Page 1of78 between thisversion andtheVersion record. Please citethisarticle asdoi:10.1111/1462-2920.13965. through thecopyediting, typesetting, pagination andproofreading process,whichmay lead todifferences This istheauthor manuscriptaccepted forpublicationandhas undergonefullpeerreview buthasnotbeen

amn Astudillo-García Carmen spongesmarine col f ilgcl cecs Uiest o Auckla of University Sciences, Biological of School Associate Professor Michael W. Taylor *Corresponding author: of Queensland, St Lucia, QLD, Australia Würzburg, Germany 7 6 5 4 3 2 1 bacter enigmatic an SAUL, of genomics and Phylogeny Michael W. Taylor

Email: Zealand Maurice Maurice Wilkins for Centre Molecular Biodiscovery, Christian-Albrechts-Universität zu Germ Kiel, Kiel, Chemis of School Ecogenomics, for Centre Australian Institute Julius-von-Sachs II, Botany of Department RD3 Marine Microbiology, GEOMAR Helmholtz foCentre Institute of Science,Marine University of Auckland School of Biological Sciences, AucklaUniversity of

[email protected] Accepted Article 1,7 *

This article isprotected by copyright. All rights reserved. 1,2 Bae . Slaby M. Beate , ; Telephone +64 ; Telephone 9 +64 9232280 3,4 Dvd . Waite W. David , any any nd, Auckland, ZealandNew , , Auckland, ZealandNew for Biological Sciences, University of Würzburg, of University Sciences, Biological for University Auckland, of Zealand New d Piae a 909 Acln 14, New 1142, Auckland 92019, Bag Private nd, try and Molecular Biosciences, The University The Biosciences, Molecular and try r Research,Ocean Germany Kiel, a lnae rqety soitd with associated frequently lineage ial 5 Kitn Bayer Kristina , 3 Ue Hentschel Ute , 3,6 ,

nutrd irognss Wie igecl genom single-cell While microorganisms. uncultured bnat pne ybot aot hc ltl i kno is little which about symbionts sponge abundant aie pne cnan n xrodnr lvl f mi of level extraordinary an contain sponges Marine ORIGINALITY-SIGNIFICANCE STATEMENT eoee gnms rm eti sog-soitd mi sponge-associated certain from genomes recovered eotd rm pne, h s-ald “sponge-associ so-called the sponges, from reported Keywords: marine sponge; symbiont; phylogeny; metag Running Phylogeny title: genomics and of SAUL u a eaaayi o eitn 1S RA ee datas gene rRNA 16S existing of meta-analysis a out genomic analyses, to gain new insights into the pre the into insights new gain to analyses, genomic enigmaticbacterial clade. Accepted Article Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. valence, identity and potential function of this of function potential and identity valence, ets, coupled with extensive phylogenetic and and phylogenetic extensive with coupled ets, td nlsiid ieg” SU) w carried we (SAUL), lineage” unclassified ated n Fr n sc lnae ht s commonly is that lineage such one For wn. crobes, there remain some key lineages of of lineages key some remain there crobes, enome functionalbinning; analyses c ad eaeois prahs have approaches metagenomics and ics crobial diversity, mostly comprised of of comprised mostly diversity, crobial Page 2of78 Page 3of78

ay aie pne cnan es ad ies commu diverse and dense contain sponges marine Many SUMMARY ebr o te sog-soitd nlsiid line unclassified “sponge-associated the of Members pne, e ltl i kon bu tee . these about known is little yet sponges, hlgntc tts f AL A eaaayi o th of meta-analysis A SAUL. of status phylogenetic ecie fr h frt ie n sog smin t symbiont sponge a in time first the for described distribution of this clade and its association with association its and clade this of distribution sequences, revealed that SAUL is a sister clade of of clade sister a is SAUL that revealed sequences, metabolism, with the capacity to degrade multiple s multiple degrade to capacity the with metabolism, phosphate into the cell and to produce and store po store produce and cellto and the into phosphate

analyses, conducted using both 16S rRNA gene-based rRNA 16S both using conducted analyses, Furthermore, weFurthermore,conductedcomprehensivea analysis eosrcin ugse ta SU mmes r aero are members SAUL that suggested reconstruction hsht rsror o te pne ot n deprivat in host sponge the for reservoir phosphate facilitate facilitate establishment the and maintenance of thi eaeoe, eeln nvl nihs no h phys the into insights novel revealing metagenomes, ietl o SU i smitc ih h hs sponge host the with symbiotic is SAUL of lifestyle Accepted Article Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. taxonomically varied sponge hosts. Phylogenetic hosts. sponge varied taxonomically s s relationship. lyphosphate granules, presumably constituting apresumablylyphosphateconstitutinggranules, aalbe ieaue eeld h widespread the revealed literature available e ponge- and algae-derived carbohydrates. We carbohydrates. algae-derived and ponge- of two draft genomes fromofdraft assembledsponge two ee e netgtd h dsrbto and distribution the investigated we Here , and identify symbiont factors which may may which factors symbiont identify and , o pros Or idns ugs ta the that suggest findings Our periods. ion phylogeny and concatenated marker protein marker concatenated and phylogeny he putative genomic capacity to transport transport to capacity genomic putative he age” (SAUL) are frequently recorded from from recorded frequently are (SAUL) age” h cniae hlm “ phylum candidate the i bcei wt fclaie anaerobic facultative with bacteria bic ooy f hs ybot Metabolic symbiont. this of iology iis f soitd microorganisms. associated of nities Latescibacteria ”.

t al. et approximately 12% and 6% of sequences derived from derived sequences of 6% and 12% approximately 03 rmi pol udrto. n sc cae is clade such One understood. poorly remain 2013) t al. et smin” n “yboi” r ue i a ra defi broad a in used are “symbiosis” and “symbiont” subsequent studies revealing its presence in numero in presence its revealing studies subsequent first identified as a symbiont of the tropical spon tropical the of symbiont a as identified first (SAUL) (Schmitt (SAUL) or the candidate phylum “phylum candidate the or “ archaea, viruses and fungi (Taylor virusesarchaea,and fungi Many sponges also host diverse and abundant microbi abundant and diverse host also sponges Many Consequently, other, less prominent members of the of members prominent less other, Consequently, association of two or more organisms, irrespective organisms, more or two of association numerically dominant and/or known to play significa play to known and/or dominant numerically aoaoy utr hs osrie or blt t u to ability our constrained has culture laboratory oa sog boas (Taylor biomass sponge total Van Soest, 2002) and are key components of the bent the of components key are and 2002) Soest, Van omnte cmrs u t 4 dfeet atra p bacterial different 41 to up comprise communities early de definitionBary 1879; Bary, (de Taylor oevr suis f h sog mcoit hv com have microbiota sponge the of studies Moreover, h rclirne f ay o ee ms, sponge-ass most, even or many, of recalcitrance The Marine sponges (phylumMarine INTRODUCTION Ca

. Synechococcus spongiarum” (Erwin and Thacker, 200 Thacker, and (Erwin spongiarum” Synechococcus .

21) and 2012) , , 2012; Simister Simister 2012; , Accepted Article et al.et noia alataAncorina Wiley-Blackwell andSocietyforApplied Microbiology , 2012), initially designated as PAUC34f(Hentschel as designated ,2012),initially This article isprotected by copyright. All rights reserved. t al. et Porifera Poribacteria , 2012a; Thomas Thomas 2012a; , t al. et ) are among the most ancient of the extantancientmetazoan amongmostare of ) the the (Simister (Simister et al.et 20a 20b Hentschel 2007b; 2007a, , ” (Fieseler (Fieseler ” , 2007a; Webster and Thomas, 2016). In thisstudy, In Webster and Thomas,2016). ,2007a; t al. et et al. ge ge , 2013), respectively. A more recent study of 81 81 of study recent more A respectively. 2013), , et al.et t al. et , , 2007a). Theonella swinhoeiTheonella drtn te hsooy f pne symbionts. sponge of physiology the nderstand of the nature of this relationship, following the the following relationship, this of nature the of us sponge (Taylor (Taylor species sponge us nt functional roles, such as the cyanobacterium the as such roles, functional nt so-called “microbial dark matter” (Rinke matter” dark “microbial so-called hos in an array of marine habitats (Bell, 2008). (Bell, habitats marine of array an in hos al communities which constitute up to 35% of 35% to up constitute which communities al , 2004; Siegl Siegl 2004; , ya (Thomas hyla the sponges the 21) Fr xml, AL represented SAUL example, For 2016). , ociated microorganisms to grow in a pure pure a in grow to microorganisms ociated h sog-soitd nlsiid lineage unclassified sponge-associated the iin t rfr ipy o h long-term the to simply refer to nition, 8; Gao 8; ol fcsd n atra ht are that bacteria on focused monly et al. et t al. et et al. et Rhopaloeides odorabileRhopaloeides t al. et , 2014; Burgsdorf 2014; , (Hentschel (Hentschel

et al.et , 2011; Kamke 2011; , 21) Tee symbiont These 2012). , 21) a wl a many as well as 2016), , , 2002),. This clade wasclade This 2002),. , et al. et et al. et , 2007a; Schmitt 2007a; , s s and(Hooper et al.et , 2002), with with 2002), , et al. et the terms the (Simister , 2014). , , 2015) , et al. et , Page 4of78 Page 5of78

ifrn sog seis cmrsn te pne Mic Sponge the comprising species, sponge different AC4) n 2 f hs seis (Thomas species those of 72 in PAUC34f) elce i te sinet f 7% f AL sequenc SAUL of >70% of assignment the in reflected maintain, over time, host-specific andcomplex micr with SAUL-affiliated sequences being identified in identified being sequences SAUL-affiliated with (Simister this sponge affinity, SAUL has also been detected i detected been also has SAUL affinity, sponge this t rltosi wt te ot pne n te preci the and sponge host the with relationship its enables marine sponges to transfer relevant symbionrelevant transfer to sponges marine enables h oiaos sponge oviparous the marine sponges (Hentschel sponges marine Although the SAUL lineage is commonly associated wi associated commonly is lineage SAUL the Although in sponges. (Taylorsoils and sediments neovd Iiily lsiid s mme of member a as classified Initially unresolved. Planctomycetes-Verrucomicrobia-Chlamydiae usqet tde hv asge SU-fiitd se SAUL-affiliated assigned have studies subsequent Deferribacteres al. eod ebrhp n h domain the in membership beyond motivated us to take a detailed look at this enigma this at look detailed a take to us motivated eadn SU casfcto, oehr ih lack a with together classification, SAUL regarding candidate “ genus candidate hlm “phylum have revealed novel insights into the lifestyles of lifestyles the into insights novel revealed have “ Ca.

20a Kamke 2007a; , Synechococcus spongiarum” (Tian spongiarum” Synechococcus Poribacteria Accepted Article al. et 21a, hc rpeet lds f microorganisms of clades represent which 2012a), , (Webster (Webster Candidatus t al.et Wiley-Blackwell andSocietyforApplied Microbiology (Fieseler ” Ectyoplasia ferox ferox Ectyoplasia This article isprotected by copyright. All rights reserved. , 2010; Simister Simister 2010; , et al.et et al.et t al. et Entotheonella” (Lackner Entotheonella” , 2007a, 2013; Thomas 2013; 2007a, , , 2002; Simister Simister 2002; , t al.et 21) a a idpnet ld coey eae to related closely clade independent an as 2011), , Bacteria 20; Siegl 2006; , et al.et (Gloeckner t al.et t al.et , 2014; Burgsdorf 2014; , (PVC) superphylum (Wagner and Horn, 2006; Taylor 2006; Horn, and (Wagner superphylum (PVC) Mnav ad il 21) Ti lc o agreement of lack This 2011). Hill, and (Montalvo 21a, r ee nbe o lsiy uh sequences such classify to unable were or 2012a), , et al.et other sponge symbionts, including the candidate the including symbionts, sponge other 21) Ti aprn afnt fr pne was sponges for affinity apparent This 2016). , tic clade. Genomics studies of individual lineages individual of studies Genomics clade. tic samples from adult, embryo and larval stages of stages larval and embryo adult, from samples obial communities(Schmitt obial n other environments such as seawater, marine seawater, as such environments other n f nweg aot t gnmc capabilities, genomic its about knowledge of Deltaproteobacteria e hlgntc lsiiain f AL remains SAUL of classification phylogenetic se ts from adults to offspring and consequently to to consequently and offspring to adults from ts t al. et t al.et th sponges, to date no studies have examined have studies no date to sponges, th , 2012a). It can also be vertically transmitted, transmitted, vertically be also can It 2012a). , une a pr o either of part as quences et al. et et al. et oim Poet fud AL lble as (labelled SAUL found Project, robiome s o ocle ‘pneseii clusters’ ‘sponge-specific so-called to es 21) Vria mcoil transmission microbial Vertical 2013). , 21; Kamke 2011; , , 2017), the widespread cyanobacteriumwidespread the 2017), , , 2016), albeit at lower abundance thanabundance lower at albeit 2016), , et al.et , 2015) and a sponge-associated a and 2015) , ht a b hgl erce in enriched highly be can that (Hentschel t al.et etal. 21, 04, the 2014), 2013, , Acidobacteria , 2012). ,Despite2012). t al. et 2002), , the or et

SAUL is widespread and abundantin marine sponge ho and identify symbiotic features of SAUL members by members SAUL of features symbiotic identify and o outy ne is hlgntc oiin amongst position phylogenetic its infer robustly to RESULTS ANDDISCUSSION metagenome data. available 16S rRNAgene available16S sequences; sequenc use (ii) clade, across different environments and among diff among and environments different across clade, In this study we aimed to (i) comprehensively descr comprehensively (i) to aimed we study this In bacterial defence systems including CRISPR-Cas, res CRISPR-Cas, including systems defence bacterial (Fan (Fan sulphur-oxidizing 2010; Fan 2010; sponge species ranging from 20.7% to less than 0.00 than less to 20.7% from ranging speciessponge nomto Tbe 1 ad n 3 ifrn pne sp sponge different 93 in and S1) Table Information h goa SU dsrbto i sogs a wide-ra was sponges in distribution SAUL global The symbionts that include an enrichment of proteins in proteins of enrichment an include that symbionts eaeoi aaye o te pne irboa thus microbiota, sponge the of analyses metagenomic in 15 studies (including the recent Sponge Microbio Sponge recent the (including studies 15 in In the absence of a comprehensive evaluation of SAU of evaluationcomprehensive a of absence the In examine those sponge studies which reported the pre the reported which studies sponge those examine

et et al. , 2012; , Horn2012; Accepted Articleal.et , 2012), universal stress proteins such as UspA (Fa UspA as proteinssuch stress universal 2012), , Gammaproteobacterium Wiley-Blackwell andSocietyforApplied Microbiology et al.et This article isprotected by copyright. All rights reserved. , Slaby 2016; et al.et (Tian (Tian , 2017). t al. et triction-modification and toxin-antitoxin systems toxin-antitoxin triction-modificationand es relatedes fromderived and SAUL bacterial phyla atra (i) eemn te eoi potential genomic the determine (iii) bacteria; volved in microbe-host signalling (Thomas signalling microbe-host in volved ibe the distribution and abundance of the SAUL the of abundance and distribution the ibe 1% (Figure 1, Supporting Information Figure S1). Figure InformationSupporting 1, (Figure 1% me Project (Thomas Project me erent sponge species, using a meta-analysis of meta-analysis a using species, sponge erent L occurrence, we performed a meta-analysis to meta-analysis a performed we occurrence,L ce, ih eaie eune bnacs per abundances sequence relative with ecies, sence of SAUL. The SAUL clade was identified was clade SAUL The SAUL. of sence reconstructing genomes from sponge-derived from genomes reconstructing gn, ih t mmes dniid rm the from identified members its with nging, 21) Smlr idns ee band by obtained were findings Similar 2017). , sts eeln cmo faue o sponge of features common revealing n et al.et , 2012) and an abundance of of abundanceandan 2012) , et al.et , 2016)) (Supporting 2016)) , et al. et , , Page 6of78 Page 7of78

SAUL is clade a sister of “ (Supporting Text). Low genomic similarity was obse was similarity genomic Low Text). (Supporting eoi lvl to level genomic S, Rinke WS3, eune aa bt a nt upre b bosrp r bootstrap by supported not was but data, sequence (Figure 1B). was also recorded in seawater and sediment samples sediment and seawater in recorded also was and phylogenetically distant sponge species (Hentsc species sponge distant phylogenetically and n hglgt h smitc oeta ad iey im likely and potential symbiotic the highlight and identified overlapping microbial community members members community microbial overlapping identified hw) Te paet ih bnac o ti clade, this of abundance high apparent The shown). associated microbial communities. This prevalence This communities. microbial associated Pacific, Atlantic and Indian oceans, as well as the as well as oceans, Indian and Atlantic Pacific, aooial dvre pne pce sget high a suggest species sponge diverse taxonomically r nt oohltc (Hug monophyletic not are weaker bootstrap support (75%), and previous resear previous and (75%), support bootstrap weaker and “ and group (Rinke group represents a monophyletic cluster closely related t related closely cluster monophyletic a represents Having demonstrated its widespread presence and num and presence widespread its demonstrated Having we sought to determine the phylogenetic position of position phylogenetic the determine to sought we supported the clustering of SAUL as a monophyletic a as SAUL of clustering the supported near full-length (>1450 bp) 16S rRNA gene sequences gene rRNA 16S bp) (>1450 full-length near 37 markers suggested that SAUL is a sister clade of cladesister a is thatsuggested SAUL markers 37

Latescibacteria Accepted Article t al.et et al. et “Latescibacteria” 21) Fgr 2) Ti rltosi ws lo ob also was relationship This 2A). (Figure 2013) , , 2013; Anantharaman 2013; , Wiley-Blackwell andSocietyforApplied Microbiology ” clustered together with together clustered ” This article isprotected by copyright. All rights reserved. Latescibacteria t al.et 21) Wt te i o rvaig o SU i rel is SAUL how revealing of aim the With 2016). , w ivsiae gnmc iiaiis ewe thes between similarities genomic investigated we , ”

et al.et Fibrobacteres , 2016; Hug 2016; , Red, Mediterranean and Caribbean seas (data not (data seas Caribbean and Mediterranean Red, thephylum“ candidate o the o rved between SAUL and “and SAUL between rved clade (Figure 2). Phylogenomic analysis of up to to up of Phylogenomic analysis 2). (Figure clade is in agreement with several studies that have that studies several with agreement in is hel hel ch has also demonstrated that these five phyla five these that demonstrated also has ch ere f eeaim ihn aie sponges marine within generalism of degree the SAUL lineage. Phylogenetic inference of of inference Phylogenetic lineage. SAUL the in the study by Thomas and colleagues (2016) colleagues and Thomas by study the in and concatenated marker protein sequences protein marker concatenated and otne f AL n h dfeet sponge- different the in SAUL of portance (including SAUL) in geographically separated geographically in SAUL) (including t bod itiuin ad t peec in presence its and distribution, broad its erical abundance in the sponge microbiota, sponge the in abundance erical et al.et Fibrobacteres-Chlorobi-Bacteroidetes smln (iue B. “ 2B). (Figure esampling et al. et , Chlorobi , 2002; Simister Simister 2002; , , 2016). In our analyses, both SAUL both analyses, our In 2016). , and evd o 1S RA gene rRNA 16S for served Latescibacteria Bacteroidetes Latescibacteria et al.et Latescibacteria” , 2012a). SAUL 2012a). , , albeit with albeit , td t the at ated ” (formerly (formerly ” lineages e ”, likely ”, (FCB) (FCB)

phylogenomic analyses, was 80.8%. According to rec to According 80.8%. was analyses, phylogenomic hlm cas n odr ees 7% 7.% n 82%, and 78.5% (75%, levels order and class phylum, previously for the sponge-associated clade “ clade sponge-associated the for previously nossec bten 6 rN gn-ae phylogeny gene-based rRNA 16S between Inconsistency (“ disp the to related lifestyles different reflecting Spotn Ifrain al S) Aeae 6 rRN 16S Average S2). Table Information (Supporting sequence similarity within SAUL and between members between and SAUL within similarity sequence o ute eaut te hlgntc tts f SAUL of status phylogenetic the evaluate further To lse ws 8, n is vrg smlrt wt “ with similarity average its and 88%, was cluster “ AL eune a a oohltc itr ld t “ to clade sister monophyletic a as sequences SAUL environments (Rinke available for andeach SAUL of “ draft genomes with sufficient completeness to be us be to completeness sufficient with genomes draft such approaches are limited by a small number of av of number small a by limited are approaches such phylum level relationships compared with analysis o analysis with compared relationships level phylum ifrn mre poen eune cn rvd high provide can sequences protein marker different is reproducibly a sister clade of “of clade sister a reproducibly is Although our results revealed that SAUL is a lineag a is SAUL that revealed results Although our n ohr lsl rltd lds rvn frhr as further prevent clades related closely other and are available.are consequence, the phylogenetic status of the SAUL cl SAUL the of status phylogenetic the consequence, Latescibacteria” Latescibacteria Accepted Article ” members are typically free-living bacteria found bacteria free-living typically are members ” would represent sister clades, possibly different possibly clades, sister represent would Wiley-Blackwell andSocietyforApplied Microbiology et al.et This article isprotected by copyright. All rights reserved. , 2013; Youssef, 2013; Latescibacteria” Latescibacteria et al.et Poribacteria , 2015)). , 2015)). . . arate environments with which they are associated are they which with environments arate ”, the paucity of near-complete genomes for SAUL for genomes near-complete of paucity the ”, e closely related with the FCB superphylum, and it it FCB superphylum,the and with closely related e Latescibacteria” ade must be revisited once more genomic data genomic more once revisited be must ade f a single marker gene such as 16S rRNA gene, rRNA 16S as such gene marker single a f ailable draft genomes. In our study, only three only study, our In genomes. draft ailable ently suggested threshold sequence criteria for criteria sequence threshold suggested ently ed for phylogenomic tree reconstruction were reconstruction tree phylogenomic for ed sertions from being made confidently. As a a As confidently. made being from sertions w cluae te vrg 1S RA gene rRNA 16S average the calculated we , gn sqec smlrt wti te SAUL the within similarity sequence gene A Latescibacteria” r eouin o rslig nr- n inter- and intra- resolving for resolution er respectively; Yarza Yarza respectively; of SAUL and those of other bacterial phyla bacterial other of those and SAUL of ” (Kamke ” and phylogenomic analysis was observed observed was analysis phylogenomic and et al.et , its closest relative according to to according relative closest its , in terrestrial, aquatic and marine and aquatic terrestrial, in classes within the same phylum. same the within classes . While a concatenation of of concatenation a While . , 2014), which also clustered also which 2014), , t al. et 21) SU and SAUL 2014), , Page 8of78 Page 9of78

for those three clusters, hereafter referred to as to referred hereafter clusters, three those for n diin o eemnn te lcmn o SU wi SAUL of placement the determining to addition In Internal structureof clade the SAUL SAUL clade suggested these clusters may represent d represent may clusters these suggested clade SAUL RA ee eune soe te xsec o tre s three of existence the showed sequences gene rRNA thresholds based on 16S rRNA gene sequence similari sequence gene rRNA 16S on based thresholds 4.5% of SAUL-affiliated sequences were derived from derived were sequences SAUL-affiliated of 4.5% hrceie h itra pyoeei srcue of structure phylogenetic internal the characterise sediment and other marine source-derived sequences sequences source-derived marine other and sediment independently from each other (Figure 3). High boo High 3). (Figure other each from independently 2. ot AL nqe ersnaie eune are sequences representative unique SAUL Most S2). biofilms. Origins varied when evaluating the three the evaluating when varied Origins biofilms. lse wt te ihs rpeetto o sponge-d of representation highest the with cluster of the sequences being derived from marine sponges sponges marine from derived being sequences the of h 1S RA ee eune drvd rm h to ot two the from derived sequences gene rRNA 16S The of the SAUL metagenome bins described below is also is below described bins metagenome SAUL the of identified SAUL sequence (clone PAUC34f, AF186412). PAUC34f,(clone sequence SAUL identified seby f eaeoe aa from data metagenome of Assembly Functionalpotential and symbiont characteristics o bin_aplysina_2) are inI. Cluster reconstruction of two near-complete draft genomes, draft near-complete two of reconstruction the identification of 104 markers (Parks markers 104 of identification the b fr i_pyia n bnptoi, respectively bin_petrosia, and bin_aplysina for Mbp Accepted Article Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. et al.et pyia aerophobaAplysina , 2015)), and an estimated genome size of 6.3 and 4 and 6.3 of size genome estimated an and 2015)), , Clusters I, II and III. The application of taxonom of application The III. and II I, Clusters clusters individually, with 62.2%, 87.5% and 42% and 87.5% 62.2%, with individually, clusters hs ieg. hlgntc re bsd n 16S on based trees Phylogenetic lineage. this f SAUL SAUL f populationrevealed by genome binning rvd eune (lse I) otis h first the contains II) (Cluster sequences erived with 90.3% and 86.5% completeness (based on (based completeness 86.5% and 90.3% with Tbe ) A hr bn bnalsn_) also (bin_aplysina_2), bin third A 1). (Table tstrap scores (>80%) supported the branching the supported (>80%) scores tstrap ty to interpret the internal architecture of the of architecture internal the interpret to ty A 16S rRNA gene sequence derived from one from derived sequence gene rRNA 16S A istinct families (Supporting Information Table Table Information (Supporting families istinct non-marine sources, primarily freshwater or freshwater primarily sources, non-marine thin the bacterial tree of life, we sought to to sought we life, of tree bacterial the thin included in the same cluster (bin_petrosia). cluster same the in included comprising a smaller fraction (24.7%). Only (24.7%). fraction smaller a comprising o Cutr I I ad I, epciey The respectively. III, and II I, Clusters for sponge derived (70.8%), with seawater, seawater, with (70.8%), derived sponge brus f AL eune clustering sequences SAUL of ubgroups and e SU bn (i_pyia and (bin_aplysina bins SAUL her ersa ficiformisPetrosia e t the to led .7 ic ic

hshrlto, ee dniid n t es oe of one least at in identified were phosphorylation, metabolism, including enzymes encoding the high aff high the encoding enzymes includingmetabolism, ihr n o bt SU bn. e lo eetd gen detected also We bins. SAUL both or one either nye ivle i te pae n/r eaoim of metabolism and/or uptake the in involved enzymes regulon (Figure 4, Box (A)), as well as the enzyme the as well as (A)), Box 4, (Figure regulon cd yl, lclss pnoe hsht pathway, phosphate pentose glycolysis, cycle, acid algae-derived carbohydrates. Genes involved in maj in involved Genes carbohydrates. algae-derived ht hs ld my e novd n hshrs seque phosphorus in involved be may clade this that auttv aarbc eaoim psesn as t also possessing metabolism, anaerobic facultative iegn pneseis (Zhang species sponge divergent reconstruction suggested binsmetagenome oftwo the bacterial cells has been described previously in th previously bacterialdescribed cells in been has paet ak f pcfc nye/rtis hud b should enzymes/proteins specific of lack apparent conversion into the polyphosphate (polyP) storage f (polyP)storage polyphosphate the conversioninto aaiiis f AL Fgr 4. t s motn t important is It 4). (Figure SAUL of capabilities u t lw opeees 3.2) Fntoa annot Functional (39.42%). completeness low to due (bin_petrosia) genes enabled a comprehensive analys comprehensive a enabled genes (bin_petrosia) constructed from Both genomes encoded enzymes involved in the produc the in involved enzymes encoded genomes Both polyP productiongranules has been identified anin yiiie, s el s ercmlt replication, near-complete as well as pyrimidines, bins (more detailed discussion of specific aspects aspects specific of discussion detailed (more bins n oiaie hshrlto, s el s substrate as well as phosphorylation, oxidative and Additionally, the genomic machinery to release and release tomachinery genomic Additionally,the information transfer canmachinery SAUL in genomes Accepted Article Aplysina aerophobaAplysina Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. et al. et metagenomic data, was not used for further genomic forfurther not used was metagenomicdata, , 2015), this is the first time that the genomic po genomic the that time first the is this 2015), , e associated communities of three phylogenetically communities associatedof three e o note that, due to genome incompleteness, any any incompleteness, genome to due that, note o polyphosphate kinase ( kinase polyphosphate actual sponge associate. actual orm. Although the presence of polyPin granules presence of Although the orm. conserve energy via the electron transport chaintransport electron the via energy conserve the bins. Moreover, genes encoding several several encoding genes Moreover, bins. the inity phosphate transporter and control of PHO PHO control of and transporter phosphateinity lvl hshrlto, a rvae fr both for revealed was phosphorylation, -level e aaiy o erd mlil sog- and sponge- multiple degrade to capacity he rncitoa ad rnltoa machineries. translational and transcriptional be foundbe Supporting in Text). or central pathways, such as the tricarboxylic the as such pathways, central or is of the metabolic potential and biosynthetic and potential metabolic the of is that SAUL members are membersSAULaerobic are that with bacteria tain rm h evrnet n is later its and environment the from stration f eta mtbls, isnhss and biosynthesis metabolism, central of irgn n slht wr ietfe in identified were sulphate and nitrogen tion of amino acids, vitamins, purines and purines vitamins, acids, amino of tion odLugal aha ad oxidative and pathway Wood-Ljungdahl itrrtd ih ato. Metabolic caution. with interpreted e to o 492 bnalsn) n 3,711 and (bin_aplysina) 4,932 of ation s novd n hsht tasot and transport phosphate in involved es ppk , EC 2.7.4.1), suggesting 2.7.4.1), EC , tential for tential analyses analyses Page 10of78 Page 11of78

which, including which, y pne fr rtcin gis peaos r epib or predators against protection for sponges by Production of biologically active secondary metabol secondary active biologically of Production Secondary metabolite biosynthesis many compounds remains controversial (Hentschel controversial remains compounds many peptide synthetases (NRPS) (Fischbach and Walsh, 20 Walsh, and (Fischbach (NRPS) synthetases peptide eaoie ae rdcd y oyeie ytae (P synthases polyketide by produced are metabolites (Fieseler (Fieseler eeld h peec o svrl eodr metaboli secondary several of presence the revealed BlastP search conducted with the PKSs identified in identified PKSs the with conducted search BlastP eoe, ye PS wr ietfe (iue , Box 4, (Figure identified were PKSs I Type genomes, pne ybot bqios ye PS Sp identif (Sup) PKS I Type ubiquitous symbiont sponge and others by associated microorganisms (Piel microorganisms associated by others and Host-microbe recognition systems eukaryoticthrough environment. irognss rsn i tee pne my ed to lead may sponges these in present microorganisms microbe-microbe interactions (Thomas interactions microbe-microbe either other symbionts present in the community or or community the in present symbionts other either antimicrobial properties may be used as a defence s defence a as used be may properties antimicrobial PKS modules and related proteins (COG3321). Furthe (COG3321). proteins related and modules PKS 2003), whichcontains 2003), also Wilson

et al. et t al.et Accepted Article , 2014; Flórez 2014; , , 2007). That same study identified the PKS in 10 10 in PKS the identified study same That 2007). , T. swinhoei T. Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. et al.et A.aerophoba , belonged to the “high-microbial-abundance” group group “high-microbial-abundance” the to belonged , , 2015). Both SAUL metagenome bins contained genes contained bins metagenome SAUL Both 2015). , et al.et and and , 2016). The production of secondary metabolites w metabolites secondary of production The 2016). , P. ficiformis P. et al. et both bins showed >60% sequence similarity to a to similarity sequence >60% showed bins both et al.et ites is an important defence mechanism utilised mechanism defence important an is ites -like -like proteins trategy by some sponge symbionts to confront to symbionts sponge some by trategy , 2004; Taylor Taylor 2004; , foreign microorganisms that enter the sponge the enter that microorganisms foreign r analysis with antiSMASH (Weber (Weber antiSMASH with analysis r (B)), as well as several putative clusters. A A clusters. putative several as well as (B)), 06; Newman and Cragg, 2012). The origin of origin The 2012). Cragg, and Newman 06; e isnhss ee lses I bt SAUL both In clusters. gene biosynthesis te e in ied S, any ye PS ad non-ribosomal and PKS, I Type mainly KS), (Gloeckner , 2012), with some produced by the sponge the byproduced some with 2012), , ionts (Pawlik, 2011). Many secondary secondary Many 2011). (Pawlik, ionts nes (n nt eesrl positive) necessarily not (and intense hoel swinhoei Theonella additional sponge species, all of of all species, sponge additional et al.et et al. et , 2014). The abundance of of abundanceThe ,2014). , 2007a; Sala Sala 2007a; , (cosmid pSW1H8) pSW1H8) (cosmid (Hentschel encoding for for encoding et al. et et al. et , 2014; 2014; , , 2015), et al.et ith ,

Burgsdorf (Habyarimana Adaptive symbiosis factors such as eukaryotic-like eukaryotic-like as such factors symbiosis Adaptive mrvn atcmn t te uaytc ot n avo and host eukaryotic the to attachment improving attention due to their presumed medipresumed involvementdueintheirto attention ankyrin (ANK), tetratricopeptide (TPR) and leucine- and (TPR) tetratricopeptide (ANK), ankyrin Cerveny symbiont communities (Liu communities symbiont soitd irba cmuiis ae dniid su identified have communities microbial associated presumed importance in symbiont recognition, we inv we recognition, symbiont in importance presumed encoded ANKs (annotated as COG0666, PF00023), TPRs TPRs PF00023), COG0666, as (annotated ANKs encoded pne ybot ehbt eitne ehnss desi mechanisms resistance exhibit symbionts Sponge Evidence SAUL adaptationfor host to conditions sponge host. Ls n AL is Sreig f O ad FM datab PFAM and COG of Screening bins. SAUL in ELPs irboa tde, n sget ter otltd i postulated their suggests and studies, microbiota in bin_aplysina. The finding of these ELPs within ELPs these of finding The bin_aplysina. in (COG4886). Genes encoding for another ELP, WD40 re WD40 ELP, another for encoding Genes (COG4886). conditions that generally lead to stress. In this this In stress. to lead generally that conditions microbial communities have revealed an enrichment o enrichment an revealed have communities microbial (Hansen (Hansen nldn te rsne f niirba cmons ( compounds antimicrobial of presence the including et al. et 2008; Fan 2008;

, 2016) and sedimentation (Luter (Luter sedimentation and 2016) ,

et al.et t al. et et al.et et al. et Accepted Article , 1995; Webster 1995; , et al. et 21; enls n Toa, 06. eet (meta)g Recent 2016). Thomas, and Reynolds 2013; , , 2015). Due to their ubiquity in sponge-associate in ubiquity their to Due 2015). , , 2012; Liu 2012; , , 2008; Al-Khodor 2008; , Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. et al.et et al.et et al.et , 2012) that may help symbionts cope with environme with cope symbionts help may that 2012) , , 2011, 2012; Fan 2012; 2011, , , 2001), and changes in temperature (Fan temperature in changes and 2001), , et al. et et al. et , 2010; Liu 2010; , , 2012). The SAUL clade possesses genomic signatur genomic possesses clade SAUL The 2012). , SAUL genomes is consistent with previous sponge sponge previous with consistent is genomes SAUL proteins (ELPs) in bacterial symbionts, particularl symbionts, bacterial in (ELPs) proteins et al.et rich (LRR) repeat proteins, have attracted much attracted have proteins, repeat (LRR) rich otx, eoi suis f sponge-associated of studies genomic context, ating host-microbeating and recognition interaction, et al.et prac fr ybot eonto b the by recognition symbiont for mportance Piel, 2009), bioaccumulation of heavy metals metals heavy of bioaccumulation 2009), Piel, f stress-related proteins (López-Legentil proteins stress-related f estigated the presence of genes encoding for for encoding genes of presence the estigated peat proteins (PF00400), were also identified also were (PF00400), proteins peat nd o pcfcly drs cags n host in changes address specifically to gned , 2012; Kamke 2012; , h atr a big ieped n these in widespread being as factors ch dne f h hs’ imn response immune host’s the of idance ss eeld ht oh AL genomes SAUL both that revealed ases , 2011; Siegl Siegl 2011; , CG00 P055 P079, n LRR and PF07719), PF00515, (COG5010, d microbial communities and their and communities microbial d et al.et et al.et nmc tde o sponge- of studies enomic , 2014; Tian Tian 2014; , , 2011; Fan Fan 2011; , et al.et , 2013), pH (Ribes pH 2013), , ntal stressors, ntal et al.et et al.et , 2014; , , 2012; , et al. et es es y , Page 12of78 Page 13of78

i_pyia noe te nye lttin peroxid glutathione enzyme the encoded bin_aplysina oeue ohdoe eoieadmlclr oxygen molecular and peroxide hydrogen to molecules O05) n mnaee ueoie imts (MnSOD, dismutase superoxide manganese and COG0450) ehnss gis oiaie tes Tee enzymes These stress. oxidative against mechanisms is an enzyme member of the superoxide dismutase fam dismutase superoxide the of member enzyme an is “ AphC and glutathione peroxidase reduce organic and and organic reduce peroxidase glutathione and AphC eoi oprsn fSU is ihtodat gen draft two with bins SAUL of comparisons genomic rE (S6, O05) mmrn poess fC (CO HflC proteases membrane COG0459), (HSP60, GroEL also protecting cells from reactive nitrogen interm nitrogen reactive from cells protecting also dniid ihn AL eoe. hs icue alky include These genomes. SAUL within identified nye ivle i cl dfne gis oxidative against defence cell in involved enzymes can act as free radical ion scavengers (Wortham scavengers ion radical free as act can denatured and/or damaged proteins were also identif also were proteins damaged and/or denatured cadaverine were identified in SAUL bins. Polyamine bins. SAUL in identified were cadaverine “ as putrescine, spermidine (the most common polyamid common most (the spermidine putrescine, as Furthermore, genes encoding for the complex PotABCD complex the for encoding genes Furthermore, ybot my e novd n dpain o h host the to adaptation in involved be may symbionts genomicdetailed comparison). noig rtis uh s sA uiesl tes pr stress (universal UspA as such proteins encoding train o epsr t hay eas Nsrm an (Nyström metals heavy to exposure or starvation, related to adaptation to the microenvironment of th of microenvironment the to adaptation to related yteie i rsos t evrnetl tes suc stress environmental to response in synthesised Latescibacteria Latescibacteria Accepted Article , niae te paet bec o Up, s well as UspA, of absence apparent the indicated ”, , uprig h nto ta tee etrs common features these that notion the supporting ”, Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. ediates (Chen (Chen ediates et al.et s play an important role in acid resistance and resistance acid in role important an play s tes nue b te pne ot ee also were host sponge the by induced stress ied in SAUL bins, including chaperone proteins proteins chaperone including bins, SAUL in ied e host sponge, with both bins carrying genes carrying bins both with sponge, host e lipid peroxides, respectively, with the former the with respectively, peroxides, lipid l hydroperoxide reductase subunit C (AphC, (AphC, C subunit reductase hydroperoxide l otein A) (annotated as COG0589), which is is which COG0589), as (annotated A) otein , 2007). Proteins involved in elimination of elimination in involved Proteins 2007). , a ha ado omtc hc, nutrient shock, osmotic and/or heat as h s (C .116 CG36. h enzymes The COG0386). 1.11.1.6, (EC ase omes of the closest relative, the free-living the relative, closest the of omes , involved in the uptake of polyamines such such polyamines of uptake the in involved , (McCord and Fridovich, 1969). Moreover, 1969). Fridovich, and (McCord ily, which is one of the cell’s major defence major cell’s the of one is which ily, es in bacteria (Wortham bacteria in es niomn (e Spotn Tx fr a for Text Supporting (see environment aaye h cneso o superoxide of conversion the catalyse Nihrt 19; Kvint 1994; Neidhardt, d 03) n Da (O04) Several (COG0443). DnaK and G0330) C .511 CG65. Only COG0605). 1.15.1.1, EC et al.et , 1998). Furthermore, MnSOD Furthermore, 1998). , s pC n MSD from MnSOD, and AphC as y dniid n sponge in identified ly et al. et t al. et , 2007)), and 2007)), , , 2003). 2003). ,

aiiae b mbl gntc lmns uh s tran as such elements genetic mobile by facilitated irbs aatto t ete seii nce o t or niches specific either to adaptation microbes, n ekroe (eln ad amr 20; Wiedenbec 2008; Palmer, and (Keeling eukaryotes and Horizontal gene transfer plays an important role in role important an plays transfer gene Horizontal Horizontal gene transfer and defencemechanisms etito-oiiain ytm ae osdrd bac considered are systems Restriction-modification (COG3344, PF00078) and integrases (PF00665). transposases (COG1943, transposases COG3415,retro COG3328), and exchange with non-symbiont and/or pathogen microorg pathogen and/or non-symbiont with exchange oiotl N ecag bten pne ybot bu symbionts sponge between exchange DNA horizontal al. 02 Ae ad nue, 05. AL is noe f encoded bins SAUL 2015). Antunes, and Alex 2012; ybot ht anan n srnte is interacti its strengthen and maintain that symbiont for genetic exchange and rearrangement. This could This rearrangement. and exchange genetic for abundance of restriction-modification systems was f was systems restriction-modification of abundance restriction-modi PF04851) and (COG2189 III Type and PF12161 and PF01420), Type II (COG0270, COG1743, CO COG1743, (COG0270, II Type PF01420), and PF12161 transposable elements within SAUL genomes likely co likely genomes SAUL within elements transposable “ also identified in SAUL bins. These included Type Type included These bins. SAUL in identified also corresponding HGTancestral to events (R≤0.4). on i SU gnms 7 n 5 o bnalsn and bin_aplysina for 5 and (7 genomes SAUL in found COG0338) COG0338) systems were identified in of theonly one transferred genes using HGT-Finder (NguyenHGT-Finderusing genes transferred To further investigate the magnitude of HGT events events HGT of magnitude the investigate further To

Latescibacteria , 2016; Slaby 2016; , Accepted Article et al.et . n hs ae CG icue i Tp I (COG0286) I Type in included COGs case, this In ”. Wiley-Blackwell andSocietyforApplied Microbiology , 2017). COGs including specific DNA modification DNA specific including COGs 2017). , This article isprotected by copyright. All rights reserved. et al.et , 2015). Very few putative transferred genes were genes transferred putative few Very ,2015). cags n niomna cniin cn be can conditions environmental in changes o two “ two dpainadeouin fbt prokaryotes both of evolution and adaptation ound when investigating the draft genomes of genomes draft the investigating when ound within SAUL genomes, we identified candidate identified we genomes, SAUL within sposons, plasmids and prophages (Fan (Fan prophages and plasmids sposons, nfers upon these microorganisms the capacity the microorganisms these upon nfers (O08, O01, O 49, PF02384, 4096, COG COG0610, (COG0286, I allow for the acquisition of functions by the the by functions of acquisition the for allow eil eec sses ht a facilitate may that systems defence terial r rnpsbe neto eeet sc as such elements insertion transposable or n ih h hs. codnl, lower a Accordingly, host. the with on id elements containing reverse transcriptase containingreverse id elements ad oa, 01. n sponge-associated In 2011). Cohan, and k iain ytm. h peec o these of presence The systems. fication anisms (Vasu and Nagaraja, 2013; Horn 2013; Nagaraja, and (Vasu anisms G0863, COG0338, COG4489 and PF00145) and COG4489 COG0338, G0863, Latescibacteria a te ae ie rtc aant DNA against protect time same the at t bin_petrosia, respectively), likely likely respectively), bin_petrosia, n Tp I (O06 and (COG0863 II Type and and restriction systems were systems restriction and ” SAGs(WS3_E07).” t al.et et , Page 14of78 Page 15of78

CRISPR-Cas systems are heritable and adaptive immun adaptive and heritable are systems CRISPR-Cas n mn bcei. hs sses r cmrsd of comprised are systems These bacteria. many and spacers toused are and cleavetarget invading DNA h CIP lcs f h hs gnm; n te interf the and genome; host the of locus CRISPR the involving incorporation of small fragments of forei of fragments small of incorporation involving omnte (Thomas communities nrdcin f oeg DA no hi chromosomes their into DNA foreign of introduction noprtd no hi gnms ytm t effectiv to systems genomes their into incorporated large amount of viral particles and phages. Member phages. and particles viral of amount large Due to the intense pumping activity of the host spo host the of activity pumping intense the to Due nvra Cs ad a2 ti sse icue prote included system this Cas2, and Cas1 universal RSRascae (a) rtis tu frig two forming thus proteins, (Cas) CRISPR-associated Information Table S3b). By contrast, bin_petrosia contrast, By S3b). Table Information ein f 13 t ossig f 8 pcr ein s regions spacer 68 of consisting nt 4173 of region NODE_3759 (Supporting Information Table S3). NODE_ S3). Table Information (Supporting NODE_3759 hrceitc f utp IC Spotn Informati (Supporting I-C subtype of characteristic no Cas proteins were identified. As a consequence, a As identified. were proteins Cas no of the CRISPR region, and included the universal Ca universal the included and region, CRISPR the of Screening of SAUL genomes revealed six CRISPR regio CRISPR six revealed genomes SAUL of Screening consisting of 101 spacer regions separated by a rep a by separated regions spacer 101 of consisting assessed. Pairwise comparisons of CRISPR spacer re spacer CRISPR of comparisons Pairwise assessed. and NODE_3759, respectively, registered hits in pla in hits registered respectively, NODE_3759, and targets of the spacers were mainly unknown targets, unknown mainlywere spacers the of targets the SAUL members representing each bin are exposed exposed are bin each representing members SAUL the rtis Cs (Makarova (Cas) proteins context, clustered, regularly interspaced, short, p short, interspaced, regularly clustered, context, Accepted Article Wiley-Blackwell andSocietyforApplied Microbiology t al.et This article isprotected by copyright. All rights reserved. t al. et , 2010; Fan Fan 2010; , 21) r cmol erce i sponge-associated in enriched commonly are 2011) , t al. et 21; Burgsdorf 2012; , alindromic repeats (CRISPRs) and their associated their and (CRISPRs) repeats alindromic gn DNA into an array of spacer sequences within sequences spacer of array an into DNA gn only contained one confirmed CRISPR region and region CRISPR confirmed one contained only s1 and Cas2, as well as other associated proteins associated other as well as Cas2, and s1 (Deveau (Deveau eat of 37 nt. Cas proteins were found upstream found were proteins Cas nt. 37 of eat nge, associated microbes are likely exposed to a to exposed likely are microbes associated nge, gions identified in SAUL genomes indicated that that indicated genomes SAUL in identified gions smids. No spacer had hits in known phages or phages known in hits had spacer No smids. n al Sa. oevr ND_79 a a has NODE_3759 Moreover, S3a). Table on although six and one spacers from NODE_2846 from spacers one and six although prtd y 2 n rpa. pr fo the from Apart repeat. nt 29 a by eparated the functionality of this system could not be be not could system this of functionality the s of sponge microbial communities have thus have communities microbial sponge of s ins characteristic of subtype I-E (Supporting (Supporting I-E subtype of characteristic ins (Fan e systems that are encoded by most archaea archaea most by encoded are that systems e ns in bin_aplysina. Of these, two contained two these, Of bin_aplysina. in ns to different types of foreign DNA. Potential DNA. foreign of types different to rne tg, hr te eety acquired recently the where stage, erence l poet hmevs n mnms the minimise and themselves protect ely 2846 presents a region of 7304 nucleotides 7304 of region a presents 2846 w mi sae: h aatto stage, adaptation the stages: main two RSRCs ytm: OE24 and NODE_2846 systems: CRISPR-Cas t al.et et al.et , 2010; Makarova, 2010; 21; Horn 2012; , t al.et 21; Horn 2015; , t al.et et al.et 21) n this In 2016). , t al.et , 2011). microbial , 2016). 2016). ,

genomes, suggesting exposure lower a i potential to SAUL SAUL potential hassponge- the degrade andto algae evaluate SAUL’s putative capacity for carbohydrate for capacity putative SAUL’s evaluate carbohydrate-active enzymes (CAZymes) using the CAZ the using (CAZymes) enzymes carbohydrate-active Supporting Text for more detailed information on de on information detailed more for Text Supporting Both SAUL bins present multiple enzymes involved in involved enzymes multiple present bins SAUL Both iue. oevr n cnimd RSR ein wer regions CRISPR confirmed no Moreover, viruses. Proteins assigned to this family in SAUL bins were were bins SAUL in family this to assigned Proteins a te eod ot bnat aiy n oh AL bi SAUL both in family abundant most second the was dehydrogenase,predictedoxidoreductasesoras dehy annotated as sialidase (EC 3.2.1.18), an enzyme tha enzyme an 3.2.1.18), (EC sialidase as annotated acid residues, which are present in marinesponges present areacid residues, which nomto Tbe 4. vrl, 4 ifrn G fa GH different 24 Overall, S4). Table Information (GT) and, to a lesser extent, polysaccharide lyases polysaccharide extent, lesser a to and, (GT) Information Table S5). The most abundant GHThefamily abundant most S5). InformationTable en ecie a αNaeyglcoaiiae E 3 (EC α-N-acetylgalactosaminidase as described been enzyme include glycolipids, glycopeptides and glyco and glycopeptides glycolipids, include enzyme sponge mesohyl and as dissolved organic matter in s in matter organic dissolved as and mesohyl sponge 94, n frs irfbis n re age uh a such algae green in microfibrils forms and 1964), rnia cmoet f h cl wl seeo i ce in skeleton wall cell the of component principal oyacaie mnas glcoann ad glucoma and galactomannans mannans, polysaccharides ...8. hs nye yrlss h (1->4)-beta- the hydrolyses enzyme This 3.2.1.78). let o lse etn, a te aiy H1, wh GH113, family the was extent, lesser a to albeit al.

, 2012). SAUL bins were rich in genes encoding gly encoding genes in rich were bins SAUL 2012). , Accepted Article Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. (PL) and carbohydrate esterases (CE) (Supporting (CE) esterases carbohydrate and (PL) (Garrone nvading nvading DNA their in environment. t hydrolyses glycosidic linkages of terminal sialic terminal of linkages glycosidic hydrolyses t iis ee eetd n AL is (Supporting bins SAUL in detected were milies proteins, compounds typically found within the within found typically compounds proteins, degradation, SAUL genomes were screened for screened were genomes SAUL degradation, -derived carbohydrates rtain species of algae (Frei and Preston, 1961, 1961, Preston, and (Frei algae of species rtain eawater (Genin eawater identified was GH109, activitywhichthehas was of identified dicated sugar catabolic pathways). To further To pathways). catabolic sugar dicated drogenases and related proteins. FamilyGH33proteins. related drogenases and coside hydrolases (GH), glycoside transferases transferases glycoside (GH), hydrolases coside the utilisation of diverse carbon sources (see sources carbon diverse of utilisation the s s -ansdc ikgs n h soae plant storage the in linkages D-mannosidic c cnan te nye -anns (EC β-mannanase enzyme the contains ich Y database in the web server dbCAN (Yin (Yin dbCAN server web the in database Y ietfe i ay f h “ the of any in identified e oim fragileCodium 214) Pyilgcl usrts o this for substrates Physiological .2.1.49). nn. ann elcs ells a the as cellulose replaces Mannan nnans. s SU poen i ti fml were family this in proteins SAUL ns. oty noae a moioio 2- myo-inositol as annotated mostly et al.et , 1971). Also present in SAUL bins,SAUL in present Also , 1971). et al.et and , 2004; Blunt 2004; ,

ctblra crenulataAcetabularia Latescibacteria et al.et , 2017). , et ”

Page 16of78 Page 17of78

eeld eoi caatrsis ht r commonly are that characteristics genomic revealed related clades prevents further assertions from bei fromassertions further clades prevents related microorganisms, which may facilitate establishment facilitate may which microorganisms, hs smiss atr icue n paet abundan apparent an include factors symbiosis These Meta-analysis available of SAUL 16S rRNA sequegene EXPERIMENTAL PROCEDURES defence mechanismsas such CRISPR-Cas. h mt-nlss ok no con toe tde p studies those account into took meta-analysis The irboe n hc SU ado PU3f ee identi were PAUC34f and/or SAUL which in microbiome microbial abundance (HMA) sponge hosts, though can though hosts, sponge (HMA) abundance microbial We have demonstrated here that the SAUL lineage is is lineage SAUL the that here demonstrated have We Concluding remarks SAUL lineage close to the FCB superphylum and as a a as and superphylum FCB the to close lineage SAUL abundance (LMA) sponges and other non-sponge habita non-sponge other and sponges (LMA) abundance “ carbohydrates (Kamke either by the sponge host taking the compounds up d up compounds the taking host sponge the by either identified in SAUL bins. These algae-derived compo algae-derived These bins. SAUL in identified ybot “symbiont uh s ellss edguaae, C ..., GH5 3.2.1.4, EC (endoglucanases, cellulases as such by-product of the sponge feeding on algae. Similar algae. on feeding sponge the of by-product (Mackie and Preston, 1968). Moreover, members of G of members Moreover, 1968). Preston, and (Mackie Latescibacteria Accepted ArticlePoribacteria . oee, h puiy f ercmlt gnms f genomes near-complete of paucity the However, ”. Wiley-Blackwell andSocietyforApplied Microbiology et al.et rvae a ope sie f ee rltd o the to related genes of suite complex a revealed ” This article isprotected by copyright. All rights reserved. , , 2013). ng made confidently. Extensive genomic analyses Extensive confidently. made ng unds may be made available for SAUL utilisation SAUL for available made be may unds ly, genomic analyses of the widespread sponge widespread the of analyses genomic ly, and maintenance of the symbiotic relationship. relationship. symbiotic the of maintenance and nces irectly from the surrounding seawater or as a as or seawater surrounding the from irectly ad eaguoiae (H1) wr also were (GH116), beta-glucosidases and ) also occur in lower numbers in low microbial low in numbers lower in occur also H families involved in cellulose degradation, cellulose in involved families H ts. The available data collected here set the set here collected data available The ts. bihd p o uy 06 n h sponge the on 2016 July to up ublished e f Ls uiesl tes rtis and proteins stress universal ELPs, of ce ieped ad fe audn, n high in abundant, often and widespread, id n epiil mnind (Supporting mentioned explicitly and fied sister clade of the candidate phylum phylum candidate the of clade sister ecie fr sponge-associated for described r AL n ohr closely other and SAUL or erdto o several of degradation

using the SINA Web Aligner (Pruesse (Pruesse Aligner Web SINA the using utilised in the study are listed in Supporting Info Supporting in listed are study the in utilised eie fo to AL eaeoe (e blw. Acc below). (see metagenomes SAUL two from derived eetd eune. s f a 21, l SAUL-affil all 2016, May of As sequences. selected eBn wr rtivd n icue i or analyses our in included and retrieved were GenBank ee eand ad pyoeei te ws construc was tree phylogenetic a and retained, were against the GenBank nr/nt database. The 100 best h best 100 The database. GenBanknr/nt the against neatv to i AB Tes ee osrce in constructed were Trees ARB. in tool Interactive eune wr sbeunl add ihu changing without added subsequently were sequences correction) and maximum likelihood (RAxML) to (RAxML)asses likelihood correction) maximum to and belonging to the distantly related clades related distantly the to belonging ih eune rpeetn coey eae pya ( phyla related closely representing sequences with dfut, TGMA n GRA. h otru fr t for outgroup The GTRCAT. and GTRGAMMA (default), aiu lklho tes ee osrce uig thr using constructed were trees likelihood Maximum analyses were carried out on near-full length (≥145 length near-full on out carried were analyses aaae ad motd no R fr ute manual further for ARB into imported and database, were downloaded and relative abundance was calculat was abundance relative and downloaded were (Simister og 6 rN gn sqecs ≥20 p previously bp) (≥1200 sequences gene rRNA 16S Long Deciphering SAUL phylogeny either orSAUL per PAUC34f sponge species. sponge species was noted for each study. When this When study. each for noted was species sponge nomto Tbe 1. hr psil, h relativ the possible, Where S1). Table Information Accepted Article al. et 21a wr ue a rfrne eune t conduc to sequences reference as used were 2012a) , Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. t al. et Thermotogae 20) mre wt te IV 19 S Rf R 99 NR Ref SSU 119 SILVA the with merged 2007), , rmation Table S6. SAUL sequences were aligned were sequences SAUL S6. Table rmation 0 bp) SAUL 16S rRNA gene sequences together sequences gene rRNA 16S SAUL bp) 0 its with >85% sequence identity for each search each for identity sequence >85% withits s the robustness of the constructedphylogeny. the robustness the of s information was not available, sequence data data sequence available, not was information sqec audne f AL n different in SAUL of abundance sequence e ae 1S RA ee eune aalbe in available sequences gene rRNA 16S iated R uig egbu-onn (Jukes-Cantor neighbour-joining using ARB and a wl a to 6 rN gn sequences gene rRNA 16S two as well as , as per (Simister (Simister per as e t cnim h SU aflain f the of affiliation SAUL the confirm to ted ed as percentage of sequences assigned to to assigned sequences of percentage as ed ee different distribution models, GTRMIX GTRMIX models, distribution different ee uain f h ainet Phylogenetic alignment. the of curation sin ubr fr og AL sequences SAUL long for numbers ession lsiid s en aflae wt SAUL with affiliated being as classified Aquificae e cluain opie sequences comprised calculation ree re oooy sn te Parsimony the using topology tree . Sequence conservation filters filters conservation Sequence . a etnie LS search BLAST extensive an t t al. et 21a) Shorter 2012a)). , Page 18of78 Page 19of78

6 rN gn sqec smlrt wti te AL c SAUL the within similarity sequence gene rRNA 16S xet ht cnevto fle ws o apid i applied not was filter conservation a that except osrce uig h sm mtoooy ecie ea described methodology same the using constructed construction, to help decipher the phylogeny of a g a of phylogeny the decipher help to construction, A custom data set of 37 different marker protein se protein marker different 37 of set data custom A Distance tool.Matrix considered in the phylogenetic analysis, was calcul was analysis, phylogenetic the in considered sequence positionsalignment the in analysis. bin_aplysina) are analysed in detail in this study this in detail in analysed are bin_aplysina) used to conduct a phylogenomic analysis (Rinke analysis phylogenomic a conduct to used this analysis were obtained from three unpublished unpublished three from obtained were analysis this vrg sqec smlrt aog lds a be u been has clades among similarity sequence Average (50%) were applied for tree construction (Ludwig (Ludwig construction tree for applied were (50%) W3, addt dvso OP3, division candidate (WS3), To investigate sub-clusters within the SAUL lineage SAUL the within sub-clusters investigate To with resamplings.500 ( eune (15 b) n a ag o sqecs from sequences of range a and bp) (≥1450 sequences a isfiinl cmlt fr ul eoe analysi genome full for complete insufficiently was 3% oeae f h poen eune n evle < e-value and sequence protein the of coverage >30% identified in a single genome, only the best match best the only genome, single a in identified WAG+Gammabootstrapping model, wasand calculated w (Castresana, 2000) and the markers were concatenate were markers the and 2000) (Castresana, their respective reference alignment using HMMER (v HMMER usingalignment reference respective their Planctomycetes Accepted Article , Verrucomicrobia Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. Firmicutes , Chlamydiae n cniae iiin R1. hlgntc re w trees Phylogenetic BRC1). division candidate and

(see Results and Discussion section), while the thi the while section), Discussion and Results (see et al.et , ated by applying the similarity option of the ARB the of option similarity the applying by ated et al.et , we selected long SAUL-affiliated 16S rRNA gene rRNA 16S SAUL-affiliated long selected we , was retained. Homologues were then aligned to aligned then were Homologues retained. was Lentisphaerae quences (Supporting Information Table S7) was was S7) Table Information (Supporting quences . ah akr ee a ietfe, requiring identified, was gene marker Each s. iven microorganism (Yarza microorganism iven draft genomes: two of these (bin_petrosia and (bin_petrosia these of two genomes: draft n order to include the maximum number of of number maximum the include to order n 3.1b2). Alignments were cleaned with Gblocks with cleaned were Alignments 3.1b2). , 2013). SAUL gene sequences employed for for employed sequences gene SAUL 2013). , d. A tree was then built in RAxML, using the using RAxML, in built then was tree A d. ae ad ewe SU ad te clades other and SAUL between and lade, , 1998), and bootstrap analyses were done were analyses bootstrap and 1998), , .0. hr mlil hmlge were homologues multiple Where 0.001. le fr 6 rN gn-ae phylogeny, gene-based rRNA 16S for rlier ith 100 resamplings.100 ith sed, in addition to phylogenetic tree tree phylogenetic to addition in sed, eea bceil hl a outgroup as phyla bacterial several “ , Poribacteria” et al. et , Latescibacteria , 2014). Thus, 2014). , ere ere rd rd

Samples of the Mediterranean sponges Mediterranean the of Samples Sponge samplecollection metagenomeand sequencing genes was conducted with nhmmer (Wheeler and Eddy, and (Wheeler nhmmer with conducted was genes odcig n msac aant dtbs o 14 e 104 of database a against hmmsearch an conducting bp were filtered out. Raw Illumina data for for data Illumina Raw out. filtered were bp 2015). Target bins were identified by a BLASTn sea BLASTn a by identified were bins Target 2015). bin_aplysina and bin_aplysina_2 (both derived from derived (both bin_aplysina_2 and bin_aplysina against a BLAST database of the rRNA genes ofrRNA databaseidentifi BLAST a against the construct a phylogenetic tree with previously ident previously with tree phylogenetic a construct off of 85%. To confirm their affiliation to the SA the to affiliation their confirm To 85%. of off manually n ter eaeoe wr sqecd n assembled and sequenced were metagenomes their and Slaby .. a dfut etns (Alneberg settings default at 0.4.0 EDN: ILMNCI::01) (Bolger ILLUMINACLIP:2:30:10) LEADING:3 egh n abgos ae. ed wr timd wit trimmed were Reads bases. ambiguous and length .12 (http://www.bioinformatics.babraham.ac.uk/pr 0.11.2 binning process as described in Slaby in described as process binning maxk (Peng -100) house python script mkBinFasta.py ( mkBinFasta.py script python house Trimmomatic for further quality trimming and length and trimming quality further for Trimmomatic (https://sourceforge.net/projects/bbmap/). Merged Merged (https://sourceforge.net/projects/bbmap/). MINLENGTH:150 AVGQUAL:30). The remaining reads wer reads remaining The AVGQUAL:30). MINLENGTH:150 The metagenomes from metagenomes The

et al. et via

Accepted Articlefrom obtained reads sequence Raw 2017). , a previously published R pipeline (Albertsen pipeline R published previously a et et al. Wiley-Blackwell andSocietyforApplied Microbiology , 2012). ,Contigs 2012). shorter bpthan were discard1000 This article isprotected by copyright. All rights reserved. A. aerophoba A. t al.et https://github.com/bslaby/scripts/ et al. et and Petrosia ficiformisPetrosia 21) wt peaain f h cvrg tbe fo tables coverage the of preparation with 2014), , (2017). A fasta file for each bin was created wit created was bin each for file fasta A (2017). P. ficiformisP. A. aerophobaA. t al. et UL clade, bin-derived sequences were then used to to used then were sequences bin-derived clade, UL ed in the metagenomic bins, withcut-identity in edan metagenomic the bins, rch of 86 known SAUL 16S rRNA gene sequences gene rRNA 16S SAUL known 86 of rch ified SAUL sequences. The identified SAUL bins SAUL identified The sequences. SAUL ified . aerophobaA. filtering (SE –phred 33 SLIDINGWINDOW:4:25 33 –phred (SE filtering and unmerged reads were again subjected to subjected again were reads unmerged and jcsfsq/ fr dpes oeal quality, overall adapters, for ojects/fastqc/) were binned using the software CONCOCT v. CONCOCT software the using binned were 21) hn egd sn bbmerge using merged then 2014) , 2013). Bin completeness was estimated by estimated was completeness Bin 2013). snil ee uig hcM (Parks CheckM using genes ssential et al. et e assembled with IDBA-UD 1.1.1 (-mink 10(-mink 1.1.1 IDBA-UD with assembled e P. ficiformisP. are deposited under JGI’s GOLD study ID ID study GOLD JGI’s under deposited are and o peiu suis (Horn studies previous for Timmtc .1 P –he 33 –phred (PR 0.31 Trimmomatic h , 2013) and contigs shorter than 2000 than shorter contigsand 2013) , Aplysina aerophobaAplysina ), and bin_petrosia, were refined were bin_petrosia, and ), were inspected using FastQC FastQC using inspected were ). The identification of rRNA of identification The ). ed. were collected were t al.et h the in- the h , 2016; 2016; , t al.et the r , Page 20of78 Page 21of78

subsystems, followed by manual checking. Clusters Clusters checking. manual by followed subsystems, Overbeek MKWU00000000. Sequence data for data Sequence MKWU00000000. s094. h asml dt ae eoie i GenB in deposited are data assembly The Gs0099546. umte t RS, h SE-ae poaytc genom prokaryotic SEED-based the RAST, to submitted The draft genomes of the two most complete SAUL bin SAUL complete most two the of genomes draft The WGS:LXNJ00000000). ne te iPoet RN385 ad h BioSample the and PRJNA318959 BioProject the under 2003) were annotated using rpsBLAST (v. 2.2.15),(v.wh usingrpsBLAST annotated were 2003) al. KEGG (Kanehisa (Kanehisa KEGG Additionally, SAUL predicted genes were submitted t submitted were genes predicted SAUL Additionally, through the WebMGA (Wu WebMGA the through genes (Kanehisa (Kanehisa genes identified by searching translated protein sequence protein translated searching by identified HMMER 3, and results were filtered using an e-value an using filtered were results and 3, HMMER were manually evaluated with SEED annotation and ex and annotation SEED with evaluated manually were genomes were searched using blastn (v +2.6.0) again +2.6.0) searchedblastngenomes(v using were was subjected to HGT-Finder ( HGT-Finder to subjected was candidates (Nguyen candidates(Nguyen oh ofre ad addt CIPs ee identifie were CRISPRs candidate and confirmed Both annotation, and classified as described previously described as classified and annotation, used for further analysis. Furthermore, CRISPR-ass Furthermore, analysis. further for used were identified using CRISPRFinder (Grissa CRISPRFinder using identified were the confirmed CRISPR spacer sequences were conducte were sequences spacer CRISPR confirmed the Putative targets of spacers were identified with CR with identified were spacers of targets Putative

20) rti fmle wr ietfe wt HMMER with identified were families protein 2003) , Accepted Articleal.et , 2014), for automated open reading frames (ORF) pr (ORF) frames reading open automated for 2014), , et al.et et al. et Wiley-Blackwell andSocietyforApplied Microbiology et al.et , 2004) annotation by GHOSTX searches against a non a against searches GHOSTX by annotation 2004) , , 2016). Carbohydrate active enzymes (CAZymes) (ht (CAZymes) enzymes active Carbohydrate 2016). , This article isprotected by copyright. All rights reserved. , 2015). Clusteredpal regularly short interspaced ,2015). et al. et R threshold ranging from 0.2 to 0.9, 0.9, to 0.2 from ranging threshold , 2011) function annotation tool, with an e-value c e-value an with tool, annotation function 2011) , P. ficiformisP. et al.et , 2008) in CRISPRs web server with default settings default with server web CRISPRs in 2008) , are deposited in the Sequence Read Archive (SRA) Archive Read Sequence the in deposited are (Makarova ociated proteins (Cas) were identified using SEED using identified were (Cas) proteins ociated ISPRTarget (Biswas ISPRTarget s against dbCAN HMMs (Yin HMMs dbCAN against s st the GenBank NR database. The BLAST output BLAST database. The GenBankstthe NR ile Pfam (Finn (Finn Pfam ile o GhostKOALA automatic annotation server for for server annotation automatic GhostKOALA o cut-off of 0.00001. Additionally, all CAZy hits CAZy all Additionally, 0.00001. of cut-off of orthologous groups (COGs) (Tatusov (COGs) groups orthologous of , u ol cnimd RSR ein were regions CRISPR confirmed only but d, cluded when results were conflicting. SAUL conflicting. were results when cluded s (bin_aplysina and bin_petrosia) were then were bin_petrosia) and (bin_aplysina s ih RSRopr (Grissa CRISPRcompar with d antto sre (Aziz server annotation e .. l anttos ee conducted were annotations All 3.0. et al.et n udr h acsin number accession the under ank AN4750 SA SRP074318, (SRA: SAMN04870510 et al.et , 2015). Pairwise comparisons of of comparisons Pairwise 2015). , ediction and annotation of SEED of annotation and ediction Q value <0.05) to identify HGT identify to <0.05) value et al. et , 2016) and TIGRfam(Haft 2016) , indromic repeats (CRISPR)indromic repeats tp://www.cazy.org) were tp://www.cazy.org) -redundant set of KEGG of set -redundant , 2013) using GenBank- using 2013) , et al.et ut-off of 0.001. of ut-off t al. et , 2012) using 2012) , et al.et , 2008). , 2008; , et al.et et , .

SAUL-“ SAUL draft genomes were compared to those of the cl the of those to compared were genomes draft SAUL “ dat RefSeq-Viral and ACLAME GenBank-Plasmid, Phage, uoen no’ Hrzn 00 eerh n innovati and research 2020 Horizon Union’s European secondary metabolites, both bins were analysed with analysed were bins both metabolites, secondary xelne ntaie o h Gaut Sho o Lif of School Graduate the to Initiative Excellence e-value:0.1, cut-off score:20). With the aim of ex of aim the With score:20). cut-off e-value:0.1, cec aadd y h Uiest o Acln. BMS Auckland. of University the by awarded Science 679849 (‘SponGES’). We thank Hannes Horn for his co his for Horn Hannes thank We (‘SponGES’). 679849 A ws upre b a Ecuaig n Supporting and Encouraging an by supported was CAG ACKNOWLEDGEMENTS (Weber (Weber data data of eosrce ad nlsd y ik ad colleagues and Rinke by analysed and reconstructed submitted to the RAST web server for ORF prediction ORF for server web RAST the to submitted rm eBn (seby D: ZAS0000. ad A and NZ_AQSL00000000.1 IDs: (assembly GenBank from between SAUL bins and the two “two the and bins SAUL between between SAUL bins “and IG (A I: CC A22B3 n SG AAA252-E07) SCGC and AAA252-B13 SCGC ID: (SAG (IMG) tools. Additionally, information on COG annotation COG on information Additionally, tools. Latescibacteria

Latescibacteria Petrosia ficiformis et al. Accepted Article , for 2015) detection the of secondary metabolite b . w “ Two ”. Wiley-Blackwell andSocietyforApplied Microbiology ” genome comparison This article isprotected by copyright. All rights reserved. Latescibacteria and valuable discussions. Latescibacteria Latescibacteria ” SAGs V2.1.3 using ” (ParksSTAMP SG, S_0 ad S_1, ee previously were WS3_B13, and WS3_E07 SAGs, ” ” SAGs was then conducted using RAST comparative RAST conducted using then wasSAGs ” ploring the putative capability of SAUL to produce to SAUL of capability putative the ploring Sine, nvriy f üzug ad y the by and Würzburg, of University Sciences, e obtained from Integrated Microbial Genomics Microbial Integrated from obtained and annotation. Sequence-based comparison Sequence-based annotation. and the web-based tool antiSMASH (version 3.0) (version antiSMASH tool web-based the ntribution to processing of the metagenomic metagenomic the of processing to ntribution 21) Tee w SG wr downloaded were SAGs two These (2013). osest related phylum, the candidate phylum candidate the phylum, related osest abases (default setting except: gap open -5, -5, open gap except: setting (default abases a spotd y gat f h German the of grant a by supported was noain otrl coasi i Marine in Scholarship Doctoral Innovation n rga udr rn Areet no. Agreement Grant under program on a ue fr ute comparisons further for used was iosynthetic iosynthetic gene clusters. W0000. rsetvl) and respectively) SWY00000000.1 et al.et , 2014). 2014). ,

Page 22of78 Page 23of78

Figure 1: Relative abundance (percentage) of SAUL- SAUL- of (percentage) abundance Relative 1: Figure LIST OF FIGUREAND LEGENDS TABLE eebsd tde o mrn sog bceil commu bacterial sponge marine of studies gene-based enzymes of the pathway were not identified in the b the in identified not were pathway the of enzymes only present in either bin_aplysina or bin_petrosia or bin_aplysina either in present only functions identified in both metagenome bins. Red Red bins. metagenome both in identified functions iue . ceai oeve soig eoi poten genomic showing overview Schematic 4. Figure same as those for provided 2A. Figure Phylogenomic analysis based on a data set of up to up of set data a on based analysis Phylogenomic iue : hlgn soig oiin f h SU cl SAUL the of position showing Phylogeny 2: Figure support (100 resamplings for (A), 500 for (B)) is s is (B)) for 500 (A), for resamplings (100 support black arrows indicate sequences obtained from SAUL SAUL from obtained sequences indicate arrows black gene sequence-based maximum-likelihood analysis of of analysis maximum-likelihood sequence-based gene architecture of the SAUL clade. Bold sequences den sequences Bold clade. SAUL the of architecture iue . 6 rN gn-ae mxmm ieiod p likelihood maximum gene-based rRNA 16S 3. Figure i_ersa a wl a te is SU/AC4 seq SAUL/PAUC34f first the as well as bin_petrosia) (filled circles) circles).(open or ≥75% bar Scale belonging to the same study defined in Table S1. ( S1. Table in defined study same the to belonging sponge species across 14 sponge microbiota studies. microbiota sponge 14 across species sponge pce rpre i te eet pne irboe Pr Microbiome Sponge recent the in reported species according tomainly according Gloecknertomainly known, sponge species are classified as high (black high as classified are species sponge known, Accepted Article Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. s 10% represent sequence divergence. hown on tree nodes where support is either ≥90% either is support where nodes tree on hown et al. , respectively. Dashed lines indicate that some that indicate lines Dashed respectively. , bars) or low microbial abundance (white bars), (white abundance microbial low or bars) 37 concatenated marker proteins. (B) 16S rRNA 16S (B) proteins. marker concatenated 37 ins (see Supporting Text for details). (A) Model (A) details). for Text Supporting (see ins and green lines/transporters indicate functions functions indicate lines/transporters green and B) Prevalence of SAUL in 72 of the 81 sponge 81 the of 72 in SAUL of Prevalence B) Letters beside bars represent sponge species sponge represent bars beside Letters ec ietfe (F842. eal ae the are Details (AF186412). identified uence ote sequences derived from marine sponges; marine from derived sequences ote r AC4-fiitd eune i 1S rRNA 16S in sequences PAUC34f-affiliated or (2014) and (2014) Moitinho-Silva d rltv t ohr atra pya (A) phyla. bacterial other to relative ade jc suy (Thomas study oject iis () rvlne f AL n different in SAUL of Prevalence (A) nities. AL n is lss rltvs Bootstrap relatives. closest its and SAUL yoeei aayi soig h internal the showing analysis hylogenetic il f SU cl. lc lns indicate lines Black cell. SAUL a of tial is bnalsn, i_pyia2 and bin_aplysina_2 (bin_aplysina, bins t al. et et al.et 21) Where 2016). , (2017).

Table Table 1. General about information metagenomeSAUL ketosynthase; AT, acyltransferase; ACPS, carri acyl i_pyia n bnptoi. brvain o PKS of Abbreviations bin_petrosia. and bin_aplysina of phosphate operon signal transduction; (B) Featur (B) transduction; signal operon phosphate of Accepted Article Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. er er protein; reductase. ER, Enoyl es of the Type I polyketide synthase identified in identified synthase polyketide I Type the of es oan a flos K, eoeuts; KS, ketoreductase; KR, follows: as domains bins. bins. Page 24of78 Page 25of78

Biswas, Biswas, Gagnon, A., J.N., Brouns, S.J.J., Fineran, de Bary, A. (1879) de Die Bary, A. (1879) erscheinung symbiose. der Ve Bell, J.J. (2008) The Bell, The (2008) functionalJ.J. rolesmarine of sp Bolger, A.M., Lohse, Trim M., Usadel, and B. (2014) Blunt, Blunt, J.W., Copp, Keyzers,Munro, M.H.B.R., R.A., Aziz, R.K., A.A., Best, D., DeJongh, Bartels, M., D Anantharaman,C.T., K.,Hug, Brown, L.A., Sharon, I Alneberg, J., Bjarnason, B.S., de SchirBruijn, I., Albertsen, M., P.,Hugenholtz, Skarshewski, Nie A., Alex, A. and WholeAntunes, A. genome(2015) sequen Al-Khodor, C.T., Price, S., Kalia, A., Abu and Kwai REFERENCES

bioinformaticprediction and analysiscrRNA targ of products. rapid annotations using subsystems technology. aquifer system. Thousands of microbial genomes on shed interc light the the marineintertidal sponge metagenomic contigsand coverage composition.by switching permissive lifestyle. multiple metagenomes. Genome sequences uncultured of rare, bacteria obtai microbes:conserved a structure with functional div Accepted Article Nat. Nat. Prod. Rep. Wiley-Blackwell andSocietyforApplied Microbiology Nat. Commun.Nat. This article isprotected by copyright. All rights reserved. Nat. Nat. Biotechnol.

34 : 235–294. Genome Genome Evol.Biol.

7 : 7:13219. doi:10.1038/ncomms13219.: 7:13219.

31 mer, M., Quick, J., Ijaz, U.Z., et al. (2014) mer, Quick, J., BinniM., Ijaz, et al. U.Z., (2014) isz, Edwards,T., isz, The et al. RAST R.A., Serv (2008) P.C., and C.M.Brown, CRISPRTarget: (2013) k, Y. (2010) k, Y. Ankyrin-repeat (2010) containing proteins of onges. onges. : : 533–538. G., G., and Prinsep, Marine(2017) M.R. natural lsen, G.W.,K.L., Tyson, P.H. and (2013) Nielsen, ., Castelle, C.J.,Castelle, ., al. Probst, (2016) et A.J., momatic: flexiblea trimmer for Illumina sequence rlag von Karl rlag J.von Trübner, Strassburg. Karl BMC Genomics BMC

cing of the symbiont 7 Estuar. Shelf Coast. Sci. revealed a gene revealed repertoire for host- : 3022–3032. ets. ersity. Nat. Nat. Methods onnected biogeochemical processes anin ned ned by coverage differential binning of RNA Biol. Trends Microbiol.Trends

9

10 : 75. : 75.

11 : 817–827. : 817–827. : 1144–1146. Pseudovibrio

79

18 : 341–353. : 132–139. sp. from from sp. ng ng er:

Fieseler, L., Hentschel, Grozdanov, L.,U., Schirme Fan, Fan, Liu, D.,L., Reynolds, M., Stark, M., Kjellebe Fan, Fan, Liu, L., R.,Simister, M., Webster, N.S., and Erwin, P.M. Thacker,Phototrophicand (2008) R.W. n Deveau, H., Garneau, and C Moineau,J.E., (2010) S. Castresana, SelectionJ. of(2000) conserved blocks Chen, L., Xie, Q.W., Nathan, Alkyland C. (1998) hy Burgsdorf, I., Slaby, B.M., Handley, K.M., Haber, M Cerveny, Dankova,L., A.,Straskova, V., Hartlova,

occurrence contextand genomic of unusually small p sponge symbionts. 7 Functional evolutionaryequivalence convergenceand up: up: the phylogenetic and functional of response s a Caribbean sponge-cyanobacteria symbioses. interactions. doi:10.1128/mBio.00391–15. phylogenetic analysis. bacterial cellshuman and against reactive nitrogen evolution in cyanobacterial ofsymbionts sponges. mechanisms. Tetratricopeptide Tetratricopeptide repeat motifs the in of world bac data. data. : 991–1002. Accepted Article Bioinformatics Annu. Rev. Annu. Microbiol.Rev. Infect. Immun. Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. Proc. Sci.Natl. Proc. USAAcad.

30 Mol. Mol. Biol. Evol. : 2114–2120.

81 : 629–635. : 629–635.

64

: 475–493. 17 : 540–552. Thomas, T. (2013) Thomas, T. microbialMarine (2013) symbiosis heats rg, Webster, S., N.S., and Thomas, T. (2012) A., Ceckova, A., Staud, and M., F., Stulik, (2013) J. r, A., r, A., Wen, G., M., Platzer, Widespre (2007) et al. droperoxide subunit reductase C (AhpC) protects

., Blom, J., Marshall, C.W., Lifestyl et al. (2015) Mar. Mar. Ecol. Prog. Ser. 109 from multiple alignmentsfrom for use their in RISPR/Cas system and role its phage-bacteria in utrition and utrition and symbiont diversity of two : 1878–1887. MBio intermediates. intermediates. terial pathogens: terial in role virulence ponge ponge holobiont to stress. thermal olyketide olyketide synthase genes microbial in complexin communities of microbial

6 : e00391–15. : e00391–15.

362 Mol Mol Cell : 139–147.

1 : 795–805. ISME ISME J. e e ad ad

Page 26of78 Page 27of78

Fieseler, L., Quaiser, A., Schleper, and C., Hentsc Finn, P.,R.D., Coggill, Eberhardt, Eddy, R.Y., S.R Frei, E. and Preston, Non-cellulosicR.D. (1964) st Flórez, L. V, Biedermann, P.H.W., and Engl, Kal T., Fischbach, M.A. and Walsh, C.T.Assembly-lin (2006) Frei, E. and Preston, VariantsR.D. (1961) in the s Fieseler, L., Horn, M., Hentschel, Wagner, M., and Garrone, Thiney, R., and Y., Pavans Ceccatty, de M. Genin, Njinkoue, J.-M., Wielgosz-Collin,E., G., Ho Gao, Gao, Wang, Z.M., R.M., Tian, Y., Batang Wong, Y.H.,

genomics. from marinethe sponge-associated,candidate novel protein protein families towardsdatabase: more sustainab a Association xylan and of mannan in with prokaryotic microorganisms. and eukaryotic peptide peptide antibiotics: logic machinery, and mechanism “ mucopolysaccharide cell in coat sponges. (2004) (2004) Glycolipids from marine sponges : monoglycos Synechococcus spongiarum adaptation genomedrives streamlining cyanob of the 192 consortia associatedmarine with sponges. Poribacteria

: 939–943. Accepted Article Environ. Environ. Microbiol. ” in ” marinein sponges. Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. .” .” MBio

8 : 612–624. : Appl. Environ. Microbiol.

5 Porphyra Porphyra umbilicalis : e00079–14. doi:10.1128/mBio.00079–14. e00079–14. : ., ., Mitchell, Mistry, TheJ., et A.L., al. Pf(2016) Experientia hel, U. (2006) hel, genomeAnalysis(2006) U. of first fragmenthe tructural ofpolysaccharidestructural cell algal walls. Appl. Appl. Environ. Microbiol. ructural ructural polysaccharidescell walls.in II.algal ussay, C., ussay, Kornprobst,C., Debitus, J.-M., C., et al. tenpoth, M. (2015) tenpoth, DefensiveM. symbioses(2015) of animals U. (2004) Discovery (2004) U. of candidate the phylum novel (1971) Electron microscopy(1971) ofa , Z.B., , Al-Suwailem,Z.B., A.M., Symbiotic(2014) et al. e e enzymology for polyketide and nonribosomal Nat. Prod. Nat. Rep. le le future. s. s.

phylum phylum ylceramides and alkyldiglycosylglycerols: 27 acterial acterial sponge symbiont“Candidatus Chem. Chem. Rev. : 1324–1326. . . Proc. R. Soc. Proc. BR.

70 Poribacteria : 3724–3732. Nucleic Nucleic Acids Res.

32

106

: 904–36. 73 : 3468–3496. : 2144–2155.

160 by environmental : 314–317.

44 : 279–285. am Nature t t

Gloeckner, Lindquist,V., N., Schmitt, and S., Hent Grissa, I., Vergnaud, G., and C.Pourcel, CR (2008) Gloeckner, V., Moitinho-Silva,Wehrl, M., L., Gerne Haft, D.H., Selengut, J.D., Th and O. White, (2003) Habyarimana, F., S., Al-Khodor, Graham,Kalia, A., Hentschel, U., Fieseler, L., Wehrl, Gernert, M., C. Hentschel, U., J., Hopke, Horn, M., A.B.Friedrich, Hansen,I. and Weeks, V., J.M., Depledge, M.H. (199

Acids Res. experimentally model for tractable vertical microbi isolation, characterizationand activity biological di Genovadi Bull. regularly shortpalindromicinterspaced repeats. HMA-LMA HMA-LMA dichotomy revisited: an electronmicroscopi Ecol. hosts and human macrophages. Role for the Ankyrin eukaryotic-like genes of Microbialdiversity marine of sponges. In, biomonitoring. Environ. Microbiol. Molecular evidence for microbialuniform a communit chromium by marine the sponge

227 Accepted Article65 : 462–474. : 78–88.

31 68 : 371–373. : 371–373. : 327–334. Wiley-Blackwell andSocietyforApplied Microbiology Mar. Pollut. Bull. This article isprotected by copyright. All rights reserved.

68 : 4431–4440.:

Environ. Microbiol. 31 Halichondria Halichondria panicea : 133–138. Sponges (Porifera) , , M., Steinert, Horn,J., Hacker, and M. (2003) , Hacker, Wagner, and M., (2002) B.S. Moore, J., ISPRcompar: websitea compare toclustered schel, U. (2013) schel, U. (2013) e e TIGRFAMs ofdatabase protein families. J.E., Price, Garcia,J.E., C.T., M.T., Kwaik, Y.A. (2 and Legionella pneumophila rt, rt, C.,P., Schupp, Hentschel, Th U., et al. (2014) 5) Accumulation5) copper, of zinc, cadmium and . Nucleic Acids Res. Boll. dei dell’Università Ist. Boll. Musei Degli e Biol. al transmission marine sponges. in

10 y sponges in from different oceans. Pallas and the implications for : 1460–1474. cal sponge of survey 56 species. Ectyoplasia ferox . pp. Springer, 59–88.

36 in parasitism in of protozoan : 52–57. , an Nucleic Nucleic Microb. Microb. Biol. Biol. Appl. Appl. 008) 008) e e Page 28of78 Page 29of78

Kanehisa, Morishima, Sato, M., and Y., Bl K. (2016) Keeling, Palmer,P.J. Horizontaland J.D. (2008) ge Kamke, Kamke, Taylor, J., ActM.W., Schmitt, (2010) S. and Kamke, Kamke, Rinke, J., Schwientek, C., Mavromatis,P., K Hooper, J.A. and Systema Por(2002) Van R.M. Soest, Kamke, Kamke, Sczyrba, J., Ivanova,A., N.,P. Schwientek, Kanehisa, Goto, M., S., Okuno,Kawashima, S., a Y., Horn, Slaby, B., H., M., Jahn, Bayer, Moitinho-K., Hentschel,Piel, U., J., Degnan, S.M., and Taylor, Hug, L.A.,Hug, Baker, B.J., Anantharaman, K., C. Brown,

genomics reveals carbohydratecomplex degradation p functionalcharacterizationof and genome metagenom compartmentation, eukaryote-like repeat proteins, a bacteria obtained rRNA rRNAvs 16S 16S by gene comp candidate phylum marine sponges. metagenomes. e87353. doi:10.1371/journal.pone.0087353.e87353. deciphering genome. the CRISPR CRISPR and other features defense-related in marine Hooper,J.A. and Van Springer (eds) US.Soest,R.M. sponge microbiome. view of the tree view of tree the of life. Accepted Article Wiley-Blackwell andSocietyforApplied Microbiology Front. Front. Microbiol. ISME ISME J. Poribacteria This article isprotected by copyright. All rights reserved. Nat. Rev. Nat. Microbiol.Rev. Nat. Nat. Microbiol.

7 Nucleic Nucleic Acids Res. : 2287–300. : by single-cell by genomics: new into phylogeinsights

7 : 1751. doi: 1751. : 10.3389/fmicb.2016.01751.

1 : 16048. doi: 10.1038/nmicrobiol.2016.48.

10 M.W. (2012)M.W. Genomic intoinsights marinethe Silva, L., Silva, Förster, F., An(2016) et al. enrichment

32 : 641–654. ne transfer nein transfer eukaryotic evolution. , Rinke, Mavromatis, C., K., Single-c et al. (2013) T., Probst, A.J., Castelle, al. A (2016) C.J., et n astKOALA and GhostKOALA: KEGG tools for ivity profiles for marine sponge-associated ., Ivanova,., Sczyrba, A., N., The et al. (2014) nd Hattori, M. (2004) Hattori,The nd KEGGM. (2004) resource for : 277–280. ifera. ifera. classification A guide the to of sponges sponge-associated microbial nd nd other genomic features. atterns in poribacterial ofsymbionts arisons. e e sequences. ISME J.ISME J. Mol. Biol.J. Mol.

4 : 498–508. : 498–508. Nat RevGenet Nat

PLoS One 428 ny,cell- : 726–731. ew ew

9 of ell ell :

Makarova, K.S., Alkhnbashi, Wolf, Y.I., Costa O.S., Makarova, K.S., Barrangou,Haft, D.H., Brouns, R., Liu,L., M., Zhong, L., Kjelleberg, Fan, S., and Th Lackner, G., Peters, E.E., E.J.N., Helfrich, and Pi Mackie, Preston,The W. and occurrenceR.D. (1968) Luter, H.M., Whalan, T and S., Webster, (2012) N.S. Liu, Fu M.Y., Kjelleberg, S., Thomas, T. (2011) and Ludwig, W., O., Strunk, S.,Klugbauer, N Klugbauer, López-Legentil, McMurray, Song, B., S., P S.E., and Kvint, K., Nachin, L., Diez, A., Nyström,(2and T.

Evolution and classification of CRISPR–Cas the syst community sponge of symbionts. 114 bacterial productnatural factories withassociated doi:10.1371/journal.pone.0039779. Codium fragile causes of brown coralspot syndrome in the spo reef proteobacterium sponge in the and regulation. Bacterial phylogeny based on comparative a sequence 1840–9. ecosystems: expressionhsp70 theby barrelgiant sp 9 : 605–618.

: E347–E356. Accepted Article and and Wiley-Blackwell andSocietyforApplied Microbiology Curr. Opin. Microbiol. Opin. Curr. This article isprotected by copyright. All rights reserved. Acetabularia crenulata Cymbastelaconcentrica ISME ISME J.

6 : : 140–145.

el, J. (2017) InsightsJ. el, into (2017) ofuncultthe lifestyle omas, Metaproteogenomic(2012) T. analysis of a 6 003) The003) universalbacterial protein: stress functi . . : 1515–1525. Planta , Shah, , F., S.A., et S.J., Saunders, al.An (2015) nctional ofgenomic analysis an uncultured δ- awlik, J.R. (2008) J.R.Bleaching awlik, (2008) stress coraland in re S.J.J., Charpentier, Horvath,E., P., (2011) et al. ., Weizenegger,., M., Neumaier, J., et al. (1998) hermal and sedimentation unlikelyare stress of of mannan microfibrils green the algae in

marine sponges. ems. 253 onge onge nge, : 249–253. nalysis. . . Nat. Nat. Rev. Microbiol. ISME ISME J. Ianthella Ianthella basta Xestospongiamuta Electrophoresis

5 : 427–435. Proc. Sci. Natl. Acad. USA . . PLoS OnePLoS

9 . . : 467–477. : 467–477.

Mol. Mol. Ecol. 19 : 554–568. : 554–568.

7 : e39779. : e39779. ured

17 :

on on ef ef

Page 30of78 Page 31of78

Montalvo, and N.F. Sponge-associa(2011) Hill, R.T. McCord, Fridovich, SuperoxideJ.M. I. (1969) and di Newman, D.J. Cragg, and G.M. Natural (2012) product Nguyen, A., Ekstrom, M., Li, X., Yin, and (2015) Y. Moitinho-Silva, L., Steinert, G., Nielsen, S., Hard Pawlik, J.R. (2011) Pawlik, chemicalThe J.R. (2011) ecology of sponges Parks, D.H., Imelfort, M., Skennerton, Hugenh C.T., Parks, D.H., G.W.,Tyson, Hugenholtz, P., and Beiko Nyström, and Expression (1994) F.C. Neidhardt, T. a Overbeek, G.D., Olson, R., R., Olsen,G.J., Pusch,

updated evolutionary classification of s CRISPR-Cas 752. doi: 10.3389/fmicb.2017.00752. (hemocuprein). from 1981 from 2010. to 1981 finding application and to Predicting HMA-LMAthe marine in status sponges by metagenomes. related butrelated geographically spongedistant hosts. Res. assessing the assessing the quality of microbial genomes recovere taxonomic functional and profiles. Escherichia coli Rapid Annotation of microbial genomes using Subsyst

Accepted42 Article : 206–214. Wiley-Blackwell andSocietyforApplied Microbiology Genome Genome Res. during arrest. growth J. Biol. Chem.J. This article isprotected by copyright. All rights reserved. J.Prod. Nat. Aspergillus

25

244 : 1043–1055.

75 : 6049–6055. Bioinformatics : 311–335. genomes. Mol Mol Microbiol oim, oim, C.C.P., Wu, Y.-C., McCormack, G.P., al. (20 et Davis, Davis, J.J.,The Disz, al. et T., andSEED (2014) t HGT-finder: new a tool for horizontal gene transfe oltz, P., and oltz, G.W. Tyson,P., and CheckM:(2015) , STAMP:(2014) R.G. statistical analysis of ted bacteriated strictly are maintained in closely two smutase. An smutase. enzymic for function erythrocuprein on Caribbean products natural reefs: shape nd role nd of the protein,stress universal UspA, of Toxins s s as of new over sources drugs 30 years the Appl. Environ. Appl. Microbiol.

30 ystems.

d cells, from single isolates, and : 3123–3124. 11 ems ems Technology (RAST). machine learning.

7 : 537–544. : : 4035–4053. Nat. Rev. Nat.Microbiol. Rev. Front. Microbiol.

77 : 7207–7216. Nucleic Acids

13 : 722–736.

he 8 : 17) 17) r

Peng, Leung, H.C.M.,Y., Yiu, S.M., and F.Y.L Chin, Piel, Piel, Metabolites (2009) J. from symbiotic bacteria Piel, Piel, Hui, J., Wen, D., G., D., Butzke, Platzer, M. Pruesse, E., Pruesse, Quast, E., Knittel, C., Fuchs,L K., B.M., Rinke, C., P., Sczyrba, Schwientek, Ivanova, A., N. Ribes, Calvo, M., Movilla, E., J., Logares, R., Com Reynolds, T. D. Evolution(2016) and Thomas, and fu Sala, Sala, Della, G. Teta,Hochmuth, Costantino, T., R., Schmitt, Tsai, J.,P., Fromont,Bell, S., J., Ilan,

natural systems. cell metagenomic sequencingand with data highly un 1428. swinhoei polyketide polyketide anbiosynthesis by uncultivatedbacteria SILVA: comprehensivea online resource qualityfor sponge microbiome tolerance oceanto favors acidifi 544. sequence withcompatible data ARB. symbionts. into the and phylogeny codingof microbiapotential the the microbiome of marine spongethe Drugs spongecore, microbiota: variable and species-speci

12 Accepted Article . . : : 5425–5440. Proc. Sci.Natl. Acad. USA Mol. Mol. Ecol. Wiley-Blackwell andSocietyforApplied Microbiology Bioscience This article isprotected by copyright. All rights reserved.

25 : 5242–5253.

61 : 888–898.

101 Nucleic Nucleic Acids Res. Plakortis halichondrioides Lindquist,M., Assessing N., (2012) et al. com the : 16222–16227. , Fusetani, N., Matsunaga, and Antitumor (2004) S. udwig, Peplies,W., udwig, Glöckner,J., and (2007) F.O. a, R., a, and R., Restructuring(2016) Pelejero, C. of the N., Anderson, I.J., InsiCheng, et J.F., al. (2013) . (2012) . IDBA-UD:(2012) dea novo single-assembler for V., and Mangoni, A. (2014) and V., A. PolyketideMangoni, (2014) synthases in . . Nat. Nat. Prod. Rep. nction of nction of eukaryotic-like proteins from sponge l of symbiont marine the sponge fic bacterial communitiesfic marinein checked checked aligned and RNA ribosomal l dark matter. dark matter. l cation. even depth.

35 : 7188–96. Environ. Environ. Microbiol. Rep.

26 : a : metagenomic a update. : 338–362. Nature Bioinformatics

499 : 431–437. : 431–437.

28

: 1420– 8 Theonella : 536– plex plex ghts Mar. Mar.

Page 32of78 Page 33of78

Taylor, M.W., Radax, D.,R., Steger, and Wagner, M. Simister, Simister, Tsai,R., M.W., Taylor, P., and Webster, Tatusov, R.L., Fedorova, N.D., Jackson, J.D., Jacob Slaby, Hackl, B.M., T., H.,K., Horn, Bayer, and He Taylor, M.W., Thacker, and Hentschel, R.W., (200 U. Simister, Simister, Rogers,R., M.W., Taylor, P K.M., Schupp, Taylor, M.W., Tsai, P., R.L.,Simister, P., Deines, Simister, Simister, R., P., Deines, Botté, E.S., Webster, N.S Siegl, A., Kamke, J., Piel, T., J., Hochmuth, Richt

COG andatabase: updatedversion eukaryote includes evolution, biotechnologicalecology, and potential. high and nutrients temperatures. Ecol. doi:10.1038/ismej.2017.101. sponge microbiome inunity reveals defense but meta isotopic of analysis active bacterial communities i bacteria widespreadare (but diverserare) in marin 316 Microbiol. revisited: comprehensivea phylogeny of sponge-asso sponges. reveals reveals the oflifestyle sponges.

: 1854–1855.

Accepted Article85 : 195–205. ISME J.ISME ISME J.ISME

14 : 517–524. Wiley-Blackwell andSocietyforApplied Microbiology

5 6 : 61–70. : 564–576. : 564–576. This article isprotected by copyright. All rights reserved. Poribacteria PLoS One , candidate a phylum symbiotically withassociated er, Liang,er, et M., C., al. Single-cell (2011) genomi Ericson,E., G., et Botte, al.“Sponge-spec (2013) ., Taylor, and Sponge-specificM.W. cluster(2012a) ntschel, U. (2017) ntschel, Metagenomic(2017) binningU. of marinea

N. (2012b) Sponge-microbe N. (2012b) associations survive 7 s, A.R., s, Kiryutin, et Koonin, B., al. E. V, (2003) .J., and P. .J., TemporalDeines, (2013) molecular and : e52220. : doi:10.1371/journal.pone.0052220.e52220. (2007a) (2007a) Sponge-associated microorganisms: 7b) Evolutionary insightssponges. from n two New n sponges.New two Zealand

e environments. e Microbiol. Mol. Microbiol. Biol. Rev. ciated microorganisms. bolic specialization. s. BMC BMC Bioinformatics ISME ISME J. ISME ISME J.

7

71 4 : 438–443. : 438–443. Environ. : : 41. FEMS Microbiol. : 295–347. : 1–15. 1–15. marine marine Science cs The ific” s

Wagner, M. The (2006) Horn, and M. Thomas, T.,Lurgi, Moitinho-Silva, L., Björk, M., J Vasu, Vasu, K. Nagaraja, and Diverse functions V. (2013) Thomas, T., DeMaere, D., Rusch, Yung, M.Z., P.Y., L Tian, L., R.-M., Cai, Zhang, W., Y.-H., Wong, Ding, Tian, R.M., Gao, Wang, Bougouffa, Y., Z.M., S., Cai Webster, N.S., Botté, E.S., Soo, Whalan, R.M., and Weber, K.,Blin, T., Duddela, S., H.Kim, D., Krug, Webster, and N.S. Thomas,spongeThe T. (2016) holo

Diversity, convergentstructure and evolution of th cellular defense. cold sponge. seep bacterium sponge.in 7 4 comprise with a biotechnological superphylum me and genomic signatures of sponge bacteria uniquereveal microbe-host drive interactions of adaptation a sul analysis versatile heterotrophic reveals capacity o 17 Res. thermal tolerance. comprehensive resource for the mininggenome of bio doi:10.1128/mBio.00135–16. : 11870. doi:10.1038/ncomms11870. : 1557–1567.

: 241–249.

Accepted43 Article : 237–243. Wiley-Blackwell andSocietyforApplied Microbiology Microbiol. Mol. Microbiol. Biol. Rev. mSystems This article isprotected by copyright. All rights reserved. Environ. Microbiol. Environ. Rep. Environ. Environ. Microbiol.

2 : e00184-16. : Planctomycetes

16

U., AntiSMASH Bruccoleri, (2015) et 3.0-a R., al. .R., Easson, C.,Easson, .R., (2016 Astudillo-García, et C., al. 77

W., and P.-Y. Genome(2017) Qian, reduction and 3 : 3548–3561. , L., , Bajic, Genomicand P.Y.V., Qian, (2014) : 756–762. 756–762. : S. (2011) The(2011) S. larval sponge highexhibits holobiont : 53–72. of of restriction-modification in systems toaddition ewis, M., ewis,Functional Halpern, (2010) M., et A., al. , genome. Verrucomicrobia f a potentiallya f symbiotic sulfur-oxidizing e e global sponge microbiome. fur-oxidizing bacterium withfur-oxidizing associated a and shared features of symbiosis. synthetic gene synthetic gene clusters. dical relevance. MBio

7 : e00135–16. , Chlamydiae Curr. Opin. Curr. Biotechnol. and phyla sister Nucleic Nucleic Acids Nat. Nat. Commun. ISME ISME J. )

Page 34of78 Page 35of78

Youssef, N.H., Farag, I.F., Rinke, Hallam, C., S.J. Zhang, F., Blasiak, Karolin,L.C., Powell, R. J.O., Webster, Ridd, N.S., Webb, R.I., Hill, M.J., R.T., Yin, Mao,Y., X.,J., Chen, Yang, Mao, and X., F., Wilson, M.C., Mori, T., C., Rückert, A.R., Uria, He Wiedenbeck, J.F.M. Cohan,Origins (2011) and ofba Yarza, Pruesse,P., Yilmaz, Glöckner, P., E., F.O., Wheeler, T.J. and S.R.Eddy, nhmmer: (2013) DNA hom Wu, Fu, L., S., B., Zhu, Z., and Niu, Li, W. (2011) Wortham, Patel, B.W., C.N., and Oliveira, (200 M.A.

“ analysis the of metabolic and potential speci niche sequestration in theof polyphosphate form microby carbohydrate-active annotation.enzyme bacterial taxon with a and metabolic distinct large Nat. Microbiol.Rev. transfer transfer and newtoadaptation niches.ecological classification cultured uncultured of bacteria and microbialcommunitycoral a sponge.reef of Bioinformatics metagenomic analysis. sequence Springer Springer US, 106–115.pp. specific mechanisms. In, Skurnik,M., Bengoechea,J.A Latescibacteria Accepted Article

Wiley-Blackwell andSocietyforApplied Microbiology 29 ” (WS3). (WS3). ” : 2487–2489. This article isprotected by copyright. All rights reserved.

12 : 635–645. : 635–645. PLoS One BMC Genomics BMC

10 : e0127499. J.,Geddes, and C.D., Hill, PhosphorusR.T. (2015) , , Woyke, T., and Elshahed, M.S. (2015) WebMGA: customizable web a server for fast Nucleic Nucleic Acids Res. and Negri, A.P. (2001) and effects The ofNegri, copperA.P. (2001) on the Xu, Y. (2012) DbCAN: Xu, Y. (2012) web A resource for automated Ludwig, W., Unitin(2014) Schleifer, et K.-H., al. lf, lf, M.J., K., Takada, An et al. (2014) environmenta Environ. Microbiol. Environ. 7) 7) Polyamines bacteria:in pleiotropic yet effects cterial diversitycterial horizontalthrough genetic

12 ology with profile search HMMs. and archaea using 16S archaea and using rRNA gene 16S sequences. FEMS Microbiol. Rev. repertoire. repertoire. alization candidatealization of phylum : 444. ., and ., Granfors,K. (eds), bial symbiontsbial marine sponges. in

40 Nature : 445–451. : 445–451.

3 : 19–31. :

506

35 : 58–62. : 957–976. The genus genus Yersinia The In silico

Proc. g g the l

. .

Natl. Sci. USAAcad. Accepted Article Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved.

112 : 4381–4386. Page 36of78 Page 37of78

amn Astudillo-GarcíaCarmen sponges marine col f ilgcl cecs Uiest o Auckla of University Sciences, Biological of School Associate ProfessorMichael Taylor W. *Corresponding author:

University ofUniversity Queensland, Lucia,QLD, Australia St 4 3 2 1 bacter enigmatic an SAUL, of genomics and Phylogeny 7 6 5 Email: Email: Zealand Michael W. Taylor MichaelW. Würzburg,Germany Maurice Wilkins Centre MauriceWilkins for Molecular Biodiscovery, Christian-Albrechts-UniversitätKiel,Kiel, Germ zu Australian Institute Julius-von-Sachs II, Botany of Department RD3Microbiology, Marine GEOMAR Helmholtz Centre fo MarineInstituteScience, of Auckland ofUniversity Biological School of Sciences, University Auckla of [email protected] 5

Australian

1,7 * AcceptedMo and Chemistry of School Ecogenomics, for Centre Article 1,2 Bae . Slaby M. Beate , Wiley-Blackwell andSocietyforApplied Microbiology ; Telephone; 9232280 +649 This article isprotected by copyright. All rights reserved. 3,4 Dvd . Waite W. David , any nd, Auckland,ZealandNew , Auckland, , NewZealand for Biological Sciences, University of Würzburg, of University Sciences, Biological for University Auckland, ZealandNew of d Piae a 909 Acln 14, New 1142, Auckland 92019, Bag Private nd, rOcean Research, Kiel, Germany a lnae rqety soitd with associated frequently lineage ial 5 Kitn Bayer Kristina , lecular Biosciences, The Biosciences, lecular 3 Ue Hentschel Ute , 3,6 , ,

aie pne cnan n xrodnr lvl f mi of level extraordinary an contain sponges Marine ORIGINALITY-SIGNIFICANCESTATEMENT nutrd irognss Wie igecl genom single-cell While microorganisms. uncultured eoee gnms rmcran pneascae mi sponge-associated certain from genomes recovered is commonly reported from sponges, the so-called “s so-called the sponges, from reported commonly is Keywords: Keywords: marine sponge;symbiont; metag phylogeny; Runningtitle:Phylogenyand genomics SAUL of abundant sponge symbionts about which which about symbionts sponge abundant e are ot mt-nlss f xsig 6 rRNA 16S existing of meta-analysis a out carried we phylogenetic and genomic analyses, to gain new insi new gain to analyses, genomic and phylogenetic functionenigmaticof this bacterial clade. Accepted Article Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. littlevirtually nothinglittlevirtually ghts into the prevalence, identity and potential and identity prevalence, the into ghts ponge-associated unclassified lineage (SAUL)”, lineage unclassified ponge-associated crobes, there remain some key lineages of of lineages key some remain there crobes, enome binning; functionalenome analyses c ad eaeois prahs have approaches metagenomics and ics ee aaes culd ih extensive with coupled datasets, gene rba dvriy msl cmrsd of comprised mostly diversity, crobial is known. For one such lineage that that lineage such one For known. is Page 38of78 Page 39of78

genomes assembled from sponge metagenomes, revealin metagenomes, sponge from assembled genomes from sponges, yet sponges, from hlgntc nlss cnutd sn bt 1S rRN 16S both using conducted analyses, Phylogenetic symbiont. Metabolic reconstruction suggested that that suggested reconstruction Metabolic symbiont. akr rti sqecs rvae ta SU i a m a is SAUL that revealed sequences, protein marker produce to and cell the into phosphate transport to and A distribution SAUL. phylogenetic me of status ay aie pne cnan es ad ies commu diverse and dense contain sponges marine Many SUMMARY defenceasaswell systems, the capability genomic symbios encoding relationship. this of maintenance and establishment genes were revealed Also periods. ho the with symbiotic is SAUL of lifestyle the that facultative anaerobic metabolism, with the capacity the with metabolism, anaerobic facultative hlm “ phylum constituting a phosphate reservoir for the sponge h sponge the for reservoir phosphate a constituting ieped itiuin f hs ld ad t assoc its and clade this of distribution widespread Members of the so-called “sponge-associated unclass “sponge-associated so-called the of Members carbohydrates. We described for the first time in time first the for described We carbohydrates. Latescibacteria

iteital nothinglittlevirtually Accepted Article . utemr, e odce a opeesv analy comprehensive a conducted we Furthermore, ”. Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. is known about these bacteria. Here we investigat we Here bacteria. these about known is a sponge symbiont the putative the symbiont sponge a of secondary of metaboliteproduction. ta-analysis ofthe available literature ta-analysis th revealed ost in deprivation periodsdeprivation in ost to degrade multiple sponge- and algae-derived and sponge- multiple degrade to ain ih aooial vre sog hosts. sponge varied taxonomically with iation and store polyphosphate granules, presumably granules, polyphosphate store and such as eukaryotic-like repeats and CRISPR-Cas and repeats eukaryotic-like as such t pne ad dniy symbiontdeprivation identify and sponge, st ified lineage” (SAUL) are frequently recorded frequently are (SAUL) lineage” ified gn-ae pyoey n concatenated and phylogeny gene-based A AL ebr ae eoi bcei with bacteria aerobic are members SAUL npyei sse cae f h candidate the of clade sister onophyletic g novel insights into the physiology of this of physiology the into insights novel g is iis f soitd microorganisms. associated of nities factors hc my aiiae the facilitate may which . Our findings suggest findings Our . genomic genomic i o to draft two of sis

capacity d the ed e

Van Soest, 2002) and are key components of the bent the of components key are and 2002) Soest, Van rhe, iue ad fungi and viruses archaea, Many sponges also host diverse and abundant microbi abundant and diversehost also sponges Many rmnn mmes f h s-ald mcoil dark “microbial so-called the of members prominent omnte cmrs u t 4 dfeet atra p bacterial different 41 to up comprise communities 02, ih usqet tde rvaig t presen its revealing studies subsequent with 2002), tro the of symbiont a as identified first was clade (SAULlineage “ long-term the to simply refer to definition, broad ese ad hms 2016) Thomas, and Webster Marine sponges(phylum INTRODUCTION 2015)(Erwin and Thacker, 2008; Gao Gao 2008; Thacker, and 2015)(Erwin oa sog boas (Taylor biomass sponge total understood. One such clade is the is clade such One understood. oevr suis f h sog mcoit hv com have microbiota sponge the of studies Moreover, 07; Schmitt 2007a; of the nature of this relationship, following the e the following relationship, this of nature the of numerically dominant and/or known to play significa play to known and/or dominant numerically h rclirne f ay o ee ms, sponge-ass most, even or many, of recalcitrance The 2007a). “ aoaoy utr hs osrie or blt t u to ability our constrained has culture laboratory Poribacteria Ca Sncoocs spongiarum” Synechococcus .

” (Fieseler” )),

(Schmitt (Schmitt t al.et 21; Simister 2012; , Accepted Article et al. et Porifera et al. et (Taylor (Taylor , 2004; Siegl 2004; , I ti suy te em “ybot ad “symbios and “symbiont” terms the study, this In . , 2012) , t al. et Wiley-Blackwell andSocietyforApplied Microbiology ) ) theancient extantare among most of themetazoan (Erwin and Thacker, 2008; Gao Gao 2008; Thacker, and (Erwin et al.et lineage PAUC34f, also known as known also PAUC34f, lineage This article isprotected by copyright. All rights reserved. , initially designated as PAUC34f (Hentschel PAUC34f as designated initially , 20a 20b Hentschel 2007b; 2007a, , et al. et t al. et , 2007a; Webster and Thomas, 2016)(Taylor Thomas, and Webster 2007a; , et al. et , 2014; Burgsdorf 2014; , , 2012a; Thomas Thomas 2012a; , , 2011; Kamke 2011; , arly de Bary definition (de Bary, 1879; Taylor Taylor 1879; Bary, (de definition Bary de arly association of two or more organisms, irrespectiveorganisms, more or two of association pical sponge pical drtn te hsooy fsog symbionts. sponge of physiology the nderstand nt functional roles, such as the cyanobacteriumthe as such roles, functional nt ce in numerous sponge species (Taylor (Taylor species sponge numerous in ce hos in an array of marine habitats (Bell, 2008). (Bell, habitats marine of array an in hos al communities which constitute up to 35% of 35% to upconstitute which communities al hyla (Thomas (Thomas hyla ociated microorganisms to grow in a pure pure a in grow to microorganisms ociated atr (Rinke matter” ol fcsd n atra ht are that bacteria on focused monly et al. et et al. et Theonella swinhoei Theonella t al. et , 2014). Consequently, other, less Consequently,other, 2014). , , 2015), t al. et sponge-associated unclassified sponge-associated t al.et t al. et 21) Fr xml, SAUL example, For 2016). , t al. et 21) Tee symbiont These 2012). , or the candidate phylum phylum candidate the or 21) a wl a many as well as 2016), , , 2014; Burgsdorf Burgsdorf 2014; , 21) ean poorly remain 2013) , et al. et (Hentschel s ae sd n a in used are is” s (Hooper ands , 2002),.. , et al. et 2007a; , t al. et t al. et et al. et et al. et This , , , , Field Code Changed Code Field Changed Code Field Formatted: Formatted: Font: English (U.S.) English Font: (Germany) German Font: Page 40of78 Page 41of78

was reflected in the assignment of >70% of SAUL seq SAUL of >70% of assignment the in reflected was Although the SAUL lineage is commonly associated wi associated commonly is lineage SAUL the Although (Simister (Simister al. al. with SAUL-affiliated sequences being identified in identified being sequences SAUL-affiliated with (labelled as PAUC34f) in 72 of those species(Thoma those 72 of in PAUC34f) (labelledas beyond membership in the domain domain the in membership beyond t rltosi wt te ot pne n te preci the and sponge host the with relationship its regarding SAUL classification, together with a a with togetherclassification, regardingSAUL usqet tde hv asge SU-fiitd se SAUL-affiliated assigned have studies subsequent h oiaos sponge oviparous the marine sponges (Hentschel sponges marine Deferribacteres neovd Iiily lsiid s mme of member a as classified Initially unresolved. nbe mrn sogs to sponges marine enables capabilities, motivated us to take a detailed look look detailed a take to us motivated capabilities, also been detected in other environments such as se as such environments other in detected been also offspringto adults from symbionts relevant host-specific microbial communities (Schmitt (Schmitt communities microbial host-specific d sequences of 6% and 12% approximately represented niiul iegs ae eeld oe isgt in insights novel revealed have lineages individual recent study of 81 different sponge species, compri species, differentsponge 81 of study recent including the candidate phylum “ phylum candidate the including Planctomycetes-Verrucomicrobia-Chlamydiae odorabile , 2007a; Kamke Kamke 2007a; , , , Thomas 2007a,2013; (Simister t al. et , 2012a), which represent clades of microorganisms microorganisms of clades represent which 2012a), , (Webster

et al. et t al.et Accepted Article ferox Ectyoplasia 21; Simister 2010; , et al. et , 2012) and and 2012) , et al. et t al. et anan ope mcoil omnte truh the through communities microbial complex maintain ,2016),albeit lower at abundancesponges.than in Wiley-Blackwell andSocietyforApplied Microbiology , 2002; Simister 2002; , Poribacteria 21) a a idpnet ld coey eae to related closely clade independent an as 2011), , This article isprotected by copyright. All rights reserved. Bacteria noia alataAncorina t al. et (Gloeckner and consequently to maintain, over time, complex a complex time, over maintain, to consequently and (PVC) superphylum (Wagner and Horn, 2006; Taylor Taylor 2006; Horn, and (Wagner superphylum (PVC) ” (Fieseler ” lackcompleteabsence Mnav ad il 21) Ti lc o agreement of lack This 2011). Hill, and (Montalvo et al. et 21a, r ee nbe o lsiy uh sequences such classify to unable were or 2012a), , et al. et samples from adult, embryo and larval stages of stages larval and embryo adult, from samples sing the Sponge Microbiome Project, found SAUL found Project, Microbiome Sponge the sing , 2012).. , s s t hs ngai cae Gnmc suis of studies Genomics clade. enigmatic this at Deltaproteobacteria e hlgntc lsiiain f AL remains SAUL of classification phylogenetic se et al. et (Simister o h lfsye o ohr pne symbionts, sponge other of lifestyles the to t al. et awater, marine sediments and soils (Taylor (Taylor soils and sediments marine awater, th sponges, to date no studies have examined have studies no date to sponges, th , 2012a). It can also be vertically transmitted, vertically be also can It 2012a). , uences to so-called ‘sponge-specific clusters’ ‘sponge-specific so-called to uences une a pr o either of part as quences et al. et , 2016). This apparent affinity for spongesfor affinity apparentThis 2016). , 21) Vria mcoil transmission microbial Vertical 2013). , rvd rm h sponges the from erived Despite this sponge affinity, SAUL has SAUL affinity, sponge this Despite , 2006; Siegl 2006; , t al.et of knowledge about its genomic genomic its knowledgeabout of that can be highly enriched in in enriched highly be can that 21) rsetvl. more A respectively. 2013), , (Hentschel et al. et , 2011; Kamke Kamke 2011; ,

Acidobacteria t al. et Rhopaloeides transfer transfer 2002), , et al. et the the or nd of of et et ,

available 16S rRNA gene available16S sequences; use (ii) sequenc o outy ne is hlgntc oiin amongst position phylogenetic its infer robustly to and identify symbiotic features of SAUL members by members SAUL of features symbiotic identify and RESULTS AND RESULTS DISCUSSION metagenomedata. clade, across different environments and among diff among and environments different across clade, were obtained by metagenomic analyses of the spongethe of analyses metagenomic by obtained were pneascae sulphur-oxidizing sponge-associated f pne ybot ta icue n nihet f p of enrichment an include that symbionts sponge of (Thomas (Thomas ynbceim “cyanobacterium In this study we aimed to (i) comprehensively descr comprehensively (i) to aimed we study this In 2016; Slaby Slaby 2016; “ candidatethegenus 2014), 2013, spongehost. poten the revealed genomes SAUL draft near-complete prevalence the at look detailed first the undertook know is little remarkably However, species. sponge t particularly sponges, marine of surveys diversity has PAUC34f or SAUL as known lineage bacterial The an abundance of bacterial defence systems including systems defence bacterial of abundance an antitoxin systems systems antitoxin et al. et

et et al. , 2010; Fan 2010; ,

, , 2017) Ca. (Fan Synechococcus spongiarum” (Tian spongiarum” Synechococcus

Accepted Articleal. et . et al. et , 2012; Horn Horn 2012; , , 2012), universal stress proteins such as UspA (Fa UspA as such proteins stress universal 2012), , Wiley-Blackwell andSocietyforApplied Microbiology Candidatus This article isprotected by copyright. All rights reserved. Gammaproteobacterium et al. et , 2016; Slaby 2016; , Entotheonella” (Lackner hose involving so-called high-microbial-abundance so-called involving hose es derived from SAUL and related derived SAULand es from bacterial phyla atra (i) eemn te eoi potential genomic the determine (iii) bacteria; ibe the distribution and abundance of the SAUL the of abundance and distribution the ibe n pyoey f AL wie nlss f two of analysis while SAUL, of phylogeny and n about this cryptic clade of bacteria. We thus thus We bacteria. of clade cryptic this about n CRISPR-Cas, restriction-modification and toxin- and restriction-modification CRISPR-Cas, erent sponge species, using a meta-analysis of meta-analysis a using species, sponge erent been reported in many, if not most, microbial most, not if many, in reported been reconstructing genomes from sponge-derived from genomes reconstructing microbiota, thus revealing common features common revealing thus microbiota, tial metabolism of SAUL bacteria within the within bacteria SAUL of metabolism tial oen ivle i mcoehs signalling microbe-host in involved roteins et al. et et al. et (Tian (Tian 04 Burgsdorf 2014; , , 2017)(Fan , t al. et et al. et 21) Smlr findings Similar 2017). , et al.et , 2017), the widespreadthe 2017), , et al. et , 2012; Horn Horn 2012; , n et al. et 05 ad a and 2015) , , 2012) and 2012) , et al. et , Formatted: All caps All Page 42of78 Page 43of78

SAUL is is SAUL a SAUL is is SAUL abundantwidespreadand marine in sponge ho ih eea suis ht ae dniid overlappi identified have that studies several with AL i gorpial sprtd n phylogenetica and separated geographically in SAUL) by Thomasand study colleagues (2016) (Figure 1B). In the absence of a comprehensive evaluation of SAU of evaluationcomprehensive a of absencethe In examine those sponge studies which reported the pre the reported which studies sponge those examine supported the clustering of SAUL as a monophyletic monophyletic a as SAUL clusteringof the supported num and presence widespread its demonstrated Having a Atlantic Pacific, the from identified members its (Thomas relative sequence abundances per sponge species ran species sponge per abundances sequence relative Supporting Information FigureS1).). Information Supporting PAUC34f) (or near full-length (>1450 bp) (>1450 full-length near 37 markers suggested that SAUL is a sistercladeof a is suggestedthatSAUL markers 37 of position phylogenetic the determine to sought we shown).not (data seas MediterraneanCaribbean and broad distribution, and its and its distribution, broad presencetaxonomical in 02 Simister 2002; eeaim ihn aie pne ad ihih the highlight and sponges marine within generalism AL n h dfeet pneascae mcoil c microbial sponge-associated different the in SAUL et al. et monophyletic

ld ws dniid n 5 tde (nldn the (including studies 15 in identified was clade , 2016)) (Supporting Information Table S1) and in 9 in and S1) Table Information (Supporting 2016)) , t al. et Accepted s and seawater in recorded also was SAUL 2012a). , Article sisterof“ clade 16S rRNA gene sequences and concatenated marker pr marker concatenated and sequences gene rRNA 16S Wiley-Blackwell andSocietyforApplied Microbiology The global SAUL distribution in sponges was wide- was sponges in distribution SAUL global The This article isprotected by copyright. All rights reserved. Latescibacteria the candidate phylum “ the phylum candidate ly diversely sponge high ofa speciessuggest degree clade (Figure 2). Phylogenomic analysis of up toup of analysis Phylogenomic 2). clade(Figure muiis Ti peaec i i agreement in is prevalence This ommunities. L occurrence, we performed a meta-analysis tometa-analysis a performed we occurrence, L g irba cmuiy ebr (including members community microbial ng The apparent high abundance of this clade, itsclade, this abundancehighof apparent The ging from 20.7% to less than 0.001% (Figure 1 (Figure 0.001% than less to 20.7% from ging the SAUL lineage. Phylogenetic inference ofinference Phylogenetic lineage. SAUL the yboi ptnil n lkl iprac of importance likely and potential symbiotic l dsat pne pce (Hentschel species sponge distant lly ” sence of of sence sts sts

erical abundance in the sponge microbiota, sponge the in abundance erical d nin cas a wl a te Red, the as well as oceans, Indian nd either eet pne irboe Project Microbiome Sponge recent 3 different sponge species, with species, sponge different 3 SAUL Latescibacteria dmn smls n the in samples ediment or PAUC34f or otein sequences otein ranging, with ranging, . The SAULThe . ” (formerly ” t al.et , ,

ifrn mre poen eune cn rvd high provide can sequences protein marker different monophyletic monophyletic phylum level relationships compared with analysis o analysis with compared relationships level phylum such approaches are limited by a small number of av of number small a by limited are approaches such AL eune a a oohltc itr ld t “ to clade sister monophyletic a as sequences SAUL draft genomes with sufficient completeness to be us be to completeness sufficient with genomes draft available for each SAUL availablefor of “ and upr (5) ad rvos eerh a as demon also has research previous and (75%), support nossec bten 6 RAgn-ae phylogeny gene-based rRNA 16S between Inconsistency to level genomic the with together clustered Anantharaman Anantharaman Rinke WS3, o ute eaut te hlgntc tts f SAUL of status phylogenetic the evaluate further To previously for the sponge-associated clade “ clade sponge-associated the for previously sequence similarity within SAUL and between members between and SAUL within similarity sequence Spotn Ifrain al S) Aeae 6 rRN 16S Average S2). Table Information (Supporting eune aa bt a nt upre b bosrp r bootstrap by supported not was but data, sequence lse ws8% ad t vrg smlrt wt “ with similarity average its and 88%, was cluster (Supporting Text). Low genomic similarity was obse was similarity genomic Low Text). (Supporting ru group represents a monophyletic cluster closely related t related closely cluster monophyletic a represents reflecting different lifestyles related to the disp the to related lifestyles different reflecting (“ environments Latescibacteria (Rinke t al.et

(Hug (Hug (Rinke t al. et t al. et

” members are typically free-living bacteria found bacteria free-living typically are members ” 21) Fgr 2) Ti rltosi ws lo ob also was relationship This 2A). (Figure 2013) , et al. et et al. et 21; Hug 2016; , “Latescibacteria” Accepted Article Anantharaman 2013; , , 2016)(Hug 2016)(Hug , Fibrobacteres ,Youssef 2013; Latescibacteria” Wiley-Blackwell andSocietyforApplied Microbiology t al. et This article isprotected by copyright. All rights reserved. et al.et , we investigated genomic similarities between thes between similarities genomic investigated we , , 2016) , Chlorobi etal. , 2016), Poribacteria t al. et , 2015)(Rinke , I or nlss bt SU ad “ and SAUL both analyses, our In . . arate environments with which they are associated are they which with environments arate and . With the aim of revealing how SAUL is related at related is SAUL how revealing of aim the With . Latescibacteria” o the o rved between SAUL and “and SAUL between rved , 2016; Hug Hug 2016; , f a single marker gene such as 16S rRNA gene, rRNA 16S as such gene marker single a f ailable draft genomes. In our study, only three only study, our In genomes. draft ailable ed for phylogenomic tree reconstruction were reconstruction tree phylogenomic for ed , we calculated the average 16S rRNA gene gene rRNA 16S average the calculated we , Bacteroidetes gn sqec smlrt wti te SAUL the within similarity sequence gene A Latescibacteria” r eouin o rslig nr- n inter- and intra- resolving for resolution er of SAUL and those of other bacterial phyla bacterial other of those and SAUL of ” (Kamke (Kamke ” and phylogenomic analysis was observed was analysis phylogenomic and Fibrobacteres-Chlorobi-Bacteroidetes smln (iue B. “ 2B). (Figure esampling tae ta tee ie hl ae not are phyla five these that strated et et al. , , Youssef 2013; t al. et et al. et t coet eaie codn to according relative closest its , in terrestrial, aquatic and marine and aquatic terrestrial, in abi wt wae bootstrap weaker with albeit , Wie cnaeain of concatenation a While . , 2014), which also clustered also which 2014), , 2016)(Rinke , evd o 1S RA gene rRNA 16S for served Latescibacteria et et al. Latescibacteria Latescibacteria” , 2015) , t al. et e lineages e ). 2013; , ”, likely ”, (FCB) ”

Page 44of78 Page 45of78

RA ee eune soe te xsec o tre s three of existence the showed sequences gene rRNA independently from each other (Figure 3). High boo High 3). (Figure other each from independently for those three clusters, hereafter referred to as to referred hereafter clusters, three those for lse wt h ihs rpeetto o sponge-d of representation highest the with cluster wi SAUL of placement the determining to addition In Internal ofthe SAUL structure clade n ohr lsl rltd lds rvn frhr as further prevent clades related closely other and rec to According 80.8%. was analyses, phylogenomic Although our results revealed that SAUL is a lineag a is revealedSAUL results that our Although identified SAUL sequence (clone PAUC34f, AF186412). PAUC34f, (clone sequence SAUL identified hrceie h itra pyoeei srcue of structure phylogenetic internal the characterise consequence, the phylogenetic status of the SAUL cl SAUL the of status phylogenetic the consequence, hlm cas n odr ees 7% 7.% n 82%, and 78.5% (75%, levels order and class phylum, “ is reproducibly a sister clade of “of clade sister a reproducibly is of the SAUL metagenome bins described below is also is below described bins metagenome SAUL the of areavailable. thresholds based on 16S rRNA gene sequence similari sequence gene rRNA 16S on based thresholds SAUL clade suggested these clusters may represent d represent may clusters these suggested clade SAUL 2. ot AL nqe ersnaie eune are sequences representative unique SAUL Most S2). sediment and other marine source-derived sequences sequences source-derived marine other and sediment 4.5% of SAUL-affiliated sequences were derived from derived were sequences SAUL-affiliated of 4.5% biofilms. Origins varied when evaluating the three the evaluating when varied Origins biofilms. f h sqecs en drvd rm aie sponges marine from derived being sequences the of Latescibacteria”

would represent sister clades, possibly different possibly clades, sister represent would Accepted Article Wiley-Blackwell andSocietyforApplied Microbiology Latescibacteria This article isprotected by copyright. All rights reserved. ”, the paucity of near-complete genomes for SAUL for genomes near-complete of paucity the ”, Clusters I, II and III. The application of taxonom of application The III. and II I, Clusters e closely related with the FCB superphylum, and it andsuperphylum, FCB the with related closely e clusters individually, with 62.2%, 87.5% and 42% and 87.5% 62.2%, with individually, clusters this lineage. Phylogenetic trees based on 16S 16S on based trees Phylogenetic lineage. this ade must be revisited once more genomic data genomic more once revisited be must ade erived sequences (Cluster II) contains the first the contains II) (Cluster sequences erived ently suggested threshold sequence criteria for for criteria sequence threshold suggested ently sertions from being made confidently. As a a As confidently. made being from sertions tstrap scores (>80%) supported the branching the supported (>80%) scores tstrap ty to interpret the internal architecture of the the of architecture internal the interpret to ty A 16S rRNA gene sequence derived from one derivedfrom sequence generRNA 16S A istinct families (Supporting Information Table Information (Supporting families istinct non-marine sources, primarily freshwater or freshwater primarily sources, non-marine hn h bceil re f ie w suh to sought we life, of tree bacterial the thin included in the same cluster (bin_petrosia). cluster same the in included epciey Yarza respectively; comprising a smaller fraction (24.7%). Only (24.7%). fraction smaller a comprising o Cutr I I ad III and II I, Clusters for pne eie (08) wt seawater, with (70.8%), derived sponge brus f AL eune clustering sequences SAUL of ubgroups classes within the same phylum. same the within classes t al.et , epciey The respectively. 21) SU and SAUL 2014), , ic

genome incompleteness, any apparent lack of specifi of lack apparent any incompleteness, genome caution. Metabolic reconstruction of the two metag two the of reconstruction Metabolic caution. eoi bcei wt fclaie neoi metabol anaerobic facultative with bacteria aerobic multiple sponge- and algae-derived carbohydrates. carbohydrates. algae-derived and sponge- multiple as the tricarboxylic acid cycle, glycolysis, pentos glycolysis, cycle, acid tricarboxylic the as oxidative phosphorylation, were identified in at le at in identified were phosphorylation, oxidative several enzymes involved in the uptake and/or metab and/or uptake the in involved enzymes several n ihr n o bt SU bn. e lo detected also We bins. SAUL both or one either in ht hs ld my e novd n hshrs seque phosphorus in involved be may clade this that enabl genes (bin_petrosia) 3,711 and (bin_aplysina) Functionalpotential characteristics symbiont and o metabolism, including enzymes encoding the high aff high theencoding enzymesincluding metabolism, b fr i_pyia n bnptoi, respectively bin_petrosia, and bin_aplysina for Mbp completeness (39.42%).(Parks low to due constructedfrom respectively bin_petrosia, and bin_aplysina for Mbp bin_aplysina_2)are) is gene rRNA 16S The conversion into the polyphosphate (polyP) storagef (polyP) polyphosphatethe into conversion (Fi SAUL of capabilities biosynthetic and potential Assembly of metagenome data from from data metagenome of Assembly regulon (Figure 4, Box (A)), as well as the enzyme the as well as (A)), Box 4, (Figure regulon reconstruction of two near-complete draft genomes, draft near-complete two of reconstruction the identification of 104 markers 104 of identification the

Aplysina Aplysina aerophoba sequencessequence Accepted Articlein I.Cluster Wiley-Blackwell andSocietyforApplied Microbiology (Parks (Parks This article isprotected by copyright. All rights reserved. metagenomic data, was not used for furtherusedfor not genomic was data, metagenomic derived from the from derived et al. et pyia aerophobaAplysina et al. et , 2015)), and an estimated genome size of 6.3 and 4 and 6.3 of size genome estimated an and 2015)), , , 2015)), and an estimated genome size of 6.3 and 4 and 6.3 of genomesize estimated an and2015)), , e phosphate pathway, Wood-Ljungdahl pathway and Wood-Ljungdahl pathway pathway, ephosphate polyphosphate kinase (kinase polyphosphate s n ftebn. Mroe,gns encoding genes Moreover, bins. the of one ast orm. Although the presence of polyP granulesinpolyP presenceof Althoughthe orm. f revealed SAUL genomepopulation by binning gure 4). It is important to note that, due to to due that, note to important is It 4). gure Genes involved in major central pathways, such pathways, central major in involved Genes ed a comprehensive analysis of the metabolic metabolic the of analysis comprehensive a ed inity phosphate transporter and control of PHO of control and phosphatetransporter inity enome bins suggested that SAUL members are members SAUL that suggested bins enome c enzymes/proteins should be interpreted with with interpreted be should enzymes/proteins c with 90.3% and 86.5% completeness (based on (basedcompleteness 86.5% and 90.3% with Tbe ) A hr bn bnalsn_) also (bin_aplysina_2), bin third A 1). (Table s, ossig lo h cpct t degrade to capacity the also possessing ism, olism of nitrogen and sulphate were identified identified were sulphate and nitrogen of olism Tbe 1). (Table tain rm h evrnet n is later its and environment the from stration ee ivle i popae rnpr and transport phosphate in involved genes two two and other SAUL SAUL other ucinl noain f 4,932 of annotation Functional ersa ficiformisPetrosia ppk binsbin , EC 2.7.4.1), suggesting 2.7.4.1), EC , (bin_aplysina e t the to led analyses and and .7 .7 Formatted: Font: Italic Font: Page 46of78 Page 47of78

(Fieseler eeld h peec o svrl eodr metaboli secondary several of presence the revealed eoe, ye PS wr ietfe (iue , Box 4, (Figure identified were PKSs I Type genomes, BlastP search conducted with the PKSs identified in identified PKSs the with conducted search BlastP pne ybot bqios ye PS Sp identif (Sup) PKS I Type ubiquitous symbiont sponge polyP granules polyP production been has identifiedin an peptide synthetases (NRPS) (Fischbach and Walsh, 20 Walsh, and (Fischbach (NRPS) synthetases peptide and others by associated microorganisms (Piel microorganisms associated by others and Wilson Wilson Both genomes encoded enzymes involved in the produc the in involved enzymes encoded genomes Both many compounds remains controversial (Hentschel controversial remains compounds many yiiie, s el s ercmlt replication, near-complete as well as pyrimidines, PKS modules and related proteins (COG3321). Furthe (COG3321). proteins relatedand modules PKS Additionally, the genomic machinery to releaseand to machinery genomic the Additionally, bacterial cells has been described previously inth previously bacterialbeendescribed cells has n oiaie hshrlto, s el s substrate as well as phosphorylation, oxidative and is mr dtie dsuso o seii aspects specific of discussion detailed (more bins divergent sponge species (Zhang species sponge divergent information information machinerytransfergenomes canin SAUL Production of biologically active secondary metabol secondary active biologically of Production metabolite Secondary biosynthesis y pne fr rtcin gis peaos r epib or predators against protection for sponges by eaoie ae rdcd y oyeie ytae (P synthases polyketide by produced are metabolites et al. et et al. et

, 2014; Flórez 2014; , , 2007). That same study identified the PKS in 10 10 in PKS the identified study same That 2007). , Accepted Article et al. et , 2015). Both SAUL metagenome bins contained genes contained bins metagenome SAUL Both 2015). , Wiley-Blackwell andSocietyforApplied Microbiology et al. et This article isprotected by copyright. All rights reserved. 05,ti stefrttm ta h eoi po genomic the that time first the is this 2015), , et al. et e associated communitiesphylogenetically three eassociated of actual sponge associate. actual both bins showed >60% sequence similarity to a to similarity sequence >60% showed bins both et al. et conserve energy via the electron transport chaintransport electron the energyvia conserve ites is an important defence mechanism utilised mechanism defence important an is ites lvl hshrlto, a rvae fr both for revealed was phosphorylation, -level rncitoa ad rnltoa machineries. translational and transcriptional beSupporting found in Text). , 2004; Taylor Taylor 2004; , r analysis with antiSMASH (WeberantiSMASH with analysis r (B)), as well as several putative clusters. A A clusters. putative several as well as (B)), 06; Newman and Cragg, 2012). The origin of origin The 2012). Cragg, and Newman 06; e isnhss ee lses I bt SAUL both In clusters. gene biosynthesis te e in ied S, any ye PS ad non-ribosomal and PKS, I Type mainly KS), f eta mtbls, isnhss and biosynthesis metabolism, central of , 2012), with some produced by the sponge sponge the by produced some with 2012), , tion of amino acids, vitamins, purines and and purines vitamins, acids, amino of tion ot (alk 21) Mn secondary Many 2011). (Pawlik, ionts hoel swinhoei Theonella additional sponge species, all of of all species, sponge additional et al. et , 2007a; Sala 2007a; , csi pSW1H8) (cosmid encoding for for encoding et al. et et al. et eta for tential , 2014; , , 2015) , Field Code Changed Code Field

mrvn atcmn t te uaytc ot n avo and host eukaryotic the to attachment improving PF07719), and LRR (COG4886). Genes encoding for an for encoding Genes (COG4886). LRR and PF07719), were also identified in bin_aplysina. The finding finding The bin_aplysina. in identified also were ankyrin (ANK), tetratricopeptide (TPR) and leucine- and (TPR) tetratricopeptide (ANK), ankyrin attention due to their presumed involvement inmedi involvement due their attention to presumed microbe-microbe interactions (Thomas (Thomas interactions microbe-microbe communities and their presumed importance in symbio in importance presumed their and communities of genes encoding for ELPs in SAUL bins. Screening bins. SAUL in ELPs for encoding genes of AL eoe ecdd Ns anttd s O06, PF COG0666, as (annotated ANKs encoded genomes SAUL antimicrobial properties may be used as a defence s defence a as used be may properties antimicrobial 2014; Tian 2014; either other symbionts present in the community or or community the in present symbionts other either Host-microbe recognition Host-microbe systemseukaryoticthrough environment. Adaptive symbiosis factors such as eukaryotic-like eukaryotic-like as such factors symbiosis Adaptive 2003) tde o sog-soitd irba communities microbial sponge-associated of studies to lead may sponges these in present microorganisms which, including which, Liu Liu Cerveny (Habyarimana widespread in these symbiont communities (Liu (Liu communities symbiont these in widespread et al. et , which also contains also which et al. et 21; Siegl 2011; , et al.et , 2013; Reynolds and Thomas, 2016).(Habyarimana Thomas, and Reynolds 2013; , et al. et

, 2014; Burgsdorf 2014; , T. swinhoeiT.

, 2008; Al-Khodor Al-Khodor 2008; , Accepted al. et Article A.aerophoba , 2011; Fan Fan 2011; , , belonged to the “high-microbial-abundance” group “high-microbial-abundance” the to belonged , Wiley-Blackwell andSocietyforApplied Microbiology et al. et This article isprotected by copyright. All rights reserved. et al. et , 2015). Due to their ubiquity in sponge-associate in ubiquity their to Due 2015). , et al.et et al. et and and , 2010; Liu Liu 2010; , , 2016). The production of secondary metabolites w metabolites secondary of production The 2016). , P. ficiformis P. , 2012; Cerveny 2012; , et al. et proteins (ELPs) in bacterial symbionts, particularl symbionts, bacterial in (ELPs) proteins of these ELPs within SAUL genomes is consistent is genomes SAUL within ELPs these of of COG and PFAM databases revealed that both both that revealed databases PFAM and COG of rich (LRR) repeat proteins, have attracted much attracted have proteins, repeat (LRR) rich trategy by some sponge symbionts to confront to symbionts sponge some by trategy -like proteins ating host-microbe recognition and host-microbe interaction, ating recognition foreign microorganisms that enter the sponge the enter that microorganisms foreign et al. et , 2011, 2012; Fan Fan 2012; 2011, , other ELP, WD40 repeat proteins (PF00400), proteins repeat WD40 ELP, other nt recognition, we investigated the presence the investigated we recognition, nt (Gloeckner ae dniid uh atr a being as factors such identified have dne f h hs’ imn response immune host’s the of idance , 2011; Siegl 2011; , nes (n nt eesrl positive) necessarily not (and intense t al. et 02) TR (O51, PF00515, (COG5010, TPRs 00023), et al. et , 2013). , et al. et , 2008; Al-Khodor Al-Khodor 2008; , et al. et et al. et , 2014). The abundance of abundanceThe of 2014). , eet (meta)genomic Recent , 2011; Fan 2011; , , 2012; Kamke Kamke 2012; , (Hentschel (Hentschel et al.et et al. et d microbial d , 2012; , , 2010; , et al. et et al.et ith y , , Page 48of78 Page 49of78

t al. et (Hansen conditions that generally lead to stress. In this this In stress. to lead generally that conditions microbial communities have revealed an enrichment o enrichment an have revealed communities microbial 2008; Fan Fan 2008; nldn te rsne f niirba cmons ( compounds antimicrobial of presence the including pne ybot ehbt eitne ehnss desi mechanisms resistance exhibit symbionts Sponge for Evidence SAUL to conditionshost adaptation recognitionby the host.sponge yrprxd rdcae uui C Ah, O05) a COG0450) (AphC, C subunit reductase hydroperoxide stress a can and resistance acid in role important an play PotABCD complex the for encoding genes Furthermore, (N metals heavy to exposure or starvation, nutrient s environmental to response in synthesised is which (univ UspA as such proteins encoding genes carrying microenviro the to adaptation to related signatures ih rvos pne irboa tde, n sugges and studies, microbiota sponge previous with al. as putrescine, spermidine putrescine, as (MnSOD, EC 1.15.1.1, COG0605). Only bin_aplysina e bin_aplysina Only COG0605). 1.15.1.1, EC (MnSOD, 2007). Proteins involved in elimination of denatur of elimination in involved Proteins 2007). n AL is icuig hprn poen GoL (H GroEL proteins chaperone including bins, SAUL in (COG0330) and DnaK (COG0443). Several enzymes invo enzymes Several (COG0443). DnaK and (COG0330) , 2007)), and cadaverine were identified in SAUL bi SAUL in identified were cadaverine and 2007)), , 21) n sdmnain (Luter sedimentation and 2016) , induced by the sponge host sponge the by induced et al.et et al. et

, 1995; Webster 1995; , , 2012; Liu Liu 2012; , Accepted Article et al. et and cadaverine and et al. et , 2012) that may help symbionts cope with environme with cope symbionts help may that 2012) , Wiley-Blackwell andSocietyforApplied Microbiology , 2001), and changes in temperature (Fan temperature in changesand 2001), , were also identified within SAUL genomes. These in These genomes. SAUL within identified also were This article isprotected by copyright. All rights reserved. t al. et (the most common polyamides in bacteriain polyamides common most (the , 2012). The SAUL clade clade SAUL The 2012). , ed and/or damaged proteins were also identified also were proteins damaged and/or ed ct as free radical ion scavengers (Wortham (Wortham scavengers ion radical free as ct otx, eoi suis f sponge-associated of studies genomic context, yström and Neidhardt, 1994; Kvint Kvint 1994; Neidhardt, and yström ersal stress protein A) (annotated as COG0589), as (annotated A) protein stress ersal ncoded the enzyme glutathione peroxidase (EC glutathioneperoxidase enzyme the ncoded ns.), were identified in SAUL bins. SAUL in identified were ns.), Piel, 2009), bioaccumulation of heavy metals heavy of bioaccumulation 2009), Piel, f stress-related proteins (López-Legentil proteins stress-related f s hi psuae iprac fr symbiont for importance postulated their ts mn o te ot pne wt bt bins both with sponge, host the of nment SP60, COG0459), membrane proteases HflC HflC proteases membrane COG0459), SP60, gned to specifically address changes in host in changes address specifically to gned rs sc a ha ado omtc shock, osmotic and/or heat as such tress , involved in the uptake of polyamines such polyamines of uptake the in involved , lved in cell defence against against defence cell in lved d agns sprxd dismutase superoxide manganese nd osse genomicpresents possesses et al. et , 2013), pH (Ribes pH 2013), , host host ntal stressors, ntal (Wortham et al. et Polyamines clude alkyl clude oxidative , 2003). , et al. et et al. et et , ,

Field Code Changed Code Field Formatted: Formatted: Changed Code Field Formatted: Formatted: Font: English (New Zealand) (New English Font: Zealand) (New English Font: Zealand) (New English Font: Zealand) (New English Font:

PF12161 and PF01420), Type II (COG0270, COG1743, CO COG1743, (COG0270, II Type PF01420), and PF12161 2012; Alex and Antunes, 2015). SAUL bins encoded f encoded bins SAUL 2015). Antunes, and Alex 2012; and eukaryotes (Keeling and Palmer, 2008; Wiedenbec 2008; Palmer, and (Keeling eukaryotes and etito-oiiain ytm ae osdrd bac considered are systems Restriction-modification irbs aatto t ete seii nce o t or niches specific either to adaptation microbes, oiotl N ecag bten pne ybot bu symbionts sponge between exchange DNA horizontal aiiae b mbl gntc lmns uh s tran as such elements genetic mobile by facilitated also identified in SAUL bins. These included Type Type included These bins. SAUL in identified also exchange with non-symbiont and/or pathogen microorg pathogen and/or non-symbiont with exchange al. s rnpsss CG93 CG45 ad O32) re COG3328), and COG3415, (COG1943, transposases as in role important an plays transfer gene Horizontal geneHorizontal transfer and defencemechanisms (seeSupporting Text detailedcompari a for genomic invo be may symbionts sponge in identified commonly “ from MnSOD, andasaswellAphC UspA, “free-living the relative, closest the of genomes glutathio and AphC enzymes The COG0386). 1.11.1.6, transcriptase(COG3344, PF00078) and integrases (PF peroxides, respectively, with the former also prote also former the with respectively, peroxides, (Chen hc i oe f h cl’ mjr eec mechanisms defence major cell’s the of one is which catalyse the conversion of superoxide molecules to to molecules superoxide of conversion the catalyse MCr ad rdvc, 1969). Fridovich, and (McCord , 2016; Slaby 2016; , et al. et

, 1998). Furthermore, MnSOD is anenzyme of Furthermore,member is MnSOD1998). ,

et al. et , 2017). COGs including specific DNA modification DNA specific including COGs 2017). , Accepted Article oevr gnmc oprsn o SU bn wt t with bins SAUL of comparisons genomic Moreover, Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. Latescibacteria Latescibacteria cting cells from reactive nitrogen intermediates nitrogen reactive from cells cting son). cags n niomna cniin cn be can conditions environmental in changes o adaptation and evolution of both prokaryotes both of evolution and adaptation 00665). 00665). or psn, lsis n pohgs (Fan prophages and plasmids sposons, (O08, O01, O 49, PF02384, 4096, COG COG0610, (COG0286, I eil eec sses ht a facilitate may that systems defence terial highly lved in adaptation to the host environment host the to adaptation in lved yrgn eoie n mlclr oxygen molecular and peroxide hydrogen ad oa, 01. n sponge-associated In 2011). Cohan, and k gis oiaie tes Tee enzymes These stress. oxidative against ”, supporting the notion that thesesupporting”,the that features notion

anisms (Vasu and Nagaraja, 2013; Horn Horn 2013; Nagaraja, and (Vasu anisms G0863, COG0338, COG4489 and PF00145) and COG4489 COG0338, G0863, e eoiae eue rai ad lipid and organic reduce peroxidase ne a te ae ie rtc aant DNA against protect time same the at t transposable insertion elements such elements insertion transposable ri eeet cnann reverse containing elements troid ”, indicated the apparent absence of of absence apparent the indicated ”, the superoxide dismutase family,superoxide thedismutase and restriction systems were systems restriction and o draft wo t al. et et , Page 50of78 Page 51of78

rtis Cs (Makarova (Cas) proteins communities eial ad dpie mue ytm ta ae enco are that systems immune adaptive and heritable p short, interspaced, regularly clustered, context, spo host the of activity pumping intense the to Due on i SU gnms 7 n 5 o bnalsn and bin_aplysina for 5 to corresponding ancestral events HGT (R≤0.4). and (7 genomes SAUL in found (Nguyen HGT-Finder using transferredgenes events HGT of magnitude the investigate further To systemsCOG0338) identifiedwere of one the in only “ f was systems restriction-modification of abundance restriction-modi PF04851) and (COG2189 III Type and t al. et large amount of viral particles and phages. Member phages. and particles viral of amount large These systems are comprised of two main stages: the stages: main two of comprised are systems These transposable elements within SAUL genomes likely co likely genomes SAUL within elements transposable noprtd no hi gnms ytm t effectiv to systems genomes their into incorporated small fragments of foreign DNA into an array of spa of array an into DNA foreign of fragments small for genetic exchange and rearrangement. This could This rearrangement. and exchange genetic for introduction of foreign DNA into their chromosomes their into DNA foreign of introduction host genome; and the interference stage, where the the where stage, interference the and genome; host ybot ht anan n srnte is interacti its strengthen and maintain that symbiont andinvading DNAcleave (Deveau Latescibacteria 21; Fan 2010; ,

(Thomas (Thomas

. n hs ae CG icue i Tp I (COG0286) I Type in included COGs case, this In ”. t al. et et al. et

Accepted Burgsdorf 2012; , Article t al. et , 2010; Fan 2010; , Wiley-Blackwell andSocietyforApplied Microbiology 21) r cmol erce i sponge-associated in enriched commonly are 2011) , et al. et This article isprotected by copyright. All rights reserved. , , 2010; Makarova etal. t al. et , 2012; Burgsdorf 2012; , et al. et , 2015; Horn Horn 2015; , , 2015). Very few putative transferred genes were geneswere transferred putative few Very 2015). , alindromic repeats (CRISPRs) and their associated their and (CRISPRs) repeats alindromic nge, associated microbes are likely exposed to a to exposed likely are microbes associated nge, two “ two ound when investigating the draft genomes of genomes draft the investigating when ound within SAUL genomes, we identified candidate identified we genomes, SAUL within cer sequences within the CRISPR locus of the of locus CRISPR the within sequences cer (Fan nfers upon these microorganisms the capacity the microorganisms these upon nfers s of sponge microbial communities have thus thus have communities microbial sponge of s allow for the acquisition of functions by the by functions of acquisition the for allow recently acquired spacers are used to target to used are spacers acquired recently adaptation stage, involving incorporation of incorporation involving stage, adaptation n ih h hs. host. the with on ely protect themselves and minimise the the minimise and themselves protect ely et al. et iain ytm. h peec o these of presence The systems. fication e b ms acaa n mn bacteria. many and archaea most by ded Latescibacteria et al. et t al.et ,2011). et al. et i_ersa rsetvl) likely respectively), bin_petrosia, , 2012; Horn Horn 2012; , 2016), , 2015; Horn2015; , n Tp I (O06 and (COG0863 II Type and ”SAGs (WS3_E07). CIP-a sses are systems CRISPR-Cas . codnl, lower a Accordingly, et al.et et al.et , 2016).. , , 2016)(Thomas, microbial

In this this In

Both SAUL bins present multiple enzymes involved in involved enzymes multiple present bins SAUL Both SAUL c a As identified. were proteins Cas no and region bin_petrosia contrast, By S3b). Table Information regio CRISPR six revealed genomes SAUL of Screening SAUL has thehas SAUL to potential degradedegrades their environment. “the of any in Moreov viruses. or phages known in hits had spacer respe NODE_3759, and NODE_2846 from spacers one and spa the of targets Potential DNA. foreign of types Supporting Text for more detailed information on de on information detailed more for Text Supporting not be not assessed. RSRascae (a) rtis tu frig two forming thus proteins, (Cas) CRISPR-associated evaluate SAUL’s putative capacity for carbohydrate for capacity putative SAUL’s evaluate NODE_3759 (Supporting Information Table S3). NODE_ S3). Table Information (Supporting NODE_3759 carbohydrate-active enzymes (CAZymes) using the CAZ the using (CAZymes) enzymes carbohydrate-active consisting of 101 spacer regions separated by a rep a by separated regions spacer 101 of consisting al. of the CRISPR region, and included the universal Ca universal the included and region, CRISPR the of hrceitc f utp IC Spotn Informati (Supporting I-C subtype of characteristic ein f 13 t ossig f 8 pcr ein s regions spacer 68 of consisting nt 4173 of region nvra Cs ad a2 ti system this Cas2, and Cas1 universal , 2012). SAUL bins were rich in genes encoding gly encoding genes in rich were bins SAUL 2012). , eoe idctd ht h SU mmes representin members SAUL the that indicated genomes

Latescibacteria Pairwise of spacerCRISPR comparisons regions iden Accepted Article ” genomes, suggesting a lower exposure to potential to exposure lower a suggesting genomes, ” Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. included sponge- and carbohydrates algae-derived rtis hrceitc f utp IE (Supporting I-E subtype of characteristic proteins onsequence, the functionality of this system could system this of functionality the onsequence, s1 and Cas2, as well as other associated proteins associated other as well as Cas2, and s1 cers were mainly unknown targets, although six although targets, unknown mainly were cers eat of 37 nt. Cas proteins were found upstream upstream found were proteins Cas nt. 37 of eat only only er, no confirmed CRISPR regions were identified identified were regions CRISPR confirmed no er, on Table S3a). Moreover, NODE_3759 has a a has NODE_3759 Moreover, S3a). Table on degradation, SAUL genomes were screened for for screened were genomes SAUL degradation, prtd y 2 n rpa. pr fo the from Apart repeat. nt 29 a by eparated dicated sugar catabolic pathways). To further To pathways). catabolic sugar dicated coside hydrolases (GH), glycoside transferases glycoside (GH), hydrolases coside the utilisation of diverse carbon sources (see sources carbon diverse of utilisation the ns in bin_aplysina. Of these, two contained contained two these, Of bin_aplysina. in ns Y database in the web server dbCAN (Yin (Yin dbCAN server web the in database Y 2846 presents a region of 7304 nucleotides 7304 of region a presents 2846 containedpresented RSRCs ytm: OE24 and NODE_2846 systems: CRISPR-Cas tvl, eitrd is n lsis No plasmids. in hits registered ctively, ec bn r epsd o different to exposed are bin each g tified in intified one confirmed CRISPR confirmed one invading DNA in DNA invading

et et

Formatted: Formatted: Font: Not Italic Not Font: Italic Not Font: Page 52of78 Page 53of78

which contains the enzyme β-mannanase (EC 3.2.1.78) (EC β-mannanase enzyme the contains which ansdc ikgs n h soae ln polysaccha plant storage the in linkages mannosidic glucomannans. Mannan replaces cellulose as the pri the as cellulose replaces Mannan glucomannans. certain species of algae (Frei and Preston, 1961, 1 1961, Preston, and (Frei algae of species certain Codium fragile Codium (Garrone (Garrone families involved in cellulose degradation, such as such degradation, cellulose in involved families ee oty noae a moioio 2-dehydrogena myo-inositol as annotated mostly were beta-glucosidases (GH116), were also identified in identified also were (GH116), beta-glucosidases compounds typically found within the sponge cell wa cell sponge the within found typically compounds incl enzyme this for substrates 2017).Physiological s in matter organic dissolved as and mesohyl sponge glyco and glycopeptides glycolipids, include enzyme lyases polysaccharide extent, lesser a to and, (GT) dehydrogenases and related proteins. Family GH33 w GH33 Family proteins. related and dehydrogenases be made available for SAUL utilisation either by th by either utilisation SAUL for available made be genesrelated the of degradation to several carbohy nomto Tbe 4. vrl, 4 ifrn G fa GH different 24 Overall, S4). Table Information SAUL bins. SAUL proteins in this family were annot were family this in proteins SAUL bins. SAUL rm h sronig ewtr r s b-rdc of by-product a as or seawater surrounding the from Information Table S5). The most abundantfamily GHmostThe S5).Table Information hydrolyses glycosidic linkages of terminal sialic a sialic terminal of linkages glycosidic hydrolyses

genomic analyses of the widespread sponge symbiont symbiont sponge widespread the of analyses genomic en ecie a αNaeyglcoaiiae E 3 (EC α-N-acetylgalactosaminidase as described been et al. et and and , 1971). Also present in SAUL bins, albeit to a le a to albeit bins, SAUL in present Also 1971). ,

Acetabularia crenulataAcetabularia Accepted Article Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. (Mackie and Preston, 1968). Moreover, members of members Moreover, 1968). Preston, and (Mackie cid residues, which are present in marine sponges marine in present are which residues, cid 964), and forms microfibrils in green algae such as such algae green in microfibrils forms and 964), (PL) and carbohydrate esterases (CE) (Supporting (CE) esterases carbohydrate and (PL) cellulases (endoglucanases, EC 3.2.1.4, GH5) and GH5) 3.2.1.4, EC (endoglucanases, cellulases ude glycolipids, glycopeptides and glycoproteins, and glycopeptides glycolipids, ude SAUL bins. These algae-derived compounds may compounds algae-derived These bins. SAUL e sponge host taking the compounds up directly up compounds the taking host sponge e drates (Kamke iis ee eetd n AL is (Supporting bins SAUL in detected were milies proteins, compounds typically found within the within found typically compounds proteins, ated as sialidase (EC 3.2.1.18), an enzyme that enzyme an 3.2.1.18), (EC sialidase as ated identified was GH109, the activity of which has which activity the of identified GH109, was ll. ncipal component of the cell wall skeleton in skeleton wall cell the of component ncipal Proteins assigned to this family in SAUL bins SAUL in family this to assigned Proteins as the second most abundant family in both in family abundant most second the as . This enzyme hydrolyses the (1->4)-beta-D- the hydrolyses enzyme This . 214) .2.1.49). eawater (Genin (Genin eawater “ the sponge feeding on algae. Similarly, Similarly, algae. on feeding sponge the Poribacteria ie mnas glcoann and galactomannans mannans, rides e oioeutss r s predicted as or oxidoreductases se, sser extent, was the family GH113, family the was extent, sser et et al. hsooia sbtae fr this for substrates Physiological rvae a ope sie of suite complex a revealed ” ,2013). t al. et , 2004; Blunt Blunt 2004; , t al. et GH ,

Meta-analysis of Meta-analysis available SAUL 16S generRNA seque e ae eosrtd ee ht h SU lnae is lineage SAUL the that here demonstrated have We Concluding remarks h mt-nlss ok no con toe tde p studies those account into took meta-analysis The microbial abundance (HMA) sponge hosts, though can though hosts, sponge (HMA) abundance microbial irboe n hc SU ado PU3f ee identi were PAUC34f and/or SAUL which in microbiome abundance (LMA) sponges and other non-sponge habita non-sponge other and sponges (LMA) abundance nomto Tbe 1. hr psil, h relativ the possible, Where S1). Table Information SAUL lineage close to the FCB superphylum and as a a as and superphylum FCB the to close lineage SAUL sponge species was noted for each study. When this When study. each for noted was species sponge “ were downloaded and relative abundance was calculat was abundance relative and downloaded were Deciphering Deciphering phylogenySAUL eitherper PAUC34f sponge SAULor species. related clades prevents further assertions from bei from assertions further prevents cladesrelated revealed genomic characteristics that are commonly commonly are that characteristics genomic revealed microorganisms, which may facilitate establishment facilitate may which microorganisms, hs smiss atr icue n paet abundan apparent an include factors symbiosis These EXPERIMENTALPROCEDURES defencesuch mechanisms asCRISPR-Cas. Latescibacteria

. oee, h puiy f ercmlt gnms f genomes near-complete of paucity the However, ”. Accepted Article Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. ng made confidently. Extensive genomic analyses genomic Extensive confidently. made ng and maintenance of the symbiotic relationship. symbiotic the of maintenance and nces nces information was not available, sequence data data sequence available, not was information sqec audne f AL n different in SAUL of abundance sequence e also occur in lower numbers in low microbial low in numbers lower in occur also ts. The available data collected here set the the set here collected data available The ts. ed as percentage of sequences assigned to to assigned sequences of percentage as ed bihd p o uy 06 n h sponge the on 2016 July to up ublished e f Ls uiesl tes rtis and proteins stress universal ELPs, of ce ieped ad fe audn, n high in abundant, often and widespread, id n epiil mnind (Supporting mentioned explicitly and fied sister clade of the candidate phylum phylum candidate the of clade sister ecie fr sponge-associated for described r AL n ohr closely other and SAUL or Formatted: Font: All caps All Font: Page 54of78 Page 55of78

against the GenBank nr/nt database. database.nr/nt GenBank the against eune wr sbeunl add ihu changing without added subsequently were sequences neatv to i AB Tes ee osrce in constructed were Trees ARB. in tool Interactive eBn wr rtivd n icue i or analyses our in included and retrieved were GenBank eetd eune. sequences. selected construc was tree phylogenetic a and retained, were previously bp) (≥1200 sequences gene rRNA 16S Long correction) and maximum likelihood (RAxML) to asses (RAxML) to likelihood and correction) maximum eie fo to AL eaeoe (e blw. Acc below). (see metagenomes SAUL two from derived (Simister (Simister aiu lklho tes ee osrce uig thr using constructed were trees likelihood Maximum utilised in the study are listed in Supporting Info Supporting in listed are study the in utilised sn te IA e Ainr (Pruesse Aligner Web SINA the using dfut, TGMA n GRA. h otru fr t for outgroup The GTRCAT. and GTRGAMMA (default), belonging to the distantly related clades related distantly the to belonging (50%) were applied for tree construction (Ludwig (Ludwig construction tree for applied were (50%) aaae ad motd no R fr ute manual further for ARB into imported and database, analyses were carried out on near-full length (≥145 length near-full on out carried were analyses ih eune rpeetn coey eae pya ( phyla related closely representing sequences with To investigate sub-clusters within the SAUL lineage SAUL the within sub-clusters investigate To resamplings. 500 with sequences (≥1450 bp) and a range of sequences from from sequences of range a and bp) (≥1450 sequences ( (WS3), candidate division OP3, OP3, division candidate (WS3), osrce uig h sm mtoooy ecie ea described methodology same the using constructed Planctomycetes t al. et

21a wr ue a rfrne eune t conduc to sequences reference as used were 2012a) , ,

Verrucomicrobia Accepted Article s gene rRNA 16S SAUL-affiliated all 2016, May of As Firmicutes Wiley-Blackwell andSocietyforApplied Microbiology , , This article isprotected by copyright. All rights reserved. The 100 best hits with >85% sequence identity for e for sequenceidentity >85% with hits best 100 The Chlamydiae t al. et Thermotogae Thermotogae n addt dvso BC) hlgntc re w trees Phylogenetic BRC1). division candidate and , 2007), merged with the SILVA 119 SSU Ref NR 99 99 NR Ref SSU 119 SILVA the with merged 2007), , , , mto al 6 SU eune ee aligned were sequences SAUL S6. Table rmation , we selected long SAUL-affiliated 16S rRNA gene rRNA 16S SAUL-affiliated long selected we , et al. et Lentisphaerae 0 bp) SAUL 16S rRNA gene sequences together sequences gene rRNA 16S SAUL bp) 0 s the robustness of the constructed phylogeny. constructed the the robustness s of ARB using neighbour-joining (Jukes-Cantor (Jukes-Cantor neighbour-joining using ARB and a wl a to 6 rN gn sequences gene rRNA 16S two as well as , as per (Simister (Simister per as e t cnim h SU aflain f the of affiliation SAUL the confirm to ted , 1998), and bootstrap analyses were done were analyses bootstrap and 1998), , e ifrn dsrbto mdl, GTRMIX models, distribution different ee le fr 6 rN gn-ae phylogeny, gene-based rRNA 16S for rlier uain f h ainet Phylogenetic alignment. the of curation sin ubr fr og AL sequences SAUL long for numbers ession lsiid s en aflae wt SAUL with affiliated being as classified Aquificae e cluain opie sequences comprised calculation ree re oooy sn te Parsimony the using topology tree eea bceil hl a outgroup as phyla bacterial several “ , . Sequence conservation filters conservation Sequence . Poribacteria” a etnie LS search BLAST extensive an t t al. et qecs vial in available equences 21a) Shorter 2012a)). , , , Latescibacteria ach search ach ere ere

ficiformis 6 rN gn sqec smlrt wti te AL c SAUL the within similarity sequence gene rRNA 16S WAG+Gamma WAG+Gamma and bootstrappingmodel, was calculated w considered in the phylogenetic analysis, was calcul was analysis, phylogenetic the in considered (http://www.bioinformatics.babraham.ac.uk/projects/ xet ht cnevto fle ws o apid i applied not was filter conservation a that except Slaby Slaby Samples of the Mediterranean sponges Mediterranean the of Samples Sponge sample collection metagenome and sequencing A custom data set of 37 different marker protein se protein marker different 37 of set data custom A Distancetool.Matrix n ter eaeoe wr sqecd n assembled and sequenced were metagenomes their and sequence alignment positions insequencethe analysis. alignmentpositions used to conduct a phylogenomic analysis (Rinke analysis phylogenomic a conduct to used vrg sqec smlrt aog lds a be u been has clades among similarity sequence Average this analysis were obtained from three unpublished three from obtained were analysis this bin_aplysina) are analysed in detail in this study study this in detail in analysed are bin_aplysina) construction, to help decipher the phylogeny of a g a of phylogeny the decipher help to construction, a isfiinl cmlt fr ul eoe analysi genome full for complete insufficiently was 3% oeae f h poen eune n evle < e-value and sequence protein the of coverage >30% identified in a single genome, only the best match match best the only genome, single a in identified their respective reference alignment using HMMER (v HMMER alignmentusing reference respective their (Castresana, 2000) and the markers were concatenate were markers the and 2000) (Castresana, t al.et

ee npce uig atC 0.11.2 FastQC using inspected were

2017)(Horn , Accepted Article t al.et 21; Slaby 2016; , Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. Petrosia ficiformisPetrosia

t al. et (see Results and Discussion section), while the thi the while section), Discussion and Results (see et al. et 2017) , ated by applying the similarity option of the ARB the of option similarity the applying by ated was retained. Homologues were then aligned to aligned then were Homologues retained. was quences (Supporting Information Table S7) was was S7) Table Information (Supporting quences . ah akr ee a ietfe, requiring identified, was gene marker Each s. iven microorganism (Yarza (Yarza microorganism iven draft genomes: two of these (bin_petrosia and and (bin_petrosia these of two genomes: draft odr o nld te aiu nme of number maximum the include to order n 3.1b2). Alignments were cleaned with Gblocks Gblocks with cleaned were Alignments 3.1b2). atc) o aatr, vrl qaiy length quality, overall adapters, for fastqc/) , 2013). SAUL gene sequences employed for employed sequences gene SAUL 2013). , d. A tree was then built in RAxML, using the using RAxML, in built then was tree A d. ae ad ewe SU ad te clades other and SAUL between and lade, .0. hr mlil hmlge were homologues multiple Where 0.001. ith 100 resamplings. 100 ith Rw eune ed otie from obtained reads sequence Raw . and o peiu studies previous for e, n diin o hlgntc tree phylogenetic to addition in sed, Aplysina aerophobaAplysina et al. et (Horn were collected were , 2014). Thus, 2014). , t al. et 2016; , rd rd P. P. Page 56of78 Page 57of78

binning process as described in Slaby in described as process binning The metagenomes from from metagenomes The Trimmomatic for further quality trimming and length and trimming quality further for Trimmomatic MINLENGTH:150 AVGQUAL:30). The remaining reads wer reads remainingThe AVGQUAL:30). MINLENGTH:150 house python script mkBinFasta.py (mkBinFasta.py script python house .. a dfut etns (Alneberg settings default at 0.4.0 maxk -100)maxk (Peng thebins metagenomic eoie i te eune ed rhv (R) under (SRA) Archive Read Sequence the in deposited BioSampleSAMN04870510 SRP074318,(SRA: WGS:LXNJ000 (https://sourceforge.net/projects/bbmap/). Merged (https://sourceforge.net/projects/bbmap/). eBn udr h acsin ubr KU0000 S MKWU00000000. number accession the under GenBank aerophoba (Albertsen BLA a against sequences gene rRNA 16S SAUL known 86 Eddy, and (Wheeler nhmmer with conducted was genes odcig n msac aant dtbs o 14 e 104 of database a against hmmsearch an conducting Trimm with trimmed were Reads bases. ambiguous and aerophoba sequences.. SAUL a construct to used then were sequences bin-derived (Parks CheckM using genes essential conducting by estimated was completeness 2015).Bin LUIALP23:0 (Bolger ILLUMINACLIP:2:30:10) are depositedGs0099546. ID study GOLD under JGI’s , n bin_ and ), et al. et

, 2013) and contigs shorter than 2000 bp were filte were bp 2000 than shorter contigs and 2013) , The identified SAUL identified The et al.et , with an identity cut-off of 85%. To confirm thei confirm 85%. To of cut-off anwith identity , , 2012). , Contigs 1000shorter were than bp discard

Acceptedpetrosia Article A. aerophobaA. , were refined manually manually refined were Wiley-Blackwell andSocietyforApplied Microbiology t al. et https://github.com/bslaby/scripts/ This article isprotected by copyright. All rights reserved. et al.et et al. et and t al. et bins 21) wt peaain f h cvrg tbe fo tables coverage the of preparation with 2014), , , 2015). , (2017). A fasta file for each bin was created wit created was bin each for file fasta A (2017). P. ficiformis P. bin_aplysina and bin_and bin_aplysina 21) hn egd sn bbmerge using merged then 2014) , Target bins were identified by a BLASTn search of search BLASTn a by identified were bins Target filtering (SE –phred 33 SLIDINGWINDOW:4:25 33 –phred (SE filtering and unmerged reads were again subjected to subjected again were reads unmerged and were binned using the software CONCOCT v. CONCOCT software the using binned were phylogenetic tree with previously identified previously with tree phylogenetic ST database of the rRNA genes identified in in identified genes rRNA the of database ST n msac aant dtbs o 104 of database a against hmmsearch an 2013). 2013). ssential genes using CheckM (Parks (Parks CheckM using genes ssential e assembled with IDBA-UD 1.1.1 (-mink 10(-mink 1.1.1 IDBA-UD with assembled e via 00000). the BioProject PRJNA318959 and the the and PRJNA318959 BioProject the mtc .1 P –he 3 LEADING:3 33 –phred (PR 0.31 omatic a previously published R pipeline pipeline R published previously a equence data for for data equence The assembly data are deposited in aredata deposited assembly The in Bin completeness was estimated by estimated was completeness Bin aplysina_2 (both derived from from derived (both aplysina_2 red out. Raw Illumina data for for data Illumina Raw out. red r affiliation to the SAUL clade, the SAULclade, raffiliation to ). The identification of rRNA of identification The ). ed. . ficiformis P. h the in- the h t al. et the r are A. A. ,

previously (Makarova previously 0.00001. Additionally, all CAZy hits were manually were hits CAZy all Additionally, 0.00001. hn eut wr cnlcig conflicting. were results when (CAZymes) (http://www.cazy.org) were identified by identified were (http://www.cazy.org) (CAZymes) against a non-redundant set of KEGG genes (Kanehisa genes KEGG of set non-redundant a against dbCAN HMMs (Yin HMMs dbCAN n -au ctof f .0. diinly SU pr SAUL Additionally, 0.001. of cut-off e-value an 08 i CIPs e sre wt dfut settings. default with server web CRISPRs in 2008) uoai antto sre fr EG (Kanehisa KEGG for server annotation automatic identified, but only confirmed CRISPR regions were regions CRISPR confirmed only but identified, soitd rtis Cs wr ietfe uig SEE using identified were (Cas) proteins associated 06 ad IRa (Haft TIGRfam and 2016) noain f ED usses floe b mna c manual by followed subsystems, SEED of annotation annotations were conducted through the WebMGA (Wu (Wu WebMGA the through conducted were annotations CG) (Tatusov (COGs) aim of exploring the putative capability of SAUL to SAUL of capability putative the exploring of aim were (CRISPR) repeats palindromic short interspaced SEED the RAST, to submitted then were bin_petrosia) The draft genomes of the two the of genomes draft The RefSeq-Viral databases (default setting except: gap except: setting (default databases RefSeq-Viral (Biswas CRISPRTarget with (Grissa CRISPRcompar with conductedwere 0.9, to 0.2 subjecte was output BLAST The database. NR GenBank (Aziz (Aziz et al. et , 2008; Overbeek 2008; , Q Q

au <.5 t ietf HT addts (Nguyen candidates HGT identify to <0.05) value

t al. et et al. et et al.et Accepted Article 2.2.15), (v. rpsBLAST using annotated were 2003) , , 2012) using HMMER 3, and results were filtered us filtered were results and 3, HMMER using 2012) , , 2015). 2015). , t al. et et al. et t al. et most complete most Wiley-Blackwell andSocietyforApplied Microbiology SAUL genomes were searched using blastn (v +2.6.0) +2.6.0) (v blastn using searched were genomes SAUL 20) rti fmle wr ietfe wt HMMER with identified were families protein 2003) , , 2014), for automated open reading frames (ORF) pr (ORF) frames reading open automated for 2014), , 21) sn GnakPae GnakPamd ACLAM GenBank-Plasmid, GenBank-Phage, using 2013) , Pairwise comparisons of the confirmed CRISPR spacer CRISPR confirmedthe of comparisonsPairwise This article isprotected by copyright. All rights reserved. et al. et SAUL bins (bins SAUL , 2008). Putative targets of spacers were identifi were spacers of targets Putative 2008). , open -5, e-value:0.1, cut-off score:20). score:20). cut-off e-value:0.1, -5, open produce secondary metabolites, both bins were were bins both metabolites, secondary produce used for further analysis. Furthermore, CRISPR- Furthermore, analysis. further for used evaluated with SEED annotation and excluded and annotation SEED with evaluated dce gns ee umte t GhostKOALA to submitted were genes edicted t al. et searching translated protein sequences against against sequences protein translated searching oh ofre ad addt CIPs were CRISPRs candidate and confirmed Both -based prokaryotic genome annotation server annotation genome prokaryotic -based identified using CRISPRFinder (Grissa (Grissa CRISPRFinder using identified

D annotation, and classified as described described as classified and annotation, D et al. et d to HGT-Finder (HGT-Finder to d et al. et ekn. lses f rhlgu groups orthologous of Clusters hecking. 20) noain y HSX searches GHOSTX by annotation 2004) , hereafter referred to as to referred hereafter , 2016). Carbohydrate active enzymes active Carbohydrate 2016). , , 2011) function annotation tool, with tool, annotation function 2011) , t al. et 21) 2015). , R threshold ranging from ranging threshold hl Pa (Finn Pfam while ing an e-value cut-off of cut-off e-value an ing lsee regularly Clustered bin_aplysina and bin_aplysina ediction and ediction gis the against sequences .. All 3.0. With the With and E t al.et t al. et ed , , Formatted: Font: (Default) Calibri, 11 pt 11 Calibri, (Default) Font: Page 58of78 Page 59of78

data ofdata

reconstructed and analysed by Rinke and colleagues colleagues and Rinke by analysed and reconstructed SAUL draft genomes were compared to those of the cl the of those to compared were genomes draft SAUL xelne ntaie o h Gaut Sho o Lif of School Graduate the to Initiative Excellence “ uoen no’ Hrzn 00 eerh n innovati and research 2020 Horizon Union’s European 679849 (‘SponGES’). We thank Hannes Horn for his co his for Horn Hannes thank We (‘SponGES’). 679849 SAUL-“ secondary biosynthetic metabolite gene clusters. AT oprtv tos Adtoal, information Additionally, tools. comparative RAST Microbial Genomics (IMG) (SAG ID: SCGC AAA252-B13 a AAA252-B13 SCGC ID: (SAG (IMG) Genomics Microbial A wsspotd y n norgn ad Supporting and Encouraging an by supported was CAG ACKNOWLEDGEMENTS server web RAST the to submitted and respectively) from downloaded comparisonsbetween SAULandbins “ analysed with the web-based tool antiSMASH (version antiSMASH tool web-based the with analysed

cec aadd y h Uiest o Acln. BMS Auckland. of University the by awarded Science “two andthe bins betweenSAUL comparison based Latescibacteria Latescibacteria Petrosiaficiformis

. w “ Two ”.

GenBankGenbank Accepted Article”genome comparison and valuable discussions. Latescibacteria Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. asml Is N_QL00001 n ASWY00000000. and NZ_AQSL00000000.1 IDs: (assembly Latescibacteria SG, S_0 ad S_1, ee previously were WS3_B13, and WS3_E07 SAGs, ” Sine, nvriy f üzug ad y the by and Würzburg, of University Sciences, e Latescibacteria ”SAGs using STAMP (ParksV2.1.3 for ORF prediction and annotation. Sequence- annotation. and prediction ORF for n O antto otie fo Integrated from obtained annotation COG on 3.0) (Weber 3.0) ntribution to processing of the metagenomic metagenomic the of processing to ntribution osest related phylum, the candidate phylum phylum candidate the phylum, related osest nd SCGC AAA252-E07) was used for further for used was AAA252-E07) SCGC nd (2013) a spotd y gat f h German the of grant a by supported was noain otrl coasi i Marine in Scholarship Doctoral Innovation on program under Grant Agreement no. no. Agreement Grant under program on (al 1). (Table . ” SAGs was then conducted usingconductedthen SAGs was ” et al. et , 2015) for the detection of detection the for 2015) , These two SAGs were were SAGs two These et al. , 2014). , 1

Formatted: Font: All caps All Font:

iue . ceai oeve soig eoi poten genomic showing overview Schematic 4. Figure identifiedDetails (AF186412). are sameastho the bin_aplysina cl SAUL the of position showing Phylogeny 2: Figure (2017). SAUL- of (percentage) abundance Relative 1: Figure bars), according mainly to(*), mainly according bars), FIGURE OF LIST AND LEGENDSTABLE lc arw idct sequences indicate arrows black functions identified in both metagenome bins. Red bins. metagenome both in identified functions Phylogenomic analysis based on a data set of up to to up of set data a on based analysis Phylogenomic commu bacterial sponge marine of studies gene-based sponge species across 14 sponge microbiota studies. microbiota sponge 14 across species sponge gene sequence-based maximum-likelihood analysis of of analysis maximum-likelihood sequence-based gene belonging to the same study defined in Table S1. ( S1. Table in defined study same the to belonging support (100 resamplings for (A), 500 for (B)) is s is (B)) for 500 (A), for resamplings (100 support (filled circles) or ≥75% (filledor (open circles) Scale bar circles). pce rpre i te eet pne irboe Pr Microbiome Sponge recent the in reported species iue . 6 rN gn-ae mxmm ieiod p likelihood maximum gene-based rRNA 16S 3. Figure rhtcueo h ALcae Bl eune den sequences Bold clade. SAUL the of architecture nw, pne pce ae lsiid s high as classified are species sponge known,

2 bin_petrosia _2, Accepted Article n bin_ and according mainly to Gloeckner to mainly according Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. . band rm AL is (bin bins SAUL from obtained

petrosia78 srepresent sequence 10% divergence. bak bars)(**)(black hown on tree nodes where support is either ≥90% either is support where nodes tree on hown se provided a wl a te is SU/AC4 sequence SAUL/PAUC34f first the as well as ) 37 concatenated marker proteins. (B) 16S rRNA 16S (B) proteins. marker concatenated 37 and green lines/transporters indicate functions indicate lines/transporters green and B) Prevalence of SAUL in 72 of the 81 sponge 81 the of 72 in SAUL of Prevalence B) Letters beside bars represent sponge species sponge represent bars beside Letters t sequences ote r AC4-fiitd eune i 1S rRNA 16S in sequences PAUC34f-affiliated or d rltv t ohr atra pya (A) phyla. bacterial other to relative ade jc suy (Thomas study oject iis () rvlne f AL n different in SAUL of Prevalence (A) nities. AL n is lss rltvs Bootstrap relatives. closest its and SAUL yoeei aayi soig h internal the showing analysis hylogenetic il f SU cl. lc lns indicate lines Black cell. SAUL a of tial et al. et forin r o mcoil abundance microbial low or Figure 2A. (2014) and Moitinho-Silva Moitinho-Silva and (2014) derived from marine sponges; marine from derived alsn, 3 (afterwards 136 _aplysina,

t al. et 21) Where 2016). , (white (white et al. et

Formatted: All caps All Page 60of78 Page 61of78

Table 1. General 1. Tableabout information SAULmetagenome bin_petrosia or bin_aplysina either in present only enzymes of the pathway were not identified in the b the in identified not were pathway the of enzymes of phosphate operon signal transduction; (B) Featur (B) transduction; signal operon phosphate of i_pyia n bnptoi. brvain o PKS of Abbreviations bin_petrosia. and bin_aplysina ketosynthase; AT,ketosynthase; acyltransferase; acyl ACPS, carri Accepted Article Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. erprotein; EnoylER,reductase. , respectively. Dashed lines indicate that some that indicate lines Dashed respectively. , es of the Type I polyketide synthase identified in in identified synthase polyketide I Type the of es ins (see Supporting Text for details). (A) Model (A) details). for Text Supporting (see ins oan a flos K, eoeuts; KS, ketoreductase; KR, follows: as domains bins. bins.

Albertsen,P., Hugenholtz, Skarshewski, M., Nie A., Al-Khodor, S.,Al-Khodor,Price, Kalia, C.T., and A., Kwai Abu REFERENCES Alex,A.(2015) and Antunes, genome sequenA. Whole Alneberg,B.S., J., Bjarnason, Schirde Bruijn, I., de Bary, A. deBary, (1879)symbiose.Die erscheinungder Ve Bell, J.J.Bell,The functional(2008) roles of marine sp Bolger,Lohse, M.,A.M., and (2014)Usadel, B. Trim Anantharaman, Brown,Anantharaman, K., L.A., Sharon,I Hug, C.T., Aziz, R.K., Aziz, D.,Bartels, Best, DeJongh,A.A., D M., Biswas, A., Biswas, J.N.,Gagnon, S.J.J., Brouns, Fineran, Blunt, J.W., Copp,Blunt, J.W., Keyzers, B.R., R.A., M.H. Munro, multiplemetagenomes. Genomesequences uncultured rare, bacteria obtai of microbes: a conserved a microbes: functionalstructure with div switching permissive switching lifestyle. theintertidal marine sponge metagenomic contigs by metagenomiccontigs by coverageandcomposition. Thousands Thousands genomes of microbial shed interc on light aquifer system. rapid annotations using subsystemsrapid annotations technology. bioinformatic prediction andbioinformaticprediction crRNA analysis of targ products. products.

Nat. Prod. Rep. AcceptedNat.Commun. Article Nat.Biotechnol.

34 Wiley-Blackwell andSocietyforApplied Microbiology : 235–294. Polymastiapenicillus Genome Biol. Evol.Genome

7 This article isprotected by copyright. All rights reserved. : 7:13219.: doi:10.1038/ncomms13219.

31 mer, M., mer, Quick, J., U.Z., Ijaz, et(2014)Binni al. isz, T., isz, R.A., et Edwards, al. (2008)The Serv RAST P.C., and (2013) P.C., C.M. Brown, CRISPRTarget: k, (2010) k, Y. Ankyrin-repeat containing proteins of onges. : 533–538. G., G., andM.R.(2017)Prinsep, Marinenatural lsen,G.W.,Tyson, K.L., Nielsen, (2013) and P.H. ., Castelle, ., A.J., C.J., etProbst, (2016) al. momatic: a flexible a momatic: fortrimmer Illumina sequence rlag vonTrübner,J. Karl rlag Strassburg. BMC BMC Genomics

cing of the cingsymbiont of 7 revealed gene host- a for repertoire Estuar.Coast. Sci.Shelf :3022–3032. ets. ersity. Nat.Methods onnected biogeochemical in processes an ned by differential nedcoverageby binning of RNABiol. TrendsMicrobiol.

9

10 : 75. :

11 : 817–827.: :1144–1146. Pseudovibrio

79

18 : 341–353.: : 132–139.: sp. from ng er: er: Formatted: All caps All Page 62of78 Page 63of78

Fan, L., Reynolds, D.,Fan, Reynolds, L., Liu, M., M., Stark, Kjellebe Fan, L., Liu, M., Simister,M., Fan,Liu,L., R.,Webster, N.S.,and Fieseler,Hentschel,L., U., Schirme Grozdanov, L., Cerveny, L., Straskova,Cerveny,L., A., Dankova, Hartlova,V., Chen,Q.W., Xie, L., and Nathan,(1998)Alkyl C. hy Castresana, J. (2000)SelectionCastresana, J. of conserved blocks Burgsdorf, I., Slaby,I., Burgsdorf, B.M., Handley, M K.M., Haber, Erwin, and Erwin, R.W.P.M. Thacker, (2008)Phototrophic n Deveau,Garneau, J.E., S.Moineau, (2010) H., and C 7 Functional equivalence Functional convergence evolutionary and sponge symbionts. occurrence and genomic contextoccurrenceunusuallyandsmall p of genomic up: the up: phylogenetic response and functional s of a mechanisms. mechanisms. Tetratricopeptide repeat motifs inTetratricopeptide bacrepeat theofmotifs world Caribbeansponge-cyanobacteriasymbioses. bacterialcells and against human reactivenitrogen doi:10.1128/mBio.00391–15. phylogenetic analysis. data. data. evolution in evolution cyanobacterial of sponges. symbionts interactions. : 991–1002.: Bioinformatics

Annu.Rev.Microbiol. Infect.Immun. Accepted Article Proc. Natl. Acad. Sci. Proc. USA

30 Mol.Biol. Evol. : 2114–2120.: Wiley-Blackwell andSocietyforApplied Microbiology

81 : 629–635.: This article isprotected by copyright. All rights reserved.

64

: 475–493.: 17 : 540–552.: Thomas, T. Thomas, (2013)Marine symbiosis microbial heats rg, S., Webster, S., rg, N.S.,and Thomas,(2012) T. A.,Ceckova, and F., M., Stulik, Staud, (2013)J. r, A., r, Wen, et al. M., G., (2007) Platzer, Widespre droperoxidereductase (AhpC) protectsC subunit

.,Blom,J., Marshall, C.W., etLifestyl al. (2015) Mar. Ecol.Mar. Prog. Ser. 109 from from multiplealignments for their usein RISPR/Cassystem roleandits in phage-bacteria utrition and symbiontutrition diversity two of : 1878–1887.: MBio intermediates. terialpathogens:role in virulence ponge stress. toholobiont thermal olyketide synthasegenes olyketide in microbial incomplex of microbial communities

6 : e00391–15. e00391–15. :

362 Mol Cell Mol : 139–147.

1 : 795–805. ISME J. ISME e ad

Genin,J.-M., E.,Njinkoue, Wielgosz-Collin, G., Ho Garrone, R.,and Thiney,Garrone, Y.,de Pavans Ceccatty, M. Gao, Z.M., Gao, Y., Wong, Wang, Y.H.,Tian, R.M., Batang Frei, E. andFrei, Preston, E. R.D.s (1961) inVariants the Frei, E. andFrei, Preston, Non-cellulosic E. R.D.(1964) st Fischbach, M.A. andWalsh, Fischbach,M.A. C.T.(2006) Assembly-lin Flórez, Biedermann, Flórez, andP.H.W., L. Engl, V, T., Kal Fieseler, L., Horn, M., Wagner, Fieseler,Horn,L., M., and Hentschel, M., Fieseler,Quaiser,L., Schleper,A., and Hentsc C., Finn,P.,R.D.,Coggill, Eberhardt, R.Y., S.R Eddy, (2004) Glycolipids from (2004)Glycolipids marine monoglycossponges : mucopolysaccharidecell incoat sponges. Synechococcus Synechococcus spongiarum adaptation drives genomeadaptation streamlining of thecyanob 192 Association of xylan and of Association xylan mannanin peptidelogic andantibiotics: machinery, mechanism consortia associated consortia with marine sponges. with prokaryotic andprokaryoticwith eukaryotic microorganisms. “ from the marine sponge-associated,from the candidate novel protein families database:protein families towards asustainab more genomics. Poribacteria : 939–943.:

Environ.Microbiol.

”in sponges. marine Accepted Article .” Wiley-Blackwell andSocietyforApplied Microbiology MBio

8 : 612–624. This article isprotected by copyright. All rights reserved. Appl. Environ. Appl. Microbiol.

5 Porphyra umbilicalis : e00079–14. doi:10.1128/mBio.00079–14. : .,Mistry,J., Mitchell, A.L., et(2016)The al. Pf Experientia hel,(2006)Analysis U. the first of genomefragmen Appl.Microbiol. Environ. tructuralpolysaccharides of algal cellwalls. ructuralpolysaccharidesalgal in II. cellwalls. ussay,C., J.-M., Kornprobst, Debitus, et al. C., tenpoth,(2015) M. Defensive of animals symbioses U. (2004) U. theDiscovery novel of candidate phylum (1971) Electron of microscopy a , Z.B., , A.M.,Al-Suwailem,et al. (2014)Symbiotic efor polyketide enzymology andnonribosomal Nat. Prod. Rep. Nat. Prod. lefuture. s.

phylum phylum ylceramides and alkyldiglycosylglycerols:ylceramidesand 27 acterial sponge symbiontspongeacterial “Candidatus Chem.Rev. : 1324–1326.: . Proc. R. Soc. R. Proc. B

70 Poribacteria :3724–3732. NucleicAcids Res.

32

106

: 904–36.: 73 : 3468–3496. : 2144–2155.:

160 by environmental : 314–317.:

44 : 279–285.: am am Nature t

Page 64of78 Page 65of78

Hansen, I. V., Weeks, Hansen, I. J.M., V., and(199 Depledge, M.H. Hentschel, U., Fieseler,Hentschel, U., Gernert, M., Wehrl,L., C. Hentschel, U., Horn, Hopke,Friedrich, J., M., A.B. Gloeckner, V., Wehrl,V.,Gloeckner, M., Moitinho-Silva,GerneL., Grissa, I., Grissa, G.,Vergnaud,and CR Pourcel, C. (2008) Haft, D.H., Selengut,D.H.,Haft, J.D., (2003) and O. White, Th Habyarimana,Al-Khodor, F., A., Kalia, S., Graham, Gloeckner, V., Lindquist,N.,V.,Gloeckner, andSchmitt,S., Hent Acids Res. Acids Environ. Microbiol. Environ. biomonitoring. chromium by the chromium marine sponge hosts and human hosts macrophages. Microbial diversity Microbial of marine sponges. In, Molecular evidence for a uniformevidence Molecular a microbialfor communit Ecol. experimentally tractable for model vertical microbi Bull. HMA-LMA dichotomyHMA-LMArevisited: an electron microscopi regularlyrepeats.interspaced short palindromic Role for the Role Ankyrineukaryotic-like genes of di Genova di isolation, characterization isolation, and biological activity

227 65 : 462–474.

: 78–88.:

31 68

: 371–373.: : 327–334.: AcceptedBull.Pollut.Mar. Article

68 : 4431–4440. Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved.

Environ. Microbiol. Environ. 31 Halichondriapanicea : 133–138.: Sponges (Porifera) Sponges ,J., Steinert,and M., Hacker, (2003) Horn,M. , Wagner, , andB.S. Moore, Hacker,J., (2002) M., ISPRcompar:website a to compare clustered schel,(2013) U. eTIGRFAMsdatabase protein families. of J.E.,Price,C.T., Garcia, (2 and M.T., Kwaik,Y.A. Legionella pneumophila rt, C., rt, Schupp,Hentschel,et (2014) al. U., P., Th 5) Accumulation copper, 5) zinc,and of cadmium . Nucleic Acids Res. Boll. dei MuseiDegli e Boll.Biol. Ist. dei dell’Università al transmissional in marinesponges.

10 y in sponges in y from different oceans. Pallas and Pallas the implications for : 1460–1474.: cal survey survey cal spongeof 56 species. Ectyoplasiaferox .Springer, pp. 59–88.

36 inparasitism protozoan of : 52–57.: , an , Nucleic Microb. Microb. Biol. Appl. 008) e

Hug, L.A., Baker, Hug, Anantharaman,B.J., Brown, K., C. Hooper, J.A. andJ.A. Hooper, Soest,R.M. (2002)Systema Van Por Horn, H., Slaby,Horn,H., B., Jahn, K., M.,Moitinho- Bayer, Kamke, J., Rinke,Kamke,Schwientek, J., C., P.,K Mavromatis, Hentschel, U., Piel, J., Piel, Hentschel, Degnan, U., Taylor, and S.M., Keeling,P.J.J.D. and Palmer, Horizontal ge (2008) Kamke, J., Sczyrba,Ivanova, Kamke,A., J., Schwientek, N., P. Kanehisa, M., Sato,Kanehisa, andY., M., (2016) K. Morishima, Bl Kanehisa, M., Goto, S., Goto, Kanehisa, M., Kawashima,S., a Okuno,Y., Kamke, J., Taylor,Kamke,M.W., J., Schmitt,S. (2010) and Act

view of the view life. tree of Hooper,J.A.Van Soest,R.M.and (eds) Springer US. CRISPR and CRISPR otherdefense-related in features marine metagenomes. candidate phylum spongemicrobiome. functionalcharacterization genome and of metagenom compartmentation,eukaryote-like repeat proteins,a e87353. doi:10.1371/journal.pone.0087353. e87353. genomics reveals complexgenomicsreveals carbohydratedegradation p marine sponges. decipheringthe genome. bacteria obtained by bacteriarRNAobtained by 16S 16Svs rRNA gene comp Front. Front. Microbiol.

AcceptedISMEJ. Article Poribacteria Nat.Rev. Microbiol. Nat. Microbiol. Nat.

7 NucleicAcids Res. : 2287–300.: Wiley-Blackwell andSocietyforApplied Microbiology by single-cell insightsnew phylogeinto genomics: This article isprotected by copyright. All rights reserved.

7 : doi: 1751. : 10.3389/fmicb.2016.01751.

1 : 16048.: doi: 10.1038/nmicrobiol.2016.48.

10 M.W. M.W. insights (2012)Genomic into the marine Silva,Förster,L., F., al.et(2016) enrichmentAn

: 641–654.: 32 netransfer eukaryotic inevolution. ,Rinke, C., Mavromatis, et al. K., (2013)Single-c T., A.J., Castelle,Probst, T., C.J., et(2016)A al. n astKOALA and astKOALA GhostKOALA: tools KEGG for ivityprofiles for marine sponge-associated .,Ivanova, N., Sczyrba, et A., al. (2014)The nd nd M. Hattori, KEGG(2004) The resource for : 277–280.: ifera.guidethe A classification to sponges of sponge-associated microbial ndfeatures. other genomic atterns in poribacterialsymbionts in atterns of arisons. arisons. esequences. ISMEJ. J.Biol. Mol.

4 : 498–508.: Nat RevNat Genet

PLoS One PLoS 428 ny, cell- ny, : 726–731.: ew

of of 9 ell :

Page 66of78 Page 67of78

Kvint, Nachin, K., A.,Kvint, Diez, L., and Nyström, T. (2 Lackner, G., Lackner,G., Peters,E.E., andE.J.N., Helfrich, Pi Liu, M., Liu, Fan, Zhong, L., Kjelleberg,L., and S., Th Liu, M.Y., Kjelleberg,M.Y., Liu, Thomas, (2011) and S., T. Fu Luter, H.M., Whalan,H.M., Luter, and S., Webster, N.S. (2012)T López-Legentil,McMurray, S.E., and B., Song, S., P Ludwig, W., Ludwig, O.,Klugbauer, Strunk, Klugbauer, S., N Mackie, W. and R.D.Mackie, Preston, occurrence(1968)The Makarova, K.S., Makarova, D.H.,Haft, R., Brouns, Barrangou, Makarova, K.S., Makarova, Wolf, Y.I., CostaAlkhnbashi, O.S., 9 andregulation. bacterial natural productbacterialnatural factories associated with 114 community sponge community of symbionts. Bacterial phylogeny Bacterial based comparative sequence on a proteobacteriumin the sponge causes of brown causesspotsyndrome of in the coral reef spo 1840–9. ecosystems: hsp70ecosystems: expression by thebarrel giant sp doi:10.1371/journal.pone.0039779. Codium fragile Codium Evolution and Evolution classification CRISPR–Casof syst the : 605–618.: : E347–E356.:

and Accepted ArticleOpin. Microbiol. Curr. Acetabulariacrenulata Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. Cymbastelaconcentrica ISMEJ.

6 : 140–145.

el,J. (2017)Insights theinto lifestyle ofuncult omas, T.omas, (2012) Metaproteogenomic analysis of a 6 003) The 003) bacterial functi universal stress protein: . : 1515–1525.: Planta nctionalanalysisgenomic of an uncultured δ- , F., F., , Shah,S.A.,Saunders, al. S.J.,et (2015)An awlik,J.R. (2008) Bleaching coraland re stress in S.J.J.,Charpentier, etE.,(2011)Horvath, al. P., ., Weizenegger, Neumaier, ., M., J., et(1998) al. hermaland sedimentation stress are unlikely of mannan microfibrils in of the greenalgae

marine sponges. ems. 253 onge nge, : 249–253. nalysis. nalysis. . . Nat. Nat. Rev.Microbiol. ISMEJ. Ianthella Ianthella basta Xestospongia muta Xestospongia Electrophoresis

5 : 427–435.: Proc. Natl. Acad. Natl. Proc. USASci. . . PLoS OnePLoS

9 . : 467–477. :

Mol.Ecol. 19 : 554–568. :

7 : e39779. : ured

17 : :

on on ef

Moitinho-Silva, L., Steinert,L., Moitinho-Silva, Nielsen, G., Hard S., McCord, J.M. and Fridovich,J.M. McCord, (1969)Superoxide I. di Montalvo, N.F. Montalvo, and R.T. (2011)Sponge-associa Hill, Parks, D.H., Parks, G.W.,Tyson, P., Hugenholtz, and Beiko Parks, D.H.,Imelfort, Parks, M., Skennerton,C.T., Hugenh Newman, D.J. Newman, and G.M. Cragg, (2012)Natural product Nguyen,A., Ekstrom, and (2015)M., Yin,Li, X., Y. Nyström, T.and Nyström,Neidhardt, F.C.Expression (1994) a Pawlik, J.R.Pawlik, (2011)The ecology spongeschemical of Overbeek, R., Olson, Overbeek,Olson, G.D., R., R., Olsen, G.J., Pusch, Predicting thePredictingHMA-LMA in status marine sponges by (hemocuprein). updated classification CRISPR-Cas evolutionary of s 752. doi: 10.3389/fmicb.2017.00752. doi: 752. relatedgeographicallybut spongedistant hosts. Res. taxonomic and functional taxonomicand profiles. metagenomes. assessing thegenomes of microbial recoverequality findingand application to from 1981 to to from1981 2010. Escherichia coli Escherichiacoli RapidAnnotation of microbial genomesusing Subsyst

42 :206–214. GenomeRes. during growth arrest. Accepted ArticleJ.Biol. Chem. J. Nat. J. Prod. Aspergillus Wiley-Blackwell andSocietyforApplied Microbiology

25

244 This article isprotected by copyright. All rights reserved. : 1043–1055.:

75 : 6049–6055.: Bioinformatics : 311–335.: genomes. MolMicrobiol oim,C.C.P., Wu, etY.-C., al. G.P.,McCormack, (20 Davis,etT., al. J.J., Disz, (2014)The SEED and t toolnew for HGT-finder: a horizontal gene transfe oltz, P., and P., oltz, G.W.Tyson, (2015)CheckM: ,R.G. statistical (2014)STAMP: analysis of tedbacteria are strictlymaintained closely in two smutase. An enzymic function for erythrocuprein Anfunction for smutase. enzymic Caribbean on natural reefs: shape products nd universalof role the stress protein, UspA, of Toxins s as sources sas of drugsnew over the years30 Appl. Environ. Microbiol.

30 ystems.

dfrom isolates, single cells, and : 3123–3124.: 11 ems Technology (RAST). ems machinelearning.

7 : 537–544.: : 4035–4053. : Nat. Rev.Nat.Microbiol. Front. Microbiol.Front.

77 :7207–7216. Nucleic Acids

13 : 722–736.:

he 8 : 17) 17) r

Page 68of78 Page 69of78

Piel, J. (2009) J. Piel, from (2009) Metabolites bacteria symbiotic Piel, J., Hui, D., J., Piel, Hui, D.,Wen,Butzke, G., Platzer,M. Sala, G.Della, Sala, T.,Costantino, Hochmuth,Teta, R., Schmitt, S., Tsai, Schmitt,S., P., Bell, J., Fromont,J., Ilan, Peng, Y., Leung, H.C.M., Yiu,Leung,Peng,Y., H.C.M.,and S.M.,Chin, F.Y.L Rinke, C.,Sczyrba,Rinke, A.,Schwientek, N.P., Ivanova, Pruesse, E., Quast,Pruesse,C., Knittel, E., Fuchs, B.M.,L K., Ribes, M., Calvo, Ribes,M., E.,Movilla, J., Logares, R., Com Reynolds, D. and D. Reynolds, Thomas, T. (2016)Evolution andfu celland metagenomic sequencinghighlyunwithdata 1428. natural systems.natural polyketide biosynthesis polyketide by anuncultivated bacteria the microbiome of themicrobiomeof themarine sponge Drugs sponge microbiota: core, spongemicrobiota: variableand species-speci 544. 544. into thephylogeny into and coding of microbia potential SILVA: a a SILVA: comprehensiveresource online quality for sequenceARB. compatible with data sponge microbiomefavors acidifito tolerance ocean symbionts. symbionts. swinhoei

12

. . : 5425–5440.

Proc. Natl.Proc.Acad. USASci. Mol. Ecol. Mol. Accepted Article Bioscience

25 : 5242–5253.:

Wiley-Blackwell andSocietyforApplied Microbiology 61 : 888–898. This article isprotected by copyright. All rights reserved.

101 NucleicAcids Res. Plakortis halichondrioides Plakortis M., Lindquist, et(2012) N., al. Assessing the com : 16222–16227. : ,andFusetani, N., Matsunaga, S. (2004)Antitumor udwig, W.,Peplies,and Glöckner, J., (2007) F.O. a, R., a,R., andPelejero, (2016)Restructuring C. the of N., Anderson, N., I.J., Cheng,Insi (2013) al. J.F., et . (2012)IDBA-UD: . de a assemblernovo single- for V., and A. (2014) V., Mangoni, Polyketide synthasesin . . Nat.Prod.Rep. nction nction eukaryotic-like of proteins from sponge lsymbiont of sponge the marine fic bacterial communities fic in marine checkedand RNA aligned ribosomal l dark l matter. cation. evendepth.

35 : 7188–96.: Environ.Microbiol. Rep.

26 : a metagenomic : update. : 338–362.: Nature Bioinformatics

499 : 431–437. :

28

: 1420– 8 Theonella Theonella : 536– : plex ghts Mar.

Taylor, M.W., Tsai, Simister,Taylor, M.W., P., Deines, R.L., P., Slaby,T., B.M.,H., Hackl, Bayer, K., and Horn, He Siegl, A., Kamke, J.,Siegl, T., Kamke, A., J., Hochmuth,Piel, Richt Taylor, M.W., Thacker, Taylor, M.W., Hentschel,R.W., (200 U. and Taylor, M.W., Radax, Taylor, R., M.W., Steger, D.,and Wagner, M. Simister, R., Taylor, Simister, M.W., K.M., P Schupp, Rogers, Simister, R., Taylor, Simister, M.W., and Webster, Tsai, P., Simister, R., Deines,Botté, Simister, P., E.S., Webster, N.S Tatusov, R.L., Fedorova,R.L., Tatusov, J.D., N.D., Jackson, Jacob high nutrientshighand temperatures. 316 sponges. bacteriaare widespread inrare) diverse (but marin spongedefense microbiomereveals in unity meta but reveals the of lifestyle Microbiol. evolution, ecology,biotechnologicalevolution,and potential. Ecol. isotopic analysis isotopic bacterial active of communities i sponges. revisited:comprehensive a phylogeny sponge-asso of doi:10.1038/ismej.2017.101. COG database: an database: COG updated includes version eukaryote : 1854–1855.:

85

: 195–205. ISME J. ISME ISME J. ISME

14 : 517–524.:

6 Accepted Article5 :564–576. :61–70. Poribacteria Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. PLoS One PLoS , a a , candidatesymbiotically phylum associated with er, M., er, C., Liang, et(2011)Single-cell al. genomi Botte, G.,et(2013) E.,Ericson,al. “Sponge-spec ., and Taylor, M.W. (2012a)Taylor, .,M.W. and Sponge-specific cluster ntschel, U. (2017) U.ntschel,binning Metagenomic marine a of

N. (2012b) Sponge-microbe N.(2012b) associations survive 7 s, A.R., s, Kiryutin,B., Koonin, E. et (2003) al. V, .J., andDeines, (2013)Temporal P. and molecular : e52220.: doi:10.1371/journal.pone.0052220. (2007a) Sponge-associated (2007a) microorganisms: 7b) Evolutionary insights7b)sponges. Evolutionary from n two New ZealandNewtwo n sponges.

eenvironments. Microbiol. Biol. Rev. Mol. ciated microorganisms. ciatedmicroorganisms. bolic specialization. bolic s. BMC BMC Bioinformatics ISMEJ. ISMEJ.

7

71 4 : 438–443.: Environ. Environ. : 41. FEMS FEMS Microbiol. : 295–347. 1–15. 1–15. marine Science cs cs The ific” s s

Page 70of78 Page 71of78

Webster,N.S. and Thomas,(2016)The T. sponge holo Thomas, T., Moitinho-Silva,T., Thomas, Lurgi,L., M., J Björk, Tian, R.M., Bougouffa,Tian, Gao,Wang, S., Cai Y., Z.M., Weber, T., Blin, K.,Blin,Weber,Duddela, Krug, T., D.,S., Kim, H. Thomas, T., Rusch, D., Rusch, T., DeMaere,Thomas, P.Y., Yung, L M.Z., Tian, R.-M., Cai, Tian, Zhang, W., Wong,L., Y.-H., Ding, Wagner, M. and M. Wagner, M. (2006) TheHorn, Webster, N.S., Botté, E.S., Soo,Webster,N.S.,Botté, E.S., andR.M., Whalan, Vasu, K. andK. Nagaraja, Vasu, (2013)Diverse V. functions doi:10.1128/mBio.00135–16. thermal tolerance.thermal Diversity, structureDiversity,and convergent th of evolution comprise a superphylum a comprise biotechnologicalwith and me 17 analysis reveals analysis heterotrophic versatile capacity o comprehensiveresourcethe genome offor mining bio bacterium in sponge. bacteriumin 7 4 Res. genomic signatures genomic of sponge bacteria reveal unique microbe-host interactions microbe-host adrive adaptation sul of cold seep cold sponge. cellulardefense. : doi:10.1038/ncomms11870. 11870. : : 1557–1567.:

: 241–249.:

43

:237–243. AcceptedBiol. Rev. Mol. Microbiol. Article mSystems Environ. Microbiol. Rep. Microbiol. Environ. Environ. Environ. Microbiol. Wiley-Blackwell andSocietyforApplied Microbiology

2 : e00184-16.: This article isprotected by copyright. All rights reserved. Planctomycetes

16

U.,Bruccoleri, etal.R., (2015)AntiSMASH 3.0-a .R.,Easson, C., Astudillo-García, C., et(2016 al. 77

W.,and Qian,(2017)Genome reductionP.-Y. and 3 :3548–3561. , L., , and Bajic, Qian, P.Y. V., (2014)Genomic : 756–762. : S.larval(2011)The sponge exhibits holobiont high : 53–72.: ofrestriction-modification systemsin addition to ewis, M., etHalpern,(2010) A., al. Functional , genome. Verrucomicrobia f af symbiotic potentially sulfur-oxidizing eglobal sponge microbiome. fur-oxidizingassociated bacterium a with and shared and featuressymbiosis. of synthetic gene synthetic clusters. dicalrelevance. MBio

7 : e00135–16.: , , Chlamydiae Curr. Opin. Biotechnol. Curr. and sisterphyla Nucleic Acids NucleicAcids Nat. Commun. Nat. ISMEJ. ) )

Zhang, F., Blasiak, Zhang,Blasiak, F., Karolin, L.C., J.O., R.Powell, Youssef, N.H., I.F.,Farag, Youssef, Rinke, S.J. C., Hallam, Yarza, P., Yilmaz, Yarza, Pruesse, Glöckner, P., F.O., E., Yin, Y., Yin, J., X., Mao, Chen, Yang, and F., Mao, X., Wortham, B.W., C.N.,Wortham, Patel, and(200 Oliveira, M.A. Wu, S., Zhu, S., Z., Wu, Fu, Niu,L., B., and(2011) Li, W. Wilson, M.C., Wilson, T., A.R.,Mori, Uria, Rückert, C., He Webster,N.S.,Webb, R.T.,Ridd,M.J., R.I., Hill, Wheeler,T.J.andS.R. Eddy, (2013)nhmmer: DNA hom Wiedenbeck,andJ.Cohan, F.M.(2011) Origins of ba sequestrationforminof polyphosphate the by micro analysis the analysis potential metabolic andof niche speci “ carbohydrate-activeenzyme annotation. classification classification cultured unculturedof and bacteria Nat. Rev. Nat. Microbiol. metagenomicsequence analysis. specific mechanisms. In, specificmechanisms. Skurnik,M., Bengoechea,J.A Springerpp.US, 106–115. bacteriallarge taxona with anddistinct metabolic transferadaptation and to niches.ecologicalnew microbial community coral microbial reef a of sponge. Bioinformatics Latescibacteria

29 ”(WS3).

Accepted Article 2487–2489.:

12 : 635–645.: PLoSOne Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. BMCGenomics

10 : e0127499. : J.,Geddes, R.T. C.D., (2015)Phosphorusand Hill, , , andT., Woyke,Elshahed, M.S.(2015) WebMGA: aWebMGA: customizable fastserverweb for NucleicAcids Res. andNegri, (2001) A.P. theeffectsofThe copper on Xu, Y. Y. Xu, (2012)DbCAN: A resourceweb automated for Ludwig, W., Schleifer, Ludwig, et al. W., (2014)Unitin K.-H., lf,M.J., K.,Takada, et(2014) al. Anenvironmenta Environ. Environ. Microbiol. 7) inPolyaminesbacteria: pleiotropiceffects yet cterialthrough horizontal diversity genetic

12 ologysearch with profile HMMs. and archaea using 16S rRNA gene andusing 16Sarchaea rRNA sequences. FEMSMicrobiol. Rev. repertoire. alization candidate alization phylum of : 444.: .,(eds), and Granfors,K. bialsymbionts in marine sponges.

40 Nature : 445–451.:

3 : 19–31. :

506

35 : 58–62. : 957–976.: The genus Yersinia Insilico

Proc. Proc. g the l

. . Page 72of78 Page 73of78

Natl. Acad. Sci. Natl. USA Accepted Article

112 : 4381–4386.: Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved.

high (black bars) or low microbial abundance (white high(blackor bars) low microbialabundance based studies based marine spongeof bacterial communitie across 14sponge microbiota across besid studies.Letters Sponge Microbiome Project study (Thomas et 201 Microbiome(Thomas al., Sponge study Project study defined in Table S1. (B) Prevalence of SAUL PrevalencedefinedSAUL (B) of study in S1. Table Figure 1: Relative abundance (percentage) of SAUL- Figure1:(percentage) Relative SAUL- of abundance Accepted Article Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. 190x275mm(300 300DPI) x Moitinho-Silva et al. (2017) Moitinho-Silvaetal. e bars represent ebars belongingsponge species sa the to bars), accordingGloeckner mainly to bars), (2014) etal. in72 81sponge the speciesof reportedin rece the

s. (A) Prevalence of SAUL in different PrevalenceSAUL of (A) s. sponge spec

or gene- PAUC34f-affiliatedrRNA sequences in 16S 6).sponge Wherespecies known, classifiedare as

and and me nt nt ies Page 74of78 Page 75of78

analysis based on a 37concatenat on to based up dataof set analysis maximumlikelihood analysis of SAUL its closest SAUL maximumlikelihood and of analysis Figurecl 2: Phylogenyshowing position SAUL the of

500 for (B)) is shown on tree on 500 for(B)) is nodes where shown support i Accepted Article Wiley-Blackwell andSocietyforApplied Microbiology circles).10% represent sequencebars diverg Scale This article isprotected by copyright. All rights reserved. 254x190mm(300 300DPI) x edmarker gene rRNA sequence proteins. 16S (B) ade relative to other bacterial phyla. (A) Phyloge (A) relative ade other bacterialto phyla. relatives. Bootstrap support relatives. Bootstrap (100 resamplings for

s either ≥90% either (filled ≥75% s (open circles) or ence. ence.

nomic based based (A), (A),

Figure 3. 16S rRNA gene- rRNA Figure16S 3.

of the SAUL clade. BoldSAUL sequencesthe denoteof sequences sequences obtained from SAUL bins (bin_aplysina, bisequences obtained fromSAUL SAUL/PAUC34f sequence identified SAUL/PAUC34f Detai (AF186412). Accepted Article Wiley-Blackwell andSocietyforApplied Microbiology based maximum likelihood maximum based show phylogeneticanalysis This article isprotected by copyright. All rights reserved. 254x338mm(300 300DPI) x

derived indicate arrows black frommarine sponges; n_aplysina_2bin_petrosia) well and first the as as ls are the same as as provided lsthose same the are forFigure 2A. inginternal the architecture

Page 76of78 Page 77of78

Abbreviations of PKS domains follows: as AbbreviationsPKS of KR,ketore either bin_aplysina or bin_petrosia, respectively. eitheror bin_petrosia, bin_aplysina

transduction; (B) Features of polyketide Features (B) Type the transduction; I identifiedRedgreenbins. in metagenome both and wereidentified(see not Supporting Tex in bins the Figure 4. Schematic overview FigureSchematic showing genomic 4. poten Accepted Article Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved. acyl carrier acyl reductase. protein; Enoyl ER, 207x228mm(300 300DPI) x Dashed lines indicate that some enzymessome lines that pat the of indicate Dashed ductase; KS, ketosynthase; ACP KS, acyltransferase; ductase; AT, synthase identified in bin_aplysina and bin_petros identified and synthase in bin_aplysina t forregulondetails). t Model (A) phosphate of sig

lines/transportersonlyindicate present functions tial of a SAUL cell. Black SAUL lines tiala of indicate function

hway nal nal ia. ia. in s s S, S,

Table 1. General about information metagenomeSAUL Accepted Article COGs (total) COGs (total) COGs (unique) genes of number total of % Number tRNAs genes of number total of % Number rRNA) 23S rRNA, 16S rRNA, (5S rRNAs genes of number total of % Number prediction function without genes coding Protein genes of number total of % Number prediction function with genes coding Protein Hypothetical Non-hypothetical (total) subsystem in not Protein Hypothetical Non-hypothetical functions (total)/SEED subsystem in Protein % Number CDs Protein SEEDsubsystems n50 (%) content GC contigs of Number (%) contamination genome Estimated (bp) size genome total Estimated (%) completeness genome Estimated (bp) size Assembly Wiley-Blackwell andSocietyforApplied Microbiology This article isprotected by copyright. All rights reserved.

bins. bins.

bin_aplysina bin_petrosia bin_petrosia bin_aplysina 6,263,616 4,738,232 4,738,232 4,100,466 6,263,616 5,661,056 3 (1, 1, 1) 3 (1, 1, 1) 1) 1, 3 (1, 1) 1, 3 (1, 11744 13499 13499 11744 46.98 46.03 46.03 46.98 53.00 52.11 99.03 99.09 86.54 90.38 3174 2427 2427 3174 1708 2317 1967 1679 1034 2570 2713 2269 933 1276 962 3545 3675 1294 1342 4887 1349 0.85 0.89 0.89 0.08 0.85 0.06 59.5 8.65 58.5 3.85 285 265 265 349 285 632 42 33 33 42 29 48

1231

Page 78of