<<

Posted on Authorea 2 Dec 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160689628.80980759/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. eeto,bcueedsmin tesi atal ikdwt otfins Wrere 02.Scn,host Second, host-level 2002). may (Wernegreen by fitness we affected host partially such, evolutionary with be hosts’ linked As will their partially by is hosts. endosymbiont fitness shaped endosymbiont in respective partially because selection be their selection, First, to with endosymbionts histories. symbionts in demographic as and molecular retention of continued patterns their expect (Chong for sizes necessary those degraded entirely resistance not but nutrition, Gil reduced, have host generally in endosymbionts Obligate implicated (Anbutsu Obligate often general 2020). are 2009). in Bordenstein (Kikuchi and & evolution Perlmutter obligate common and to particularly pathogens, facultative are (Clark from to hosts dependence their in in and endosymbioses range & flow- commensalisms (Ehrlich between endosymbiotic or as between mutualisms such or partner partners, pollinators Kiester between between codiversification their 2000; pop- in and another reciprocal result in may require change ering evolution evolutionary generally Mutualistic to will 1964). response Raven in 1980), population (Janzen one in ulation change evolutionary the Coevolution, loss. endosymbiont loss and INTRODUCTION gene demography of host patterns between across random relationship absence somewhat no and indicating found presence signal, we variable phylogenetic Lastly, toward exhibited exhibited shifts genomes. positive half endosymbiont and of about in demography the demography signatures those, host of host found Of between 10% between we relationship About genomes. relationship genes, no endosymbiont positive genes. Blochmannia but endosymbiont a in genes, in identified endosymbiont selection selection We in for relaxed phylogeny. selection tests the intensified multiple across toward Using rapidly, shifts strength evolved and hosts. selection have their in genes endosymbiont of shifts that that and found selection ˜30x We of endosymbionts. pace signatures their host ( a strong and how identified at ants we about phylogenomics, carpenter carpenter hypotheses genome between of test whole codiversification Using genomes and of endosymbionts. codiversification complete in their sequenced evolution demography investigate molecular we host to impacts of Here, demography endosymbionts impact Blochmannia lacking. the their many generally of and While investigations are Camponotus) detailed evolution. evolution endosymbiont endosymbionts, molecular of and codiversification course endosymbiont expect hosts the may on of affecting we discernibly instances, demography these demonstrated host In have as endosymbiont. studies well obligate, and as are host endosymbionts that of and those evolution hosts especially the Endosymbioses, of link that taxa. interacting intimate more or an two are of evolution the connects discernibly Mutualism Abstract 2020 2, December 2 1 Manthey Joseph their and ants carpenter endosymbionts Camponotus Blochmannia in genome test endosymbiont a on evolution: history evolutionary host of Impact udeUniversity Purdue University Tech Texas tal. et 03 engen20) ugsigte eansm ee o hi w ucina elas well as function own their for genes some retain they suggesting 2002), Wernegreen 2003; tal. et 1 enfrGiron Jennifer , 94.Edsmissaecmo niscs a eita retaellr n the and extracellular, or intra- be may insects, in common are Endosymbioses 1984). 2 n akHruska Jack and , tal. et 1 07 rwle&Jhsn20;Mra 2020; Moreau 2009; Johnson & Brownlie 2017; 1 tal. et tal. et 2019; Posted on Authorea 2 Dec 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160689628.80980759/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. apntsd oognm sebyadannotation and the assembly of genome Collection novo Zoology University. Tech Invertebrate de Texas Camponotus the of in available throughout Museum housed and with Laboratory, table specimens sequencing comparison Research supplementary voucher for or Science the with 2019) colonies Natural in (Mackay associated numbers 17 keys collection are mor- from using specimen manuscript and individuals All identified the photography used were GenBank. for we NCBI’s Specimens (e.g., on total, S1). storage sequences In (Table museum study long-term ethanol. this for 100% in Specimens in preserved field. were the phometrics) in nitrogen liquid with for ing for searched actively We USA. collected We work Field METHODS rnfr(.. hog otseis yrdzto)wt oflcighs n noybotphylogenomic endosymbiont and host conflicting with hybridization) species’ host histories. through (e.g., transfer histories? co-evolutionary phylogenomic H strict exhibit endosymbionts and hosts Do Camponotus Camponotus 1976). that (Wilson evidence genera some is of (Feldhaar speciose hosts most ant the their is genus the with al. and relationship et world a maintained the have of endosymbionts. ants and most Camponotine hosts across of woodlands coevolution investigating in for mon system ideal an of is effects bionts and between the unexplored. hosts relationship codiversification, largely symbiotic of host-endosymbiont is The codiversification for evolution molecular evidence for endosymbiont of on plethora demography insect relative host and several a hosts in Despite of evidence codiversification 2003). is expect of Chen would there strength (e.g., we potential endosymbionts Indeed, transmitted, demographic the host vertically by endosymbionts. such, influenced are be as their endosymbionts both if linked; should genomes Third, intrinsically endosymbiont in history. be drift genetic will of efficacy sizes and selection population effective endosymbiont and ieeta,hs-pcfi eetv rsue ndffrn otlnae Wlim engen2015). between Wernegreen patterns & (Williams coevolutionary lineages investigate host To different in pressures selective host-specific whole differential, three examining across date analyses to parative studies few a vafer been have There 0 agl ogun otedsmin hlgnmcpten.H patterns. phylogenomic host-endosymbiont congruent Largely : De otdmgah nunertso eels nedsmin eoe?H genomes? endosymbiont in loss gene of rates influence demography host Does 3. De otdmgah hp aua eeto teghi noybotgnms H genomes? endosymbiont in strength selection natural shape demography host Does 2. do fast How 1. ewe otdmgah n noybotgn os H loss. gene loss. endosymbiont gene faster endosymbiont exhibit and endosymbionts’ demography the host across between strength selection H in shifts or genomes. / endosymbiont and in selection phylogeny. positive selection of natural patterns and influences demography host between ihrt aito costegnm niaieo selection. of indicative genome the across variation rate with Blochmannia 09,adti eainhphslkl esse because persisted likely has relationship this and 2009), hr ssm vdneo non uiyn eeto Wlim engen21) n ncom- in and 2012), Wernegreen & (Williams selection purifying ongoing of evidence some is there , Camponotus eoeadsvrld novo de several and genome niiul oadestefloigqetosadhypotheses. and questions following the address to individuals Camponotus Blochmannia vlto ihsmlrrtsars h eoe H genome. the across rates similar with evolution ciiydrn h ih.Seiesfrgnmc eepae ncytbsadfrozen and cryotubes in placed were genomics for Specimens night. the during activity Camponotus tal. et n pcmn o hssuyi umr21 rmAioa ooao n Utah, and Colorado, Arizona, from 2018 summer in study this for specimens ant tal. et Camponotus Blochmannia 07 Russell 2007; ee vleadi hr aevrainars h eoe H genome? the across variation rate there is and evolve genes 99 Clark 1999; and Camponotus Blochmannia Blochmannia ooisb erhn o od ersdrn h a n search- and day the during debris woody for searching by colonies eoe,gn ospten ie coslnae,sgetv of suggestive lineages, across differ patterns loss gene genomes, tal. et tal. et Camponotus apne nsadtheir and ants carpenter 07.Cnitn ihaln-emedsmiss there endosymbiosis, long-term a with Consistent 2017). 01 Clark 2001; 2 aehsoiso opcain(Degnan cospeciation of histories have Blochmannia eoe,a ela eeune eoe o several for genomes resequenced as well as genomes, Blochmannia Blochmannia and A ot ihsalrpplto ie will sizes population smaller with Hosts : tal. et Blochmannia A A o bu 0mlinyas(Wernegreen years million 40 about for vdneo noybothorizontal endosymbiont of Evidence : atrtsof rates Fast : eoi eune;in sequences; genomic 00 Distel 2000; Blochmannia rvd seta mn cd to acids amino essential provide easmlda assembled we , Camponotus Blochmannia tal. et A otdemography Host : atra endosym- bacterial 0 0 orelationship No : orelationship No : 94 Lo 1994; 0 atrtsof rates Fast : nsaecom- are ants Blochmannia tal. et evolution enovo de 2004). tal. et Posted on Authorea 2 Dec 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160689628.80980759/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. oipoeanttosb nldn E htmyhv enmse nohrcrtdseis(Boman species curated other in missed been have may that TEs including by annotations improve times. to two published the to the in up for curation process al. manual process prior and this incomplete this repeats any repeated repeated novo For we also de endpoints, sequences. identifying TE consensus to recover develop created addition not newly In did to (3) of we and sequences ends where Geneious, extracted Standley the sequences in & sequences consensus on these (Katoh consensus nucleotides MAFFT majority used ambiguous 50% using We any alignment (2) trimming (1) Ltd), (BioMatters sequences. steps: Geneious following in flanking the implemented bp using 2013) TEs 1000 novel we for as Second, sequences well consensus sequences. as RepBase to elements identical repetitive (Camacho 98% bedtools [?] and sequences BLAST output used RepeatModeler any removed we First, elements. (Jurka repetitive v24.03 on database based invertebrate repeats RepBase identify Hubley to & the Finder Smit in al. (v1.0.11; Repeats repetitiveness Tandem RepeatModeler’s and used and we structure, RECON, First, homology, elements RepeatScout, file. these of with summary element genome implementations repetitive the 2008) and mask (3) TE and a elements, create repetitive to curate at- manually the (2) contamination sequences, in S1). represented potential elements (Fig. of assembly repetitive ˜37kbp our identified and from we removed Annotation. subsequently BlobTools, Element we of Repetitive that use molluscs the or chordates With data resequencing to GitHub of genome. tributed on parsing coverage reference documented and the annotation are taxonomic BlobToolsto Briefly, through analyses contamination 2017). further identify Blaxter to & all attempts Laetsch (DOI:10.5281/zenodo.845347; and (Dudchenko v.1.0.1 BlobTools annotation, pipeline with assembly, tamination 3D-DNA the the for using (github.com/jdmanthey/camponotus commands assembly All initial the 2016). scaffold to of Genomics. lane and data Biotechnology partial the a for for (Koren Center on input University v1.9 sequenced Tech as then Texas Texas worker was the the library major at of Assembly. Hi-C single services The a flow the USA). used S1 used CA, NovaSeq They We Diego, Illumina library. (San an reagents. Hi-C kit sequencing 3.0 a Hi-C and Sequencing prepare Genomics used Science), with to Arima then (Sage Facility v3 was Pippin Core DNA 1M Blue University extracted SMRTcells a A&M high The Sequel using a PacBio extraction. selection single the four performed size A for preparation, on They used Kit. library was For USA). DNA SMRTbell removed, HMW TX, PacBio gaster technology. MagAttract for and (Lubbock, Germany) Illumina legs Genomics (Hilden, with with RTL worker, Qiagen’s sequenced using major of extraction library services DNA Hi-C used weight code a molecular we the (2) sequencing, with and PacBio species the sequencing, this (PacBio) to Biosciences refer Pacific we Sequencing. a manuscript, & the or Extraction in science genomic Hereafter to a Canada. species complete we all and new until includes name USA a colony that the genome either future from reference is the this described in sample naming analysis from This morphological refrain and We identify. America. to North unable other eastern currently of were polymorphism we color that species a of a genome available only subgenus The of S1). that (Table genome reference high-quality sample. Genome 00.W de hsseist nraetedvriyo n E nordtbs,wihhsbe shown been has which database, our in TEs ant of diversity the increase to species this added We 2020). 05.W endteRpaMdlrotu yfitrn ace ocoeyrltdsqecsi the in sequences related closely to matches filtering by output RepeatModeler the refined We 2005). Camponotus Camponotus Camponotus tal. et easmldthe assembled We p (1-JDM). sp. 07 to 2017) ( eue utpe( )idvdasfo n ooyfor colony one from individuals 4) = (N multiple used We Myrmothrix hc r oecmo nwsenUA o h eeec eoecln,w chose we colony, genome reference the For USA. western in common more are which , enovo de Camponotus Camponotus eue w eunigmtosfrgnm seby 1 ograswith reads long (1) assembly: genome for methods sequencing two used We Camponotus ) floridanus sebetePci ogras eod eue h iCsequence Hi-C the used we Second, reads. long PacBio the assemble tal. et eue ut-tppoest noaetasoal lmns(TEs) elements transposable annotate to process multi-step a used We eoe1.W ult hce h eoeasml o oeta con- potential for assembly genome the checked quality We genomes1). (perhaps 09 una al21)t xrc eoi ein matching regions genomic extract to 2010) Hall & Quinlan 2009; p 1JM eoe 1 dniyd oorpasadover- and repeats novo de identify (1) genome: (1-JDM) sp. omc selysi p 1JM eoei w tgs is,w sdCanu used we First, stages. two in genome (1-JDM) sp. tal. et enovo de edcddt sebetegnm famme fthe of member a of genome the assemble to decided We . .novaeboracensis C. 3 05 n hncetn osnu eune fnovel of sequences consensus creating then and 2005) seby(a dy20;Bno 99 Price 1999; Benson 2002; Eddy & (Bao assembly Camponotus eoe(CI GCA (NCBI: genome Camponotus pce nthe in species )gnrlyfudi otenor northern in found generally ?) Camponotus tal. et 0893.;Brelsford 009859135.1; p 1JM eoe we genome, (1-JDM) sp. enovo de Camponotus 07 Durand 2017; pce odt is date to species sebyo a of assembly subgenus tal. et tal. et et et Posted on Authorea 2 Dec 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160689628.80980759/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. niger xrcin ihIvtoe Crsa,Clfri)Qbtfloecn uniain n oldteha + with head extraction the the DNA from pooled for concentrations and input DNA quantitation, the quantified fluorescent as We Qubit material kits. California) with pulverized tissue (Carlsbad, sample the and Invitrogen the used pulverized with of subsequently then DNeasy extractions and We plethora Germany) nitrogen pestle. a (Hilden, liquid head and QIAGEN has with ants (2) mortar sample gaster and many sterile the gaster, the of froze (1) a representative because we extractions: single DNA extraction, this two each a did For performed is We we individual individual, mesosoma. each Each For + colony. ˜10–30x). each (e.g., per coverage collected sequencing moderate at work. estimate an Lab obtain species to Camponotus (2015) multiple al. of et genomes Resequencing Blaimer (Paradis from estimates ape time package divergence R with of the along tree using sites assembly degenerate novel and downloaded neogagates downloads we Formica context, CDS evolutionary our al. timed (Blaimer by et a ants in covered formicine sites lineages of degenerate sentative tree four-fold phylogenomic CDS recent the of a (Guindon evolution PhyML the in (Darriba put evolution jModelTest To sequence using of evolution model sequence I Biostrings of + packages R GTR model the Pag`es four- (1995) and appropriate 2007; concatenated scripts Hochberg an R Lobry and custom identified and & extracted using then 806,844) (Charif Benjamini We = seqinr the (N selection. alignments and for using the (Yang from evidence CODEML testing sites with in degenerate multiple alignments selection fold any for for removed tests values we branch-specific significance method, and gene-wide correcting using After selection 1997). for (Capella-Guti´errez gene trimAl each used tested We we alignments. back-translating, alignment protein nucleotide final averaged the a the the provide in to takes Before gaps alignments algorithms, protein gene. (Notredame the alignment translates each T-Coffee several back used for and using alignments, we all them of species, aligns alignment ant best sequences, four nucleotide the translates between Coffee homologues putative align To datasets. across homologues versus species all for of sequences BLAST CDS the downloaded fulva We Nylanderia bedtools. and output MAKER putative the extracted completeness. We assembly genome assess predict clock to to molecular odb9) MAKER Camponotus of (set: iteration set Waack second gene a & orthologous in Stanke Augustus single 2004; the and (Korf SNAP Augustus in in and trained models SNAP models train gene the to used we predictions Lastly, MAKER initial 2003). these used We 2019). species: ant other (Cantarel pipeline v2.31.10 Annotation. v4.08 including RepeatMasker Gene library in use TE for combination the here a in described content TEs used curated we newly Lastly, all (Smit BLAST. and database using invertebrate our database RepBase of the RepBase some invertebrate naming the with to help To 2019). Camponotus (GCA 04 sn orseisa ersnaie ftoelineages: those of representatives as species four using 2004) tal. et 0055.) and 001045655.1), eamdt eeunegnmsfr17 for genomes resequence to aimed We 05.Rpamse upticue akdgnm n umrzdrpttv n TE and repetitive summarized and genome masked a included output Repeatmasker 2015). Camponotus pcfi uainrates. mutation specific sm esosa rtis o oooybsdcmaios epromdareciprocal a performed We comparisons. homology-based for proteins) as versions (same apntsfloridanus Camponotus Camponotus and , oantt ee nthe in genes annotate To aisniger Lasius p 1JM genome. (1-JDM) sp. Camponotus Camponotus tal. et yadrafulva Nylanderia p 1JM eoe eue UC 3(Sim˜ao v3 BUSCO used We genome. (1-JDM) sp. 08.Frt eue AE opeitgnsuigpoen from proteins using genes predict to MAKER used we First, 2008). eue the used We . p 1JM oigsqec CS rmteasml sn the using assembly the from (CDS) sequence coding (1-JDM) sp. p 1JM sn lsn(Camacho blastn using (1-JDM) sp. tal. et enovo de (GCF 07.Wt hsaineto orfl eeeaests we sites, degenerate four-fold of alignment this With 2017). (GCF Camponotus 003227725.1), E,w sesdhmlg fnwycrtdsequences curated newly of homology assessed we TEs, 4 Camponotus Camponotus 0215.)(Bonasio 005281655.1) tal. et tal. et 05 n rndtete otefu repre- four the to tree the pruned and 2015) p 1JM eoe eue h MAKER the used we genome, (1-JDM) sp. omc exsecta Formica Blochmannia 00 oetmt hlgntctree. phylogenetic a estimate to 2010) seicbac egho h four-fold the of length branch -specific apntshyatti Camponotus niiul rm7seis(al S1) (Table species 7 from individuals omc exsecta Formica tal. et tal. et N eaiet n DNA. ant to relative DNA (GCF tal. et 09 oietf putative identify to 2009) tal. et 00 Dhaygude 2010; tal. et 003651465.1), , 02 n sdthe used and 2012) , aisniger Lasius yadradodo Nylanderia 09 oremove to 2009) tal. et 05 ihthe with 2015) 00.T- 2000). Lasius tal. et and , , Posted on Authorea 2 Dec 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160689628.80980759/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. o s nMM,w akdgnmcrgosntgntpd steewudohrieb itknfor mistaken be otherwise would these as genotyped, not 2014). regions Durbin genomic & (Schiffels masked v1.1.0 we MSMC2 program MSMC, the in used we use individual, For each for demography estimate To (R samples hereafter. found ingroup We heterozygosity in of scores. correlated quality estimates highly per-base raw be and genetic to depth of sequencing estimates estimates two for each (Renaud accounting impact the for while ROHan to heterozygosity sites potential program of invariant the rates the to has estimate used bi-allelic depth sequencing of also Because we proportion diversity, sites. the genotyped as all simply for heterozygosity individual observed estimated coalescent-based we the demography. First, (2) and and 2010), diversity maximum Holder (1) Genetic (Zhang & methods: III (Sukumaran two using DendroPy ASTRAL tree approach using phylo- species trees tree the a input species estimated summarized we all We trees, of evolution. gene tree sequence these credibility From of 4784). model = GTRGAMMA (n the genies with 2014) (Stamatakis v8.2.12 Phylogenomics. genomics population and phylogenomics Camponotus the of also each We in study. genes our in samples Annotation. the (NC of the each outgroup of an for annotation as genome and use circular a sequence assembled the visualized process downloaded then Wick This was S3; output format. FASTA GFA resulting in Fig. The (see by MindTheGap. Bandage of followed in (Rizk (github.com/GATB/minia), output (GFA) MindTheGap Minia assembly program fragment program the graphical the using genome using contigs reference reads the a to gapfilling recruited reads these metagenomic a assembles maps used it pipeline we First, Here, genomes. BWA. bacterial using of assembly targeted allows that Blochmannia minimum 70. (4) of 20, Assembly. coverage of of annotation quality depth and genotype mean minimum assemblies genotyped maximum (3) genome (1) (5) 20, Blochmannia restrictions: and of following quality 5, the site of using minimum coverage calls (Danecek, (2) of site v0.1.14 invariant individuals, distribution depth VCFtools and of the variant used measured 70% We all We [?] S2). filter length. in (Fig. initially in to command Mbp two 2011) ‘depth’ least al. samtools at et the scaffolds using all coverage on sequencing (McKenna sites of invariant v4.1.0.0 to, (GATK) and groups Toolkit variant read Analysis both added al. Genome for sorted, et the cleaned, (Li, using lastly file v1.4.1 and al. BAM format, samtools et each BAM used bbmap from to We the duplicates file command. removed BWA-MEM SAM of and output the BWA script using the bbduk.sh 2009) convert to the Durbin the 2009) & using to (Li data data BWA filtered sequencing with the raw genome aligned the then We filtered 2014). quality (Bushnell package data, and downloaded adapters the trimmed and we outgroups: data generated as newly use our With to NovaSeq6000. (SRX5650044). Illumina datasets the published on two cell Genotyping. subsequent flow and and S4 creation an library Filtering of sequencing lane shotgun partial Center Illumina a University Tech standard and on Texas ant for the sequencing both to Genomics of sent representation and were good Biotechnology extractions have to DNA for as Genomic so sequencing. respectively, for ratio, DNA 0.8:0.2 endosymbiont a at extracts gaster and mesosoma 00.W sdGT’ ucin altpCle n eoyeVF ogntp l individuals all genotype to GenotypeGVCFs and HaplotypeCaller functions GATK’s used We 2010). ihterwsqecn aa eue h iY ieie(Guyomar pipeline MinYS the used we data, sequencing raw the With eoe o ahsml.MnSue ape ie ihhs n atra N napipeline a in DNA bacterial and host with mixed samples used MinYS sample. each for genomes eue h CIpoaytcgnm noainppln (Tatusova pipeline annotation genome prokaryotic NCBI the used We eetmtd“eetes o o-vrapn 0kpsiigwnosuigRAxML using windows sliding kbp 50 non-overlapping for trees” “gene estimated We Blochmannia 005061.1). lcmni pennsylvanicus Blochmannia is,w onoddIlmn eunigrasfo h CISAof SRA NCBI the from reads sequencing Illumina downloaded we First, tal. et genomes. 05,rgoswt utpeptsmre ycvrg,adoutput and coverage, by merged paths multiple with regions 2015), eetmtdgntcdvriyfrec niiuli w ways. two in individual each for diversity genetic estimated We tal. et apntsfloridanus Camponotus Blochmannia 2018). 5 tal. et eoe(NC genome tal. et 09.RHnue aeinfaeokto framework Bayesian a uses ROHan 2019). noybotof endosymbiont 2 env Camponotus novo de .13 p 0.7113, = 04.Fnly h ieiesmlfisthe simplifies pipeline the Finally, 2014). SX282 and (SRX022802) 0221 sortre.Nx,the Next, target. our as 007292.1) << apntsfloridanus Camponotus .0) ow osdronly consider we so 0.001), tal. et tal. et p 1JM reference (1-JDM) sp. 00 oassemble to 2020) aalpi niger Cataglyphis 06 oannotate to 2016) for Posted on Authorea 2 Dec 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160689628.80980759/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. hnei hlgntccnetb esrn eaiebac egh ntehs pce revru the versus tree species host the in lengths branch relative measuring and by tree context species phylogenetic host a (Paradis the ape in package between R change distance the in (1994) implemented Kuhner-Felsenstein trees, gene the calculated we in variation each this examined for we First, matrix ways. absence / rates. presence Evolutionary gene a coevolution from tabulated Blochmannia annotated we and genes Camponotus coding results, from analysis sequences BLAST acid the amino Blochmannia all From of (blastp) samples. BLAST all protein for all-to-all once an times, formed six RELAX loss. branches ran Gene test we followed such, of selection As set positive a strength. outgroup). requires for selection the RELAX tested in are shifts testing. identify branches multiple each to all for branches 1979) where reference (Holm mode and correction exploratory Holm-Bonferroni in a aBSREL by ran We 2015). al. (Zhang (Sukumaran DendroPy III ASTRAL using Blochmannia approach trees tree input selection. species all for coalescent-based of Tests the tree (2) model credibility and GTRGAMMA clade 2010), the maximum Holder with (1) the & 2014) in methods: of (Stamatakis present two each not v8.2.12 using alignments for RAxML tree the phylogeny used of a we portions estimate any Next, trimmed to trimAl. and T-Coffee using using samples gene all each for sequences outgroup all the aligned in each genes from to genes gene gous each match to BLAST Phylogenomics. time generation genomics population with years and consistent ( six phylogenomics ant generally and Blochmannia harvester maturity is red reproductive value in of This years age eight the analyses. (Ingram year to as demographic third seven years the for of three until time estimates used overwintering we generation individuals such, winged minimum the As a first as is the 1908). reproductives with Pricer winged formation, producing 1986; colony (Fowler queens (Nadachowska-Brzyska following of maturity age years sexual earliest two the of of not that are age suggested there have the Because studies double previous the rate. mutation studies: run and in other to time times generation in necessary generation species. species’ phasing of per a to estimates sizes with relative sample good certainty sampling in aggregating small presented low amongst is the than (3) structure output of rather MSMC and population because individual of again species, each uncertainty individuals, per for multiple (1) sizes MSMC with with sampling reasons: use program genomes, uneven several to the of decided (2) of signal We segments because locations, how Mbp species replicates. see 1 to per bootstrapped bootstrap individual we samples ten each this, observed of For for with total replicates regions. correlated a genomic bootstrap highly different performed using is We vary which segments. could estimate years, time to distinct million histories inferred half demographic 23 past the the used over (R largely heterozygosity sizes We populations, changes population panmictic results. mimic mean in may history harmonic time accurate demographic through raw are interpreting populations estimates between when (Chikhi MSMC connectivity sizes that in population note changes in to or relevant structure is population It but homozygosity. of runs 05 n etse o hfsi eeto teghars h hlgn sn EA (Wertheim RELAX using phylogeny the across strength selection in shifts for tested we and 2015) Camponotus tal. et eoe( 0 nqecdn ee dnie rmalsamples). all from identified genes coding unique 607 = (N genome etse o eels nall in loss gene for tested We ee napyoeei rmwr.W etdfrpstv eeto sn BRL(Smith aBSREL using selection positive for tested We framework. phylogenetic a in genes 2013). 2 pce ntesuywt oeta igeidvda ie,excluding (i.e., individual single a than more with study the in species eetatdCSrgosfo each from regions CDS extracted We .47 p 0.7497, = Blochmannia eue h yh otaepcae(od&Ms 05 ots o eeto nthe in selection for test to 2005) Muse & (Pond package software HyPhy the used We eepoe vltoayrtso vlto in evolution of rates evolutionary explored We tal. et << eoe o ute nlssw et57gnspeeti l ape.We samples. all in present genes 507 kept we analysis further For genome. 08 Mazet 2018; .0) hnrnigMM,w loe pt 0ieain n pto up and iterations 20 to up allowed we MSMC, running When 0.001). Camponotus Blochmannia Blochmannia Blochmannia tal. et ns eue osraiepoygnrto ieused time generation proxy conservative a used we ants, eepyoeisrltv otehs pce re odo To tree. species host the to relative phylogenies gene 6 lcmni floridanus Blochmannia tal. et ee.Fo hs eetes eetmtdaspecies a estimated we trees, gene these From genes. 06.Bcueo hs oecuinsol eused be should caution some this, of Because 2016). oooymxbarbatus Pogonomyrmex Blochmannia eoe sebe o hssuy is,w per- we First, study. this for assembled genomes 04.Scn,w xlrdrtso evolutionary of rates explored we Second, 2004). eoeuigbdol.W hnused then We bedtools. using genome Blochmannia tal. et oietf uaieyhomolo- putatively identify to ooiskp ncaptivity in kept colonies ) 05.In 2015). eoe nmultiple in genomes tal. et .ocreatus C. Camponotus Blochmannia 2018). tal. et and et , Posted on Authorea 2 Dec 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160689628.80980759/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. ewe niiul ihnseis(ooe rnei i.2.I otatt ayn rprin fgene of of proportions majority relationships varying a some to hosts, were contrast ant differences In only the 2). The in relationships 2). Fig. tree in (Fig. species orange tree matching (colored species trees species host within the individuals to between topology identical nearly a h pce re hl ahnd a 0%spoti SRLaaye Fg 2). (Fig. analyses ASTRAL in support 100% had The node each while tree, species the the across ASTRAL— in windows and monophyletic. most sliding credibility was contrast, ocreatus kbp clade species in C. each tree—maximum 50 monophyletic; Here, was species in species, the 2). estimated four create including trees” (Fig. to topology “gene used identical 4770 we an using identified methods tree Both species genome. 10 host the x ant 1.983877 an of estimated We rate substitution a estimated 1.2% we Phylogenomics four- regions, and the year. CDS genes Using / genome’s genome complete site S3). (Table our the / 98% orthologs suggest substitutions from copy containing results sites single BUSCO hymenopterans, repetitive near-universal degenerate hymenopteran S6). both other fold 4415 with Fig. of the 1, correlated of representative (Fig genes negatively and windows fragmented is sliding complete and kbp nearly 100 genome is in the GC% across and previously heterogeneous content ant element both is other content S5), with content gene repetitive consistent genomic Coding (Fig. is lower of landscape transposons background repetitive a DNA in The content was repetitive study. (Schrader content extreme this of repetitive repetitive for “islands” 60% exhibiting curated the than species than manually more of greater contained those genome windows proportion exhibiting and the genomic genome large described of genome; the 13% the A about of of Overall, portions nature content. 1). large repetitive many (Fig. the content to where with repetitive content relate regions 70% may element genomic genome repetitive link the 24.7% further only scaffolding averaged potentially and fully conservative to in S4). two somewhat contacts difficulty about (Fig. The Hi-C be of signal to the contiguity the chose in the of we signal with confident S4), genome additional were we a have (Fig. generated did North genome have we the for we While karyotypes scaffold appears no it . are such, per As there scaffolds Although 1969). see Imai 29; Mbp. = 1964; Yosida N90 2 Mbp, than 3.58 greater ˜ japonicus scaffolds L90 (scaffold 31 assembly had the we of American Overall, majority the S2). composing Table scaffolds of number small a clockOur molecular and characteristics genome reference Camponotus harmonic host between relationship RESULTS the endosymbiont explore in and to regimes size selection 1985) population (Felsenstein relaxed mean species), contrasts (2) (per results and independent aBSREL intensified and phylogenetic size (1) population mean both harmonic specifically host between relationship the on regression ecn dniyfralgn lgmns xldn nes ntesm he rusa aforementioned. as in groups loss. three gene ocreatus same and C. the selection in on indels, influences excluding Demographic alignments, gene subgenera all the for identity of percent group combined the (3) Camponotus Blochmannia Blochmannia env Camponotus novo de Blochmannia Tanaemyrmex N=1 r1) and 14), or 13 = (N tal. et Camponotus ,w xlrdterltosi ewe otpplto ie n 1 hne nslcinstrength selection in changes (1) and sizes population host between relationship the explored we ), a eoee smr lsl eae to related closely more as recovered was 2 subgenus (2) , eetes ecluae hs ae feouinr hnei he rus 1 subgenus (1) groups: three in change evolutionary of rates these calculated We trees. gene 2014). ee n 2 atrso eels in loss gene of patterns (2) and genes pce reetmtdfo 0 eetesietfidasrnl upre hlgn with phylogeny supported strongly a identified trees gene 507 from estimated tree species ugns ewe 0 n 9 fgn re upre h eainhp dnie in identified relationships the supported trees gene of 99% and 40% Between subgenus. ( Camponotus p 1JM eeec eoewshgl otgos(otgL0˜50kp with kbp) 500 ˜ L50 (contig contiguous highly was genome reference (1-JDM) sp. Tanaemyrmex .obscuripes C. ,teeaeetmtsfor estimates are there ), Blochmannia Camponotus excluding N=1)fo h atr aertc(ashek18;Ia & Imai 1983; (Hauschteck Palearctic eastern the from 14) = (N .floridanus C. .ocreatus C. 7 and eecount. gene Blochmannia o aho h i nru pce ie,excluding (i.e., species ingroup six the of each For Camponotus Tanaemyrmex apntsligniperda Camponotus eas ti naln rnhb tef and itself, by branch long a on is it because (subgenus Blochmannia eoe.Frt epromdlinear performed we First, genomes. ( Tanaemyrmex Blochmannia hr,w esrdnucleotide measured we Third, . Myrmothrix Camponotus ee.Scn,w used we Second, genes. hpodN=14), = N (haploid omdacae but clade, a formed ) eetesmatched trees gene hnohrspecies other than ) ( Camponotus C. -9 ), Posted on Authorea 2 Dec 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160689628.80980759/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. lcmni eoeszs eecmoiin ae fmlclrevolution molecular of The rates composition, entire as gene the concordance, sizes, across phylogenetic genome consistent the Blochmannia were general, metric, In KF94 signal. the phylogenomic by overall measured the from divergent relatively were B i.S0.A rvosymnind oegn osi noybotgnmsehbtdphylogenetic exhibited genomes endosymbiont in (Fig. loss demography we gene host contrast, some and In mentioned, strength selection previously S10). relaxed As of Fig. as with number 5A, S10). well genes and as of (Fig. estimates Fig. multiple number strength pressures size without 5B, between selection selection and population relationship with all intensified in host no tests toward between shifts for found selection shifts relationship of lineages for with positive numbers tested results host genes a relative also report endosymbiont found the we identify We We strength, among both selection correction. S5). to relaxed) in Table testing trying shifts or S9; extreme were intensified with we (Fig. loci (i.e., as phylogeny strength Here, the selection in genes. in randomly somewhat shifts appeared for selection and positive traits estimates. 19 endosymbiont of size tests, between selection population correlations positive mean for observed In due harmonic looked estimates with these we in using variance correlated such, sizes (e.g., As highly trends population size host was structure). population years population simply however, the not differential Overall, million if therefore to even half and mind. history, population in last panmixia), reflect this of the estimates (R lack with individual through each interpreted a for size be (i.e., heterozygosity flow should population gene individual mean each population harmonic for among species within trends in couple histories a demographic variation demographic with in overall of Variation and, indicative 300,000 S8). to be (Fig. ˜20,000 may species from within ranged consistent sizes largely were population 3B. exceptions, effective Fig. host in these of estimates rates evolution in of Contemporary genic endosymbiont alignments rates the on the high-quality than say, demography any to higher host much Needless recover of likely in alignments). Impacts to are the identity unable regions of were intergenic sequence regions these these we (Darling percent in in progressiveMauve that evolution and sections using non-overlapping species, evolution large alignments host host (e.g., of genome between of regions rates whole scale cases relative taxonomic some performed identify the we in at to this, genome al. tried do the et To also of We regions. most 3C). intergenic across acting (Fig. forces clades evolutionary similar of suggestive 10 Blochmannia x 1.454 = (range year / mean the context, absolute timed S7). Fig. 4, myrmex (Fig. loss samples gene the subgenus of among signal phylogenetic absence exhibited / the 37 presence genes, variable 65 exhibited those The genes Of 65 4). total, (Fig. In here sequenced while annotated. were kbp, genes 781 to coding kbp 775 from The size in for ranged GenBank genomes from the subgenussequence the of in Five with individuals consistent contrast, sizes. is In This kbp]. 3). 791 (Fig. = [ kbp GenBank 792 on to already ˜783 subgenus from size in varied genomes between distance KF94 the measured We node). each for support 99% to each (67% tree species the of relationships Blochmannia Blochmannia Blochmannia Blochmannia Blochmannia 00.Hwvr noybotitrei eune ees iegn ewe otsbeea and subgenera, host between divergent so were sequences intergenic endosymbiont However, 2010). Fg B.I euethe use we If 3B). (Fig. Camponotus eeiett ihnhs ugnr a eeal ossetars h noybotgenome, endosymbiont the across consistent generally was subgenera host within identity gene eoe(i.3) diinly tapae that appeared it Additionally, 3B). (Fig. genome eoe otie ewe 7 n 0 ee,adi oa cosalgnms 0 unique 607 genomes, all across total in and genes, 601 and 576 between contained genomes eoe sebe eelreyvre e ugns ntesubgenus the In subgenus. per varied largely here assembled genomes eete n h otseiste oietf fayrgoso the of regions any if identify to tree species host the and tree gene ee vle t˜03xtert fteathss ihsih aito across variation slight with hosts, ant the of rate the ˜20–30x at evolved genes a lgtysoe aeo vlto hntoeo h n otsubgenus host ant the of those than evolution of rate slower slightly a had -8 .vafer B. o126x10 x 1.256 to Blochmannia .pennsylvanicus B. Blochmannia Camponotus 2 (NC .47 p 0.7497, = -7 1992 a 72kp(i.3D). (Fig. kbp ˜722 was 014909.2) ee hwdeiec o eeto TbeS) hs signatures These S4). (Table selection for evidence showed genes usiuin ie/year). / site / substitutions aeo oeua vlto oput to evolution molecular of rate eeeouinrt saot544x10 x 5.474 about is rate evolution gene (NC << Tanaemyrmex 8 0221 9 kbp; 791 = 007292.1) .0) n ugssta h SCpplto size population MSMC the that suggests and 0.001), Blochmannia a ihyvariable highly had Blochmannia .ocreatus B. Blochmannia .chromaiodes B. eoe(i.3A). (Fig. genome ee nathsso the of hosts ant in genes Blochmannia Blochmannia -8 Blochmannia a 76kpada and kbp ˜746 was usiuin site / substitutions eoe rmthis from genomes Camponotus (NC Blochmannia ae na in rates 020075.1) genome genome Tanae- the , Posted on Authorea 2 Dec 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160689628.80980759/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. 02,w a aeanl xetto hths eorpi atrsprilyiflec endosymbiont influence partially endosymbiont patterns of factors demographic two Wernegreen host influenced 2002; demography that Moran host whether & expectation investigated (Mira we null sizes Here, a population evolution. have molecular effective may host effective endosymbiont with we and linked size, 2002), intrinsically population is effective by size influenced population are processes genomic population Because evolution? endosymbiont bacterial shape free-living demography their to host and Does hosts their of both rates to relative evolution specific evolution molecular of relatives. host-lineage by rates corroborated have faster have are may endosymbionts results that and previous in These here genome identified 2009). the Ochman rates relatives, endosymbiont across & However, free-living (Kuo somewhat their 1996). evolution varies (Moran to molecular sites rate compared coding rates nonsynonymous evolution 2002). at Wernegreen evolutionary molecular evolution 2002; insect relative of Moran faster histories; rates & increased have (Mira including evolution also their fast endosymbionts relatively inheritance of of and such, mode because sizes As their populations endosymbionts of effective because in small bottlenecks to here expected regular This lead identified undergo are evolution to propensity of rates properly. and rates evolution asexuality them endosymbionts’ the fast scale, align Relatively absolute to an On 1999). able (1995). not colleagues were and in Moran we in by reported that (˜36x) level the lineages to of similar across is rate divergent evolution relative so endosymbiont-host were regions intergenic that found endosymbionts We Blochmannia generally in statistic a KF94 evolution found the molecular of we of estimate Rates Indeed, stable a by genome. evidenced as the genome the across the across across pressures overall sorting similar selective lineage relatively of variable pattern with by consistent genome as the caused across trees incongruence signal species and consistent across supported (1) evolution strongly patterns: of two obtain rates of to one sequencing able across expected by were sorting we contrast, lineage we Here, in in endosymbionts, variation markers; and estimate molecular hosts as few well both a for or genomes one using full phylogenies (Degnan inferred ants have carpenter codiversification in studies previous and vertically- 2000), and been hosts has al. endosymbionts between (Toju et and and co-speciation hosts weevils completely host between of in were between codiversification expectations found relationships of incongruence with evidence species-level phylogenetic consistent similar all endosymbionts; are some transmitted but patterns was species, These There within identified congruent. individuals we 2). endosymbionts, amongst bacterial trees their (Fig. endosymbiont and codiversification hosts ant strict carpenter generally both of sequencing endosymbionts demog- whole-genome Blochmannia host Using and of hosts effects selection. ant (3) natural carpenter and and of variation loss genome, Codiversification (2) endosymbiont gene endosymbionts, of the and patterns across endosymbiont hosts evolution on of molecular raphy codiversification and (1) sorting available to lineage related publicly in questions of investigated a number we for the sources, resource doubled genome whole than a more added We endosymbiont dynamics. and size coevolutionary host population ant carpenter host several between sequenced S11). relationship We a (Fig. for contrasts independent evidence phylogenetic no using found DISCUSSION genomes we endosymbiont this, in Despite loss S7). gene and Fig. 4, (Fig. signal Blochmannia 01 Lo 2001; Blochmannia Blochmannia tal. et vrg bu nodro antd atrta hs eotdin reported those than faster magnitude of order an about average 03,bvle (Distel bivalves 2003), Blochmannia genome. tal. et Blochmannia ee vle tart 3xfse hntehs eoe(i.3.I addition, In 3). (Fig. genome host the than faster ˜30x rate a at evolved genes 03,fle (Chen flies 2013), eoe Fg ) vrl,teerslscrooaepeiu evidence previous corroborate results these Overall, 3). (Fig. genomes ee,o 2 ihyvral adcp fpyoeei congruence phylogenetic of landscape variable highly a (2) or genes, tal. et Blochmannia tal. et 94,ahd (Clark aphids 1994), 9 Blochmannia tal. et 99 Hosokawa 1999; Blochmannia 04.Gnrly rvossuisinvestigating studies previous Generally, 2004). ee Fg A ihpyoeei statistics. phylogenetic with 3A) (Fig. genes Camponotus ulgnm eune.Wt hs re- these With sequences. full-genome eoe oadesqetosabout questions address to genomes tal. et tal. et 00,pyld (Thao psyllids 2000), ( 02,ccrahs(Clark 2012), Camponotus atra endosymbionts bacterial Buchnera pce and species ) (Clark tal. et tal. et Posted on Authorea 2 Dec 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160689628.80980759/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. CS,rpttv n rnpsbeeeet,adG otn.Pit niaesmaysaitc n100 in statistics summary indicate Points content. GC and elements, transposable for and Jisha repetitive John (CDS), thanks 1. JDM Figure JDM. to funds TABLES & startup FIGURES University Tech Texas in interested by him supported getting was work be This to taxon, those new https://github.com/jdmanthey/camponotus including a files, GithHub: ACKNOWLEDGEMENTS as on Output available it to are archive. in describe uploaded project Genome loss or this gene NCBI’s be species and in to will selection the genome for assemblies place tests the from either format uploading GFA to by followed us all subsequently allows while investigation GenBank, novo taxonomic de NCBI’s further The to All SRA. submitted NCBI’s Dryad. to being submitted are being currently first format is the data sequencing wrote specimens. raw and ant All data of the photos and analyzed macro samples, JDM took collecting JDM work. AVAILABILITY for DATA and lab fieldwork JCG performed (2) manuscript. JPH project, the the drafts. of in writing draft ideas initial of of development revisions (1) (3) to contributed somewhat authors all, found All histories. not we demographic Lastly, but host history. some, to demographic that related CONTRIBUTIONS host not found AUTHOR We by was that shaped phylogeny. loss part host gene in the endosymbiont were across random selection strength of in patterns shifts and selection itive the their across investigate rates to of evolutionary endosymbionts codiversification their and strict hosts bionts. identified ant We carpenter both of coevolution. sequencing whole-genome used We a with phylogeny; consistent 2015). the is Wernegreen in This & CONCLUSIONS loss 4). (Williams smallest gene drift (Fig. the of genetic losses with patterns and rest gene constraints random the species singleton selection while relatively were the informative, suggests signal in phylogenetically but phylogenetic This was research lacking loss dataset, previous S4). loss gene entire Table the gene S7; of the of half 4, majority in about that case (Figs. to found not the We leading was drift— not 4). thereby genetic was (Fig. of sizes, loss estimates This population gene and along host sizes loss. faster, small population occur gene with estimated would drift of endosymbionts genetic in rates significant that no part anticipated pressures, faster found in initially and selection We loss least gene decreased at S11). endosymbiont of may with (Fig. patterns pressures two on demography the selection host between in of relationship effect shifts an that in for tested appears patterns also but We it processes. to S10), endosymbionts, demographic relative 5, host insect selection by (Figs. in (Alleman influenced results relaxed identified genes be our expect often of on may fraction is Based small we selection 2012). a general, Indeed, in in only endosymbionts 2002). generally in (Wernegreen In pressures selection S6). free-living intensified Table toward relaxation shifts S10; and and 5, selection sizes positive population of host signatures between both loss. in and gene strength demography of selection host patterns of between (2) relationship and no selection, found We natural of patterns (1) evolution: genome Blochmannia Camponotus Blochmannia Camponotus ee r vligaot3xfse hnhs eoe,wt eaieyconsistent relatively with genomes, host than faster 30x about evolving are genes Blochmannia p (1-JDM) sp. Camponotus Blochmannia noybot dniyn ieg-pcfi eels agl u orelaxed to due largely loss gene lineage-specific identifying endosymbionts eoeasml n noain ilb poddt ra ni a until Dryad to uploaded be will annotations and assembly genome ee Fg.5 9 1) ncnrs,w on oiierelationship positive a found we contrast, In S10). S9, 5, (Figs. genes oeta eaeaowt ot-ogfil trip. field month-long a with ago decade a than more enovo de Blochmannia eoe In genome. eoeasml hrceitc,icuigcdn content coding including characteristics, assembly genome 10 tal. et ilb poddt ra.Alcd sdfranalyses for used code All Dryad. to uploaded be will , Blochmannia Camponotus 08 Chong 2018; .laevigatus C. ot n their and hosts ee,w on oeeiec o pos- for evidence some found we genes, tal. et ddhv h otendosymbiont most the have —did Blochmannia 09 ilas&Wernegreen & Williams 2019; Blochmannia Blochmannia sebisi FASTA in assemblies genomes1. ee (Figs. genes endosym- Posted on Authorea 2 Dec 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160689628.80980759/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. hlgne n xiie 0%qattspotfreeyrltosi.Hs re eeroe ihthe with phylogenies. rooted endosymbiont were and host trees the Host between vary these relationship. that branches to every identical indicate for were tree support topologies tree quartet niger species 100% Cataglyphis of ASTRAL proportion exhibited indicate The and labels Branch hypothesis. phylogenies phylogenetic endosymbionts. their this and ants supporting host trees of congruence Phylogenetic 2. Figure estimates. mean Mbp) 1 (i.e., window ten indicate lines solid while windows, sliding non-overlapping kbp ape The sample. Blochmannia rewsmdon otd rnebace nthe in branches Orange rooted. midpoint was tree 11 Blochmannia Posted on Authorea 2 Dec 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160689628.80980759/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. ee o lgmnswt nesrmvd o l ttsis ie niaema ausars idw of windows across values mean indicate these lines (NC for statistics, GenBank all on For published (D) molecular of removed. rates genes. indels Relative (B) ten with alignments tree. identical for an genes is zero of value in a evolution where lengths, branch including and topologies tree the species across host evolution the molecular in Variation 3. Figure 007292.1), Blochmannia .vafer B. Blochmannia (NC Blochmannia ee eaiet h otseiste.()Gn dniypretg in percentage identity Gene (C) tree. species host the to relative genes 014909.2), eoeasml ie o l e sebisi hssuya ela those as well as study this in assemblies new all for sizes assembly genome Camponotus ee.TeK9 itnemaue ieecsbtenphylogenetic between differences measures distance KF94 The genes. .chromaiodes B. ugnr.Gnaksequences: GenBank subgenera. 12 Blochmannia (NC 020075.1). eoe.()TeK9 itnebetween distance KF94 The (A) genomes. lcmni pennsylvanicus Blochmannia Blochmannia Posted on Authorea 2 Dec 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160689628.80980759/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. eut ihmlil etn orcinaecnitn ihteerslsadpeetdi iueS10. Figure in presented and results. results corrected these testing with non-multiple were consistent for strength are selection are correction in here testing Shifts presented multiple demography. Results with host aBSREL. and Results program strength the selection in using shifts obtained of Relationships 5. Figure exhibit individual absence. one indicates least yellow at light in while lost presence, genes indicates 65 green of Dark 37 S7). S4). (Table (Fig. samples signal among phylogenetic absence / presence in varied 4. Figure Blochmannia eepeec asnepyoeei eta.O 0 oa ee noae,65 annotated, genes total 607 Of heatmap. phylogenetic /absence presence gene 13 Posted on Authorea 2 Dec 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160689628.80980759/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. nlresaepyoeei analyses. phylogenetic trimming large-scale alignment automated in for tool a trimAl: Silla-Mart´ınez Gabald´onCapella-Guti´errez (2009) S, JM, T SM genomes. Robb model I, emerging Korf BL, Cantarel Berkeley Lawrence V Orlando Avagyan Ernest formatics G, Coulouris Aligner. C, Splice-Aware Camacho Accurate, (US). Fast, CA A Berkeley, Laboratory, BBMap: National (2014) B Bushnell hosts. insect in protection Symbiont-mediated 348-354. (2009) , KN Johnson JC, Brownlie A ants. Avril Formica J, Purcell A, Brelsford C Ye saltator. G, Harpegnathos Zhang finch. R, zebra in Bonasio M LTR Santos of dos diversity Silva hidden da uncovers C, Frankl-Vilches J, Boman ants. formicine of study case a history: 15 evolutionary deep resolving in TR approaches Schultz SG, sequences. Brady BB, DNA Blaimer analyze to program a to approach finder: 573-580. powerful repeats , and Tandem practical (1999) a G Benson rate: discovery false the Controlling testing. genomes. (1995) multiple sequenced Y in Hochberg families Y, sequence Benjamini repeat of identification novo de Research Automated (2002) Genome SR Eddy Z, Bao N Nikoh Shaping Pressures M, beetles. Selective Moriyama and H, Drift Anbutsu Genetic Random (2018) Genome. S Blattabacterium Kambhampati the KL, Hertweck A, Alleman CITED LITERATURE 271. , rceig fteNtoa cdm fSine,USA Sciences, of Academy National the of Proceedings 10 1. , urn Biology Current ora fterylsaitclscey eisB(Methodological) B Series society. statistical royal the of Journal 12 1269-1276. , Science tal. et , cetfi Reports Scientific tal. et , 30 329 tal. et , 0-1.e304. 304-311. , tal. et , eoeResearch Genome 1068-1071. , tal. et , 21)Gnmccmaio fteat apntsflrdnsand floridanus Camponotus ants the of comparison Genomic (2010) tal. et , 22)A nin n rddsca uegn swdsra across widespread is supergene social eroded and ancient An (2020) 20)MKR nes-oueantto ieiedsge for designed pipeline annotation easy-to-use an MAKER: (2008) 21)Pyoeoi ehd upromtaiinlmulti-locus traditional outperform methods Phylogenomic (2015) 21)Salgnm ybotudrisctcehrns in hardness cuticle underlies symbiont genome Small (2017) 20)BAT:acietr n applications. and architecture BLAST+: (2009) 8 tal. et , 1-12. , 25 14 18 1972-1973. , 188-196. , 21)Tegnm fBu-apdCordon-Bleu Blue-capped of genome The (2019) 114 Genes E8382-E8391. , 10 301. , M vltoayBiology Evolutionary BMC uli cd Research Acids Nucleic rnsi Microbiology in Trends 289-300. , M Bioin- BMC 17 27 Posted on Authorea 2 Dec 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160689628.80980759/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. olrH 18)Plmrhs n ooyotgn nNrhAeia apne ns(Hymenoptera: ants carpenter American North ferrugineus). in Camponotus f ontogeny and colony pennsylvanicus and Camponotus Formicidae: Polymorphism (1986) method. HG comparative Fowler the and Phylogenies (1985) J Felsenstein coevolution. Blochmannia. in M endosymbiont Krischke study J, a Straka plants: H, and Feldhaar Butterflies (1964) PH Raven PR, Ehrlich zoom. MS unlimited with Shamim maps JT, contact Robinson NC, Durand scaffolds. chromosome-length AD sulfur-oxidizing yields Omer among SS, Batra congruence O, Dudchenko phylogenetic hosts. for bivalve Evidence their (1994) and 533-542. C endosymbionts Cavanaugh bacterial host. H, chemoautotrophic to Felbeck endosymbiont D, from Distel transfer gene extensive reveal endosymbiont Genomics Sundstr its Y, and Wurm H, exsecta, Johansson A, Candi- Nair evolutionary endosymbionts, K, their fast Dhaygude and and species stability Camponotus Host–symbiont of (2004) Blochmannia. cospeciation JJ datus association: Wernegreen ant–bacterium CD, an parallel in Brock and rates AB, heuristics new Lazarus models, PH, more Degnan 2: jModelTest loss (2012) D gain, Posada gene R, with computing. Doallo GL, alignment Taboada genome D, congru- multiple Darriba phylogenetic progressiveMauve: (2010) for NT testing Perna rearrangement. of and B, pitfalls Mau and AE, endosymbionts (Uroleucon) Darling bacterial aphids between of Cospeciation radiation (2000) JJ recent ence. Wernegreen a P, extreme and Baumann having (Buchnera) NA, endosymbionts Moran bacterial MA, in evolution Clark Sequence (1999) P compositions. Baumann base NA, bac- Moran its MA, and Clark a between Coevolution (2001) perspective. Sciences S Biological biogeographical Kambhampati a CA, endosymbiont: Burnside terial S, aphidicola. Hossain Buchnera endosymbiont JW, obligate Clark the of evolution Evolution Genome and (2019) Biology choice. NA Molecular Moran model H, and Park inference RA, demographic Chong into insights molecular diversity: species: genomic S insect of Grusea summary host Rodr´ıguez W, its L, with symbiont Chikhi a of glossinidia. Wigglesworthia evolution Evolution endosymbiont, Molecular Concordant -associated of its (1999) and S Glossina computing genus Aksoy of statistical phylogeny S, for Li project X, R Chen the to package In: contributed analysis. Springer. and a retrieval 207-232. 1.0-2: sequences SeqinR biological (2007) to devoted JR Lobry D, Charif raleen olgeudPyilgedrTiere der Physiologie und Zoologie allgemeine ur ¨ Evolution 20 aueMethods Nature 1-16. , 54 517-525. , oeua ilg n Evolution and Biology Molecular 268 LSOne PLoS ytmtcBiology Systematic 48 393-398. , 49-58. , 9 M Biology BMC 772. , 5 e11147. , 36 tal. et , tal. et , tal. et , Science 1481-1489. , elsystems Cell 53 20)Ntiinlugaigfronvru apne nsb the by ants carpenter omnivorous for upgrading Nutritional (2007) tal. et , 21)D ooasml fteAdsagpignm sn Hi-C using genome aegypti Aedes the of assembly novo De (2017) 5 21)TeIC ivreisatnoscaecnert)a a as rate) coalescence instantaneous (inverse IICR The (2018) 356 95-110. , 48. , 92-95. , 21)Jieo rvdsavsaiainsse o Hi-C for system visualization a provides Juicebox (2016) 3 90 99-101. , 15 rceig fteRylSceyo odn eisB: Series London. of Society Royal the of Proceedings 16 mL(09 h rtdatgnmso h n Formica ant the of genomes draft first The (2019) L ¨ om 297-316. , 1586-1598. , tutrlApoce oSqec Evolution Sequence to Approaches Structural h mrcnNaturalist American The ora fMlclrEvolution Molecular of Journal olgsh Jahrb Zoologische Evolution Heredity 586-608. , 125 ce.Abteilung ucher. ¨ 1-15. , 120 13-24. , Journal BMC pp. , 38 , Posted on Authorea 2 Dec 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160689628.80980759/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. oN ad ,Wtnb ,Nlp ,Bnnt 20)Eiec o oldgnssbtendiverse between cocladogenesis for Evidence (2003) T Beninati C, endosymbionts. Nalepa intracellular H, their and Watanabe lineages C, transform. dictyopteran Burrows–Wheeler Bandi with N, alignment Lo read short accurate and Fast (2009) matics assemblies. R genome Durbin of H, Interrogation Li in BlobTools: evolution (2017) molecular ML of Blaxter rates DR, variable Laetsch the rocks: lacking when clocks Inferring bacteria. (2009) H unequal Ochman and C-H, equal Kuo under algorithms phylogeny of comparison simulation rates. A evolutionary (1994) J Felsenstein genomes. MK, novel Kuhner in finding Gene (2004) I Korf separation. repeat and K weighting Berlin k-mer BP, Walenz S, Koren culturability. and polli- diversity their their insects: and in plants ments bacteria in Endosymbiotic speciation (2009) and Y coevolution Kikuchi of Models in (1984) improvements DW Schemske 7: nators. R, version Lande software AR, alignment Kiester sequence multiple MAFFT usability. (2013) and DM performance Standley K, Katoh A Pavlicek VV, elements. Kapitonov J, red coevolution? of Jurka it success is reproductive When lifetime (1980) and DH history Janzen life Colony (2013) DM colonies. Gordon ant J, differentiation harvester Heer species A, and Pilko evolution KK, Chromosome Ingram I. ants. Japanese . of ants. studies in Karyological (1969) HT Imai ants. Japanese in (Japan) observations Genetics Chromosome of (1964) T Yosida flies. H, bat Imai in bacteria endosymbiotic of transmission R uterine Koga N, procedure. Nikoh T, test Hosokawa multiple rejective sequentially simple A 65-70. (1979) . S species. Holm Palearctic Western of Karyotypes II. 149-164. , Ant (1983) J Hauschteck communities. F symbiotic Legeai in W, Delage 3.0. C, PhyML Guyomar of performance V the assessing Lefort phylogenies: J-F, likelihood Dufayard S, Guindon genomes. reduced of E Zientz FJ, Silva R, Gil 0908180109-0908180109. , 25 h mrcnNaturalist American The ilg Direct Biology cec eotTkoKok agk eto B Section Daigaku Kyoiku Tokyo Report Science yoeei n eoeResearch Genome and Cytogenetic 1754-1760. , oeua ilg n Evolution and Biology Molecular 15 rceig fteNtoa cdm fSine,USA Sciences, of Academy National the of Proceedings ora fAia of Journal 64-66. , 4 tal. et , 35. , A eoisadBioinformatics and Genomics NAR oeua ilg n Evolution and Biology Molecular 20)Tegnm euneo lcmni oiau:cmaaieanalysis comparative floridanus: Blochmannia of sequence genome The (2003) tal. et , tal. et , tal. et , 124 tal. et , tal. et , 220-243. , 21)Rdciegnm vlto,hs–ybotc-pcainand co-speciation host–symbiont evolution, genome Reductive (2012) 21)Cn:saal n cuaeln-edasml i adaptive via assembly long-read accurate and scalable Canu: (2017) eoeResearch Genome 22)MnS ieYu ybotb agtdgnm assembly genome targeted by Symbiont Your Mine MinYS: (2020) Evolution 21)Nwagrtm n ehd oetmt maximum- estimate to methods and algorithms New (2010) 110 20)RpaeUdt,adtbs fekroi repetitive eukaryotic of database a Update, Repbase (2005) M Bioinformatics BMC 82 462-467. , 16 11 540-550. , 34 459-468. , 611-612. , 215087.215116. , 14 h SEjournal ISME The oeua ilg n Evolution and Biology Molecular 30 27-46. , 2 772-780. , ytmtcBiology Systematic lqaa047. , 5 nulrpr fteNtoa Institute National the of report Annual 59. , cniainjunlo statistics of journal Scandinavian 100 6 9388-9393. , F1000Research 577-587. , 59 irbsadEnviron- and Microbes netsSociaux Insectes 307-321. , 20 6 907-913. , Bioinfor- 1287. , 30 , Posted on Authorea 2 Dec 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160689628.80980759/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. usl A adr G oeuC 21)Htpt o yboi:fnto,eouin n pcfiiyof specificity and evolution, Formicidae). function, (Hymenoptera: phylogeny : ant for the of Hotspots short tips (2017) News of to trunk CS assembly from Moreau and associations JG, detection ant-microbe integrated Sanders MindTheGap: JA, (2014) Russell C Lemaitre R, insertions. Heterozygosity Chikhi long of and A, Estimates Samples. Gouin Joint Ancient G, (2019) and L Rizk Modern Orlando for E, Homozygosity features. Willerslev of genomic TS, Runs comparing Korneliussen and for K, Hanghøj utilities G, of Renaud suite flexible a BEDTools: (2010) formatics ant. IM carpenter Hall genomes. the AR, of large Quinlan history in life families The repeat (1908) of JL identification Pricer novo De (2005) PA Pevzner formatics NC, Jones AL, Price In: phylogenies. . using testing of hypothesis language. Evolution HyPhy: tissues (2005) R SV reproductive Muse in SLK, the Pond evolution in and Microorganisms (2020) Microbiology of SR Reviews analyses Bordenstein JI, APE: Perlmutter (2004) K Strimmer J, multiple Bioinformatics Claude accurate E, and strings. Paradis biological fast of for manipulation 2.0 Efficient Version method Biostrings: Package (2017) novel S R DebRoy A R, Gentleman T-Coffee: P, Aboyoun Pag`es (2000) H, J Heringa alignment. DG, sequence Higgins C, Notredame ciesS ubnR(04 nern ua ouainsz n eaainhsoyfo utpegenome multiple from history separation and size population human Inferring (2014) sequences. R Durbin S, Schiffels populations avian sequences. of whole-genome dynamics Temporal by (2015) revealed H Ellegren Pleistocene G, during Zhang L, Smeds C, microbes. Li and K, Nadachowska-Brzyska ants in among than Symbioses bacteria (2020) endosymbiotic CS in Moreau rates evolutionary Faster (1995) hosts. P insect Baumann CD, cospeciating Dohlen bacteria. von endosymbiotic NA, Moran in rachet USA Muller’s Sciences, and of Academy evolution National Accelerated (1996) NA transmitted Moran maternally in bottlenecks transmission and size bacteria. population endosymbiotic Estimating (2002) NA Moran A, Mira data. sequencing DNA in- E next-generation Banks structured: analyzing M, being Hanna of A, importance McKenna inference? the size On population (2016) ancestral for L evolution—lessons Chikhi 116 human S, and Boitard rates coalescence S, Camponotus stantaneous Grusea Subgenus Rodr´ıguez W, the O, and Mazet Complexes Species and Subgenera the Publishing. to Keys duction, (2019) W Mackay 362-371. , 24 43-69. , 26 21 p 2-8.Springer. 125-181. pp. , aueGenetics Nature 841-842. , i351-i358. , 20 289-290. , e ol apne nso h yedvreGnsCmoou.oue1 Intro- 1: Camponotus.Volume Genus Hyperdiverse the of Ants Carpenter World New Bioinformatics ora fMlclrBiology Molecular of Journal 1-15. , irba Ecology Microbial . ora fMlclrEvolution Molecular of Journal 46 919. , tal. et , 30 93 3451-3457. , 21)TeGnm nlssTokt aRdc rmwr for framework MapReduce a Toolkit: Analysis Genome The (2010) 2873-2878. , 44 137-143. , eoeResearch Genome 302 17 h ilgclBulletin Biological The 205-217. , urn Biology Current urn pno nIsc Science Insect in Opinion Current 41 727-731. , Genetics 20 eeis 302057.302019. genetics. , 1297-1303. , 25 ttsia ehd nMolecular in Methods Statistical 1375-1380. , 14 177-218. , abr Academic Lambert rceig fthe of Proceedings 39 Myrmecological 1-5. , Heredity Nature Bioin- Bioin- Posted on Authorea 2 Dec 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160689628.80980759/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. hn ,Rbe ,SyaiE iaa 21)ATA-I:plnma ieseiste reconstruction tree species time polynomial ASTRAL-III: (2018) trees. S gene Mirarab resolved E, partially Sayyari from M, Rabiee likelihood. C, maximum Zhang by analysis phylogenetic for package program 13 a genera. PAML: ant (1997) prevalent Z gene most Yang the parallel . are symbiosis: tribe Which ant bacteria-ant (1976) the ancient EO of an Wilson origin in the evolution spanning Genome Blochmannia among (2015) loss JJ Wernegreen indel LE, endosymbiont. context-specific Williams genome bacterial and novo a composition, de in sequence of variation selection, 44-51. visualization intraspecific Purifying interactive (2012) shape Bandage: JJ mutations Wernegreen (2015) LE, KE Williams Holt J, relaxed Zobel detecting MB, RELAX: assemblies. (2015) Schultz K RR, Scheffler Wick SL, Pond framework. phylogenetic Kosakovsky a MD, in Smith selection B, insects. Murrell sap-feeding JO, tending Wertheim Blochmanniaby Phylo- acquired another: begat Camponotini symbiosis tribe Biology nutritional ant One Evolutionary the (2009) that PS Ward evidence SG, genetic Brady SN, Kauppinen JJ, insects. Wernegreen replacements, of endosymbiosis: endosymbionts bacterial of in Diversification evolution 850-861. (2013) Genome T (2002) weevils. Fukatsu JJ in Wernegreen T, symbionts Sota of Y, promiscuity Notsu and AS, co-speciation Tanabe H, Toju P Abbot symbionts. NA, Moran ML, Thao submodel. Research A Acids computing. intron Badretdin phylogenetic M, new DiCuccio for T, a library Tatusova and Python a model DendroPy: Markov (2010) hidden 26 MT Holder a J, with Sukumaran prediction Gene (2003) S Bioinformatics Waack phylogenies. M, large of Stanke post-analysis and analysis phylogenetic for tool a Bioinformatics 8: version RAxML (2014) A selection. Stamatakis diversifying episodic of detection efficient for S Weaver JO, Wertheim MD, Open-1.0. 2013–2015. Smith RepeatModeler Open-4.0. (2008) R RepeatMasker Hubley AF, (2015) Smit P Green R, org http://repeatmasker. Hubley genome assessing A, BUSCO: orthologs. Smit (2015) single-copy with EM completeness Zdobnov annotation EV, and Kriventseva P, assembly Ioannidis RM, Waterhouse Sim˜ao FA, species. D invasive Ence an in JW, environments Kim L, Schrader 555-556. , 1569-1571. , ple n niomna Microbiology Environmental and Applied Bioinformatics 30 44 19 1312-1313. , 6614-6624. , ii215-ii225. , 9 292. , . 31 3350-3352. , tal. et , tal. et , M Bioinformatics BMC tal. et , aueCommunications Nature oeua ilg n Evolution and Biology Molecular tal. et , 21)Tasoal lmn sad aiiaeaatto onovel to facilitate islands element Transposable (2014) 20)Cseito fpyld n hi rmr rkroi endo- prokaryotic primary their and psyllids of Cospeciation (2000) 21)Ls smr:a dpiebac-ierno ffcsmodel effects random branch-site adaptive an more: is Less (2015) 21)NB rkroi eoeantto pipeline. annotation genome prokaryotic NCBI (2016) 18 66 vial rmhttp://www.repeatmasker.org from Available oeua ilg n Evolution and Biology Molecular 19 2898-2905. , 153. , tdaEntomologica Studia 5 5495. , Bioinformatics h SEjournal ISME The eoeBooyadEvolution and Biology Genome 32 820-832. , PeerJ nttt o ytm Biology. Systems for Institute auervesgenetics reviews Nature 31 19 3 e881. , 3210-3212. , 187-200. , 7 32 1378-1390. , 1342-1353. , Bioinformatics Bioinformatics . Nucleic BMC 4 3 , , Posted on Authorea 2 Dec 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160689628.80980759/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. camponotus-carpenter-ants-and-their-blochmannia-endosymbionts 496575-impact-of-host-evolutionary-history-on-endosymbiont-genome-evolution-a-test-in- Fig_2_camponotus_tree_reduced.pdf file Hosted

C Relative B A Gene % Identity Evolution Rate KF94 Distance Prop. GC Prop. Repetitive Prop. CDS 60 70 80 90 100 0 10 20 30 40 50 0.05 0.15 0.25 0.35 0.25 0.35 0.45 0.0 0.4 0.8 0.0 0.2 0.4 0 Position inBlochmanniaGenome(thousandsofbp) 100 1 200 2 3 300 vial at available 4 5 400 6 7 Scaffold 500 19 8 https://authorea.com/users/380686/articles/ Index Index 9 600 10 11 12 700 13 14 D Blochmannia (1000s of bp) Tanaemyrmex Full Dataset Camponotus 720 740 760 780 Camponotus

Tanaemyrmex 15 -31 B. ocreatus B. vafer Posted on Authorea 2 Dec 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160689628.80980759/v1 — This a preprint and has not been peer reviewed. Data may be preliminary.

# Blochmannia Genes w/ A Intensified Selection Strength 10 15 20 25 30 35 40 50 C−005 C−039 C−028 C−029 C−024 C−018 C−016 C−010 C−056 C−050 C−049 C−036 C−019 C−002 C−003 C−046 C−006 100 150 Host SpeciesHarmonicMeanPop.Size(1000s) R 200 p =0.0011 2 =0.9332 250 20

# Blochmannia Genes w/ B Relaxed Selection Strength 10 15 20 25 50 100 150 R 200 p =0.5613 2 =0.0910 C. vicinus C. sp.(2-JDM) C. sp.(1-JDM) C. herculeanus C. modoc C. laevigatus C. ocreatus 250