and their endosymbionts: and evolutionary rates Daej A Kh A M Arab

The University of Sydney Faculty of Science 2021 A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy Authorship contribution statement

During my doctoral candidature I published as first-author or co-author three stand-alone papers in peer-reviewed, internationally recognised journals. These publications form the three research chapters of this thesis in accordance with The University of Sydney’s policy for doctoral theses. These chapters are linked by the use of the latest phylogenetic and molecular evolutionary techniques for analysing obligate mutualistic endosymbionts and their host mitochondrial to shed light on the evolutionary history of the two partners.

Therefore, there is inevitably some repetition between chapters, as they share common themes.

In the general introduction and discussion, I use the singular “I” as I am the sole author of

these chapters. All other chapters are co-authored and therefore the plural “we” is used, including appendices belonging to these chapters.

Part of chapter 2 has been published as: Bourguignon, T., Tang, Q., Ho, S.Y., Juna, F., Wang,

Z., Arab, D.A., Cameron, S.L., Walker, J., Rentz, D., Evans, T.A. and Lo, N., 2018.

Transoceanic dispersal and plate tectonics shaped global distributions: evidence from mitochondrial . and , 35(4), pp.970-983.

The chapter was reformatted to include additional data and analyses that I undertook towards this paper. My role was in the paper was to sequence samples, assemble mitochondrial genomes, perform phylogenetic analyses, and contribute to the writing of the manuscript.

1

Chapter 3 of this thesis has been published as: Arab, D.A., Bourguignon, T., Wang, Z., Ho,

S.Y. and Lo, N., 2020. Evolutionary rates are correlated between cockroach symbionts and

mitochondrial genomes. Biology Letters, 16(1), p.20190702. N.L., D.A.A., T.B. and

S.Y.W.H. designed the study. Z.W. and T.B. collected and provided specimens. D.A.A. and

T.B. generated sequence data. D.A.A., T.B., N.L. and S.Y.W.H. performed data analysis and interpreted results. N.L. and D.A.A. wrote the manuscript, with contributions from T.B.,

Z.W. and S.Y.W.H.

Chapter 4 of this thesis has been accepted for publication (18/2/2021) in the Journal of

Molecular Evolution. D. A. A. and N. L. designed the study. D. A. A. performed all analyses and wrote the manuscript. N. L. provided feedback on the manuscript.

Chapter 5 includes some material from the abstract of the chapters mentioned above.

Appendix 4 of this thesis includes published versions of all the papers I contributed research to during my PhD, and on which I am an author, including 3 papers which are not described in this thesis.

The first paper has been published as: Gao, F., Chen, C., Arab, D.A., Du, Z., He, Y. and Ho,

S.Y.W., 2019. EasyCodeML: A visual tool for analysis of selection using CodeML. Ecology and Evolution, 9(7), pp.3891-3898. I performed tests of the software described in the paper using my own datasets (unpublished) and contributed to the writing of the manuscript of this paper.

2

The second paper has been published as: Beasley-Hall, P.G., Chui, J., Arab, D.A. and Lo, N.,

2019. Evidence for a complex evolutionary history of mound building in the Australian nasute (Nasutitermitinae). Biological Journal of the Linnean Society, 126(2), pp.304-314. I generated datasets for this paper and supervised analyses.

The third paper has been published as: Bourguignon, T., Kinjo, Y., Villa-Martín, P.,

Coleman, N.V., Tang, Q., Arab, D.A., Wang, Z., Tokuda, G., Hongoh, Y., Ohkuma, M. and

Ho, S.Y., Pigolotti, S., Lo, N. 2020. Increased mutation rate is linked to reduction in prokaryotes. Current Biology, 30(19), pp.3848-3855. I performed analyses on the

Blattabacterium datasets as part of the project that led to this paper, including Bayesian,

Maximum likelihood, rate correlation, and gene loss analyses.

In addition to the statements above, in cases where I am not the corresponding author of a published item, permission to include the published material has been granted by the corresponding author.

Daej A Kh A M Arab, ……………….., 02/03/2021

As supervisor for the candidature upon which this thesis is based, I can confirm that the authorship attribution statements above are correct.

Nathan Lo, ……………………, 02/03/2021

Daej A. Arab 2021

3

Statement of Originality

I certify that, to the best of my knowledge, the content of this thesis is my own work, except where specifically acknowledged. This thesis has not been submitted for any other degree or purposes.

I certify that the intellectual content of this thesis is the product of my own work and that all the assistance received in preparing this thesis and sources have been acknowledged.

Daej A. Arab 2021

4

Acknowledgements

This thesis would not exist without the help of many. I would like to thank my main

supervisor professor Nathan Lo for everything. I still remember the first email I received

from Nate discussing my Master project, the amount of passion and enthusiasm was

infectious. Nate also fought for me to obtain a scholarship to do this work. So Nate, Thank

you for introducing me to the world of and termites, which will forever be my

favourite insects and thank you for giving me this opportunity. I would also like to thank my

other supervisor professor Simon Ho for helping design the analyses that led to two chapters

of this thesis. A very special thank you has to go to Dr Thomas Bourguignon. Not only did

Thomas help immensely with the first few years of my PhD, he also helped build up my passion to study endosymbiont evolution. Thanks, Thomas, for being a friend and a mentor. A special acknowledgement to Dr Fangluan Gao for inviting me to work on his project and giving support through his visit to our laboratory. And finally, thanks to all my co-authors who helped with the publications throughout my PhD.

As an international student, I was supported by the Australian government through the

International Postgraduate Research Scholarship (IPRS). It was such an honour and privilege to receive such a scholarship. I am forever grateful to be given such an opportunity.

I had the absolute honour and privilege to work alongside incredible and highly talented

people in the Molecular Ecology, Evolution and Phylogenetics (MEEP) laboratory at the

University of Sydney. They all provided support, knowledge and comfort throughout my

years in the lab. To Cara, Charles, Tim, Perry, David and Nik, thank you for everything.

MEEP will forever be a place that I was proud to call home and I will miss it dearly.

5

The School of Life and Environmental Sciences at the University of Sydney has been my home for over 7 years. Throughout my Masters and PhD (and years in between) I managed to build myself in the image of many that run this wonderful institution. I would like to thank professor Peter Banks for his support throughout my PhD. Many thanks to all the academics, staff and students of the school and I hope to see you in future ventures.

I would not be at this stage of my life without the support of friends that made Sydney my home over the past few years. Dr Michael Holmes, thanks for being an inspiration through my whole academic path, your words and support led to this moment. Dr Zoe Patterson-Ross and Dr Ryan Leonard, thanks for the good company and inspiration. I would also like to thank my chosen family Daniel, Julia, Michael, Elyse and Alex, your love and support have been invaluable.

Mum, thanks for pushing me to aim higher. Thanks for teaching me never to give up no matter how hard things got. There are no words that can show how grateful I am to have such strong support throughout my education and so much love throughout my life. Danah and

Fatima, having younger sisters like you is irreplaceable. You inspire me to be a better man, so thank you from the bottom of my heart. Jo (the girl with the cicada tattoo), you helped pick me up and get me back on track. It is funny how a cicada tattoo and a conversation about mitochondria can lead to a publication. Thanks for the support, thanks for the love, thanks for being by my side through the hard times. Here is to more adventures together with Maxine.

Mum, Danah, Fatima and Jo, this is for you.

6

Table of Contents

Authorship contribution statement ...... 1 Statement of Originality ...... 4 Acknowledgements ...... 5 Table of Contents ...... 7 Abstract ...... 9 Chapter 1: General Introduction ...... 11 1.1 Molecular evolution ...... 11 1.2 Mutualistic insect endosymbionts ...... 14 1.3 Insect endosymbiont genomes ...... 18 1.4 cuenoti ...... 21 1.5 Cockroach phylogenetics and evolution ...... 24 Chapter 2: Phylogenetics and evolutionary timescale of cockroaches based on mitochondrial genomes and Blattabacterium symbiont genes...... 28 2.1 Introduction ...... 28 2.2 Materials and methods ...... 31 2.2.1 Cockroach sampling, DNA extraction and sequencing ...... 31 2.2.2 Cockroach mitochondrial genome assembly ...... 32 2.2.3 Blattabacterium genomic data assembly ...... 32 2.2.4 Phylogenetic analyses ...... 33 2.2.5 Molecular dating ...... 35 2.3 Results ...... 36 2.3.1 Tree topologies...... 36 2.3.2 Molecular dating ...... 45 2.4 Discussion ...... 47 2.4.1 Cockroach phylogenetics ...... 47 2.4.2 Timescale of cockroach evolution ...... 49 Chapter 3: Evolutionary rates are correlated between cockroach symbiont and mitochondrial genomes ...... 51 3.1 Introduction ...... 51 3.2 Materials and methods ...... 53 3.3 Results ...... 54 3.4 Discussion ...... 55

7

Chapter 4: Evolutionary rates are correlated between Buchnera endosymbionts and mitochondrial genomes of their aphid hosts ...... 60 4.1 Introduction ...... 60 4.2 Methods...... 63 4.2.1 Genomic data from symbionts and their hosts ...... 63 4.2.2 Phylogenetic analysis ...... 65 4.2.3 Comparisons of evolutionary rates between hosts and endosymbionts ...... 66 4.3 Results ...... 67 4.3.1 Evolutionary rate comparisons between Buchnera and aphids mtDNA ...... 67 4.3.2 Evolutionary rate comparisons between Sulcia and host mtDNA ...... 69 4.4 Discussion ...... 73 Chapter 5: General Discussion...... 78 5.1 Thesis summary ...... 78 5.2 Future directions ...... 80 5.2.1 Phylogenetics of ...... 80 5.2.2 Host and endosymbiont evolutionary rates ...... 81 References ...... 84 Appendix 1: Supplementary material for Chapter 2 ...... 105 Appendix 2: Supplementary material for Chapter 3 ...... 126 Sampling and Blattabacterium genomic data ...... 126 Phylogenetic analysis ...... 128 Root-to-tip distances and comparison of phylogenetically independent pairs of host and symbiont branch lengths ...... 128 Appendix 3: Supplementary material for Chapter 4 ...... 144 Appendix 4: Publications ...... 158

8

Abstract

Insect bacterial endosymbionts are common in nature. Obligate mutualistic intracellular endosymbionts have been acquired multiple times over the last few hundred million years by common ancestors of several major taxonomic groups. These ancient associations between hosts and endosymbionts mean that the phylogenies of the symbiotic partners are congruent.

This allows the use of endosymbiont molecular phylogenies to resolve host relationships.

Bacterial endosymbionts evolve under strong host-driven selection. Factors influencing host evolution might affect symbionts in similar ways, potentially leading to correlations between the molecular evolutionary rates of hosts and symbionts. Although there is evidence of rate correlations between mitochondrial and nuclear genes, similar investigations of hosts and symbionts are lacking.

In chapter 2, I generate molecular phylogenies based on complete mitochondrial genomes from 119 cockroach , 13 species, 7 species and multiple outgroups, including representatives of all 9 cockroach families and 20 out of 27 subfamilies.

I also generate the most comprehensive cockroach phylogeny (at the time the study was undertaken) based on Blattabacterium data, based on 104 Blattabacterium genes including strains harboured in 54 cockroach species representing all major cockroach families (except

Nocticolidae, in which it is absent) and the only strain found in a termites ( darwiniensis). I then estimate divergence times of cockroaches, based on cockroach mtDNA and Blattabacterium genes, to investigate further the large variation in divergence time estimates from recent studies.

In chapter 3, I demonstrate a correlation in molecular rates between the genomes of an endosymbiont (Blattabacterium cuenoti) and the mitochondrial genomes of their hosts

9

(cockroaches). I use partial genome data for multiple strains of B. cuenoti to compare phylogenetic relationships and evolutionary rates for 55 host/symbiont pairs. The phylogenies inferred for B. cuenoti and the mitochondrial genomes of their hosts were largely congruent, as expected from their identical maternal and cytoplasmic mode of inheritance. I find the first evidence of a correlation between evolutionary rates of the two genomes, based on comparisons of root-to-tip distances and on comparisons of the branch lengths of phylogenetically independent species pairs. My results underscore the profound effects that long-term symbiosis can have on the biology of each symbiotic partner.

In Chapter 4, I use partial genome data from multiple strains of the bacterial endosymbionts Buchnera aphidicola and Sulcia muelleri, and the mitochondrial genomes of their sap-feeding insect hosts. Both endosymbionts show phylogenetic congruence with the mitochondria of their hosts, a result that is expected due to their identical mode of inheritance. I compare root-to-tip distances and branch lengths of phylogenetically independent species pairs. Both analyses show a highly significant correlation of molecular rates between genomes of Buchnera and mitochondrial genomes of their hosts. A similar correlation was detected between Sulcia and their hosts but it was not statistically significant.

The results of this chapter indicate that evolutionary rate correlations between hosts and long- term symbionts may be a widespread phenomenon.

10

Chapter 1: General Introduction

1.1 Molecular evolution

The study of how DNA, RNA and protein sequences change across generations and between taxa has changed the fields of , evolutionary biology and ecology, among others.

Since the discovery of DNA, relationships among and between major groups have become clearer. For example, the discovery and sequencing of 16S ribosomal RNA sequences allowed scientists to split all life into three domains: Archaebacteria, Eubacteria (), and Eukarya (Fox et al. 1980; Woese and Fox 1977). Molecular analyses helped to confirm that the that gave rise to Homo sapiens split from the lineage that gave rise to chimpanzees ~5 million years ago, contradicting estimates obtained from (Goodman et al. 1998; Sarich and Wilson 1967). The explosion of DNA and RNA sequencing over the last few decades, coupled with the development of molecular phylogenetic and evolutionary techniques, have helped scientists investigate how these molecules change across generations, and to infer the evolutionary forces that act on them.

Molecular evolutionary techniques enable the calculation of genetic distances between lineages. These genetic distances are inferred from sequence alignments and the phylogenetic relationships among them. Molecular dating can then used to date divergences between lineages. This is done by adding date estimates based on and biogeographical records to help with estimating time since divergence between lineages. For these dates to be applied to the rest of the tree, molecular dating traditionally relied on the assumption that molecular evolution occurs at the same rate across all lineages on the tree (also referred to as rate consistency). Thus, the term ‘molecular-clock’ is used to describe a clock like, consistent rate of evolution across the tree of life. The rate consistency assumption is a product of the

11

neutral theory of evolution. Kimura and Ohta (1971) suggested that most amino-acid changes

in a protein are neutral, which means that these changes do not have an influence on the

fitness of an . This suggested that natural selection had little or no effect on the rate

of change, as advantageous changes will be rare and deleterious mutations will be removed

by selection, while most amino acid changes will have little or no effect on the resulting

protein. In this framework, the rate of accumulation of these neutral changes will be governed

by the underlying mutation rate, suggesting that the rate of change will remain constant as

long as the mutation rate remains unchanged. However, proteins were found to differ in their

rate of change (Dickerson 1971), challenging the notion that mutation rate is the sole force determining this rate. These differential rates were explained by considering the proportion of neutral sites in a protein sequence. Dickerson (1971) showed that proteins with higher proportions of neutral sites exhibited higher rates of molecular evolution. This transformed the way scientists viewed changes in protein sequences, as it became clear that constant sites are likely to be important for the functionality of proteins, while variable amino acids are likely to be less important (reviewed by Bromham and Penny 2003).

A number of controversies regarding molecular dating estimates have arisen especially when dates obtained through molecular dating contradicted traditional fossil, archaeological and biogeographical records (Smith and Peterson 2002). We now know that the consistency of the rate assumption is often violated either between species, genetic compartments within an individual (e.g. mitochondria vs nuclear DNA) or even between regions of the same genome (Bromham and Penny 2003; Davies et al. 2004; Woolfit and

Bromham 2003; Wu and Li 1985). This led to the creation of modern molecular clock methods that allow for molecular evolutionary rate variation between lineages (Bromham and

Penny 2003; Welch and Bromham 2005). So why does the rate of molecular evolution vary between species and genetic compartments?

12

Studies investigating the rate of molecular evolution have found many factors that might influence the rate in which molecular sequences change, including generation time, population size and environmental conditions (Davies et al. 2004; Lanfear et al. 2014; Martin

and Palumbi 1993). These traits affect either the mutation rate or the substitution rate in species. By definition, the mutation rate represents the frequency of nucleotide change

(mutations) arising in a gene or genome per unit of time, while the substitution rate considers

the probability of these changes reaching fixation. Variations in mutation rate can cause

variations in molecular rates between . Mutations can arise due to external

(environmental) factors, such as ultraviolet light (Wright et al. 2006), or internal factors, such

as the species’ DNA replication rate (Bartosch‐Harlid et al. 2003) or generation time

(Bromham et al. 1996). We know that an organism’s DNA repair machinery can fix these

changes; however, species and individuals vary in their ability to repair these changes (Drake

et al. 1998; Woodruff et al. 1984). Similarly, mitochondria and plastids possess their own

DNA repair processes which are typically encoded in the nucleus (Boesch et al. 2011). Thus,

molecular evolutionary rates usually depend on two factors: the relative effect of mutagens

on molecular sequences and the ability of species (or genetic compartments) to repair these

changes. Species with shorter generation times may show an increase in evolutionary rate, as

increased generation time leads to an increase in mistakes during DNA replication (Bromham

et al. 1996).

On the other hand, evolutionary forces such as selection and genetic drift can have an

influence on the substitution rate of specific genes or across a whole genome. A species

introduced to a new environment or switching to new diet, for example, will have changes in

selection on certain genes to allow for new functions. Relaxation of selection can lead to

changes across the whole genome. Species with small population sizes will be affected by

genetic drift which lead to more rapid fixation of mutations (Lynch and Walsh 2007). The

13

effect of effective population size (Ne) on protein sequence evolution was considered by Ohta

(1987) with the introduction of the nearly neutral theory of evolution. This theory suggests

that in small populations, genetic drift overpowers the efficacy of selection, leading to the

fixation of nearly-neutral alleles.

1.2 Mutualistic insect endosymbionts

Bacteria that reside within cells of eukaryotes, known as endosymbionts, are very common in

nature and are known to have played an important role in the evolution and diversity of many

lineages (Margulis and Fester 1991). Insecta is the largest class within the Arthropoda

phylum with over 1 million described species (with estimates suggesting the existence of

over 5 million species (Stork 2018)). It is considered to be one of the most diverse

groups. Endosymbionts are found in most insect groups and are known to provide the host

with an array of benefits, such as protection against predators (Heyworth and Ferrari 2016;

Kaltenpoth et al. 2005; Nakabachi et al. 2013; Oliver et al. 2009; Oliver et al. 2003),

resistance (Kikuchi et al. 2012), and most commonly, provisioning of nutrients

such as amino acids and vitamins (Baumann 2005; Douglas 1998; Moran et al. 2003; Sabree

et al. 2009; Salem et al. 2017; Tokuda et al. 2013). Because of all these benefits,

endosymbionts are believed to be associated with host adaptive radiation (Mitter et al. 1988)

and are known to play a role in altering a range of terrestrial systems (Raffa et al. 2008;

Schaefer and Panizzi 2000). Bacteria are the most commonly found endosymbionts in insects.

These endosymbionts are known to be metabolically diverse, allowing for the production of a

range of chemicals and nutrients that their hosts do not have the ability to produce. The

availability of these metabolites facilitates the introduction of hosts to novel environmental

niches, and is thought to result in significant diversification (Cavaliere et al. 2017).

14

Early microscopic studies were able to identify the existence of bacterial endosymbionts in specialised cells known as bacteriocytes in some ants and many plant-sap and blood feeding insects (Blochmann 1892; Buchner 1965). Experiments that aimed to cure hosts of their endosymbionts by using antibiotics found them to be essential for the host’s development, reproduction and longevity (Brooks and Richards 1955; Griffiths and Beck

1974; Ishikawa and Yamaji 1985; Malke 1964). These endosymbionts are referred to as primary obligate mutualistic symbionts (hereafter ‘primary endosymbiont’) and they are believed to be found in 10 to 15% of all insects (Baumann and Moran 1997; Buchner 1965;

Dasch 1984; Douglas 1989; Houk and Griffiths 1980). Primary endosymbionts are strictly vertically transmitted from mother to offspring during egg development (Braendle et al.

2003; Buchner 1965) and are known to be associated with certain species of insects for hundreds of millions of years (Evangelista et al. 2019; Moran et al. 2005b; Takiya et al.

2006). Some insects are also known to harbor 2 or more primary endosymbionts (co-primary endosymbionts) as is the case in many Auchenorrhyncha species (Buchner 1965) and psyllids

(Hall et al. 2016; Morrow et al. 2017).

On the other hand, many insects harbour facultative or secondary endosymbionts.

These endosymbionts are not essential for the host development, reproduction or longevity, but might provide the host with some fitness benefits (Haynes et al. 2003; Moran et al.

2005a; Tsuchida et al. 2002). Facultative endosymbionts are not necessarily found in

specialised or localised cells in the host and can be horizontally or vertically transmitted in

the population (Chiel et al. 2007; Fukatsu et al. 2000; Sintupachee et al. 2006). It is important

to note that some insect endosymbionts are difficult to categorise as primary or secondary,

such as reproductive manipulators like Wolbachia (Werren 1997) and Cardinium (Zchori-

Fein et al. 2001; Zchori-Fein et al. 2004). These bacterial endosymbionts are maternally vertically transmitted and are known be occasionally horizontal transferred between host

15

species (Hinrich et al. 2002; Werren 1997; Zchori-Fein and Brown 2002). Reproductive manipulator endosymbionts increase the frequency of infected females allowing for an increase in symbiont frequency in the population.

Classifications and identification of insect bacterial endosymbionts by microscopy proved to be impossible (Buchner 1965). However, DNA sequencing allowed the classification of many primary bacterial endosymbionts. A great number of primary bacterial endosymbionts belong to the phylum Proteobacteria. For example, primary endosymbionts such as Buchnera aphidicola (aphids), Baumannia cicadellinicola (sharpshooters),

Blochmannia floridanus (carpenter ants), Carsonella ruddii (psyllids) and Portiera aleyrodidarum (whiteflies) all belong to the class Gammaproteobacteria. Other primary endosymbionts were found to belong to the phylum, such as Sulcia muelleri

(various Auchenorrhyncha species) and Blattabacterium cuenoti (cockroaches), both belonging to the class Flavobacteria (Sabree et al. 2009). The classification of insect

endosymbionts allowed for direct comparison with their free-living relatives using experimental and phylogenetic techniques, which helped illuminate characteristics of their evolutionary pathways (Sabater-Muñoz et al. 2017).

It was assumed for a long time that the association between insect and their primary endosymbiont is an ancient one (Buchner 1965). This was clear in groups like aphids,

Auchenorrhyncha and cockroaches as most species in these groups harboured their associate primary endosymbiont, which suggested a single infection of the endosymbionts in their last common ancestor. Molecular phylogenetic studies based on a handful of genetic markers have helped confirm these ancient associations in aphids (Munson et al. 1991), tsetse

(Chen et al. 1999), cockroaches (Lo et al. 2003), whiteflies (Thao and Baumann 2004b) and

many more (Baumann and Baumann 2005; Degnan et al. 2004; Lefevre et al. 2004; Schröder

et al. 1996; Thao et al. 2000b). Furthermore, molecular phylogenetic studies of primary

16

bacterial endosymbionts and the host nuclear and mitochondrial genes have found high levels of congruence between host and endosymbiont topologies (Clark et al. 2000; Lo et al. 2003;

Takiya et al. 2006; Urban and Cryan 2012). This is to be expected due to an ancient

association between host and endosymbionts and the strictly vertical transmission of the

endosymbiont from mother to offspring. On the other hand, facultative endosymbiont

topologies were found to be incongruent with their host and the association between host and

endosymbiont much younger in comparison to their primary counterparts (Thao and

Baumann 2004a; Thao et al. 2000a). However, evidence of phylogenetic congruence between

host and facultative endosymbionts have been reported at a lower taxonomic level (Fromont

et al. 2016).

This congruence between host and symbiont topologies positions primary

endosymbionts as an important tool to investigate endosymbiont and host evolution. The

strict vertical transmission allows us to calibrate molecular clock analyses using host fossil

records. This allows for estimation of an evolutionary timeframe for these bacteria, which is

considered challenging for their free-living relatives. Using phylogenetic techniques, the

association between Buchnera and their aphid (Aphidoidea) hosts was found to be established

over 100 million years ago (Moran et al. 1993; Nováková et al. 2013; Peccoud et al. 2009), while the association between S. muelleri and their sap-feeding hosts from the suborder

Auchenorrhyncha are considered one of the most ancient examples of endosymbiosis, dating back to over 260 million years (Moran et al. 2005b). Furthermore, high congruence between the phylogenies of host and endosymbionts make molecular phylogenies based on the insect endosymbiont gene a powerful tool for investigating host phylogenies. For example,

relationships between insect taxa were confirmed in whiteflies (de Moraes et al. 2018), armored scale insects (Andersen et al. 2010), psyllids (Hall et al. 2016) and aphids (Liu et al.

2013) using their respective endosymbiont genes.

17

1.3 Insect endosymbiont genomes

In the last two decades, there has been an increase in bacterial genome sequencing and analyses of genomic data, which allows for the study of unculturable bacterial communities.

This allows comparative genomic studies to shed light on the evolutionary forces shaping the genomes of bacteria living in diverse ecological niches. It was immediately evident that genomes of insect primary endosymbionts are much smaller than their free living relatives

(McCutcheon and Moran 2012; Moran and Bennett 2014; Moya et al. 2008; Shigenobu et al.

2000). Comparative analyses suggest this gene loss occurs rapidly upon transition to a strictly host-confined lifestyle (Clayton et al. 2012). Both the environment of the host and the role of the symbiont determine which genes are lost (Degnan et al. 2009).

Examination of the genomes of primary endosymbionts have found a lack of genes associated with cell envelope development and DNA repair, while some genes essential for translation, transcription and replication are maintained (Moran and Bennett 2014). Similar to the case for mitochondria and chloroplasts, some insect endosymbionts lack many genes that are ubiquitous and essential in free-living relatives. This demonstrates an evolutionary pathway leading to metabolic dependency towards their hosts, making them unable to live outside their hosts or easily switch to another host (especially distantly related hosts). Shrinkage of insect primary endosymbiont genomes is believed to be associated with continued bottlenecking that occurs due to the strict vertical transmission (Moran et al. 2008). The population size of endosymbionts is dependent on the population size of their hosts.

Furthermore, only a small number of bacterial cells can fit in host egg cells at each host generation. Thus, insect primary endosymbionts populations are generally thought to be smaller than those of their free-living relatives (Mira and Moran 2002; Wilkinson et al.

18

2003). The constant bottlenecking of endosymbiont cells over millions of years is thought to be responsible for the reduction of the effective population size (Ne) of these endosymbionts.

This reduction in population size coupled with lack of recombination (Muller’s ratchet) of

symbiont cells is thought to lead to genome reduction through the accumulation of slightly

deleterious mutations (Fares et al. 2002; Moran 1996; Moran et al. 2008). However, recent

work indicates that increased mutation rate may also play a role in genome reduction

(Bourguignon et al. 2020).

Tiny genomes of insect endosymbionts lack a great number of genes that are essential

for maintaining function of the bacterial cells. Endosymbionts with small genomes are

usually missing many genes responsible for DNA repair, transcription, translation and RNA

and DNA synthesis (Moran and Bennett 2014). This raises a question regarding how these

bacteria function. It has been found that the obligate symbiont of mealybugs Tremblaya

princeps has acquired its own symbiont that could provide cellular information processing

genes that are missing from the genome of the bacterial host (Husnik et al. 2013;

McCutcheon and Von Dohlen 2011), while other suggestions include the expansion of protein functions to accommodate multiple catabolic needs. However more importantly, there is evidence of the role that the insect host plays in providing these essential genes. Many comparative tissue-specific transcriptomics studies have found an overexpression of host support genes in the bacteriocyte tissue containing endosymbionts (Hansen and Moran 2011;

Husnik et al. 2013; Luan et al. 2015; Nakabachi et al. 2005; Sloan et al. 2014). Mao et al.

(2018) performed a comparative tissue-specific transcriptomics study on the aster leafhopper, targeting the bacteriocyte tissues of the two endosymbionts, Sulcia muelleri and Nasuia deltocephalinicola. These endosymbionts are co-primary and have 2 of the smallest genomes found in bacterial endosymbionts. Interestingly, the genome content of each endosymbiont compliments what is missing from the other to provide the host with all 10 essential amino

19

acids (Bennett and Moran 2013). The study found tailored support from the host overexpressing certain support genes in each bacteriocyte to compensate for the missing genes from the genome of the respective endosymbiont. We now know that insect host

support genes have multiple origins including insect eucaryotic genes (Hansen and Moran

2011; Nakabachi et al. 2005), gene duplication of certain genes in the host genome (Duncan

et al. 2014; Mao et al. 2018; Price et al. 2011; Price et al. 2010), horizontal transfer of genes from infecting bacteria to the host nuclear genome (Husnik et al. 2013; Luan et al. 2015; Mao

et al. 2018; Sloan et al. 2014) and the reassignment of mitochondrial support genes (Mao et

al. 2018). All these findings highlight the intimate association between the genomes of the

host and endosymbiont.

Endosymbionts themselves have been found to influence the genome of their

neighbouring endosymbionts in the same host as demonstrated recently by Monnin et al.

(2020) in aphids and their co-primary endosymbionts. Buchnera aphidicola (hereafter

Buchnera) is usually the only obligate endosymbiont found in aphids. However, some

examples are known where a co-primary or facultative endosymbiont is found to accompany

Buchnera (Monnin et al. 2020). Serratia symbiotica (hereafter Serratia) is one such

endosymbiont found in some species of aphids. Acquisition of Serratia has occurred multiple

times in the phylogeny of aphids, ~125 Ma in the ancestor of the subfamily Lachninae, ~75

Ma in the ancestor of the Periphyllus, and more recently (less than 30 Ma) in Aphis

urticate and Microlophium carnosum (Monnin et al. 2020). It is apparent that the acquisition

of Serratia has impacted the genome of Buchnera and the physiology of the host. Ancient

associations with Serratia appear to be responsible for gene loss in the genome of Buchnera,

specifically in genes essential for some amino acid synthesis pathways (Monnin et al. 2020).

Hosts of co-obligate Buchnera and Serratia show high levels of anatomical integration with their symbionts in the form of well-developed bacteriocytes (Monnin et al. 2020). This

20

demonstrates the need to investigate how host biology and other endosymbionts can influence the evolution of endosymbionts genomes.

1.4 Blattabacterium cuenoti

Blattabacterium cuenoti (hereafter Blattabacterium) is an obligate mutualistic strictly vertically inherited endosymbiont of cockroaches and the termite

(Figure 1.1). Although generally considered to be a single species, three species of

Blattabacterium have been described in the wood-feeding cockroach genus

(Clark and Kambhampati 2003), based on their presence in different hosts. The convention of naming different Blattabacterium strains as different species based on their hosts has not

21

Figure 1.1: Blattabacterium cuenoti from a fat body cell of the termite Mastotermes darwiniensis. Endosymbiotic Blattabacterium are shown (b), adjacent to mitochondria (m) and the nucleus (n) of the cell. Used with permission from Luciano Sacchi (University of

Pavia, Italy).

22

been followed since these descriptions in 2003. However, in this work I will only be referring to the endosymbiont as Blattabacterium encompassing all strains found in cockroaches and termite. Blattabacterium is found in almost all cockroaches, with the exception of members of the family that have been examined to date. Blattabacterium is found in

highly specialised cells in the fat body of the host and in females around the oocytes in the

ovarioles, allowing for transovarial transmission to offspring (Sacchi et al. 1996).

Blattabacterium is found to be essential for host fitness and reproduction (Cornwell 1968). It was hypothesised that Blattabacterium plays a role in recycling nitrogen from the stored uric acid in the cockroach fat bodies (Donnellan and Kilby 1967; Gier 1936; Lanham 1968) and inspection of the genome of Blattabacterium has confirmed this role (Sabree et al. 2009).

Work on the cockroach germanica (Patino-Navarrete et al. 2014) has shown that the cockroach itself is able to convert uric acid to urea, which is then likely to be taken up by

Blattabacterium and converted to amino acids, as Blattabacterium genomes are equipped with gene pathways capable of synthesising almost all 10 essential amino acids (Huang et al.

2012; López-Sánchez et al. 2009; Sabree et al. 2009).

The association between cockroaches and Blattabacterium was established over 200 million years ago (Evangelista et al. 2019). In chapter 2, I generate the most comprehensive cockroach phylogeny to date, based on host mitochondrial genomes, and a phylogeny of 55

Blattabacterium taxa based on 104 genes covering all major cockroach families (with the exception of ). I perform comparisons of phylogenies between host and symbiont.

This chapter will demonstrate the power of Blattabacterium genes in resolving cockroach phylogenetic relationships, shedding light on the evolution of the group. The ancient association that Blattabacterium has with its endosymbiont makes it a model organism to study the effects of long-term symbioses on the evolution of endosymbionts and their hosts.

My co-authors and I have sequenced a number of Blattabacterium genomes from all

23

cockroach families that harbour them (Bourguignon et al. 2020; see Appendix 4)Genomes of

Blattabacterium sequenced so far range in size from 510 to 645 kb (Bourguignon et al.

2020). Looking at the lower end of this range we find a number of Blattabacterium genomes

going through genome degradation through gene erosion. The strict vertical transmission of

primary symbionts is thought to cause reduced effective population sizes (Ne), which is believed to be responsible for genome shrinkage (Moran et al. 2008). However, this alone does not explain the genome reduction phenomenon as some free-living prokaryotes with large population sizes are found with small genomes (Scanlan et al. 2009). One explanation for why some Blattabacterium genomes are shrinking is elevated mutation rate in these lineages due to lack of DNA repair machinery (Bourguignon et al. 2020). This hypothesis prompted the work in chapter 3 of this thesis, where I test for correlations between molecular evolutionary rates of cockroach mitochondrial genomes and genomes of Blattabacterium. I find a significant correlation between the two, indicating similar evolutionary forces acting on the two genomes. Then in chapter 4, I test for correlations between molecular evolutionary rates of aphids, Auchenorrhyncha mitochondrial genomes and the genomes of their endosymbionts Buchnera and Sulcia, respectively. I find a significant correlation between the aphids and Buchnera and evidence that these correlations in molecular evolutionary rates are a trait of long-term obligate mutualistic vertically transmitted endosymbiosis.

1.5 Cockroach phylogenetics and evolution

Cockroaches are among the most recognisable insects and they are infamous for their role as pests. Thirty cockroach species are known as pests, including the American cockroach,

Periplaneta americana and the German cockroach, Blattella germanica (Bell et al. 2007;

Cochran 2009). Due to this bad reputation, the majority of research surrounding cockroaches

24

is aimed at eliminating them from residential and agricultural settings. In reality though, there

around 7500 species of cockroaches in the family Blattodea living in almost all terrestrial

ecosystems (Beccaloni and Eggleton 2013; Bell et al. 2007; morphological diversity shown

in figure 1.2). This indicates an underrepresentation of the rest of Blattodea in the literature.

This is unfortunate as cockroaches are among the most diverse insect groups, and they exhibit

an array of interesting behaviors such as swimming, sound production, mimicry, and

bioluminescence (Bell et al. 2007). They are also ecologically important as they are known to

play a role in recycling dead plants and animal remains (Bell et al. 2007).

One of the most important discoveries in in the past two decades was that termites are social cockroaches (Inward et al. 2007; Klass and Meier 2006). Not only did this

discovery add around 3000 species to the Blattodea, but it also allowed for an opportunity to

understand how termites evolved their wood-feeding and social habits. Termites were found

to be nested within Blattodea and forming a sister relationship with the cockroaches in the

Cryptocercidae family. Species of Cryptocercidae are known for their wood-feeding and

biparental care and the discovery that they represented the to termites allowed for

illuminating how these two traits evolved in Blattodea. Comparison of the gut biota between

termite and Cryptocercidae species have found similar protozoan species in the two (Kitade

2004). This indicates that the evolution of wood-feeding has likely been facilitated by the

infection of the last common ancestor of cockroaches and termites with these cellulolytic

flagellates, which are known to boost the capability of their hosts to digest lignocellulose.

This symbiosis is believed to have arisen due proctodeal trophallaxis, which allows the

transfer of the symbionts to offspring and individuals in the same colony and it is believed to

have started with the evolution of social behavior in the group (Klass et al. 2008; Lo and

Eggleton 2010; Nalepa 2015). The evolution of sociality in the group remains an evolutionary puzzle. There is a need in identifying the most closely related group to

25

Figure 1.2: Cockroaches of Blattodea, (a) Ellipsidion sp., (b) Macropanesthia rhinoceros, (c) Tryonicus parvus and the termite (d) Mastotermes darwiniensis workers, soldier (with orange head), and nymphs (larger individuals). Photo credits: Photos a, c and d by Yi Kai Tea and photo b by Cameron Richardson.

26

Cryptocercidae and the termites. This discovery will allow for the investigation of the evolutionary steps leading to the evolution of sociality in Blattodea Unfortunately, there are discrepancies in the literature regarding this group. In chapter 2 of this thesis, I attempt to resolve the relationships of Cryptocercidae and termites with their closest relatives using cockroach (and termite) mitochondrial genomes and genes from Blattabacterium.

27

Chapter 2: Phylogenetics and evolutionary timescale of cockroaches based on mitochondrial genomes and Blattabacterium symbiont genes

2.1 Introduction

Cockroaches comprise ~7500 extant species within 9 families which form the order

Blattodea. Termites (~3000 species) were originally classified in the order Isoptera

(Kristensen 1981), but have since been shown to be nested within the phylogeny of

cockroaches, and have been proposed to form an epifamily (Termitoidae; Inward et al. 2007;

Lo et al. 2007b) within Blattodea. The sister group of Blattodea is the preying mantids

(Mantodea), and together these orders form the superorder . Phylogenetic

relationships among cockroach families have been the focus of a number of studies in recent decades, which have employed both morphological and molecular data (Djernæs et al. 2015;

Djernaes et al. 2012; Edgecombe 2010; Evangelista et al. 2019; Grandcolas 1996;

Kambhampati 1995; Klass and Meier 2006; Legendre et al. 2017; Legendre et al. 2015; Lo et al. 2003; Lo et al. 2007a; Murienne 2009; Svenson and Whiting 2004; Thao et al. 2000a;

Wang et al. 2017; Ware et al. 2008). There seems to be a consensus on the of most of the major cockroach families: Anaplectidae, , , Cryptocercidae,

Corydiidae, Lamproblattidae, Nocticolidae and . Blaberidae is found to be nested within the paraphyletic family , while termites form the sister group of

Cryptocercidae. Some of the relationships that remain under discussion include: i) the position of , which is sometimes found as the earliest branching lineage among cockroaches, but sometimes groups with the Ectobiidae; ii) the position of Anaplectidae, which is sometimes found to be the sister group of Cryptocercidae + termites (Djernæs et al.

2015; Wang et al. 2017), but sometimes found to be the sister of Tryonicidae +

28

Cryptocercidae + termites (Djernæs et al. 2015); iii) the sister group of Cryptocercidae + termites, which has in one study been Tryonicidae (Djernæs et al. 2015), in others

Lamproblattidae (Evangelista et al. 2019; Legendre et al. 2015), and others Anaplectidae

(Legendre et al. 2015; Wang et al. 2017). Resolving the sister group to Cryptocercidae + termites is particularly important in order to shed light on the evolution of eusociality in the group, and the origins of their unique oxymonad and parabasalid protozoa (dozens of species of which are found in no other insects; Inoue et al. 2000). For example, the monophyly of

Blattidae + Lamproblattidae + Tryonicidae + Anaplectidae would rule out Tryonicidae and

Anaplectidae as potential model transitional forms in the evolution of social behaviour and the acquisition of flagellate protozoa in the Cryptocercidae + termites clade.

Cockroaches are one of the most ancient examples of winged insects, as corroborated by their fossil record that dates to the upper . Fossil “roachoid” insects are believed to be the stem group of the superorder Dictyoptera (mantids, cockroaches and termites) and the oldest examples of these cockroach-like insects appear in the fossil record

315–318 Ma (Garwood and Sutton 2010; Zhang et al. 2013). However, the oldest unambiguous fossils of cockroaches from extant families (Grimaldi et al. 2005; Labandeira

1994; Vršanský 1997), termites (Abe et al. 2000; Krishna et al. 2013) and mantids (Vršanský

2002) all date back to ~140 Ma in the , indicating an emergence of the modern lineages of Dictyoptera during the (Grimaldi et al. 2005). Molecular clock analyses that have investigated the origins of these insects have found varying results that suggest , cockroaches and termites descended from a common ancestor that first appeared sometime between 192 and 307 Ma (Djernæs et al. 2015; Misof et al. 2014; Wang et al.

2017; Ware et al. 2010). This large window for the origin of these taxa indicates that further investigation into their evolutionary timescale is warranted.

29

The diversification of Blattodea is believed to be associated with the acquisition of the obligate intercellular mutualistic endosymbiont Blattabacterium cuenoti (hereafter

Blattabacterium) by their last common ancestor (Sabree et al. 2009). These bacterial endosymbionts are found in highly specialised cells (bacteriocytes) in the fat bodies of all cockroaches (except species of the family Nocticolidae (Lo et al. 2007a)) and the earliest

branching termite lineage that comprises Mastotermes darwiniensis (Blattabacterium appears

to have been lost in the ancestor of all other termites; Lo et al. 2003). Analyses of the genome

of Blattabacterium confirmed that it plays an important role in nitrogen recycling and amino acid provisioning in cockroaches (Huang et al. 2012; Sabree et al. 2009; Tokuda et al. 2013).

This allows hosts to thrive on nutrient poor diets, which most likely introduced the ancestors of cockroaches to new environmental niches facilitating their diversification.

Blattabacterium is vertically transmitted from mother to offspring and the symbiotic relationship is at least 140 Myr old, and possibly much older. This close association means that phylogenies of host and endosymbiont are congruent (Garrick et al. 2017; Lo et al.

2003), making them an ideal model for studying the phylogenetics of the host. Furthermore, despite this ancient association, genomes of Blattabacterium are highly conserved (Patiño-

Navarrete et al. 2013) and exhibit slower evolutionary rates in comparison to mitochondrial

genomes of the hosts, making them a good candidate for resolving relationships at for deeper

nodes with Blattodea.

The purpose of this study was to: 1) generate the most comprehensive cockroach

phylogeny (at the time the study was undertaken) in terms of sampling and number of

molecular characters, based on complete mitochondrial genomes from 119 cockroach species,

13 termite species, 7 mantis species and multiple outgroups, including representatives of all 9

cockroach families and 20 out 27 subfamilies; 2) generate the most comprehensive cockroach

phylogeny (at the time the study was undertaken) based on Blattabacterium data, based on

30

104 Blattabacterium genes including strains harboured in 54 cockroaches representing all major cockroach families (except Nocticolidae, in which it is absent) and the only strain found in a termites (Mastotermes darwiniensis). Finally, estimate divergence times of

cockroaches, based on cockroach mtDNA and Blattabacterium genes, to investigate further

the large variation in divergence time estimates from recent studies (Djernæs et al. 2015;

Misof et al. 2014; Tong et al. 2015; Ware et al. 2010).

2.2 Materials and methods

2.2.1 Cockroach sampling, DNA extraction and sequencing

Specimens for which new molecular data were generated included 115 cockroach species

(Table S2.1). Of these, 113 were used for mitogenome analyses, and 48 were used for

Blattabacterium analyses. In some cases, the same sample was used to generate data for both

mitogenome and Blattabacterium analyses (46 specimens), while in other cases, some

samples were used for either mitogenome (67 specimens) or Blattabacterium (2 specimens)

analyses. Samples were either purchased from Kyle Kandilian (roachcrossing.com) or

collected by Thomas Bourguignon, Nathan Lo, David Rentz, and others. Specimens were

preserved in RNA-Later or 100% ethanol and kept at -80 oC until DNA extraction. I extracted

fat-body DNA from 22 out of the 115 specimens examined during the study using a DNeasy

Blood & Tissue Kit (Qiagen) in preparation for shotgun sequencing of both whole

mitochondrial genomes and Blattabacterium genomic data. DNA from the remaining samples

was extracted either by Thomas Bourguignon or Qian Tang. For 48 of the total of 115

samples, we aimed to obtain Blattabacterium genomic data as well as mitogenome data by

sequencing a total of 2-4 Gb of data, while for the remaining samples, we obtained less

31

sequence data, with the aim of obtaining only mitochondrial genome data. Libraries were prepared for each sample at BGI (Shenzhen, China) and paired-end sequenced in multiple

lanes of Illumina HiSeq4000.

2.2.2 Cockroach mitochondrial genome assembly

Cockroach mitochondrial genomes were assembled de novo using CLC Genomic Workbench

10 (available from http://www.clcbio.com) by Thomas Bourguignon and Qian Tang. The

assembled contigs were mapped with ≥5 × coverage for each species to a mitogenome present

in GenBank from the nearest available relative. A consensus sequence for each taxon was

produced. The original reads were then mapped against the consensus sequence and mistakes

that occurred during the initial assembly step were corrected, generating a new consensus

sequence. This procedure was repeated until stability was reached, with no inconsistencies

detected. Then, 22 tRNAs, 13 protein-coding genes, and 2 ribosomal RNAs were annotated

using the MITOS Webserver with the invertebrate genetic code and default settings (Bernt et

al. 2013). These annotations were carried out by Thomas Bourguignon and Qian Tang.

2.2.3 Blattabacterium genomic data assembly

I performed de novo assembly of Blattabacterium genomic data using CLC Genomic

Workbench 10 (available from http://www.clcbio.com). Next, the assembled contigs were

mapped with ≥5 × coverage for each species to the most closely related reference

Blattabacterium genome available on GenBank. For most taxa, it was only possible to

assemble genomic fragments rather than full genome sequences. Original reads were mapped

against the consensus sequence of contigs for each taxon and mistakes that occurred during

the initial assembly step were corrected, generating a new consensus sequence. This

procedure was repeated until stability was reached, with no inconsistencies detected.

32

In addition to 55 Blattabacterium sequences, the genome sequences of outgroups were obtained from GenBank and included three strains of Sulcia muelleri (accession numbers CP002163, AP013293, and CP010828), one Flavobacterium gilvum (CP017479), one Lutibacter sp. (CP017478), one Tenacibaculum dicentrarchi (CP013671), and one

Polaribacter sp. (LT629752). Annotations of for all these sequences were performed using

Prokka v1.12 (Seemann 2014). In each case, automatic annotations were corrected by comparison with homologous Blattabacterium genes.

2.2.4 Phylogenetic analyses

I performed phylogenetic analyses on mitochondrial genomes of 119 cockroach species (113 sequenced and additional 6 species from Genbank), combined with sequences from 13 termites and 7 mantises (Table S2.1). For outgroups, sequences for 14 polyneopteran insect species were obtained from GenBank, including 11 stick insects and single representative of grasshoppers, stoneflies and grylloblattids. The final alignment included 153 sequences overall. Individual gene alignments were performed using the Muscle algorithm (Edgar 2004) with default settings, implemented in MEGA 5.2 (Tamura et al. 2011). Protein-coding genes were aligned as codons. Then all genes were concatenated into a single dataset.

Xia’s method (Xia and Lemey 2009) as implemented in DAMBE (Xia 2017) was used and 3rd codon positions (Iss =0.682) were found to be much more saturated than 1st (Iss

=0.248) and 2nd (Iss =0.127) codon positions. Although it was not significant, the Iss score

for mitogenome 3rd codon positions was close to the critical saturation value, indicating these

data were not suitable for analysing deep divergences in the cockroach phylogeny. Therefore,

3rd codon positions were removed from the dataset. The concatenated alignment was then

partitioned into four subsets: (i) first codon positions of protein-coding genes; (ii) second

33

codon positions of protein-coding genes; (iii) 12S and 16S rRNA genes; and (iv) tRNA genes.

For the Blattabacterium phylogenetic analyses, in total, 104 protein-coding genes were selected using the orthologs inference software OMA v1.1.2 (Altenhoff et al. 2019).

These genes were found in 95% of all taxa. All taxa had over 90% of 104 genes, except for

Aeluropoda insignis which only had 83 (~80%) genes. Missing genes were presumed to be a result of uneven sequencing coverage of samples and the relatively low sequencing coverage used, rather than the actual absence of these genes from their genomes; further work is required to confirm their presence or absence. All genes were concatenated into a 107,187 bp dataset. A translated amino acid alignment for the dataset was also prepared.

Protein-coding Blattabacterium genes were aligned at the amino acid level using

TranslatorX (Abascal et al. 2010). Using the Xia’s method implemented in DAMBE 6 (Xia and Lemey 2009; Xia et al. 2003) the Blattabacterium dataset was not significantly saturated

at 3rd codon positions (NumOTU = 32, ISS = 0.649, ISS.CAsym =0.819), nevertheless, they

were not used for phylogenetic analyses because of the proximity to critical value. Therefore,

phylogenetic analyses using alignments of 1st and 2nd codon sites only were performed. The dataset was then partitioned into 1st codon positions and 2nd codon positions.

Phylogenies of cockroaches and endosymbionts were generated using maximum

likelihood analyses in RAxML v8.2 (Stamatakis 2014), with 1000 bootstrap replicates to

estimate node support. For the cockroach mtDNA dataset, the Bayesian information criterion

in PartitionFinder (Lanfear et al. 2012) was used to select the best-fitting model of nucleotide

substitution, which was GTR+G+I for the 4 partitions mentioned above. Using jModelTest

(Darriba et al. 2012) the Blattabacterium dataset were assigned the GTR+G substitution

model for each partition based on Bayesian information criterion scores. Using ProtTest v3.4

34

(Darriba et al. 2011), the translated amino acid dataset for Blattabacterium was assigned the

CAT+CpREV model and the translated amino acid dataset for cockroach mtDNA subset was assigned the CAT+MtART model based on Bayesian information criterion scores.

Bayesian phylogenetic analyses for the cockroach mtDNA dataset were conducted using MrBayes 3.2 (Ronquist et al. 2012). The substitution model GTR+I+G was assigned to each partition. Posterior distributions were estimated using Markov chain Monte Carlo

(MCMC) sampling with four chains (three hot and one cold). Samples were drawn every

2,000 steps over a total of MCMC 5 × 106 steps. A burnin of 2 × 106 steps was discarded,

based on inspection of the trace files using Tracer v1.7 (Rambaut et al. 2018). Due to the size

of the Blattabacterium alignment, ExaBayes v1.5 (Aberer et al. 2014) was used for Bayesian phylogenetic analyses. The substitution model GTR+G was assigned to each partition.

Posterior distributions were estimated using Markov chain Monte Carlo (MCMC) sampling with a single chain. Samples were drawn every 500 steps over a total of MCMC 1 × 106 steps.

A burnin of 1 × 105 steps was discarded, based on inspection of the trace files using Tracer

v1.7.

2.2.5 Molecular dating

The evolutionary timescale cockroaches based on mitogenomes or Blattabacterium genes

were inferred using BEAST v 1.8.4 (Drummond et al. 2012). For the cockroach mtDNA

dataset, the molecular clock was estimated using minimum age constraints based on 15

fossils (Table S2.2). For the cockroach mtDNA subset and Blattabacterium datasets, the

molecular clock was estimated using minimum age constraints based on four fossils (Table

S2.3). Fossils were selected following suggested criteria for justifying fossil calibrations

described by Parham et al. (2012). Soft maximum bounds were determined using

phylogenetic bracketing (Ho and Philips, 2009), and for the Blattabacterium datasets, a soft

maximum bound of 315 Ma was set for the root node, representing the oldest known 35

cockroach-like fossil (Carpenter 1966).The same soft maximum was also applied to the

cockroach mtDNA subset. BEAST analyses were performed on the complete cockroach

mtDNA dataset, and on a subset of 26 Blattabacterium protein-coding genes (this number of

genes, out of a total of 104 was chosen due to computational difficulties in running larger

datasets). BEAST analysis was repeated on 3 additional subsets of randomly selected 26

Blattabacterium protein-coding genes (from the total of 78 available). Each Blattabacterium

subset was unique and the alignment size was between 17,514 – 18,080 bp after removing 3rd

codon positions (26,271 – 27,120 bp with 3rd codon positions) (Table S2.4).

For each analysis, posterior distributions of parameters, including the tree, were

estimated using MCMC sampling. A single MCMC run was estimated, with the tree and parameter values sampled every 10,000 steps over a total of 108 generations. A burn-in of 107

steps was discarded. A maximum-clade-credibility tree was obtained using TreeAnnotator in the BEAST software package. Tracer v1.7 (Rambaut et al. 2018) was used to check acceptable sample sizes and convergence to the stationary distribution.

2.3 Results

2.3.1 Tree topologies

Based on the complete cockroach mtDNA dataset, the tree topologies estimated using maximum likelihood (RAxML) and Bayesian methods (MrBayes and BEAST) yielded similar phylogenetic estimates with regard to Dictyoptera (Figures 2.1 – 2.3). All analyses showed strong support for the monophyly of Dictyoptera and cockroaches (including termites). Regarding cockroach interfamily relationships, Blattodea consistently formed 2 monophyletic groups; one included Nocticolidae, Corydiidae, Ectobiidae and Blaberidae,

36

Figure 2.1: of cockroaches inferred from complete mitochondrial genomes using maximum likelihood in RAxML with a GTR+G model. Third codon positions of protein-coding genes were excluded from the analysis. Node labels are bootstrap support values.

37

Figure 2.2: Phylogenetic tree of cockroaches inferred from complete mitochondrial genomes using Bayesian inference in MrBayes with a GTR+G model. Third codon positions of protein-coding genes were excluded from the analysis. Node labels are posterior probabilities.

38

Figure 2.3: Bayesian time-tree of cockroaches inferred from complete mitochondrial genomes, with third codon positions excluded. The time-tree was calibrated with 15 fossils, including Mylacris and Juramantis. Numbers are labels for calibrated nodes: 1. Mylacris anthracophila, 2. Juramantis initialis, 3. Valditermes brenanae, 4. Cratokalotermes santanensis, 5. Reticulitermes antiquus, 6. Coptotermes sucineus, 7. Nanotermes, 8. Balatronis libanensis, 9. Ergaula stonebut, 10. houlberti, 11. Gyna obesa, 12. , 13. Pycnoscelus gardneri, 14. Ischnoptera gedanensis, 15. Epilampra. The scale bar is given in millions of years. Grey bars at internal nodes represent the 95% credibility intervals of age estimates. Branches are labelled with symbols representing the minimal support in three analyses: posterior probabilities inferred with BEAST, posterior probabilities inferred with MrBayes under a GTR+G substitution model, and bootstrap support inferred with RAxML.

39

while the other included Anaplectidae, Lamproblattidae, Cryptocercidae, termites,

Tryonicidae and Blattidae.

The monophyly of all cockroach families was obtained with strong support in each of the three analyses except for Ectobiidae (Figures 2.1–2.3). Blaberidae was consistently placed

within the paraphyletic Ectobiidae clade in all analyses. Nocticolidae formed a monophyletic

group was the sister group of with Corydiidae and together they were the sister group of the

clade containing Blaberidae and Ectobiidae, although both these groupings did not have high

support values (Figures 2.1 – 2.3). The monophyly of Anaplectidae + Lamproblattidae,

Blattidae + Tryonicidae and Cryptocercidae + termites was consistently recovered with high

support values in all analyses (Figures 2.1 – 2.3; note that in the case of Anaplectidae and

Lamproblattidae, only two taxa from each family were included). The sister clade to

Cryptocercidae + termites was Tryonicidae + Blattidae in both RAxML and BEAST analyses

with low support (Figures 2.1 & 2.3). However, in the MrBayes analysis, Anaplectidae +

Lamproblattidae was the sister group to Cryptocercidae + termites with low support (Figure

2.2).

Based on analyses featuring increased amounts of data (based on Blattabacterium genes), but

fewer taxa, Blattodea consistently formed 2 monophyletic groups; one included Ectobiidae

and Blaberidae, while the other included Corydiidae, Anaplectidae + Lamproblattidae,

Cryptocercidae + termites, Tryonicidae + Blattidae (Figures 2.4–2.7). The monophyly of each

cockroach family was obtained with strong support in all analyses except for Ectobiidae

(Figures 2.4–2.7). In all analyses, Ectobiidae was paraphyletic with respect to Blaberidae, which was nested among 2 main Ectobiidae . The monophyly of Anaplectidae +

Lamproblattidae, Blattidae + Tryonicidae and Cryptocercidae + termites was consistently recovered with high support values in all analyses (Figures 2.4 – 2.7). The sister clade to

Cryptocercidae + termites was Tryonicidae + Blattidae in both RAxML and BEAST

40

Figure 2.4: phylogenetic tree of Blattabacterium inferred using maximum likelihood from 104 protein-coding genes (3rd codon sites excluded from both data sets). Shaded circles at nodes indicate bootstrap values (black = 100%, grey = 85–99%). Nodes without black or grey circles have bootstrap values <85%. Branches are coloured according to their membership of different cockroach families.

41

Figure 2.5: phylogenetic tree of Blattabacterium inferred using maximum likelihood, based on amino acid sequences translated from protein-coding genes. Shaded circles at nodes indicate bootstrap values (black = 100%, grey = 85–99%). Nodes without black or grey circles have bootstrap values <85%. Branches are coloured according to their membership of different cockroach families.

42

Figure 2.6: Timescale of Blattabacterium evolution. The tree was inferred using Bayesian analysis in BEAST, on the basis of 26 protein-coding genes with 3rd codon sites excluded. Node bars indicate 95% credibility intervals of node ages. Colours of branches and species names represent different cockroach families, as shown at the base of the figure.

43

Figure 2.7: phylogenetic tree of Blattabacterium inferred using Bayesian methods in ExaBayes from 104 protein-coding genes (3rd codon sites excluded from both data sets). Shaded circles at nodes indicate bootstrap values (black = 100%, grey = 85–99%). Nodes without black or grey circles have bootstrap values <85%. Branches are coloured according to their membership of different cockroach families.

44

analyses with low support (Figures 4 & 6). However, Anaplectidae + Lamproblattidae was the sister group with high support according to the ExaBayes analysis (Figure 5) and with

low support based on Blattabacterium translated amino acid dataset (Figure 7).

2.3.2 Molecular dating

Based on the molecular dating analysis performed on the cockroach mtDNA dataset, lineages leading to Dictyoptera and its sister clade (containing stick insects and grylloblattids) diverged around 320 Ma (95% credibility interval 315.0–333.6 Ma; Figure 3). The lineages leading to mantids and cockroaches + termites subsequently diverged 263.4 Ma (95% CI

236.3–291.5 Ma; Figure 3). The last common ancestor of cockroaches + termites was estimated to have appeared 235.2 Ma (95% CI 209.5–263.2 Ma; Figure 3) giving rise to a clade containing Anaplectidae, Lamproblattidae, Cryptocercidae, termites, Tryonicidae, and

Blattidae, and a clade containing Nocticolidae, Corydiidae, Ectobiidae and Blaberidae.

The divergence time between Anaplectidae and Lamproblattidae was estimated around 179

Ma (95% credibility interval 147.1– 209.8 Ma; Figure 3), while the divergence between

Blattidae + Tryonicidae and Cryptocercidae + termites was estimated at 205.1 Ma (95% credibility interval 183.1 – 227.7 Ma; Figure 3). Cryptocercidae and termites were estimated to have diverged 182 Ma (95% CI 160.6 – 204.1 Ma; Figure 3), while Blattidae and

Tryonicidae were estimated to have diverged ~144 Ma (95% credibility interval 125 – 172.7

Ma; Figure 3). The paraphyletic Ectobiidae was represented by 3 main lineages, the earliest of which was estimated to have appeared 170 Ma (95% credibility interval 149.3 – 190.1 Ma;

Figure 3). The lineage leading to Blaberidae diverged from its ectobiid sister group ~160 Ma.

Blaberidae was estimated to have appeared ~ 129 Ma (95% credibility interval 114.5 – 145

Ma; Figure 3).

45

Corydiidae and Nocticolidae were estimated to have diverged 213.6 Ma (95% credibility interval 189.4 – 238 Ma; Figure 3) and the appearance of Corydiidae was estimated at ~169

Ma (95% credibility interval 136 – 202.5 Ma; Figure 3).

Based on the molecular dating analysis performed on the Blattabacterium dataset, the last

common ancestor of cockroaches + termites appeared 222 Ma (95% credibility interval 175–

284 Ma; Figure 6), giving rise to a clade containing strains infecting Corydiidae, termites,

Cryptocercidae, Blattidae, Anaplectidae, Tryonicidae, and Lamproblattidae, and a clade

containing strains infecting Ectobiidae and Blaberidae. Similar estimates were obtained using

the additional 3 subsets, with the earliest divergence occurring between 225–235 Ma (95% CI

177–296 Ma; Figure S1, 95% CI 177–299 Ma; Figure S2 and 95% CI 186–295 Ma; Figure

S3).

The divergence time between Anaplectidae and Lamproblattidae was estimated at 133 Ma

(95% credibility interval 94.7– 170.4 Ma; Figure 6), while the divergence between Blattidae

+ Tryonicidae and Cryptocercidae + termites was estimated at 166.3 Ma (95% credibility interval 143.2 – 197 Ma; Figure 6). Cryptocercidae and termites were estimated to have diverged 136.6 Ma (95% CI 130 – 156.1 Ma; Figure 6), while Blattidae and Tryonicidae were estimated to have diverged ~135 Ma (95% credibility interval 125 – 158.8 Ma; Figure 6).The paraphyletic Ectobiidae was represented with 2 major lineages, the first lineage was estimated to have appeared 186.6 Ma (95% credibility interval 135.3 – 249.7 Ma; Figure 6), while the sister to Blaberidae appeared 119.1 Ma (95% credibility interval 83.1 – 170.5 Ma;

Figure 6). Blaberidae was estimated to have appeared ~ 80.7 Ma (95% credibility interval 54

– 107.1 Ma; Figure 6). Similar results were obtained with all other subsets (Figures S1–S3).

46

2.4 Discussion

2.4.1 Cockroach phylogenetics

Mitochondrial genomes have been shown to be a suitable markers for resolving phylogenetic relationships among families within various insect orders (Bourguignon et al. 2014;

Bourguignon et al. 2017; Bourguignon et al. 2016; Cameron 2014; Cameron et al. 2012).

Results obtained in this study further demonstrate the value of mitochondrial genomes for

resolving ancient divergences among insects, specifically among lineages leading to extant

families of cockroaches. Furthermore, this study included a cockroach phylogeny based on a

relatively large dataset of 104 genes of the obligate endosymbiont Blattabacterium. Results

obtained from analysing Blattabacterium genes provide an alternative view of phylogenetic

relationships among cockroach lineages.

The monophyly of Dictyoptera is well supported in all cockroach mtDNA analyses in

this study, providing further evidence to the sister relationship of mantises and cockroaches,

as found in a number of previous studies (Djernaes et al. 2012; Inward et al. 2007; Legendre et al. 2015; Svenson and Whiting 2004; Thao et al. 2000a). Monophyly of all cockroach families were confirmed, with the exception of Ectobiidae, which was consistently paraphyletic with respect to Blaberidae. Analyses based on Blattabacterium genes provided the same results regarding cockroach families (excluding Nocticolidae, as species from this family do not harbour the endosymbiont, as well as Mantodea, which do not possess

Blattabacterium). Sister group relationship between Lamproblattidae and Anaplectidae was confirmed in all analyses. This is novel to our analyses, past phylogenies either did not include a representative of one of the families (Evangelista et al. 2019; Legendre et al. 2015)

or found the two to be paraphyletic (Djernæs et al. 2015; Wang et al. 2017). While sister group relationship between Tryonicidae and Blattidae was confirmed in all analyses

47

mirroring findings by Evangelista et al. (2019) but are different to Djernæs et al. (2015),

Legendre et al. (2015) and Wang et al. (2017).

Determining the sister group of Cryptocercidae + termites is essential for

understanding the evolution of sociality in the group and the key acquisition of parabasalid

and oxymonad flagellates by their ancestor. Previous cockroach phylogenetic studies

(Djernæs et al. 2015; Wang et al. 2017) grouped either Tryonicidae, Anaplectidae, or a combination of these two taxa with Cryptocercus + termites, although without strong support.

Results in this chapter obtained from RAxML and BEAST analyses based on nucleotide

datasets, support the sister relationship between Blattidae + Tryonicidae and Cryptocercidae

+ termites. However, MrBayes and ExaBayes analyses using nucleotide datasets and the

RAxML analysis using the translated amino acid Blattabacterium alignment found

Anaplectidae + Lamproblattidae to be the sister of Cryptocercidae + termites. These results

appear to rule out tryonicids and anaplectids as potential model transitional forms in the evolution of social behaviour and the acquisition of flagellate protozoa in the

Cryptocercidae + termites clade. Recently, a study based on a large dataset of cockroach nuclear genes found Lamproblattidae to be the sister group to Cryptocercidae + termites

(Evangelista et al. 2019). It is important to note however, that the phylogeny did not include any representatives of Anaplectidae. Due to the high level of discrepancies in the literature concerning the sister group of Cryptocercidae + termites, including the results of this chapter, this relationship should be considered unresolved. To help resolve this relationship, increasing taxon sampling of Anaplectidae and Lamproblattidae in future phylogenetic studies based on either Blattabacterium or cockroach nuclear genes will be required.

A single species of Nocticolidae was examined in mt genome analyses in this study.

In all analyses, Nocticolidae was found to be the sister group to Corydiidae with very low support. This is in agreement with the most recent phylogeny based on cockroach nuclear

48

genes (Evangelista et al. 2019). The position of Corydiidae differed between analyses using cockroach mtDNA and Blattabacterium genes. Corydiidae + Nocticolidae was found to be sister to the group containing Ectobiidae and Blaberidae based on cockroach mtDNA trees, as found in previous studies that used a few nuclear and mitochondrial loci (Djernaes et al.

2012; Inward et al. 2007). In all Blattabacterium trees however, Corydiidae was found to be

sister to the monophyletic group containing Lamproblattidae + Anaplectidae, Blattidae +

Tryonicidae and Cryptocercidae + termites. The latter position of Corydiidae is congruent

with the most recent phylogeny based on nuclear cockroach genes (Evangelista et al. 2019).

This finding highlights the value of Blattabacterium genes in resolving deep phylogenetic

relationships of their hosts.

2.4.2 Timescale of cockroach evolution

Estimates of the age of Dictyoptera from the TimeTree database (Hedges et al. 2006) vary greatly, ranging between 137 and 307 Ma. A large insect nuclear genomic dataset using 37 fossil calibrations (none of which was included in this study) estimated that the last common ancestor of Dictyoptera appeared 197 Ma (95% CI 159–243 Ma) (Misof et al. 2014).

Reanalysis of this dataset to include the roachoid fossil Mylacris produced an older date for the origin of Dictyoptera, placing the last common ancestor of the group at 236 Ma (95% CI

215–273 Ma) (Tong et al. 2015). The estimate produced in this chapter agrees with age estimates of the latter (236.3–291.5 Ma), despite the fact that a different dataset and different fossil calibrations were used, except for Mylacris which was used in both analyses.

Molecular clock analyses of cockroach mtDNA and Blattabacterium genes in this chapter improved our understanding of the origin of Blattodea. All analyses estimate that the last common ancestor of cockroaches and termites appeared 222 – 235 Ma. These estimates are significantly earlier than the first undisputed fossils of modern cockroaches that date back

~140 Ma in the Cretaceous (Lo et al. 2003; Nalepa and Bandi 2000; Vršanský 1997), and 49

markedly older than molecular clock estimates obtained by Evangelista et al. (2019), which

found the crown of Blattodea to date back to 202 Ma. It is a common occurrence to find

cockroach-like fossils in deposits representing all epochs from the late Carboniferous to the

late Jurassic. These fossils show a general trend of reduction in size of the ovipositor over

time, until the Cretaceous, where cockroach-like fossils lack ovipositors and resemble extant cockroaches. The absence of fossils resembling extant cockroach families at their inferred origin of 235 Ma could be explained by proposing that these ancestors are actually represented∼ by fossil taxa with ovipositors. This in turn would suggest that ovipositors were lost in the ancestors of modern cockroaches independently in multiple cockroach lineages by the Cretaceous. Alternatively, modern cockroaches might have been not common in the

Jurassic, explaining the lack of representation in the fossil record.

50

Chapter 3: Evolutionary rates are correlated between cockroach symbiont and mitochondrial genomes

3.1 Introduction

Rates of molecular evolution are governed by a multitude of factors and vary significantly among species (Bromham and Penny 2003; Ho and Lo 2013). In the case of symbiotic organisms, such rates may be influenced by the biology of their symbiotic partner, in addition to their own. This is particularly the case for strictly vertically transmitted, obligate intracellular symbionts (hereafter ‘symbionts’), which have a highly intimate relationship with their hosts (Douglas 2010). For example, a small host effective population size will potentially lead to increased fixation of slightly deleterious mutations within both host and symbiont genomes, owing to the reduced efficacy of selection.

When the phylogenies of host and symbiont taxa are compared, simultaneous changes in evolutionary rate between host-symbiont pairs might be evident in their branch lengths.

Some studies have found a correlation in evolutionary rates between nuclear and mitochondrial genes in sharks (Martin 1999), herons (Sheldon et al. 2000), and turtles

(Lourenço et al. 2013), between plastid and mitochondrial genes in angiosperms, and between nuclear, plastid and mitochondrial genes in green algae (Hua et al. 2012; Sloan et al.

2012; Smith and Lee 2010), suggesting that host biology affects substitution rates in nuclear and cytoplasmic genomes in similar ways. In insects, nuclear genes that interact directly with mitochondrial proteins have shown rate correlations with mitochondrial genes (Yan et al.

2019).

Potential correlations in evolutionary rates between hosts and bacterial symbionts remain untested. Evidence for correlated levels of synonymous substitutions was found in a study of one nuclear gene and two mitochondrial genes from Camponotus ants and three

51

genes from their Blochmannia symbionts (Degnan et al. 2004). However, the study did not

determine whether this correlation was driven by rates of evolution, time since divergence, or

both. Numbers of substitutions tend to be low for closely related pairs of hosts and their

corresponding symbionts, and high for more divergent pairs, leading to a correlation with

time that does not necessarily reflect correlation in evolutionary rates.

Blattabacterium cuenoti (hereafter Blattabacterium) is an intracellular bacterial symbiont that has been in an obligatory intracellular and mutualistic relationship with cockroaches for over 200 million years (Bourguignon et al. 2018; Evangelista et al. 2019).

Found in highly specialised cells in the fat bodies of cockroaches, Blattabacterium is required for host fitness and fertility, and is transovarially transmitted from the mother to the progeny

(Cornwell 1968; Sacchi et al. 1996). Genome-wide analyses of the symbiont have confirmed its role in host nitrogen metabolism and the synthesis of essential amino acids (López-

Sánchez et al. 2009; Sabree et al. 2012). The genomes of 21 Blattabacterium strains sequenced to date are highly reduced compared with those of their free-living relatives, ranging in size from 590 to 645 kb (Kinjo et al. 2018; Vicente et al. 2018). They contain genes encoding enzymes for DNA replication and repair, with some exceptions (holA, holB, and mutH) (Kambhampati et al. 2013; López-Sánchez et al. 2009; Vicente et al. 2018). The extent to which host nuclear proteins are involved in the cell biology of Blattabacterium, and particularly DNA replication, is not well understood.

We recently performed a study of cockroach evolution and biogeography using mitochondrial genomes (Bourguignon et al. 2018). During this process, we obtained partial genomic information for several Blattabacterium strains. These data provide the opportunity to test for correlation of molecular evolutionary rates between Blattabacterium and host- cockroach mitochondrial DNA. Here we infer phylogenetic trees for 55 Blattabacterium strains on the basis of 104 genes and compare branch lengths and rates of evolution for host-

52

symbiont pairs across the phylogeny. We find evidence of markedly increased rates of evolution in some Blattabacterium lineages, which appear to be matched by increased rates of evolution in mitochondrial DNA of host lineages.

3.2 Materials and methods

A list of samples and collection data for each cockroach examined is provided in Table S3.1

(see electronic supplementary material, ESM). For the majority of taxa examined in this study, we obtained Blattabacterium sequence data from genomic libraries originally used in a previous study of cockroach mitochondrial genomes carried out by our laboratories

(Bourguignon et al. 2018). In some cases, new genomic data were obtained from fat bodies of individual cockroaches (see ESM for further details). We obtained 104 genes of 55

Blattabacterium strains from these data and aligned them with orthologues from seven taxa from (details provided in ESM).

Genomic data were assembled and annotated, and then aligned and tested for saturation. After the exclusion of 3rd codon sites in each data set, total lengths for the mitochondrial and Blattabacterium alignments were 11,051 bp and 71,458, respectively. The former was partitioned into four subsets (1st codon sites, 2nd codon sites, rRNAs, and tRNAs),

and the latter into two subsets according to codon positions. Trees were inferred using

maximum likelihood in RAxML v8.2 (Stamatakis 2014), using 1000 bootstrap replicates to

estimate node support. We examined congruence between host and symbiont phylogenies

using the distance-based ParaFit (Legendre et al. 2002) in R 3.5.1 (R Core Team 2018).

Root-to-tip distances from the RAxML analyses for each host and symbiont pair were

subjected to Pearson correlation analysis. Branch-length differences between hosts and symbionts were compared for 27 phylogenetically independent species pairs across the

53

topology (see Figure S3.1 in appendix 2). These were calculated using a fixed topology

(derived from the Blattabacterium analysis described above) for each of the following three data sets: 1) 1st+2nd codon sites of protein-coding genes; 2) translated amino acid sequences;

3) 1st+2nd codon positions of protein-coding genes plus the inclusion of rRNAs+tRNAs in the case of the mitochondrial data set. Further details on phylogenetic methods are provided in the ESM.

3.3 Results

In all analyses, there was strong support for the monophyly of each cockroach family with the exception of Ectobiidae (Figure 3.1). The topologies inferred from the host and symbiont data sets were congruent (p = 0.001). In only two cases was a disagreement found to be supported by >85% bootstrap support in both trees (the sister group of Lamproblattidae + Anaplectidae; the sister group to Carbrunneria paraxami + Beybienkoa kurandensis).

We found a correlation between root-to-tip distances for protein-coding genes from hosts and their symbionts (R = 0.75, Figure 3.2a). Similar results were found when

rRNAs+tRNAs were included in the host data set (R = 0.73, Figure S3.4a). The highest rates

of evolution in the host and symbiont data sets (on the basis of branch lengths; see Figure 3.1)

were in members of an ectobiid clade containing Allacta sp., Amazonina sp., Balta sp.,

Chorisoserrata sp., and Euphyllodromia sp., and a separate clade containing the

Anaplectidae. After excluding these taxa, evolutionary rates remained correlated, although to

a lesser degree (R = 0.35, Figure 3.2b). The sharing of branches between taxa in the

estimation of root-to-tip distances renders the data in these plots phylogenetically non-

independent and precludes statistical analysis.

54

A comparison of branch lengths among phylogenetically independent pairs of host and symbiont taxa based on protein-coding genes revealed a significant correlation between their rates of evolution (R = 0.40, p = 0.039; Figure 2c, Figure S3.1). Equivalent analyses of branch lengths inferred from amino acid data also revealed a significant rate correlation between host and symbiont (R = 0.43, p = 0.023; Figure S3.3a). However, there was no rate correlation between host and symbiont following inclusion of rRNA+tRNA in the host mitochondrial data set (R = 0.27, p = 0.17; Figure 3.2d). Analyses involving standardisation of branch-length differences yielded significant rate correlations for the protein-coding gene and amino acid data sets (R = 0.43–0.48, p = 0.011–0.023; see Figures S3.2, S3.3), and mixed results in the case of the inclusion of rRNA+tRNA partitions in the host mitochondrial data set (R = 0.34–0.40, p = 0.041–0.085; see Figure S3.4).

3.4 Discussion

We have detected a correlation in molecular evolutionary rates between Blattabacterium and host mitochondrial genomes, using two different methods of analysis. To our knowledge, this is the first evidence of such a correlation in a host-symbiont relationship. Previous studies found a correlation in evolutionary rates between mitochondrial and nuclear genes in various animal groups (Lourenço et al. 2013; Martin 1999; Sheldon et al. 2000; Yan et al. 2019).

Similar forces acting on the underlying mutation rates of both host and symbiont genomes could translate into a relationship between their substitution rates. This could potentially occur if symbiont DNA replication depends on the host’s DNA replication and repair machinery (Moran and Bennett 2014). In leafhoppers, a number of nuclear-encoded proteins targeted to mitochondria have been retargeted to support nutritional symbionts (Mao et al. 2018). A study of insect genomic data found a correlation in rates between

55

Figure 3.1: Congruence between (a) phylogenetic tree of host cockroaches inferred using maximum likelihood from whole mitochondrial genomes, and (b) phylogenetic tree of Blattabacterium inferred using maximum likelihood from 104 protein-coding genes (3rd codon sites excluded from both data sets). Shaded circles at nodes indicate bootstrap values (black = 100%, grey = 85–99%). Nodes without black or grey circles have bootstrap values <85%. Red outlines on circles indicate disagreement between the phylogenies. Branches are coloured according to their membership of different cockroach families.

56

Figure 3.2: Comparison of evolutionary rates of Blattabacterium symbionts and their host cockroaches. (a) Correlation of root-to-tip distances in phylogenies of Blattabacterium and cockroaches, inferred using maximum-likelihood analysis of protein-coding genes from each data set, with 3rd codon sites excluded. (b) Correlation of root-to-tip differences following the removal of five rapidly evolving ectobiid taxa (Amazonina sp., Chorisoserrata sp., Allacta sp., Balta sp., and Euphyllodromia sp.) and two anaplectids. Colours represent data from representatives of different cockroach families, as shown in the colour key. (c) Correlation of log-transformed branch-length differences between phylogenetically independent pairs of host and symbiont taxa, based on protein-coding genes only, and (d) with the addition of rRNAs and tRNAs to the host mitochondrial data set

57

mitochondrial genes and nuclear genes that encode proteins targeted to mitochondria (Yan et al. 2019). The level of integration of host-encoded proteins in the metabolism of

Blattabacterium, and interactions between Blattabacterium and mitochondria, are not well understood. Further exploration of these interactions may shed light on the causes of the correlation in rates that we have found here.

Short host generation times could potentially lead to elevated evolutionary rates in host and symbiont (Bromham 2009), assuming that increased rates of symbiont replication are associated with host reproduction, as is found in Blochmannia symbionts of ants (Wolschin et al. 2004). Variations in metabolic rate and effective population size between host taxa could also explain the rate correlations that we have observed. Unfortunately, with the exception of a few pest and other species, generation time, metabolic rates, and effective population sizes are poorly understood in cockroaches. This precludes an examination of their influence on evolutionary rates in host and symbiont.

The addition of mitochondrial rRNA+tRNA data weakened the correlations found in the branch-length comparisons of species pairs. The reasons for this are unclear but they might be associated with the conserved nature of tRNAs and the stem regions of rRNAs, or highly variable loop regions in the latter.

Blattabacterium is a vertically transmitted, obligate intracellular mutualistic symbiont, whose phylogeny is expected to mirror that of its hosts. This is especially the case for phylogenies inferred from mitochondrial DNA, since mitochondria are linked with

Blattabacterium through vertical transfer to offspring through the egg cytoplasm. As has been found in previous studies (Clark et al. 2001; Garrick et al. 2017; Lo et al. 2003), we observed a high level of agreement between the topologies inferred from cockroach mitochondrial genomes and from the 104-gene Blattabacterium data set. In some cases, however,

58

disagreements were observed between well-supported relationships. The variability in rates that we observed between some lineages, and/or the highly increased rate of mitochondrial

DNA compared with Blattabacterium DNA, could be responsible for these disagreements.

Owing to long periods of co-evolution and co- between cockroaches and

Blattabacterium (Bourguignon et al. 2018; Lo et al. 2003), potential movement of strains between hosts (for example, via parasitoids) is not expected to result in the establishment of new symbioses, especially between hosts that diverged millions of years ago.

In conclusion, our results highlight the profound effects that long-term symbiosis can

have on the biology of each symbiotic partner. The rate of evolution is a fundamental

characteristic of any species; our study indicates that it can become closely linked between

organisms as a result of symbiosis. Further studies are required to determine whether the

correlation that we have found here also applies to the nuclear genome of the host. Future

investigations of generation time, metabolic rate, and effective population sizes in

cockroaches and Blattabacterium will allow testing of their potential influence on

evolutionary rates.

59

Chapter 4: Evolutionary rates are correlated between Buchnera endosymbionts and mitochondrial genomes of their aphid hosts

4.1 Introduction

The rate of molecular evolution is a fundamental biological trait and is governed by a multitude of factors (Bromham 2009; Bromham and Penny 2003; Ho and Lo 2013). For example, substitution rates are influenced by variation in selection and population size, while the mutation rate is influenced by generation time and DNA repair mechanisms (Bromham

2009). In symbiotic organisms, evolutionary rates may be influenced by the biology of both host and symbiont. This influence is especially pronounced in strictly vertically transmitted obligate intracellular symbionts (hereafter ‘endosymbionts’), which have a highly intimate relationship with their hosts (Douglas 2010). For example, reductions in the efficacy of selection due to small host population size will potentially lead to increased fixation of deleterious mutations in both host and endosymbiont genomes. One additional factor that could influence rates of molecular evolution in endosymbionts is the dependence of these organisms on the DNA replication and repair machinery of the host, due to loss of the genes encoding these functions in the genome of the endosymbiont (Mao and Bennett 2020). The use of the same DNA replication and repair machinery by host and symbiont could lead to a correlation between the evolutionary rates of the two genomes.

In long-term associations between insect hosts and their endosymbionts, strict maternal vertical transmission and lack of horizontal transfer of symbionts leads to phylogenetic congruence between host and endosymbiont (Garrick et al. 2017; Lo et al.

2003; Moran et al. 1993). This provides an opportunity to test for correlations in evolutionary rates between the genomes of host and endosymbiont. To this end, we recently analysed branch lengths on phylogenetic trees for 55 cockroach species and their bacterial

60

endosymbiont Blattabacterium cuenoti (Arab et al. 2020). We found evidence of a correlation in evolutionary rates between host mitochondrial and endosymbiont DNA.

Although this was the first time such a correlation was reported, previous studies have found correlations in rates between nuclear DNA and either mitochondrial or chloroplast DNA (Hua et al. 2012; Lourenço et al. 2013; Sheldon et al. 2000; Sloan et al. 2012; Smith and Lee 2010;

Yan et al. 2019). Since endosymbionts are associated with an array of insect groups, the question arises, is the correlation in evolutionary rates between host mitochondrial and endosymbiont DNA specific to cockroaches? Or is it a more general phenomenon?

Plant feeding insects from the order are highly diverse, and diversification of this group has been partly attributed to the acquisition of bacterial endosymbionts in multiple lineages. Buchnera aphidicola (Gammaproteobacteria; hereafter ‘Buchnera’) is an intercellular bacterial endosymbiont that has been in an obligatory intracellular and mutualistic relationship with aphids for over 200 million years (Douglas 1998; Moran et al.

2008; Moran et al. 1993). These endosymbionts can be found within the body cavity of aphids in highly specialised cells (bacteriocytes) and are vertically transmitted to offspring

(Baumann et al. 1995; Braendle et al. 2003; Koga et al. 2012). Buchnera is found in almost all aphids, except in cases where it has been replaced with a fungal (Fukatsu and Ishikawa

1996) or another bacterial endosymbiont (Chong and Moran 2018). Genome analyses of

Buchnera have shown that they range in size from 412 to 645 kb, and have confirmed the important role of these bacteria in provisioning essential amino acids that are lacking in phloem sap, permitting aphids to tap into this resource (Baumann 2005; Moran et al. 2008).

Many host-transcribed genes responsible for amino acid biosynthesis pathways, nitrogen recycling, and transport of nutrients were found to be overexpressed in the bacteriocytes of aphids, indicating a high level of cooperation and integration of host and endosymbiont genomes (Hansen and Moran 2011; Nakabachi et al. 2005).

61

Sulcia muelleri (Flavobacteria; hereafter ‘Sulcia’) is another endosymbiont found in a number of sap-feeding insects in the hemipteran suborder Auchenorrhyncha, including a number of species of cicada, planthopper, treehopper, and spittlebug. Acquisition of Sulcia in this group is believed to have occurred ~260 Ma, making it among the most ancient examples of bacterial symbiosis (McCutcheon and Moran 2007; Moran et al. 2008; Moran et al.

2005b). The genome of Sulcia ranges in size from 191 to 277 kb, and its gene inventory indicates that Sulcia can provision 8 of the 10 essential amino acids lacking in the host’s xylem or phloem sap diets (Bennett and Moran 2013; McCutcheon and Moran 2010). For that reason, all Auchenorrhyncha infected with Sulcia have co-primary bacterial or fungal endosymbionts that are capable of provisioning the remaining two amino acids (Bennett and

Moran 2013; Mao and Bennett 2020). A transcriptomic study investigating the bacteriocytes of the glassy-winged sharpshooter and the aster leafhopper found that host genomes appear to provide support to Sulcia through enhanced gene expression, including those encoding enzymes involved in replication, transcription, and translation (Mao and Bennett 2020; Mao et al. 2018).

Buchnera and Sulcia are among the best studied insect endosymbionts, with genome sequences available for multiple taxa of each genus. The availability of such data, combined with the availability of mitochondrial genomes for their host taxa, make them ideal candidates for further investigations into whether evolutionary rates among hosts and endosymbionts are correlated. Here we infer the phylogenies of 31 Buchnera strains and 28

Sulcia strains and compare them with the mitochondrial phylogenies of their hosts. We then compare branch lengths and the rates of evolution for host-endosymbiont pairs across these phylogenies. We find a correlation in evolutionary rates between Buchnera and aphids and some evidence that evolutionary rates in Sulcia and their hosts might be correlated.

62

4.2 Methods

4.2.1 Genomic data from symbionts and their hosts

We obtained complete symbiont genomes for 31 Buchnera strains and 15 host mitochondrial

genomes from GenBank (Table S4.1). The remaining 16 Buchnera host mitochondrial

genomes were assembled using Illumina Hiseq 4000 short read data previously generated for

the sequencing of symbiont genomes, and available in GenBank (Table S4.1).We also

obtained 28 annotated whole Sulcia genomes and 28 whole and partial mitochondrial

genomes from their host taxa from Genbank (Table S4.1). To assemble the 16 aphid

mitochondrial genomes, we filtered reads using a local library of 15 previously published

aphid mitochondrial genomes as reference sequences. De novo mitochondrial genome

assembly was then performed for each sequence library using Velvet v1.2.10 (Zerbino 2010).

All mitochondrial genes were annotated using the MITOS web server (Bernt et al. 2013).

Across the 31 Buchnera genomes and 28 Sulcia genomes, we identified orthologues

using OMA v1.1.2 (Altenhoff et al. 2019). Using a custom script, we identified 240 and 120

protein-coding genes respectively shared by all Buchnera, and all Sulcia taxa. For aphid

mitochondrial gene alignments, all taxa were represented by at least 10 of 12 available

protein-coding genes. For Sulcia-host mitochondrial gene alignments, all taxa were

represented by all 12 available protein-coding genes. Outgroups were selected based on past

publications addressing that examined host or symbiont phylogenies (Chong and Moran

2018; Ortiz-Rivas and Martínez-Torres 2010; Ren et al. 2017) and/or availability of whole

mtDNA genomes or bacterial genomes on GenBank. Genome sequences of outgroups for

Buchnera alignments were obtained from GenBank and included one strains of Escherichia

coli (NC_000913.3), one Serratia marcescens (HG326223.1) and one Yersinia pestis

(NC_003143.1). Genome sequences of outgroups for Sulcia alignments were obtained from

GenBank and included the free-living Bacteroidetes species Flavobacterium gilvum 63

(CP017479), Lutibacter sp. (CP017478), Tenacibaculum dicentrarchi (CP013671). The

Daktulosphaira vitifoliae (DQ021446.1) mitochondrial genome was used as an outgroup for

aphid alignments, while Trialeurodes vaporariorum (AY521265) was used as an outgroup

for Sulcia-host alignments.

TranslatorX (Abascal et al. 2010) was used to align at the amino acid level each of the

240 and 120 orthologous genes for the Buchnera and Sulcia datasets, respectively. These

were concatenated into 249,624 bp and 128,214 bp alignments, respectively. The aphid

mitochondrial genome data set included at least 10 out 12 protein-coding genes, while the

Sulcia host data set included all protein-coding genes from each taxon. Both data sets included the genes encoding 12S rRNA (12S), 16S rRNA (16S) and tRNAs. All mitochondrial protein-coding genes were free of stop codons and indels, indicating that they were not nuclear insertions. Mitochondrial protein-coding genes were aligned using

TranslatorX, while MAFFT (Katoh and Standley 2013) was used to align 12S, 16S, and tRNAs. We then concatenated the mitochondrial sequences into 13,393 bp and 14,164 bp alignments, for aphids and Sulcia-hosts, respectively.

We tested for substitution saturation using Xia’s method in DAMBE 6 (Xia 2017; Xia

and Lemey 2009). Third codon sites in the Buchnera (NumOTU = 32, ISS = 0.712, ISS.CAsym

= 0.601) and aphid data sets were found to be saturated (ISS = 0.4808, ISS.CAsym = 0.3932)

and thus excluded from our analyses. After we excluded saturated sites, the total lengths of

the final data sets were 166,476 bp and 9825 bp (166,416 bp and 9824 bp without outgroups)

for the Buchnera and host mitochondrial alignments, respectively. No evidence of saturation

was found in the Sulcia data set, but 3rd codon sites, 12S, and 16S in the Sulcia-host

mitochondrial data set were saturated (3rd codon sites: Iss = 0.7218, Iss.cAsym = 0.5570,

12S: Iss = 0.8932, Iss.cAsym = 0.4332, 16S: Iss = 0.853, Iss.cAsym = 0.4070) and were

64

excluded from our analyses. After we excluded saturated sites, the total length of the final

Sulcia-host mitochondrial data set was 8955 bp (8224 bp without outgroups).

4.2.2 Phylogenetic analysis

RAxML v8.2 (Stamatakis 2014) was used to carry out maximum-likelihood analyses, with

1000 bootstrap replicates to estimate node support. The Buchnera data set was partitioned

into two subsets: 1st codon sites and 2nd codon sites. The aphid mitochondrial data set was

partitioned into four subsets: 1st codon sites, 2nd codon sites, rRNAs, and tRNAs. The Sulcia

data set was partitioned into three subsets: 1st codon sites, 2nd codon sites, and 3rd codon

sites. The Sulcia-host mitochondrial data set was partitioned into three subsets: 1st codon

sites, 2nd codon sites, and tRNAs. Using jModelTest v2.1.10 (Darriba et al. 2012), we

selected the GTR+G substitution model for each data set based on the Bayesian information

criterion. We used the Bayesian information criterion in ProtTest v3.4 (Darriba et al. 2011) to

select models of the translated amino acids: the CAT+CpREV model for Buchnera, the

CAT+MtART model for aphid mitochondrial DNA, the CAT+WAG model for Sulcia, and

the CAT+MtART model for Sulcia-host mitochondrial DNA.

We quantified congruence between host and endosymbiont topologies using ParaFit

using the R package ape v 5.0 (Paradis and Schliep 2018). We first created matrices of

patristic distances calculated from maximum-likelihood phylogenies of host and

endosymbiont and a host-endosymbiont association matrix. We then performed a global test

with 999 permutations, using the ParafitGlobal value and a p-value threshold of 0.05 to

determine significance.

65

4.2.3 Comparisons of evolutionary rates between hosts and endosymbionts

Root-to-tip distances from the RAxML analyses for each host and endosymbiont pair were

calculated and subjected to Pearson correlation analysis using the R packages ape, adephylo

(Jombart et al. 2010) and ggpubr (Kassambara 2018). The use of root-to-tip distances

removes the confounding effects of time, because all lineages leading to the tips of the tree

have experienced the same amount of time since evolving from their common ancestor.

However, the sharing of internal branches by groups of taxa renders these data non-

independent. Therefore, we compared branch-length differences between hosts and

endosymbionts for 15 phylogenetically independent species pairs across aphid and

endosymbiont topologies and 14 phylogenetically independent species pairs across Sulcia

host insects and endosymbiont topologies (see Figures S4.1 and S4.2). These were calculated

using a fixed topology (derived from the endosymbiont analyses described above) for each of

the following three data sets: 1) 1st+2nd codon sites of protein-coding genes; 2) translated

amino acid sequences; and 3) 1st+2nd codon positions of protein-coding genes plus the inclusion of rRNAs+tRNAs in the case of the aphid mitochondrial data set or the inclusion of

tRNAs in the case of Sulcia host mitochondrial data. Branch lengths were log transformed, and differences between pairs of hosts and pairs of endosymbionts were calculated and

compared via Pearson correlation analysis.

To check the data for potential violations of the assumptions of linear regressions, we

compared the absolute mean value of log-transformed branch lengths with the log-

transformed branch-length differences (Freckleton 2000). We found no significant correlation

between these values (R = -0.25, p = 0.39 for data from aphids; R = 0.36, p = 0.21 for data

from Buchnera; R = -0.34, p = 0.23 for data from Sulcia hosts; R = 0.15, p = 0.61 for data

from Sulcia), indicating that the data were suitable for use in our analyses. We also

performed analyses in which branch-length differences were standardised following previous

66

recommendations (Welch and Waxman 2008), to account for the potentially confounding

effects of the different amounts of time that sister-pairs have had to diverge. Three

standardisations were carried out, each based on dividing log-transformed branch-length differences by the square root of an estimate of time since divergence for the pair. In the first standardisation, time since divergence for host pairs was estimated as the average branch length of the host pair, divided by an assumed rate of 0.001 subs/site/Myr (as a reflection of the range of rates seen in eukaryotic genomes (Ho 2020)), while for corresponding endosymbionts it was estimated as the average branch length of the endosymbiont pair, divided by the same assumed rate. In the second and third standardisations, times since divergence for both endosymbionts and hosts were based either on average branch lengths of host pairs only or endosymbiont pairs only.

4.3 Results

4.3.1 Evolutionary rate comparisons between Buchnera and aphids mtDNA

In all aphid-Buchnera analyses, there was strong support for the monophyly of each aphid

subfamily and tribe (Figures 4.1, S4.3 and S4.4). The topologies inferred from the host and

endosymbiont data sets were found to be congruent (p = 0.001). In only one case did a

disagreement have strong bootstrap support in both trees: Aphis glycines was the sister taxon

of Aphis craccivora in host mitochondrial phylogenies and the sister taxon of Aphis nasturtii

in endosymbiont phylogenies. This disagreement is likely to be associated with these species

being present on the shortest terminal branch lengths in the aphid mitochondrial phylogeny,

which could lead to difficulties in resolving their phylogenetic position.

67

Figure 4.1: Congruence between (a) phylogenetic tree of host aphids inferred using maximum likelihood (RAxML) from whole mitochondrial genomes, and (b) phylogenetic tree of Buchnera inferred using maximum likelihood from 240 protein-coding genes (3rd codon sites excluded from both data sets). Shaded circles at nodes indicate bootstrap values (black = 100%, grey = 85–99%). Nodes without black or grey circles have bootstrap values <85%. Red outlines on circles indicate disagreement between the phylogenies. Colours of branches indicate different aphid subfamilies or tribes

68

We found a correlation between root-to-tip distances between aphid and Buchnera data

sets, regardless of whether or not rRNA and tRNA genes were included in the host data set (R

= 0.88, Figure 2a; R= 0.89, Figure S5, amino acid alignment: R= 0.91, Figure S4.6). A

comparison of branch lengths among phylogenetically independent pairs of host and

endosymbiont taxa based on protein-coding genes, including host rRNA and tRNA genes

(Figure S1) revealed a significant correlation between their rates of evolution (R = 0.78, p =

0.0011; Figure 4.2b). A significant correlation was also found when excluding host rRNA

and tRNA genes (R = 0.77, p = 0.013; Figure 4.2c). Equivalent analyses of branch lengths

inferred from amino acid data also revealed a significant rate correlation between host and

endosymbiont (R = 0.83, p = 0.0023; Figure 4.2d). Analyses involving standardisation of

branch-length differences yielded significant rate correlations for all data sets (R = 0.72–0.86,

p = 0.0037–9×10-5; see Figures S4.5, S4.6, and S4.7)

4.3.2 Evolutionary rate comparisons between Sulcia and host mtDNA

In all analyses of Sulcia and their host insects, there was strong support for the monophyly of

each insect subfamily, although treehoppers (Cicadellidae) were found to be paraphyletic

with respect to leafhoppers (Membracidae) (Figures 4.3, S4.8, and S4.9). The topologies inferred from all Auchenorrhyncha sap-feeding insect mitochondrial DNA and Sulcia data

sets were congruent (p = 0.001). There were no disagreements found to be highly supported

in both topologies. We found a correlation between root-to-tip distance for data sets including

tRNA genes in the host data set and protein-coding genes in the endosymbiont data set (R =

0.92, Figure 4.4). Similar results were found for data sets including only protein-coding genes

from hosts and their endosymbionts (nucleotide alignment: R= 0.9, Figure S10, amino acid

alignment: R= 0.89, Figure S4.11).

69

Figure 4.2: Comparison of evolutionary rates of Buchnera symbionts and their aphid hosts. (a) Correlation of root-to-tip distances in phylogenies of Buchnera and aphids, inferred using maximum-likelihood analysis of whole mitochondrial host genomes and 240 protein-coding Buchnera genes (3rd codon sites excluded from both data sets). (b) Correlation of log- transformed branch-length differences between phylogenetically independent pairs of host and symbiont taxa, based on whole mitochondrial aphid genomes and 240 protein-coding Buchnera genes, (c) protein-coding genes only for host mitochondrial data set (excluding 3rd codon sites from both data sets), and (d) amino acid sequences translated from protein-coding genes.

70

Figure 4.3: Congruence between (a) phylogenetic tree of Sulcia hosts inferred using maximum likelihood (RAxML) from mitochondrial protein-coding genes and tRNAs, and (b) phylogenetic tree of Sulcia inferred using maximum likelihood from 120 protein-coding genes (3rd codon sites excluded from host data set). Shaded circles at nodes indicate bootstrap values (black = 100%, grey = 85–99%). Nodes without black or grey circles have bootstrap values <85%. Red outlines on circles indicate disagreement between the phylogenies. Colours of branches indicate different Sulcia host families. Dash dotted branches are not in scale and indicate long branches that were removed to allow readability. Solid lines are in scale according to legend underneath them

71

Figure 4.4: Comparison of evolutionary rates of Sulcia symbionts and their hosts. (a) Correlation of root-to-tip distances in phylogenies of Sulcia and hosts, inferred using maximum-likelihood analysis of host mitochondrial protein-coding genes plus tRNAs and 120 Sulcia protein-coding genes, with 3rd codon sites excluded from the host data set. (b) Correlation of log-transformed branch-length differences between phylogenetically independent pairs of host and symbiont taxa, based on mitochondrial protein-coding genes (3rd codon sites excluded from both data sets) plus tRNAs for the host data set, (c) protein- coding genes only for host mitochondrial data set (excluding 3rd codon sites from both data sets), and (d) amino acid sequences translated from protein-coding genes

72

A comparison of branch lengths among phylogenetically independent pairs of host and

endosymbiont taxa based on protein-coding genes, including host tRNA genes (Figure S4.2)

revealed a significant correlation between their rates of evolution (R = 0.62, p = 0.017; Figure

4.4b). However, a non-significant correlation was found when excluding host tRNAs (R =

0.49, p = 0.073; Figure 4.4c). Equivalent analyses of branch lengths inferred from amino acid

data also revealed a weak non-significant rate correlation between host and endosymbiont (R

= 0.49, p = 0.074; Figure 4.4d). Analyses involving standardisation of branch-length

differences yielded non-significant, weak rate correlations for all data sets (R = 0.054–0.35, p

= 0.17–0.85; see Figures S4.10, S4.11 and S4.12).

4.4 Discussion

We have detected a correlation in molecular evolutionary rates between Buchnera and host

mitochondrial genomes, using two different methods of analysis. A correlation in molecular

evolutionary rates was also detected between Sulcia and host mitochondrial genomes using

the same methods, although it was not statistically significant. Our previous work found

similar correlations in the evolutionary rates of cockroach mitochondrial genomes and the

genomes of their endosymbionts Blattabacterium (Arab et al. 2020). Other studies have

found a correlation in evolutionary rates between nuclear and mitochondrial genes in various

(Lourenço et al. 2013; Martin 1999; Sheldon et al. 2000; Yan et al. 2019), between mitochondrial and plastid

73

genes in angiosperms (Sloan et al. 2012), and between nuclear, mitochondrial and plastid

genes in algae (Hua et al. 2012; Smith and Lee 2010).

A relationship between host and endosymbiont substitution rates could be explained

by similar forces acting on their underlying mutation rates. One way in which this could

occur is if endosymbiont DNA replication depends on the host’s DNA replication and repair

machinery (Mao et al. 2018). Some Buchnera genomes show a reduction in the number of

repair and replication genes, which suggests that the endosymbiont might be dependent on

host DNA repair and replication machinery, potentially leading to similarities in their

mutation rates (Moran and Bennett 2014; Silva et al. 2001). Correlations in evolutionary rates could also potentially be the result of interactions between host mitochondrial and endosymbiont proteins, as has been found between insect mitochondrial-encoded and nuclear-encoded OXPHOS genes associated with mitochondrial function (Yan et al. 2019).

There is evidence of close physical and metabolic association between Buchnera and mitochondria. Electron microscopy studies indicate a dense number of mitochondria in aphid bacteriocytes (Griffiths and Beck 1973; Hinde 1971). A transcriptomic study of pea aphid

(Acyrthosiphon pisum) found an upregulation of several genes for mitochondria-related transporters in bacteriocytes, suggesting cooperative metabolic interactions between

Buchnera and mitochondria (Nakabachi et al. 2005). A close physical and metabolic association between Buchnera and mitochondria indicates that changes affecting mitochondrial genome evolution could impact the genome of the endosymbiont, possibly explaining to some degree the correlation in molecular rate between the two. The extent of interactions between host and Buchnera proteins, however, remains to be investigated.

Some characteristics of host biology can also lead to correlations of evolutionary rates between host and symbiont genomes. For example, elevated host and symbiont evolutionary rates can potentially be caused by short host generation times (Bromham 2009). This could

74

occur if increased rates of symbiont replication are associated with host reproduction

(Braendle et al. 2003; Wilkinson et al. 2003), and generation times varied among hosts.

Increased transmission bottlenecks of both mitochondria and symbionts could also explain the rate correlations that we have observed. Certain species of aphids, such as Uroleucon ambrosiae (Funk et al. 2001), are known to experience severe and repeated bottlenecking.

A significant correlation was not consistently detected when comparing evolutionary rates between Sulcia and host mitochondrial genomes. A significant correlation was detected in a single analysis comparing evolutionary rates between Sulcia and host mitochondrial genes including tRNAs. However, this significance was lost when excluding tRNAs and all standardisation tests showed the correlation between Sulcia and host is not significant with the inclusion of tRNAs. The reason for this loss of significance is unclear but may be associated with a loss in signal associated with the lower amounts of data used. This result contrasted with those of analyses involving the genomes of Blattabacterium, Buchnera and the mitochondrial genomes of their hosts. One explanation for this result could be the number of samples included in the data set from different host families. Taxa that were included in these analyses were limited to members of only four families of insects, with 22 out of 28 species belonging to Cicadidae. Additionally, those 22 samples came from one geographical location, Japan. Sulcia is known to have a relatively slow evolutionary rate, and accordingly short branch lengths were recovered in phylogenetic analyses of this endosymbiont. The slow evolutionary rate of Sulcia may have led to the non-significance of the correlation. Future studies on evolutionary rate correlations between Sulcia and host mitochondria should include more representatives of Auchenorrhyncha that contain Sulcia.

Buchnera and Sulcia are vertically transmitted, obligate intercellular mutualistic endosymbionts, whose phylogeny is expected to mirror that of their hosts. This is expected for phylogenies inferred from mitochondrial DNA, since mitochondria are linked with the

75

endosymbionts through vertical transfer to offspring through the egg cytoplasm. As observed in previous studies (Clark et al. 2000; Nováková et al. 2013; Urban and Cryan 2012), we found a high level of agreement between the topologies inferred from aphid mitochondrial and Buchnera genome data sets and the same was found for host mitochondrial and Sulcia genome data sets. In the case of aphids and Buchnera, however, we observed disagreements between some well-supported relationships. The variability in rates observed between some lineages, coupled with the highly increased rate of mitochondrial DNA compared with

Buchnera DNA and very short terminal branches of some taxa on host mitochondrial phylogenies, could be responsible for these disagreements. Regarding insect host mitochondrial and Sulcia genome data sets, some disagreements were observed between relationships with low support in either topology. This could be explained by the highly increased rate of mitochondrial DNA compared with Sulcia DNA.

In conclusion, our results for the aphid-Buchnera data set highlight the significant

effects that long-term symbiosis can have on the biology of each symbiotic partner. The rate

of evolution is a fundamental characteristic of any species; our study indicates that it can

become closely linked between organisms as a result of symbiosis. Detecting correlations in

evolutionary rates between hosts and endosymbionts can highlight forces impacting the

evolution of both organisms. We provide evidence that these correlations extend to multiple

groups of insects and their endosymbionts. To further our understanding of the extent of these

correlations, future work is required to determine whether the correlation that we have found

here also applies to the nuclear genome of the host. Additionally, analyses performed in this

work should be applied to different groups of hosts and their endosymbionts to assess how

widespread these correlations are between the genomes of the two. On the other hand, our results for the Sulcia endosymbiont and the mitochondrial genomes of its hosts did not show

76

a consistent significant correlation in evolutionary rates. Further studies should examine whether this result is due to limitations in the data set used.

77

Chapter 5: General Discussion

5.1 Thesis summary

The studies in this thesis have focused on the phylogenetics and molecular evolution of cockroaches, aphids, Auchenorrhyncha species and their respective primary endosymbionts.

Using host mitochondrial genomes and a large number of endosymbiont genes, this thesis has demonstrated the impact of over 200 million years of mutualistic endosymbiosis on the molecular evolution of both host and endosymbiont. The major findings of this thesis include:

1) showing the power primary endosymbiont genes possess for resolving deep host phylogenetic relationships; 2) demonstrating the high levels of phylogenetic congruence between insect host mitochondrial genomes with genomes of their primary endosymbionts, and 3) the first evidence that molecular evolutionary rates of insect host mitochondrial genomes correlate with those of their primary endosymbionts, based on evidence from cockroaches and their primary endosymbiont Blattabacterium and aphids and their endosymbiont Buchnera.

The first two chapters focus on cockroaches and their primary endosymbiont Blattabacterium cuenoti. In Chapter 2, I attempt to resolve the phylogeny of cockroaches using the most comprehensive datasets (at the time the study was conducted) based on host whole mitochondrial genomes and 104 Blattabacterium genes. For this chapter, 115 cockroach mitochondrial genomes and genes from 48 Blattabacterium strains were sequenced. I applied maximum likelihood and Bayesian phylogenetic methods and applied fossil calibrations to estimate the timescale of cockroach evolution. Phylogenies produced in this chapter confirm relationships between all major cockroach families. An aim of Chapter 2 was determining the sister group of the monophyletic clade which includes the sub-social family Cryptocercidae

78

and the eusocial termites, which has the potential for shedding light on how sociality evolved in cockroaches. Our results could not determine a single lineage as the sister group, prompting the need for further sampling of potential candidate groups.

In Chapter 3 I tested for phylogenetic congruence and compared molecular evolutionary rates

of cockroach whole mitochondrial genomes and 104 genes from 55 Blattabacterium strains

and their hosts using two methods: root-to-tip distances and comparisons of the branch lengths of phylogenetically independent species pairs. These analyses revealed high levels of phylogenetic congruence between the cockroach mitochondrial and Blattabacterium

genomes. They also provided the first evidence of correlation in evolutionary rates between

host insect mitochondrial genomes and the genome of their endosymbiont. This correlation

indicates that the genomes of the symbiotic partners are evolving under the same

evolutionary forces.

In Chapter 4 I attempted to test whether the correlation of molecular evolutionary rates found

between cockroach mitochondrial genomes and their endosymbiont is specific to the group or

is a more widespread phenomenon between insects and their primary endosymbionts. For this

chapter, I analysed whole mitochondrial genomes of aphids and 240 genes of their

endosymbiont Buchnera aphidicola, in addition to whole mitochondrial genomes of species

from the suborder Auchenorrhyncha and 120 genes of their endosymbiont Sulcia muelleri. As

the case with cockroaches and their endosymbionts, I found high levels of phylogenetic

congruence between the phylogenies of the two. Correlation of molecular evolutionary rates

was significant between aphids mitochondrial and Buchnera genomes. However, it was not

significant when comparing Auchenorrhyncha species and their mitochondrial and Sulcia

genomes. This may be due to issues with sampling the Auchenorrhyncha dataset and/or the

slow evolutionary rate found in Sulcia, but future work is needed to confirm this. Overall, the

79

results of this chapter further confirm the impact that ancient endosymbiosis has on the evolution of the genomes of both host and symbiont.

5.2 Future directions

5.2.1 Phylogenetics of Blattodea

DNA sequencing and analysis have been a driving force in resolving major relationships in the tree of life. DNA sequencing costs are reducing dramatically and methods improving the speed, cost and efficiency of genome sequencing are being developed rapidly (Goodwin et al.

2016). The use of genome scale data are allowing a better understanding of phylogenetic

relationships between taxonomic groups. Our understanding of phylogenetic relationships

among members of the Blattodea could benefit from further work regarding the relationships

of major groups. To date, the results of chapter 2 contain the largest cockroach whole

mitochondrial DNA and Blattabacterium genes datasets. The transcriptomics dataset from

Evangelista et al. (2019) is the most comprehensive based on nuclear genes. Evangelista et al. (2019) found the family Lamproblattidae to be the sister to Cryptocercidae + termites, however, their study did not include representatives of Anaplectidae. In our study (chapter 2),

Lamproblattidae was sister to members of the family Anaplectidae. This result is novel and confirming this relationship should be an aim of future studies. Future studies using transcriptomics should include representatives of both Anaplectidae and Lamproblattidae to confirm or reject the findings of chapter 2.

The age of the Blattodea is highly debated in the literature, with its origin in the middle

Jurassic or at least 170 My old (Evangelista et al. 2019; Legendre et al. 2015; Misof et al.

2014; Tong et al. 2015; Wang et al. 2017). Divergence time estimates obtained from

cockroach mtDNA and Blattabacterium genes in chapter 2 puts the mean estimate of the

80

origin of Blattodea at ~230 Ma, which is 30 My younger than the mean estimate in the literature by (Evangelista et al. 2019), although their 95% confidence intervals largely

overlap. It is important to note here that the oldest modern cockroach is dated back to the

Cretaceous (Grimaldi and Engel 2005; Lin 1980), although some controversy surrounds the

dating of some cockroach fossils (Evangelista et al. 2017). This creates a considerable gap in the knowledge of the timeframe of Blattodea evolution. The phylogeny of Blattodea could benefit from developing more robust molecular datasets coupled with robust morphological datasets and reliable fossil calibrations that take into account misdiagnoses (Evangelista et al.

2017). Fossil calibrations should be selected using the best practices (Parham et al. 2012), with priors selected to reflect a suitable amount of uncertainty in the age of the nodes (Tong et al. 2015).

5.2.2 Host and endosymbiont evolutionary rates

Based on the results of chapters 3 and 4, we now have evidence of similar forces shaping the evolution of insect hosts and endosymbionts. I have detected correlations of molecular evolutionary rates between insect host mitochondrial and endosymbionts genomes. Although only demonstrated with significance in cockroaches, aphids and their respective primary endosymbionts, it is possible that further investigation will uncover that it is generally a trait of long-term endosymbiosis. Future studies should focus on sampling insect mitochondrial genomes and comparing their rates to the rates of their primary endosymbionts. Furthermore, since the correlations of molecular evolutionary rates between host and endosymbionts appear to be genome-wide, future studies should investigate whether these correlations extend to the nuclear genome of the hosts.

To further explain why a significant correlation in evolutionary rates was not detected between Sulcia and their hosts in chapter 4, it is important to consider the biology and evolutionary history of Sulcia. It is believed that the ancestor of Auchenorrhyncha acquired 81

Sulcia 260 Ma, making it the oldest known insect endosymbiont. Genomes of Sulcia are highly reduced and are among the smallest of all known insect endosymbionts. These tiny genomes lack genes responsible for cellular information processing mechanisms, which are essential for DNA repair, replication, and translation. Sulcia is always found in hosts with at least one other co-primary endosymbiont (Bennett and Moran 2013). Co-primary endosymbionts have been lost and replaced multiple times in Auchenorrhyncha. For example,

Candidatus Nasuia deltocephalinicola (hereafter Nasuia) is believed to be an ancient endosymbiont that probably infected the ancestor of Auchenorrhyncha (Bennett and Moran

2013), while Candidatus Baumannia cicadellinicola (hereafter Baumannia) is believed to have replaced Nasuia more recently in the ancestor of sharpshooters (McCutcheon and

Moran 2007).

A recent study found a lack of overexpression of several host repair and replication

genes that might replace these functions in the endosymbiont in the glassy-winged sharpshooter (Homalodisca vitripennis) (Mao and Bennett 2020). An alternative source that

may assist Sulcia with these functions is their co-primary endosymbionts. Genomes of co-

primary endosymbionts that are found in Sulcia hosts vary in size. Nasuia has the smallest

known genome of any insect endosymbiont (Bennett and Moran 2013), while Baumannia has

a relatively large genome (McCutcheon and Moran 2007). A transcriptomic study found an

overexpression of host genes in the bacteriocytes of the aster leafhopper (Macrosteles

quadrilineatus) that are believed to provide support to Sulcia and Nasuia by performing

functions such as replication, transcription, and translation (Mao et al. 2018). However, while

investigating glassy-winged sharpshooter bacteriocytes containing Sulcia, Mao and Bennett

(2020) found that several genes that are responsible for DNA replication and repair were missing. An analysis of the genome of Baumannia found large numbers of repair and replication genes in comparison with other known insect endosymbionts (Wu et al. 2006). In

82

this case, Sulcia might depend on Baumannia instead of the host’s for DNA replication and

repair in the glassy-winged sharpshooter, but it is unknown whether this occurs between

insect co-primary endosymbionts.

The non-significant correlation between Sulcia and host mitochondrial genomes evolutionary rates in chapter 4 could be due to representative hosts having co-primary

endosymbionts with relatively large genomes within Sulcia hosts. Sixteen out of 28 species

examined in chapter 4 harbour a co-primary endosymbiont with a large genome; in addition

to Sulcia, 14 cicada species contain a yeast-like endosymbiont (25.1 Mb) (Matsuura et al.

2018), while H. vitripennis harbours Baumannia (686 kb) (Wu et al. 2006) and Philaenus spumarius harbours an endosymbiont closely related to Sodalis glossinidius (4.1 Mb) (Koga and Moran 2014). This indicates an overrepresentation of Sulcia hosts with recent co-primary endosymbionts with relatively large genomes that presumably retain function. If co-primary endosymbionts with complete DNA repair and replication machinery were found to provide those essential replication and repair services to Sulcia, then this lack of dependence on host

DNA repair and replication machinery could influence the evolutionary rate correlation between host and endosymbiont genomes. We now know that hosts of Sulcia have evolved tailored mechanisms to support each co-primary endosymbiont. To that end, future studies should investigate correlations of evolutionary rates between genomes of Sulcia, co-primary endosymbionts with large genomes, and their hosts.

83

References

Abascal F, Zardoya R, Telford MJ (2010) TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic acids research 38 (suppl_2):W7–W13 Abe T, Bignell DE, Higashi M, Higashi T, Abe Y (2000) Termites: evolution, sociality, symbioses, ecology. Springer Science & Business Media Aberer AJ, Kobert K, Stamatakis A (2014) ExaBayes: massively parallel Bayesian tree inference for the whole-genome era. Molecular Biology and Evolution 31 (10):2553– 2556 Altenhoff AM, Levy J, Zarowiecki M, Tomiczek B, Vesztrocy AW, Dalquen DA, Müller S, Telford MJ, Glover NM, Dylus D (2019) OMA standalone: orthology inference among public and custom genomes and transcriptomes. Genome Research 29 (7):1152–1163 Andersen JC, Wu J, Gruwell ME, Gwiazdowski R, Santana SE, Feliciano NM, Morse GE, Normark BB (2010) A phylogenetic analysis of armored scale insects (Hemiptera: Diaspididae), based upon nuclear, mitochondrial, and endosymbiont gene sequences. and Evolution 57 (3):992–1003 Arab DA, Bourguignon T, Wang Z, Ho SY, Lo N (2020) Evolutionary rates are correlated between cockroach symbionts and mitochondrial genomes. Biology Letters 16 (1):20190702 Bartosch‐Harlid A, Berlin S, Smith NG, Mosller AP, Ellegren H (2003) Life history and the male mutation bias. Evolution 57 (10):2398–2406 Baumann L, Baumann P (2005) Cospeciation between the primary endosymbionts of mealybugs and their hosts. Current Microbiology 50 (2):84–87 Baumann P (2005) Biology of bacteriocyte-associated endosymbionts of plant sap- sucking insects. Annu. Rev. Microbiol. 59:155–189 Baumann P, Baumann L, Lai C-Y, Rouhbakhsh D, Moran NA, Clark MA (1995) Genetics, physiology, and evolutionary relationships of the genus Buchnera: intracellular symbionts of aphids. Annual Review of Microbiology 49 (1):55–94 Baumann P, Moran NA (1997) Non-cultivable from symbiotic associations of insects and other hosts. Antonie van Leeuwenhoek 72 (1):39–48

84

Beccaloni G, Eggleton P (2013) Order Blattodea. In: Zhang, Z.-Q.(Ed.) Animal Biodiversity: An Outline of Higher-level Classification and Survey of Taxonomic Richness (Addenda 2013). Zootaxa 3703 (1):46–48 Beccaloni GW (2014) Blattodea species file online., http://cockroach.speciesfile.org/HomePage/Cockroach/HomePage.aspx Bell WJ, Roth LM, Nalepa CA (2007) Cockroaches: ecology, behavior, and natural history. JHU Press Bennett GM, Moran NA (2013) Small, smaller, smallest: the origins and evolution of ancient dual symbioses in a phloem-feeding insect. Genome Biology and Evolution 5 (9):1675–1688 Bernt M, Donath A, Jühling F, Externbrink F, Florentz C, Fritzsch G, Pütz J, Middendorf M, Stadler PF (2013) MITOS: improved de novo metazoan mitochondrial genome annotation. Molecular Phylogenetics and Evolution 69 (2):313–319 Blochmann F (1892) Uber das Vorkommen bakterienahnlicher Gebilden in den Geweben und Elem verschiedener Insekten. Zbl Bakteriol 11:234–240 Boesch P, Weber-Lotfi F, Ibrahim N, Tarasenko V, Cosset A, Paulus F, Lightowlers RN, Dietrich A (2011) DNA repair in organelles: pathways, organization, regulation, relevance in disease and aging. Biochimica et Biophysica Acta (BBA)-Molecular Cell Research 1813 (1):186–200 Bourguignon T, Kinjo Y, Villa-Martín P, Coleman NV, Tang Q, Arab DA, Wang Z, Tokuda G, Hongoh Y, Ohkuma M (2020) Increased mutation rate is linked to genome reduction in prokaryotes. Current Biology 30 (19):3848–3855. e3844 Bourguignon T, Lo N, Cameron SL, Šobotník J, Hayashi Y, Shigenobu S, Watanabe D, Roisin Y, Miura T, Evans TA (2014) The evolutionary history of termites as inferred from 66 mitochondrial genomes. Molecular Biology and Evolution 32 (2):406–421 Bourguignon T, Lo N, Šobotník J, Ho SY, Iqbal N, Coissac E, Lee M, Jendryka MM, Sillam-Dussès D, Křížková B (2017) Mitochondrial phylogenomics resolves the global spread of higher termites, ecosystem engineers of the tropics. Molecular Biology and Evolution 34 (3):589–597 Bourguignon T, Lo N, Šobotník J, Sillam-Dussès D, Roisin Y, Evans TA (2016) Oceanic dispersal, vicariance and human introduction shaped the modern distribution of the termites Reticulitermes, Heterotermes and Coptotermes. Proceedings of the Royal Society B: Biological Sciences 283 (1827):20160179

85

Bourguignon T, Tang Q, Ho SYW, Juna F, Wang Z, Arab DA, Cameron SL, Walker J, Rentz D, Evans TA, Lo N (2018) Transoceanic dispersal and plate tectonics shaped global cockroach distributions: evidence from mitochondrial phylogenomics. Molecular Biology and Evolution 35 (4):970–983 Braendle C, Miura T, Bickel R, Shingleton AW, Kambhampati S, Stern DL (2003) Developmental origin and evolution of bacteriocytes in the aphid–Buchnera symbiosis. PLoS Biol 1 (1):e21 Bromham L (2009) Why do species vary in their rate of molecular evolution? Biology letters 5 (3):401–404 Bromham L, Penny D (2003) The modern molecular clock. Nature Reviews Genetics 4 (3):216–224 Bromham L, Rambaut A, Harvey PH (1996) Determinants of rate variation in mammalian DNA sequence evolution. Journal of Molecular Evolution 43 (6):610–621 Brooks MA, Richards AG (1955) Intracellular symbiosis in cockroaches. I. Production of aposymbiotic cockroaches. The Biological Bulletin 109 (1):22–39 Buchner P (1965) Endosymbiosis of animals with plant microorganisms. Interscience, Inc, New York Cameron SL (2014) Insect mitochondrial genomics: implications for evolution and phylogeny. Annual Review of Entomology 59:95–117 Cameron SL, Barker SC, Whiting MF (2006) Mitochondrial genomics and the new insect order Mantophasmatodea. Molecular Phylogenetics and Evolution 38 (1):274–279 Cameron SL, Lo N, Bourguignon T, Svenson GJ, Evans TA (2012) A mitochondrial genome phylogeny of termites (Blattodea: Termitoidae): robust support for interfamilial relationships and molecular synapomorphies define major clades. Molecular Phylogenetics and Evolution 65 (1):163–173 Cameron SL, Whiting MF (2007) Mitochondrial genomic comparisons of the subterranean termites from the Genus Reticulitermes (Insecta: Isoptera: ). Genome 50 (2):188–202 Carpenter F (1966) The Lower insects of Kansas. Part II. The orders Protorthoptera and . Psyche: A Journal of Entomology 73 (1):46–88 Cavaliere M, Feng S, Soyer OS, Jiménez JI (2017) Cooperation in microbial communities and their biotechnological applications. Environmental Microbiology 19 (8):2949–2963

86

Chen A-H (2013) Complete mitochondrial genome of the double-striped cockroach Blattella bisignata (Insecta: Blattaria: Blaberoidea). Mitochondrial DNA 24 (1):14–16 Chen X, Li S, Aksoy S (1999) Concordant evolution of a symbiont with its host insect species: molecular phylogeny of genus Glossina and its bacteriome-associated endosymbiont, Wigglesworthia glossinidia. Journal of Molecular Evolution 48 (1):49–58 Chiel E, Gottlieb Y, Zchori-Fein E, Mozes-Daube N, Katzir N, Inbar M, Ghanim M (2007) Biotype-dependent secondary symbiont communities in sympatric populations of Bemisia tabaci. Bulletin of Entomological Research 97 (4):407 Chong RA, Moran NA (2018) Evolutionary loss and replacement of Buchnera, the obligate endosymbiont of aphids. The ISME Journal 12 (3):898–908 Clark JW, Hossain S, Burnside CA, Kambhampati S (2001) Coevolution between a cockroach and its bacterial endosymbiont: a biogeographical perspective. Proceedings of the Royal Society B 268 (1465):393–398 Clark JW, Kambhampati S (2003) Phylogenetic analysis of Blattabacterium, endosymbiotic bacteria from the wood roach, Cryptocercus (Blattodea: Cryptocercidae), including a description of three new species. Molecular Phylogenetics and Evolution 26 (1):82–88 Clark MA, Moran NA, Baumann P, Wernegreen JJ (2000) Cospeciation between bacterial endosymbionts (Buchnera) and a recent radiation of aphids (Uroleucon) and pitfalls of testing for phylogenetic congruence. Evolution 54 (2):517–525 Clayton AL, Oakeson KF, Gutin M, Pontes A, Dunn DM, von Niederhausern AC, Weiss RB, Fisher M, Dale C (2012) A novel human-infection-derived bacterium provides insights into the evolutionary origins of mutualistic insect–bacterial symbioses. PLoS Genet 8 (11):e1002990 Cochran DG (2009) Blattodea:(cockroaches) Encyclopedia of insects. Elsevier, pp. 108–112 Cockerell T (1920) XXXVI.—Fossil in the British Museum.—I. Annals and Magazine of Natural History 5 (27):273–279 Cornwell PB (1968) The Cockroach. Hutchinson, London Darriba D, Taboada GL, Doallo R, Posada D (2011) ProtTest 3: fast selection of best- fit models of protein evolution. Bioinformatics 27 (8):1164–1165 Darriba D, Taboada GL, Doallo R, Posada D (2012) jModelTest 2: more models, new heuristics and parallel computing. Nature Methods 9 (8):772

87

Dasch G (1984) Endosymbionts of insects. Bergey's Manual of Systematic Bacteriology 1 Davies TJ, Savolainen V, Chase MW, Moat J, Barraclough TG (2004) Environmental energy and evolutionary rates in flowering plants. Proceedings of the Royal Society of London. Series B: Biological Sciences 271 (1553):2195–2200 de Moraes LA, Muller C, de Freitas Bueno RCO, Santos A, Bello VH, De Marchi BR, Watanabe LFM, Marubayashi JM, Santos BR, Yuki VA (2018) Distribution and phylogenetics of whiteflies and their endosymbiont relationships after the Mediterranean species invasion in Brazil. Scientific Reports 8 (1):1–13 Degnan PH, Lazarus AB, Brock CD, Wernegreen JJ (2004) Host–symbiont stability and fast evolutionary rates in an ant–bacterium association: cospeciation of Camponotus species and their endosymbionts, Candidatus Blochmannia. Systematic Biology 53 (1):95– 110 Degnan PH, Yu Y, Sisneros N, Wing RA, Moran NA (2009) Hamiltonella defensa, genome evolution of protective bacterial endosymbiont from pathogenic ancestors. Proceedings of the National Academy of Sciences 106 (22):9063–9068 Dickerson RE (1971) The structure of cytochromec and the rates of molecular evolution. Journal of Molecular Evolution 1 (1):26–45 Djernæs M, Klass K-D, Eggleton P (2015) Identifying possible sister groups of Cryptocercidae+ Isoptera: A combined molecular and morphological phylogeny of Dictyoptera. Molecular Phylogenetics and Evolution 84:284–303 Djernaes M, KLASS KD, Picker MD, Damgaard J (2012) Phylogeny of cockroaches (Insecta, Dictyoptera, Blattodea), with placement of aberrant taxa and exploration of out‐ group sampling. Systematic Entomology 37 (1):65–83 Donnellan J, Kilby B (1967) Uric acid metabolism by symbiotic bacteria from the fat body of Periplaneta americana. Comparative Biochemistry and Physiology 22 (1):235–252 Douglas A (1989) Mycetocyte symbiosis in insects. Biological Reviews 64 (4):409– 434 Douglas A (1998) Nutritional interactions in insect-microbial symbioses: aphids and their symbiotic bacteria Buchnera. Annual Review of Entomology 43 (1):17–37 Douglas AE (2010) The Symbiotic Habit. Princeton University Press, Princeton, NJ Drake JW, Charlesworth B, Charlesworth D, Crow JF (1998) Rates of spontaneous mutation. Genetics 148 (4):1667–1686

88

Drummond AJ, Suchard MA, Xie D, Rambaut A (2012) Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular Biology and Evolution 29 (8):1969–1973 Duncan RP, Husnik F, Van Leuven JT, Gilbert DG, Dávalos LM, McCutcheon JP, Wilson AC (2014) Dynamic recruitment of amino acid transporters to the insect/symbiont interface. Molecular Ecology 23 (6):1608–1623 Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5 (1):113 Edgecombe GD (2010) phylogeny: an overview from the perspectives of , molecular data and the fossil record. Arthropod Structure & Development 39 (2-3):74–87 Emerson AE (1971) Tertiary fossil species of the Rhinotermitidae (Isoptera), phylogeny of genera, and reciprocal phylogeny of associated Flagellata (Protozoa) and the Staphylinidae (Coleoptera). . Bulletin of the AMNH 146 Engel MS, Grimaldi DA, Krishna K (2007) Primitive termites from the early Cretaceous of Asia (Isoptera). Staatliches Museum für Naturkunde Engel MS, Grimaldi DA, Nascimbene PC, Singh H (2011) The termites of Early Eocene Cambay amber, with the earliest record of the (Isoptera). ZooKeys(148):105 Evangelista DA, Djernaes M, Kohli MK (2017) Fossil calibrations for the cockroach phylogeny (Insecta, Dictyoptera, Blattodea), comments on the use of wings for their identification, and a redescription of the oldest Blaberidae. Palaeontologia Electronica 20 (1FC):1–23 Evangelista DA, Wipfler B, Béthoux O, Donath A, Fujita M, Kohli MK, Legendre F, Liu S, Machida R, Misof B, Peters RS (2019) An integrative phylogenomic approach illuminates the evolutionary history of cockroaches and termites (Blattodea). Proc. R. Soc. B 286 (1895):20182076 Fares MA, Ruiz-González MX, Moya A, Elena SF, Barrio E (2002) GroEL buffers against deleterious mutations. Nature 417 (6887):398–398 Fox G, Stackebrandt E, Hespell R, Gibson J, Maniloff J, Dyer T, Wolfe R, Balch W, Tanner R, Magrum L (1980) The phylogeny of prokaryotes. Science 209 (4455):457–463 Freckleton RP (2000) Phylogenetic tests of ecological and evolutionary hypotheses: checking for phylogenetic independence. Functional Ecology 14 (1):129–134

89

Fromont C, Riegler M, Cook JM (2016) Phylogeographic analyses of bacterial endosymbionts in fig homotomids (Hemiptera: Psylloidea) reveal codiversification of both primary and secondary endosymbionts. FEMS Microbiology Ecology 92 (12) Fukatsu T, Ishikawa H (1996) Phylogenetic position of yeast-like symbiont of Hamiltonaphis styraci (Homoptera, ) based on 18S rDNA sequence. Insect Biochemistry and Molecular Biology 26 (4):383–388 Fukatsu T, Nikoh N, Kawai R, Koga R (2000) The secondary endosymbiotic bacterium of the pea aphid Acyrthosiphon pisum (Insecta: Homoptera). Applied and Environmental Microbiology 66 (7):2748–2758 Funk DJ, Wernegreen JJ, Moran NA (2001) Intraspecific variation in symbiont genomes: bottlenecks and the aphid-Buchnera association. Genetics 157 (2):477–489 Garrick RC, Sabree ZL, Jahnes BC, Oliver JC (2017) Strong spatial‐genetic congruence between a wood‐feeding cockroach and its bacterial endosymbiont, across a topographically complex landscape. Journal of Biogeography 44 (7):1500–1511 Garwood R, Sutton M (2010) X-ray micro-tomography of Carboniferous stem- Dictyoptera: new insights into early insects. Biology Letters 6 (5):699–702 Gier HT (1936) The morphology and behavior of the intracellular bacteroids of roaches. The Biological Bulletin 71 (3):433–452 Goodman M, Porter CA, Czelusniak J, Page SL, Schneider H, Shoshani J, Gunnell G, Groves CP (1998) Toward a phylogenetic classification of primates based on DNA evidence complemented by fossil evidence. Molecular Phylogenetics and Evolution 9 (3):585–598 Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of next- generation sequencing technologies. Nature Reviews Genetics 17 (6):333 Grandcolas P (1996) The phylogeny of cockroach families: a cladistic appraisal of morpho-anatomical data. Canadian Journal of Zoology 74 (3):508–527 Griffiths GW, Beck SD (1973) Intracellular symbiotes of the pea aphid, Acyrthosiphon pisum. Journal of Insect Physiology 19 (1):75–84 Griffiths GW, Beck SD (1974) Effects of antibiotics on intracellular symbiotes in the pea aphid, Acyrthosiphon pisum. Cell and Tissue Research 148 (3):287–300 Grimaldi D, Engel MS (2005) Evolution of the Insects. Cambridge University Press, Cambridge, UK Grimaldi D, Engel MS, Engel MS, Engel MS (2005) Evolution of the Insects. Cambridge University Press

90

Grimaldi DA, Engel MS, Krishna K (2008) The species of Isoptera (Insecta) from the early Cretaceous Crato Formation: a revision. American Museum Novitates 2008 (3626):1– 30 Hall AA, Morrow JL, Fromont C, Steinbauer MJ, Taylor GS, Johnson SN, Cook JM, Riegler M (2016) Codivergence of the primary bacterial endosymbiont of psyllids versus host switches and replacement of their secondary bacterial endosymbionts. Environmental Microbiology 18 (8):2591–2603 Hansen AK, Moran NA (2011) Aphid genome expression reveals host–symbiont cooperation in the production of amino acids. Proceedings of the National Academy of Sciences 108 (7):2849–2854 Haynes S, Darby A, Daniell T, Webster G, Van Veen F, Godfray H, Prosser JI, Douglas A (2003) Diversity of bacteria associated with natural aphid populations. Applied and Environmental Microbiology 69 (12):7216–7223 Hedges SB, Dudley J, Kumar S (2006) TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics 22 (23):2971–2972 Heyworth ER, Ferrari J (2016) Heat stress affects facultative symbiont-mediated protection from a parasitoid wasp. PLOS ONE 11 (11):e0167180 Hinde R (1971) The control of the mycetome symbiotes of the aphids Brevicoryne brassicae, Myzus persicae, and Macrosiphum rosae. Journal of Insect Physiology 17 (9):1791–1800 Hinrich J, Schulenburg GV, Hurst GD, Tetzlaff D, Booth GE, Zakharov IA, Majerus ME (2002) History of infection with different male-killing bacteria in the two-spot ladybird Adalia bipunctata revealed through mitochondrial DNA sequence analysis. Genetics 160 (3):1075-1086 Ho SY (2020) The Molecular Clock and Evolutionary Rates Across the Tree of Life The Molecular Evolutionary Clock. Springer, pp. 3–23 Ho SYW, Lo N (2013) The insect molecular clock. Australian Journal of Entomology 52 (2):101–105 Houk E, and, Griffiths GW (1980) Intracellular symbiotes of the Homoptera. Annual Review of Entomology 25 (1):161–187 Hua J, Smith DR, Borza T, Lee RW (2012) Similar relative mutation rates in the three genetic compartments of Mesostigma and Chlamydomonas. Protist 163 (1):105–115

91

Huang CY, Sabree ZL, Moran NA (2012) Genome sequence of Blattabacterium sp. strain BGIGA, endosymbiont of the giganteus cockroach. Journal of Bacteriology 194 (16):4450–4451 Huang M, Wang Y, Liu X, Li W, Kang Z, Wang K, Li X, Yang D (2015) The complete mitochondrial genome and its remarkable secondary structure for a stonefly Acroneuria hainana Wu (Insecta: , Perlidae). Gene 557 (1):52–60 Husnik F, Nikoh N, Koga R, Ross L, Duncan RP, Fujie M, Tanaka M, Satoh N, Bachtrog D, Wilson AC (2013) Horizontal gene transfer from diverse bacteria to an insect genome enables a tripartite nested mealybug symbiosis. Cell 153 (7):1567–1578 Inoue T, Kitade O, Yoshimura T, Yamaoka I (2000) Symbiotic associations with protists Termites: evolution, sociality, symbioses, ecology. Springer, pp. 275–288 Inward D, Beccaloni G, Eggleton P (2007) Death of an order: a comprehensive molecular phylogenetic study confirms that termites are eusocial cockroaches. Biology Letters 3 (3):331–335 Ishikawa H, Yamaji M (1985) Symbionin, an aphid endosymbiont-specific protein—I: Production of insects deficient in symbiont. Insect Biochemistry 15 (2):155–163 Jombart T, Balloux F, Dray S (2010) Adephylo: new tools for investigating the phylogenetic signal in biological traits. Bioinformatics 26 (15):1907–1909 Kaltenpoth M, Göttler W, Herzner G, Strohm E (2005) Symbiotic bacteria protect wasp larvae from fungal infestation. Current Biology 15 (5):475–479 Kambhampati S (1995) A phylogeny of cockroaches and related insects based on DNA sequence of mitochondrial ribosomal RNA genes. Proceedings of the National Academy of Sciences 92 (6):2017–2020 Kambhampati S, Alleman A, Park Y (2013) Complete genome sequence of the endosymbiont Blattabacterium from the cockroach Nauphoeta cinerea (Blattodea: Blaberidae). Genomics 102 (5):479–483 Kassambara A (2018) ggpubr:“ggplot2” based publication ready plots. R package version 0.2. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular Biology and Evolution 30 (4):772– 780 Kikuchi Y, Hayatsu M, Hosokawa T, Nagayama A, Tago K, Fukatsu T (2012) Symbiont-mediated insecticide resistance. Proceedings of the National Academy of Sciences 109 (22):8618–8622 92

Kimura M, Ohta T (1971) On the rate of molecular evolution. Journal of Molecular Evolution 1 (1):1–17 Kinjo Y, Bourguignon T, Tong KJ, Kuwahara H, Lim SJ, Yoon KB, Shigenobu S, Park YC, Nalepa CA, Hongoh Y, Ohkuma M (2018) Parallel and gradual genome erosion in the Blattabacterium endosymbionts of Mastotermes darwiniensis and Cryptocercus wood roaches. Genome Biology and Evolution 10 (6):1622–1630 Kitade O (2004) Comparison of symbiotic flagellate faunae between termites and a wood-feeding cockroach of the genus Cryptocercus. Microbes and Environments 19 (3):215– 220 Klass K-D, Meier R (2006) A phylogenetic analysis of Dictyoptera (Insecta) based on morphological characters. Entomologische Abhandlungen 63 (1-2):3–50 Klass K-D, Nalepa C, Lo N (2008) Wood-feeding cockroaches as models for termite evolution (Insecta: Dictyoptera): Cryptocercus vs. Parasphaeria boleiriana. Molecular Phylogenetics and Evolution 46 (3):809–817 Koga R, Meng X-Y, Tsuchida T, Fukatsu T (2012) Cellular mechanism for selective vertical transmission of an obligate insect symbiont at the bacteriocyte–embryo interface. Proceedings of the National Academy of Sciences 109 (20):E1230–E1237 Koga R, Moran NA (2014) Swapping symbionts in spittlebugs: evolutionary replacement of a reduced genome symbiont. The ISME Journal 8 (6):1237–1246 Kômoto N, Yukuhiro K, Ueda K, Tomita S (2011) Exploring the molecular phylogeny of phasmids with whole mitochondrial genome sequences. Molecular Phylogenetics and Evolution 58 (1):43–52 Krishna K, Grimaldi D, Krishna V, Engel M (2013) Treatise on the Isoptera of the world: Vol. 4 Termitidae. Bulletin of the American Museum of Natural History 377:977– 1423 Kristensen NP (1981) Phylogeny of insect orders. Annual Review of Entomology 26 (1):135–157 Kumar S, Stecher G, Tamura K (2016) MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Molecular Biology and Evolution 33 (7):1870–1874 Labandeira CC (1994) A compendium of fossil insect families. In: Watkins R (ed) Contributions in Biology and Geology. Milwaukee Public Museum, Wisconsin, pp. 1–71 Lanfear R, Calcott B, Ho SY, Guindon S (2012) PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Molecular Biology and Evolution 29 (6):1695–1701 93

Lanfear R, Kokko H, Eyre-Walker A (2014) Population size and the rate of evolution. Trends in Ecology & Evolution 29 (1):33–41 Lanham U (1968) The Blochmann bodies: hereditary intracellular symbionts of insects. Biological Reviews 43 (3):269–286 Lefevre C, Charles H, Vallier A, Delobel B, Farrell B, Heddi A (2004) Endosymbiont phylogenesis in the Dryophthoridae weevils: evidence for bacterial replacement. Molecular Biology and Evolution 21 (6):965–973 Legendre F, Grandcolas P, Thouzé F (2017) Molecular phylogeny of Blaberidae (Dictyoptera, Blattodea) with implications for and evolutionary studies. European Journal of Taxonomy(291) Legendre F, Nel A, Svenson GJ, Robillard T, Pellens R, Grandcolas P (2015) Phylogeny of Dictyoptera: dating the origin of cockroaches, praying mantises and termites with molecular data and controlled fossil evidence. PLOS ONE 10 (7):e0130127 Legendre P, Desdevises Y, Bazin E (2002) A statistical test for host–parasite coevolution. Systematic Biology 51 (2):217–234 Lin Q (1980) Fossil insects from the Mesozoic of Zhejiang and Anhui provinces. Divisions and Correlations of the Meso zoic Volcano-Sedimentary Strata in Zhejiang and Anhui Provinces:211–234 Liu L, Huang X, Zhang R, Jiang L, Qiao G (2013) Phylogenetic congruence between Mollitrichosiphum (Aphididae: Greenideinae) and Buchnera indicates insect–bacteria parallel evolution. Systematic Entomology 38 (1):81–92 Lo N, Bandi C, Watanabe H, Nalepa C, Beninati T (2003) Evidence for cocladogenesis between diverse dictyopteran lineages and their intracellular endosymbionts. Molecular Biology and Evolution 20 (6):907–913 Lo N, Beninati T, Stone F, Walker J, Sacchi L (2007a) Cockroaches that lack Blattabacterium endosymbionts: the phylogenetically divergent genus Nocticola. Biology Letters 3 (3):327–330 Lo N, Eggleton P (2010) Termite phylogenetics and co-cladogenesis with symbionts Biology of termites: a modern synthesis. Springer, pp. 27–50 Lo N, Engel MS, Cameron S, Nalepa CA, Tokuda G, Grimaldi D, Kitade O, Krishna K, Klass K-D, Maekawa K (2007b) Save Isoptera: A comment on Inward et al. Biology Letters 3 (5):562–563 López-Sánchez MJ, Neef A, Peretó J, Patiño-Navarrete R, Pignatelli M, Latorre A, Moya A (2009) Evolutionary convergence and nitrogen metabolism in Blattabacterium strain 94

Bge, primary endosymbiont of the cockroach Blattella germanica. PLoS Genet 5 (11):e1000721 Lourenço JM, Glémin S, Chiari Y, Galtier N (2013) The determinants of the molecular substitution process in turtles. Journal of Evolutionary Biology 26 (1):38–50 Luan J-B, Chen W, Hasegawa DK, Simmons AM, Wintermantel WM, Ling K-S, Fei Z, Liu S-S, Douglas AE (2015) Metabolic coevolution in the bacterial symbiosis of whiteflies and related plant sap-feeding insects. Genome Biology and Evolution 7 (9):2635–2647 Lynch M, Walsh B (2007) The origins of genome architecture. Sinauer Associates Sunderland, MA Malke H (1964) Production of aposymbiotic cockroaches by means of lysozyme. Nature 204 (4964):1223–1224 Mao M, Bennett GM (2020) Symbiont replacements reset the co-evolutionary relationship between insects and their heritable bacteria. The ISME Journal:1–12 Mao M, Yang X, Bennett GM (2018) Evolution of host support for two ancient bacterial symbionts with differentially degraded genomes in a leafhopper host. Proceedings of the National Academy of Sciences 115 (50):E11691–E11700 Margulis L, Fester R (1991) Symbiosis as a source of evolutionary innovation: speciation and morphogenesis. MIT Press, Cambridge, Massachusetts Martin AP (1999) Substitution rates of organelle and nuclear genes in sharks: implicating metabolic rate (again). Molecular Biology and Evolution 16 (7):996–1002 Martin AP, Palumbi SR (1993) Body size, metabolic rate, generation time, and the molecular clock. Proceedings of the National Academy of Sciences 90 (9):4087–4091 Matsuura Y, Moriyama M, Łukasik P, Vanderpool D, Tanahashi M, Meng X-Y, McCutcheon JP, Fukatsu T (2018) Recurrent symbiont recruitment from fungal parasites in cicadas. Proceedings of the National Academy of Sciences 115 (26):E5970–E5979 McCutcheon JP, Moran NA (2007) Parallel genomic evolution and metabolic interdependence in an ancient symbiosis. Proceedings of the National Academy of Sciences 104 (49):19392–19397 McCutcheon JP, Moran NA (2010) Functional convergence in reduced genomes of bacterial symbionts spanning 200 My of evolution. Genome Biology and Evolution 2:708– 718 McCutcheon JP, Moran NA (2012) Extreme genome reduction in symbiotic bacteria. Nature Reviews Microbiology 10 (1):13–26

95

McCutcheon JP, Von Dohlen CD (2011) An interdependent metabolic patchwork in the nested symbiosis of mealybugs. Current Biology 21 (16):1366–1372 Mira A, Moran NA (2002) Estimating population size and transmission bottlenecks in maternally transmitted endosymbiotic bacteria. Microbial Ecology 44 (2):137–143 Misof B, Liu S, Meusemann K, Peters RS, Donath A, Mayer C, Frandsen PB, Ware J, Flouri T, Beutel RG (2014) Phylogenomics resolves the timing and pattern of insect evolution. Science 346 (6210):763–767 Mitter C, Farrell B, Wiegmann B (1988) The phylogenetic study of adaptive zones: has phytophagy promoted insect diversification? The American Naturalist 132 (1):107–128 Monnin D, Jackson R, Kiers ET, Bunker M, Ellers J, Henry LM (2020) Parallel evolution in the integration of a co-obligate aphid symbiosis. Current Biology 30 (10):1949– 1957.e1946 Moran NA (1996) Accelerated evolution and Muller's rachet in endosymbiotic bacteria. Proceedings of the National Academy of Sciences 93 (7):2873–2878 Moran NA, Bennett GM (2014) The tiniest tiny genomes. Annual Review of Microbiology 68:195–215 Moran NA, McCutcheon JP, Nakabachi A (2008) Genomics and evolution of heritable bacterial symbionts. Annual Review of Genetics 42:165–190 Moran NA, Munson MA, Baumann P, Ishikawa H (1993) A molecular clock in endosymbiotic bacteria is calibrated using the insect hosts. Proceedings of the Royal Society of London B: Biological Sciences 253 (1337):167–171 Moran NA, Plague GR, Sandström JP, Wilcox JL (2003) A genomic perspective on nutrient provisioning by bacterial symbionts of insects. Proceedings of the National Academy of Sciences 100 (suppl 2):14543–14548 Moran NA, Russell JA, Koga R, Fukatsu T (2005a) Evolutionary relationships of three new species of Enterobacteriaceae living as symbionts of aphids and other insects. Applied and Environmental Microbiology 71 (6):3302–3310 Moran NA, Tran P, Gerardo NM (2005b) Symbiosis and insect diversification: an ancient symbiont of sap-feeding insects from the bacterial phylum Bacteroidetes. Applied and Environmental Microbiology 71 (12):8802–8810 Morrow JL, Hall AA, Riegler M (2017) Symbionts in waiting: the dynamics of incipient endosymbiont complementation and replacement in minimal bacterial communities of psyllids. Microbiome 5 (1):1–23

96

Moya A, Peretó J, Gil R, Latorre A (2008) Learning how to live together: genomic insights into prokaryote–animal symbioses. Nature Reviews Genetics 9 (3):218–229 Munson M, Baumann P, Clark M, Baumann L, Moran N, Voegtlin D, Campbell B (1991) Evidence for the establishment of aphid-eubacterium endosymbiosis in an ancestor of four aphid families. Journal of Bacteriology 173 (20):6321–6324 Murienne J (2009) Molecular data confirm family status for the Tryonicus– Lauraesilpha group (Insecta: Blattodea: Tryonicidae). Organisms Diversity & Evolution 9 (1):44–51 Nakabachi A, Shigenobu S, Sakazume N, Shiraki T, Hayashizaki Y, Carninci P, Ishikawa H, Kudo T, Fukatsu T (2005) Transcriptome analysis of the aphid bacteriocyte, the symbiotic host cell that harbors an endocellular mutualistic bacterium, Buchnera. Proceedings of the National Academy of Sciences 102 (15):5477–5482 Nakabachi A, Ueoka R, Oshima K, Teta R, Mangoni A, Gurgui M, Oldham NJ, van Echten-Deckert G, Okamura K, Yamamoto K (2013) Defensive bacteriome symbiont with a drastically reduced genome. Current Biology 23 (15):1478–1484 Nalepa CA (2015) Origin of termite eusociality: trophallaxis integrates the social, nutritional, and microbial environments. Ecological Entomology 40 (4):323–335 Nalepa CA, Bandi C (2000) Characterizing the ancestors: paedomorphosis and termite evolution Termites: evolution, sociality, symbioses, ecology. Springer, pp. 53–75 Neef A, Latorre A, Pereto J, Silva FJ, Pignatelli M, Moya A (2011) Genome economization in the endosymbiont of the wood roach due to drastic loss of amino acid synthesis capabilities. Genome Biology and Evolution 3:1437– 1448 Nováková E, Hypša V, Klein J, Foottit RG, von Dohlen CD, Moran NA (2013) Reconstructing the phylogeny of aphids (Hemiptera: Aphididae) using DNA of the obligate symbiont Buchnera aphidicola. Molecular Phylogenetics and Evolution 68 (1):42–54 Ohta T (1987) Very slightly deleterious mutations and the molecular clock. Journal of Molecular Evolution 26 (1-2):1–6 Oliver KM, Degnan PH, Hunter MS, Moran NA (2009) Bacteriophages encode factors required for protection in a symbiotic mutualism. Science 325 (5943):992–994 Oliver KM, Russell JA, Moran NA, Hunter MS (2003) Facultative bacterial symbionts in aphids confer resistance to parasitic wasps. Proceedings of the National Academy of Sciences 100 (4):1803–1807

97

Ortiz-Rivas B, Martínez-Torres D (2010) Combination of molecular data support the existence of three main lineages in the phylogeny of aphids (Hemiptera: Aphididae) and the position of the subfamily Lachninae. Molecular Phylogenetics and Evolution 55 (1):305–317 Paradis E, Schliep K (2018) ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35 (3):526–528 Parham JF, Donoghue PC, Bell CJ, Calway TD, Head JJ, Holroyd PA, Inoue JG, Irmis RB, Joyce WG, Ksepka DT (2012) Best practices for justifying fossil calibrations. Systematic Biology 61 (2):346–359 Patiño-Navarrete R, Moya A, Latorre A, Peretó J (2013) Comparative genomics of Blattabacterium cuenoti: the frozen legacy of an ancient endosymbiont genome. Genome Biology and Evolution 5 (2):351–361 Patino-Navarrete R, Piulachs M-D, Belles X, Moya A, Latorre A, Peretó J (2014) The cockroach Blattella germanica obtains nitrogen from uric acid through a metabolic pathway shared with its bacterial endosymbiont. Biology Letters 10 (7):20140407 Peccoud J, Simon J-C, McLaughlin HJ, Moran NA (2009) Post-Pleistocene radiation of the pea aphid complex revealed by rapidly evolving endosymbionts. Proceedings of the National Academy of Sciences 106 (38):16315–16320 Piton LE (1940) Paléontologie du gisement Éocene de Menat. Price DR, Duncan RP, Shigenobu S, Wilson AC (2011) Genome expansion and differential expression of amino acid transporters at the aphid/Buchnera symbiotic interface. Molecular Biology and Evolution 28 (11):3113–3126 Price DR, Tibbles K, Shigenobu S, Smertenko A, Russell CW, Douglas AE, Fitches E, Gatehouse AM, Gatehouse J (2010) Sugar transporters of the major facilitator superfamily in aphids; from gene prediction to functional characterization. Insect Molecular Biology 19:97–112 R Core Team (2018) R: A language and environment for statistical computing. R Foundation for Statistical Computing., Vienna, Austria Raffa KF, Aukema BH, Bentz BJ, Carroll AL, Hicke JA, Turner MG, Romme WH (2008) Cross-scale drivers of natural disturbances prone to anthropogenic amplification: the dynamics of bark beetle eruptions. Bioscience 58 (6):501–517 Rambaut A, Drummond AJ, Xie D, Baele G, Suchard MA (2018) Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Systematic Biology 67 (5):901

98

Ren Z, Harris A, Dikow RB, Ma E, Zhong Y, Wen J (2017) Another look at the phylogenetic relationships and intercontinental biogeography of eastern Asian–North American Rhus gall aphids (Hemiptera: Aphididae: Eriosomatinae): Evidence from mitogenome sequences via genome skimming. Molecular Phylogenetics and Evolution 117:102–110 Ronquist F, Teslenko M, Van Der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic Biology 61 (3):539–542 Sabater-Muñoz B, Toft C, Alvarez-Ponce D, Fares MA (2017) Chance and necessity in the genome evolution of endosymbiotic bacteria of insects. The ISME Journal 11 (6):1291– 1304 Sabree ZL, Huang CY, Arakawa G, Tokuda G, Lo N, Watanabe H, Moran NA (2012) Genome shrinkage and loss of nutrient-providing potential in the obligate symbiont of the primitive termite Mastotermes darwiniensis. Applied and Environmental Microbiology 78 (1):204–210 Sabree ZL, Kambhampati S, Moran NA (2009) Nitrogen recycling and nutritional provisioning by Blattabacterium, the cockroach endosymbiont. Proceedings of the National Academy of Sciences 106 (46):19521–19526 Sacchi L, Corona S, Grigolo A, Laudani U, Selmi MG, Bigliardi E (1996) The fate of the endocytobionts of Blattella germanica (Blattaria: Blattellidae) and Periplaneta americana (Blattaria: Blattidae) during embryo development. Italian Journal of Zoology 63 (1):1–11 Salem H, Bauer E, Kirsch R, Berasategui A, Cripps M, Weiss B, Koga R, Fukumori K, Vogel H, Fukatsu T (2017) Drastic genome reduction in an herbivore’s pectinolytic symbiont. Cell 171 (7):1520–1531. e1513 Sarich VM, Wilson AC (1967) Immunological time scale for hominid evolution. Science 158 (3805):1200–1203 Scanlan DJ, Ostrowski M, Mazard S, Dufresne A, Garczarek L, Hess WR, Post AF, Hagemann M, Paulsen I, Partensky F (2009) Ecological genomics of marine picocyanobacteria. Microbiology and Molecular Biology Reviews 73 (2):249–299 Schaefer CW, Panizzi AR (2000) Heteroptera of economic importance. CRC press Schröder D, Deppisch H, Obermayer M, Krohne G, Stackebrandt E, Hölldobler B, Goebel W, Gross R (1996) Intracellular endosymbiotic bacteria of Camponotus species

99

(carpenter ants): systematics, evolution and ultrastructural characterization. Molecular Microbiology 21 (3):479–489 Scudder S (1868) Supplement to descriptions of Articulates. Description of fossil insects found on Mazon Creek and near Morris, Grundy Co., Ill. Geological Survey of Illinois 3:566–572 Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30 (14):2068–2069 Sendi H, Azar D (2017) New aposematic and presumably repellent bark cockroach from Lebanese amber. Cretaceous Research 72:13–17 Sheldon FH, Jones CE, McCracken KG (2000) Relative patterns and rates of evolution in heron nuclear and mitochondrial DNA. Molecular Biology and Evolution 17 (3):437–450 Shelford R (1910) On a collection of Blattidae preserved in amber, from Prussia. Zoological Journal of the Linnean Society 30 (201):336–355 Shigenobu S, Watanabe H, Hattori M, Sakaki Y, Ishikawa H (2000) Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS. Nature 407 (6800):81–86 Silva FJ, Latorre A, Moya A (2001) Genome size reduction through multiple events of gene disintegration in Buchnera APS. TRENDS in Genetics 17 (11):615–618 Sintupachee S, Milne J, Poonchaisri S, Baimai V, Kittayapong P (2006) Closely related Wolbachia strains within the pumpkin arthropod community and the potential for horizontal transmission via the plant. Microbial Ecology 51 (3):294–301 Sloan DB, Alverson AJ, Wu M, Palmer JD, Taylor DR (2012) Recent acceleration of plastid sequence and structural evolution coincides with extreme mitochondrial divergence in the angiosperm genus Silene. Genome Biology and Evolution 4 (3):294–306 Sloan DB, Nakabachi A, Richards S, Qu J, Murali SC, Gibbs RA, Moran NA (2014) Parallel histories of horizontal gene transfer facilitated extreme reduction of endosymbiont genomes in sap-feeding insects. Molecular Biology and Evolution 31 (4):857–871 Smith AB, Peterson KJ (2002) Dating the time of origin of major clades: molecular clocks and the fossil record. Annual Review of Earth and Planetary Sciences 30 (1):65–88 Smith DR, Lee RW (2010) Low nucleotide diversity for the expanded organelle and nuclear genomes of Volvox carteri supports the mutational-hazard hypothesis. Molecular Biology and Evolution 27 (10):2244–2256

100

Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post- analysis of large phylogenies. Bioinformatics 30 (9):1312–1313 Svenson GJ, Whiting MF (2004) Phylogeny of Mantodea based on molecular data: evolution of a charismatic predator. Systematic Entomology 29 (3):359–370 Takiya DM, Tran PL, Dietrich CH, Moran NA (2006) Co‐cladogenesis spanning three phyla: leafhoppers (Insecta: Hemiptera: Cicadellidae) and their dual bacterial symbionts. Molecular Ecology 15 (13):4175–4191 Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Molecular Biology and Evolution 28 (10):2731–2739 Thao ML, Baumann P (2004a) Evidence for multiple acquisition of Arsenophonus by whitefly species (: Aleyrodidae). Current Microbiology 48 (2):140–144 Thao ML, Baumann P (2004b) Evolutionary relationships of primary prokaryotic endosymbionts of whiteflies and their hosts. Applied and Environmental Microbiology 70 (6):3401–3406 Thao ML, Clark MA, Baumann L, Brennan EB, Moran NA, Baumann P (2000a) Secondary endosymbionts of psyllids have been acquired multiple times. Current Microbiology 41 (4):300–304 Thao ML, Moran NA, Abbot P, Brennan EB, Burckhardt DH, Baumann P (2000b) Cospeciation of psyllids and their primary prokaryotic endosymbionts. Applied and Environmental Microbiology 66 (7):2898–2905 Tian X, Liu J, Cui Y, Dong P, Zhu Y (2017) Mitochondrial genome of one kind of giant Asian mantis, Hierodula formosana (Mantodea: Mantidae). Mitochondrial DNA Part A 28 (1):11–12 Tokuda G, Elbourne LD, Kinjo Y, Saitoh S, Sabree Z, Hojo M, Yamada A, Hayashi Y, Shigenobu S, Bandi C (2013) Maintenance of essential amino acid synthesis pathways in the Blattabacterium cuenoti symbiont of a wood-feeding cockroach. Biology Letters 9 (3):20121153 Tokuda G, Isagawa H, Sugio K (2012) The complete mitogenome of the Formosan termite, Coptotermes formosanus Shiraki. Insectes Sociaux 59 (1):17–24 Tong KJ, Duchêne S, Ho SY, Lo N (2015) Comment on “Phylogenomics resolves the timing and pattern of insect evolution”. Science 349 (6247):487–487

101

Tsuchida T, Koga R, Shibao H, Matsumoto T, Fukatsu T (2002) Diversity and geographic distribution of secondary endosymbiotic bacteria in natural populations of the pea aphid, Acyrthosiphon pisum. Molecular Ecology 11 (10):2123–2135 Urban JM, Cryan JR (2012) Two ancient bacterial endosymbionts have coevolved with the planthoppers (Insecta: Hemiptera: Fulgoroidea). BMC Evolutionary Biology 12 (1):1–19 Vicente CSL, Mondal SI, Akter A, Ozawa S, Kikuchi T, Hasegawa K (2018) Genome analysis of new Blattabacterium spp., obligatory endosymbionts of Periplaneta fuliginosa and P. japonica. PLOS ONE 13 (7):e0200512 Vršanský P (1997) Piniblattella gen. nov.—the most ancient genus of the family Blattellidae (Blattodea) from the Lower Cretaceous of Siberia. Entomological Problems 28 (1):67–79 Vršanský P (2002) Origin and the early evolution of mantises. AMBA Projekty 6 (1):1–16 Vršanský P, Šmídová L, Valaška D, Barna P, Vidlička L, Takáč P, Pavlik L, Kúdelová T, Karim T, Zelagin D (2016) Origin of origami cockroach reveals long-lasting (11 Ma) phenotype instability following viviparity. The Science of Nature 103 (9):1–15 Vršanský P, Vidlička L, Barna P, Bugdaeva E, Markevich V (2013) Paleocene origin of the cockroach families Blaberidae and Corydiidae: Evidence from Amur River region of Russia. Zootaxa 3635 (2):117–126 Wang Z, Shi Y, Qiu Z, Che Y, Lo N (2017) Reconstructing the phylogeny of Blattodea: robust support for interfamilial relationships and major clades. Scientific Reports 7 (1):1–8 Ware JL, Grimaldi DA, Engel MS (2010) The effects of fossil placement and calibration on divergence times and rates: an example from the termites (Insecta: Isoptera). Arthropod Structure & Development 39 (2-3):204–219 Ware JL, Litman J, KLASS KD, Spearman LA (2008) Relationships among the major lineages of Dictyoptera: the effect of outgroup selection on dictyopteran tree topology. Systematic Entomology 33 (3):429–450 Welch JJ, Bromham L (2005) Molecular dating when rates vary. Trends in Ecology & Evolution 20 (6):320–327 Welch JJ, Waxman D (2008) Calculating independent contrasts for the comparative study of substitution rates. Journal of Theoretical Biology 251 (4):667–678

102

Werren JH (1997) Biology of Wolbachia. Annual Review of Entomology 42 (1):587– 609 Wilkinson T, Fukatsu T, Ishikawa H (2003) Transmission of symbiotic bacteria Buchnera to parthenogenetic embryos in the aphid Acyrthosiphon pisum (Hemiptera: Aphidoidea). Arthropod Structure & Development 32 (2-3):241–245 Woese CR, Fox GE (1977) Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proceedings of the National Academy of Sciences 74 (11):5088–5090 Wolschin F, Hölldobler B, Gross R, Zientz E (2004) Replication of the endosymbiotic bacterium Blochmannia floridanus is correlated with the developmental and reproductive stages of its ant host. Applied and Environmental Microbiology 70 (7):4096–4102 Woodruff R, Thompson JN, Seeger MA, Spivey WE (1984) Variation in spontaneous mutation and repair in natural population lines of Drosophila melanogaster. 53 (1):223–234 Woolfit M, Bromham L (2003) Increased rates of sequence evolution in endosymbiotic bacteria and fungi with small effective population sizes. Molecular Biology and Evolution 20 (9):1545–1555 Wright S, Keeling J, Gillman L (2006) The road from Santa Rosalia: a faster tempo of evolution in tropical climates. Proceedings of the National Academy of Sciences 103 (20):7718–7722 Wu C-I, Li W-H (1985) Evidence for higher rates of nucleotide substitution in rodents than in man. Proceedings of the National Academy of Sciences 82 (6):1741–1745 Wu D, Daugherty SC, Van Aken SE, Pai GH, Watkins KL, Khouri H, Tallon LJ, Zaborsky JM, Dunbar HE, Tran PL (2006) Metabolic complementarity and genomics of the dual bacterial symbiosis of sharpshooters. PLoS Biol 4 (6):e188 Xia X (2017) DAMBE6: new tools for microbial genomics, phylogenetics, and molecular evolution. Journal of Heredity 108 (4):431–437 Xia X, Lemey P (2009) Assessing substitution saturation with DAMBE. In: Lemey P, Salemi M, Vandamme A-M (eds) The phylogenetic handbook: a practical approach to DNA and protein phylogeny. Cambridge University Press, Cambridge, pp. 615–630 Xia X, Xie Z, Salemi M, Chen L, Wang Y (2003) An index of substitution saturation and its application. Molecular Phylogenetics and Evolution 26 (1):1–7 Xiao B, Chen A-H, Zhang Y-Y, Jiang G-F, Hu C-C, Zhu C-D (2012) Complete mitochondrial genomes of two cockroaches, Blattella germanica and Periplaneta americana, and the phylogenetic position of termites. Current Genetics 58 (2):65–77 103

Yamauchi M, Miya M, Nishida M (2004) Use of a PCR‐based approach for sequencing whole mitochondrial genomes of insects: two examples (cockroach and dragonfly) based on the method developed for decapod crustaceans. Insect Molecular Biology 13 (4):435–442 Yan Z, Ye G, Werren JH (2019) Evolutionary rate correlation between mitochondrial-encoded and mitochondria-associated nuclear-encoded proteins in insects. Molecular Biology and Evolution 36 (5):1022–1036 Ye F, Lan X-e, Zhu W-b, You P (2016) Mitochondrial genomes of praying mantises (Dictyoptera, Mantodea): rearrangement, duplication, and reassignment of tRNA genes. Scientific Reports 6 (1):1–9 Zchori-Fein E, Brown J (2002) Diversity of prokaryotes associated with Bemisia tabaci (Gennadius)(Hemiptera: Aleyrodidae). Annals of the Entomological Society of America 95 (6):711–718 Zchori-Fein E, Gottlieb Y, Kelly S, Brown J, Wilson J, Karr T, Hunter M (2001) A newly discovered bacterium associated with parthenogenesis and a change in host selection behavior in parasitoid wasps. Proceedings of the National Academy of Sciences 98 (22):12555–12560 Zchori-Fein E, Perlman SJ, Kelly SE, Katzir N, Hunter MS (2004) Characterization of a ‘Bacteroidetes’ symbiont in Encarsia wasps (: Aphelinidae): proposal of ‘Candidatus Cardinium hertigii’. International Journal of Systematic and Evolutionary Microbiology 54 (3):961–968 Zerbino DR (2010) Using the velvet de novo assembler for short‐read sequencing technologies. Current Protocols in Bioinformatics 31:11.15.11–11.15.12 Zhang Y-y, Xuan W-j, Zhao J-l, Zhu C-d, Jiang G-f (2010) The complete mitochondrial genome of the cockroach (Blattaria: Polyphagidae) and the phylogenetic relationships within the Dictyoptera. Molecular Biology Reports 37 (7):3509–3516 Zhang Z, Schneider JW, Hong Y (2013) The most ancient roach (Blattodea): a new genus and species from the earliest Late Carboniferous (Namurian) of China, with a discussion of the phylomorphogeny of early blattids. Journal of Systematic Palaeontology 11 (1):27–40

104

Appendix 1: Supplementary material for Chapter 2

105

Table S2.1: Samples of cockroaches analysed in this chapter, with details of family, sample code, collection locality, date of collection, and GenBank accession numbers. Bolded accession number indicate samples sequenced for this study

Species Family Sample ID Accession Collecting locality Date number Locusta migratoria Acrididae, NC_015624.1 GenBank tibetensis Orthoptera Anaplecta calosoma Anaplectidae Cockroach contig 1688 MG882215 Kuranda, Queensland, Australia 16-Nov-11 Anaplecta omei Anaplectidae Anaplecta omei MG882129 Mt. Emeishan, Sichuan Province, 30-Jun-09 China Epilampra maya Blaberidae B095 MG882194 Arcadia, Florida, USA 06-Jul-05 Aeluropoda insignis Blaberidae B002 MG882133 Breeding of Kyle Kandilian N/A

Archimandrita tessellata Blaberidae B003 MG882134 Breeding of Kyle Kandilian N/A Blaberidae B010 MG882137 Breeding of Kyle Kandilian N/A Blaberus peruvianus Blaberidae B004 MG882135 Breeding of Kyle Kandilian N/A Blaberus sp. Blaberidae B007 MG882136 Breeding of Kyle Kandilian N/A Blaberus sp. Blaberidae B013 MG882138 Breeding of Kyle Kandilian N/A Blaptica dubia Blaberidae B056 MG882164 Breeding of Kyle Kandilian N/A Byrsotria cabrerai Blaberidae B015 MG882139 Breeding of Kyle Kandilian N/A Byrsotria fumigata Blaberidae B016 MG882140 Breeding of Kyle Kandilian N/A Byrsotria rothi Blaberidae B017 MG882141 Breeding of Kyle Kandilian N/A Byrsotria sp. Blaberidae B018 MG882142 Breeding of Kyle Kandilian N/A Blaberidae B020 MG882143 Breeding of Kyle Kandilian N/A Elliptorhina chopardi Blaberidae B021 MG882144 Breeding of Kyle Kandilian N/A

106

Elliptorhina davidi Blaberidae B022 MG882145 Breeding of Kyle Kandilian N/A Elliptorhina javanica Blaberidae B023 MG882146 Breeding of Kyle Kandilian N/A distanti Blaberidae B025 MG882147 Breeding of Kyle Kandilian N/A

Gromphadorhina Blaberidae B029 MG882148 Breeding of Kyle Kandilian N/A grandidieri Gromphadorhina Blaberidae B031 MG882149 Breeding of Kyle Kandilian N/A oblongonota Gromphadorhina Blaberidae B032 MG882150 Breeding of Kyle Kandilian N/A portentosa Gromphadorhina sp. Blaberidae B040 MG882153 Breeding of Kyle Kandilian N/A Gyna caffrorum Blaberidae B034 MG882151 Breeding of Kyle Kandilian N/A Gyna capucina Blaberidae B035 MG882152 Breeding of Kyle Kandilian N/A Nauphoeta cinerea Blaberidae B041 MG882154 Breeding of Kyle Kandilian N/A nivea Blaberidae B044 MG882155 Breeding of Kyle Kandilian N/A Phoetalia pallida Blaberidae B046 MG882156 Breeding of Kyle Kandilian N/A Pycnoscelus femapterus Blaberidae B048 MG882157 Breeding of Kyle Kandilian N/A

Pycnoscelus indicus Blaberidae B049 MG882158 Breeding of Kyle Kandilian N/A Pycnoscelus nigra Blaberidae B050 MG882159 Breeding of Kyle Kandilian N/A Pycnoscelus surinamensis Blaberidae B052 MG882160 Breeding of Kyle Kandilian N/A

Rhyparobia maderae Blaberidae B053 MG882161 Breeding of Kyle Kandilian N/A Rhyparobia sp. Blaberidae B054 MG882162 Breeding of Kyle Kandilian N/A Schultesia lampyridiformis Blaberidae B055 MG882163 Breeding of Kyle Kandilian N/A

Panesthia angustipennis Blaberidae Z138 MG882234 Breeding, born in Czech rep., orig 31-Jul-11 Vietnam

107

Gromphadorhina sp. Blaberidae Pvii MG882227 Breeding, determined as a Princisia 31-Jul-11 vanwaerbecki Opisthoplatia orientalis Blaberidae Z15100 MG882239 breeds J. Hromádka Panesthia sp. Blaberidae Salganea MG882229 Bubeng, provine Yunnan, China 07-Jul-05 Rhyparobia grandis Blaberidae Z95 MG882231 Cameroon 04-May-11 Blaberidae BGIGA NC_017924.1 GenBank N/A Nauphoeta cinerea Blaberidae BNCIN CP005488.1 GenBank N/A Macropanesthia rhinoceros Blaberidae B115 MG882202 Gumlu, Australia 13-Oct-11

Rhabdoblatta sp. Blaberidae RHA MG882228 Kuranda, Queensland, Australia 15-Sep-11 Paranauphoeta circumdata Blaberidae PARA MG882225 N/A N/A

Galiblatta cribrosa Blaberidae Z98 MG882232 Nouragues, French Guiana 13-Jun-11 Laxta sp. Blaberidae AUS2 MG882131 Olney State Forest, New South, 24-Aug-11 Wales, Australia Neolaxta mackerrasae Blaberidae B107 MG882201 Paluma Range, Australia 14-Oct-11 Pycnoscelus sp. Blaberidae B103 MG882200 Singapore, Singapore 31-Aug-09 Laxta sp. Blaberidae BLA 48 MG882210 Tidbinbilla Nature Reserve, Australian Capital Territory, Australia orientalis Blattidae B068 MG882174 Breeding of Kyle Kandilian N/A Deropeltis paulinoi Blattidae B069 MG882175 Breeding of Kyle Kandilian N/A Eurycotis decipiens Blattidae B071 MG882176 Breeding of Kyle Kandilian N/A Eurycotis opaca Blattidae B073 MG882178 Breeding of Kyle Kandilian N/A Eurycotis sp. Blattidae B074 MG882179 Breeding of Kyle Kandilian N/A rhombifolia Blattidae B075 MG882180 Breeding of Kyle Kandilian N/A

Periplaneta brunnea Blattidae B078 MG882182 Breeding of Kyle Kandilian N/A Shelfordella lateralis Blattidae B080 MG882183 Breeding of Kyle Kandilian N/A

108

Melanozosteria sp. Blattidae Melanozosteria____sp. MG882223 Cairns, Australia 17-Dec-11 Cosmozosteria sp. Blattidae B117 MG882203 Cape Upstart, Australia 12-Oct-11 Melanozosteria nitida Blattidae B122 MG882204 Cape Upstart, Australia 12-Oct-12 Periplaneta australasiae Blattidae B077 MG882181 Florida, USA N/A

Periplaneta americana Blattidae BPLAN NC_013418.2 GenBank N/A Blatta orientalis Blattidae BOR CP003605.1 GenBank N/A Periplaneta americana Blattidae NC_016956.1 GenBank – Xiao et al. 2012 Periplaneta fuliginosa Blattidae NC_006076.1 GenBank - Yamauchi et al. 2004 Protagonista lugubris Blattidae Cockroach contig 4907 MG882216 Mt.Diaoluoshan,Hainan, China 24-May-11

Methana sp. Blattidae AUS1 MG882130 North Manly, New South Wales, 31-Jul-11 Australia Platyzosteria sp. Blattidae AUS3 MG882132 Olney State Forest, New South, 24-Aug-11 Wales, Australia Eurycotis floridana Blattidae B072 MG882177 Sarasota, Florida, USA 04-Jul-05 viridissima Blattidae BLA 003 MG882209 Thredbo, NSW, Australia

Arenivaga sp. Corydiidae B097 MG882196 Breeding of Kyle Kandilian N/A Ergaula capucina Corydiidae B086 MG882188 Breeding of Kyle Kandilian N/A Eupolyphaga sinensis Corydiidae B081 MG882184 Breeding of Kyle Kandilian N/A Heterogamodes hebraica Corydiidae B094 MG882193 Breeding of Kyle Kandilian N/A

Therea olegrandjeani Corydiidae B089 MG882190 Breeding of Kyle Kandilian N/A petiveriana Corydiidae B090 MG882191 Breeding of Kyle Kandilian N/A Eupolyphaga sinensis Corydiidae NC_014274.1 GenBank - Zhang et al. 2010 Therea regularis Corydiidae B091 MG882192 Palm plantation between Puducherry N/A and Auroville, India Polyphaga aegyptiaca Corydiidae B088 MG882189 Sinai, Egypt N/A

109

Cryptocercus punctulatus Cryptocercidae CPU CP003015.1 GenBank N/A

Cryptocercus relictus Cryptocercidae JX144941 GenBank - Cameron et al. 2012 Cryptocercus punctulatus Cryptocercidae CR18 MG882217 Mountain Lake Biological Station, 30-Sep-11 Virginia, USA Cryptocercus hirtus Cryptocercidae HIR MG882220 Mt.Taibaishan,Shaanxi 05-Oct-09 Province,China Cryptocercus meridianus Cryptocercidae MERI MG882224 Mt.Yulongxueshan,Yunnan 17-Jul-08 Province,China

Micadina phluctainoides Diapheromeridae, NC_014673 GenBank - Komoto et al. 2011 fulvescens Ectobiidae B099 MG882197 Alabama, USA 05-Jul-05 Ectobiidae B102 Breeding colony of Kyle Kandilian N/A Anallacta methanoides Ectobiidae B057 MG882165 Breeding of Kyle Kandilian N/A Asiablatta kyotensis Ectobiidae B058 MG882166 Breeding of Kyle Kandilian N/A Ischnoptera bilunata Ectobiidae B082 MG882185 Breeding of Kyle Kandilian N/A Lobopterella dimidiatipes Ectobiidae B096 MG882195 Breeding of Kyle Kandilian N/A

Paratemnopteryx Ectobiidae B061 MG882168 Breeding of Kyle Kandilian N/A couloniana Symploce macroptera Ectobiidae B066 MG882172 Breeding of Kyle Kandilian N/A Symploce pallens Ectobiidae B067 MG882173 Breeding of Kyle Kandilian N/A Ellipsidion sp. Ectobiidae BLA 050 MG882211 Brisbane Forest Park,Queensland, 27-Dec-00 Australia Balta sp. Ectobiidae Balta_sp. MG882206 Cairns, Australia 17-Dec-11 Beybienkoa kurandanensis Ectobiidae Beybienkoa_karandanensis MG882208 Cairns, Australia 17-Dec-11 Ischnoptera sp. Ectobiidae B084 MG882187 Costa Rica N/A Phyllodromica sp. Ectobiidae Phil MG882226 Czech republic 31-Jul-11

110

Parcoblatta pennsylvanica Ectobiidae B064 MG882171 Dearborn, Michigan, USA 05-Jul-05

Amazonina sp. Ectobiidae Z256E MG882237 , Bosque Protector del Alto Apr-2016 Nangaritza Euphyllodromia sp. Ectobiidae Z257 MG882238 Ecuador, Podocarpus National Park Apr-2016 Megaloblatta sp. Ectobiidae ECMD1 MG882218 Ecuador, Podocarpus National Park Apr-2016 Ectobiidae sp. Ectobiidae Z140 MG882235 Etiopia 31-Jul-11 Blattella germanica Ectobiidae BGE CP001487.1 GenBank N/A Blattella bisignata Ectobiidae NC_018549.1 GenBank - Chen 2013 Blattella germanica Ectobiidae NC_012901.1 GenBank - Xiao et al. 2012 Allacta australiensis Ectobiidae AUS Allacta MG882127 James Cook University, Rainforest 21-Jun-11 site, Queensland, Australia Carbrunneria paramaxi Ectobiidae Carbru MG882214 James Cook University, Rainforest 04-Oct-11 site, Queensland, Australia Ectoneura hanitschi Ectobiidae Ectoneura___hanitschi MG882219 James Cook University, Rainforest 17/12/2011 site, Queensland, Australia by David Rentz Parcoblatta bolliana Ectobiidae B101 MG882199 Montgomery, Alabama, USA 05-Jul-05 Parcoblatta divisa Ectobiidae B062 MG882169 Montgomery, Alabama, USA 05-Jul-05 Ectobiidae B063 MG882170 Montgomery, Alabama, USA 05-Jul-05 Parcoblatta uhleriana Ectobiidae B100 MG882198 Montgomery, Alabama, USA 05-Jul-05 Blattella vaga Ectobiidae BV MG882213 Multan, Pakistan 30-Apr-10 Allacta bimaculata Ectobiidae Allacta MG882128 Next to G213 National Road, 24-Apr-10 Menglun,Yunnan Province, China Blattella lituricollis Ectobiidae BUidUT MG882212 Singapore, Singapore 31-Mar-09 Ectobiidae sp. Ectobiidae BB MG882207 Singapore, Singapore 31-Dec-07 Ectobius sp. Ectobiidae Z254C MG882236 Slovenia Apr-2016 Ischnoptera deropeltiformis Ectobiidae B083 MG882186 Torreya State Park, Bristol, Florida, 06-Jul-05 USA

111

Blattella asahinai Ectobiidae B059 MG882167 Tuscaloosa, Alabama, USA 05-Jul-05 Chorisoserrata sp. Ectobiidae CHORI Yunnan, China by Zongqing Wang 30-Jun-09 Sulcia muelleri Flavobacteriaceae CARI CP002163 GenBank N/A Sulcia muelleri Flavobacteriaceae PSPU AP013293 GenBank N/A Sulcia muelleri Flavobacteriaceae CARI CP010828 GenBank N/A Flavobacterium gilvum Flavobacteriaceae EM1308 CP017479 GenBank N/A Lutibacter sp. Flavobacteriaceae LPB0138 CP017478 GenBank N/A Tenacibaculum dicentrarchi Flavobacteriaceae AY7486TD CP013671 GenBank N/A Polaribacter sp. Flavobacteriaceae LT629752 LT629752 GenBank N/A sculleni Grylloblattidae, DQ241796 GenBank - Cameron et al. 2006

Heteropteryx dilatata Heteropterygidae, NC_014680 GenBank - Komoto et al. 2011 Phasmatodea

Creobroter gemmatus Hymenopodidae, NC_030267.1 GenBank - Ye et al. 2016 Mantodea Mastotermes darwiniensis Isoptera MADAR CP003000.1 GenBank N/A

Reticulitermes santonensis Isoptera NC_009499.1 GenBank - Cameron and Whiting 2007

Drepanotermes sp. Isoptera ISO312 JX144938 GenBank - Cameron et al. 2012 Heterotermes sp. Isoptera ISO183 JX144936 GenBank - Cameron et al. 2012 Macrognathotermes sp. Isoptera ISO360 JX144939 GenBank - Cameron et al. 2012 Macrotermes sp. Isoptera IS95 JX144937 GenBank - Cameron et al. 2012 Mastotermes darwiniensis Isoptera NC_018120.1 GenBank - Cameron et al. 2012

Microhodotermes sp. Isoptera IS17 JX144931 GenBank - Cameron et al. 2012 Nasutitermes sp. Isoptera ISO352 JX144940 GenBank - Cameron et al. 2012 Neotermes insularis Isoptera NC_018124.1 GenBank - Cameron et al. 2012

112

Porotermes sp. Isoptera ISO207 JX144930 GenBank - Cameron et al. 2012 31-Mar-99 Schedorhinotermes sp. Isoptera ISO314 JX144935 GenBank - Cameron et al. 2012 Zootermopsis sp. Isoptera ISO845 JX144932 GenBank - Cameron et al. 2012 Coptotermes formosanus Isoptera NC_015800.1 GenBank - Tokuda et al. 2012

Lamproblatta sp. Lamproblattidae Z99 MG882233 Nouragues, French Guiana 14-Jun-11 Lamproblatta sp. Lamproblattidae LA male MG882222 Petit saut, French Guiana 07-Jul-05 Humbertiella nada Liturgusidae, NC_030264.1 GenBank - Ye et al. 2016 Mantodea

Tamolanica tamolana Mantidae, NC_007702.1 GenBank - Cameron et al. 2006 Mantodea Hierodula formosana Mantidae, NC_029326.1 GenBank - Tian et al. 2015 Mantodea Mantis religiosa Mantidae, NC_030265.1 GenBank - Ye et al. 2016 Mantodea Statilia sp. Mantidae, KU201316 GenBank - Ye et al. 2016 Mantodea Tenodera sinensis Mantidae, NC_030266.1 GenBank - Ye et al. 2016 Mantodea Sclerophasma paresisense Mantophasmatidae NC_007701.1 GenBank - Cameron et al. 2006 (Mantophasmatodea ) Nocticola sp. Nocticolidae JW1 9/1 MG882221 Glennie Tableland, Queensland, 14-Sep-11 Australia Acroneuria hainana Perlidae NC_026104.1 GenBank - Huang et al. 2015 (Plecoptera) Ramulus hainanense , NC_013185.1 GenBank Phasmatodea

Entoria okinawaensis Phasmatidae, NC_014694 GenBank - Komoto et al. 2011 Phasmatodea

113

Megacrania alpheus adan Phasmatidae, NC_014688 GenBank - Komoto et al. 2011 Phasmatodea

Phobaeticus serratipes Phasmatidae, NC_014678 GenBank - Komoto et al. 2011 Phasmatodea

Phraortes illepidus Phasmatidae, NC_014695 GenBank - Komoto et al. 2011 Phasmatodea

Ramulus Phasmatidae, NC_014702 GenBank - Komoto et al. 2011 irregulariterdentatus Phasmatodea

Extatosoma tiaratum Phasmatidae, NC_017748 GenBank Phasmatodea

Timema californicum Timematidae, DQ241799.1 GenBank - Komoto et al. 2011 Phasmatodea Tryonicus mackerrasae Tryonicidae B124 MG882205 Mt Haig, Australia 23-Oct-11

Tryonicus parvus Tryonicidae Tryonicus parvus MG882230 Olney State Forest, New South, 09-Mar-12 Wales, Australia

114

Table S2.2: Fossils used to calibrate the estimates of divergence times of major cockroach clades using topology based on cockroach mitochondrial genomes.

Species Age Calibration Group Soft Max. References (Ma)/Min. Bound Age (97.5% Constraint probability) for Group Mylacris 315 Dictyoptera+Phasmatodea+Grylloblattodea 407 (Scudder 1868) anthracophila Juramantis 145 Dictyoptera 315.2 (Vršanský 2002) initialis Valditermes 130 +other Isoptera, excluding Mastotermes 235 (Krishna et al. 2013) brenanae Cratokalotermes 112 +Rhinotermitidae+Termitidae 145 (Grimaldi et al. santanensis 2008) Reticulitermes 33.9 Reticulitermes+Coptotermes+Heterotermes 94.3 (Engel et al. 2007) antiquus Coptotermes 16 Coptotermes+Heterotermes 33.9 (Emerson 1971) sucineus Nanotermes 47.8 Termitidae+Coptotermes+Heterotermes+Reticulitermes 94.3 (Engel et al. 2011)

Balatronis 125 Blattidae+Tryonicidae 235 (Sendi and Azar libanensis 2017) Ergaula stonebut 61.7 Ergaula+Therea 145 (Vršanský et al. 2013)

115

Periplaneta 56 Periplaneta+Shelfordella+Blatta+Neostylopyga+Deropeltis 145 (Piton 1940) houlberti Gyna obesa 56 Gyninae+Panchlorinae+Blaberinae 145 (Piton 1940) Diploptera 56 Diplopterinae+Oxyhaloinae 145 (Vršanský et al. 2016) Pycnoscelus 41.3 Panesthiinae+Perisphaerinae+Pycnoscelinae (+Rhabdoblatta) 145 (Cockerell 1920) gardneri Ischnoptera 33.9 Ischnoptera+sister 145 (Shelford 1910) gedanensis Epilampra 41.3 Epilampra+Galiblatta 145 (Beccaloni 2014)

116

Table S2.3: Fossils used to calibrate the estimates of divergence times of major cockroach clades using topology based on Blattabacterium genes.

Species Age (Ma) / min. Calibration group Soft max. bound Reference age constraint (97.5% probability) for group Juramantis initialis 145 Dictyoptera 315.2 (Vršanský 2002) Baissatermes 137 Cryptocercus + Isoptera 235 (Engel et al. lapideus 2007)

Balatronis libanensis 125 Blattidae + Tryonicidae 235 (Sendi and Azar 2017)

Ischnoptera 33.9 Ischnopteraþsister 145 (Shelford 1910) gedanensis Epilampra 41.3 EpilampraþGaliblatta 145 (Beccaloni 2014)

117

Table S2.4: Blattabcterium genes used for each subset for BEAST analyses.

First Set Second Set Third Set Fourth Set Size Size Size Size HOG No. (bp) HOG No. (bp) HOG No. (bp) HOG No. (bp) HOG154 357 HOG144 942 HOG159 1389 HOG517 1227 HOG172 237 HOG152 315 HOG160 519 HOG458 441 HOG259 1425 HOG18 426 HOG174 1542 HOG255 852 HOG272 1068 HOG238 1116 HOG212 1695 HOG320 855 HOG279 984 HOG254 1083 HOG239 2595 HOG365 1542 HOG282 789 HOG261 840 HOG295 453 HOG177 1185 HOG291 1308 HOG286 762 HOG319 990 HOG234 957 HOG312 2319 HOG288 594 HOG326 831 HOG404 2112 HOG321 483 HOG3 1188 HOG34 567 HOG357 321 HOG325 1107 HOG346 990 HOG354 1971 HOG335 1128 HOG340 1356 HOG362 1200 HOG373 1125 HOG380 348 HOG371 813 HOG372 1296 HOG39 1200 HOG399 3351 HOG378 456 HOG385 2415 HOG450 372 HOG91 2520 HOG382 1479 HOG422 1017 HOG452 630 HOG573 864 HOG368 1335 HOG453 369 HOG456 213 HOG467 252 HOG388 1698 HOG457 1353 HOG470 672 HOG620 1035 HOG401 1149 HOG463 267 HOG471 330 HOG398 1020 HOG403 1758 HOG477 303 HOG473 828 HOG600 252 HOG451 1035 HOG479 468 HOG478 2163 HOG469 423 HOG454 381 HOG480 363 HOG481 822 HOG366 711 HOG459 504 HOG502 276 HOG523 723 HOG483 669 HOG472 282 HOG503 1626 HOG525 579 HOG476 606 HOG485 429 HOG518 4350 HOG536 2151 HOG400 1074 HOG564 1452 HOG524 420 HOG572 2118 HOG396 1116 HOG599 1266 HOG562 1650 HOG604 237 HOG402 1023 HOG632 1257 HOG633 1491 HOG631 354 HOG466 387 All All All All codons 26727 codons 27120 codons 27069 codons 26271 1st and 1st and 1st and 1st and 2nd only 17818 2nd only 18080 2nd only 18046 2nd only 17514

118

Table S2.5: Blattabaterium gene names with reference to HOG label. Gene name HOG No. tRNA-2-methylthio-N6-dimethylallyladenosine synthase HOG386 Elongation Factor HOG3 hypothetical protein HOG160 Fructose-bisphosphate aldolase HOG254 DNA gyrase subunit A HOG239 3-dehydroquinate synthase HOG238 hypothetical protein HOG159 Bifunctional aspartokinase HOG91 ATP synthase subunit A HOG39 Ribosome recycling factor HOG34 Imidazole glycerol phosphate synthase subunit HisH HOG288 Imidazole glycerol phosphate synthase subunit HisF HOG286 Ribose-phosphate pyrophosphokinase HOG234 hypothetical protein HOG212 Chaperone protein DnaJ HOG177 ATP synthase subunit beta HOG174 ATP synthase subunit c HOG172 Diaminopimelate epimerase HOG326 Putative aminopeptase YsdC HOG325 Ribosomal RNA small subunit methyltransferase A HOG282 2-oxoisovalerate dehydrogenase subunit beta HOG279 putative branched-chain-amino-acid aminotransferase HOG272 2,3,4,5-tetrahydropyridine-2,6-dicarboxylate N-succinyltransferase HOG261 Multifunctional CCA protein HOG259 Lipoyl synthase HOG255

119

SsrA-binding protein HOG321 Transketolase 2 HOG320 1-deoxy-D-xylulose-5-phosphate synthase HOG319 Phosphate acetyltransferase HOG312 hypothetical protein HOG295 Histidinol dehydrogenase HOG291 Methionine--tRNA ligase HOG388 Octanoyltransferase HOG366 Lysine--tRNA ligase HOG365 Aspartate aminotransferase HOG362 30S ribosomal protein S16 HOG357 DNA gyrase subunit B HOG354 Phenylalanine--tRNA ligase alpha subunit HOG346 Histidine--tRNA ligase HOG340 Lon protease 2 HOG385 tRNA modification GTPase MnmE HOG382 30S ribosomal protein S6 HOG380 50S ribosomal protein L9 HOG378 Phospho-2-dehydro-3-deoxyheptonate aldolase HOG373 S-adenosylmethionine synthase HOG372 Enoyl-acyl-carrier-protein reductase NADH FabI HOG371 30S ribosomal protein S1 HOG404 Argininosuccinate synthase HOG403 N-acetyl-gamma-glutanyl-phosphate reductase HOG402 Acetylornithine aminotransferase HOG401 Carbamoyl-phosphate synthate small chain HOG400 Carbamoyl-phosphate synthate large chain HOG399 N-acetylornithine carbamoyltransferase HOG398

120

Translation initiation factor IF-1 HOG456 30S ribosomal protein S13 HOG454 30S ribosomal protein S11 HOG453 30S ribosomal protein S4 HOG452 DNA-directed RNA polymerase subunit alpha HOG451 50S ribosomal protein L17 HOG450 Aspartate semialdehyde dehydrogenase HOG422 Two names: 1)] Acetylornithine deacetylase; 2)] Succinyl-diaminopimelate HOG396 desuccinylase 30S ribosomal protein S3 HOG470 50S ribosomal protein L16 HOG469 30S ribosomal protein S17 HOG467 50S ribosomal protein L14 HOG466 Alternate 30S ribosomal protein S14 HOG463 30S ribosomal protein S5 HOG459 Methionine aminopeptidase 1 HOG481 30S ribosomal protein S12 HOG480 30S ribosomal protein S7 HOG479 Elongation factor G HOG478 30S ribosomal protein S10 HOG477 50S ribosomal protein L15 HOG458 Protein translocase subunit SecY HOG457 FeS cluster assembly protein SufB HOG633 Cysteine desulfurase SufS HOG632 50S ribosomal protein L1 HOG523 50S ribosomal protein L3 HOG476 50S ribosomal protein L2 HOG473 30S ribosomal protein S19 HOG472

121

50S ribosomal protein L22 HOG471 50S ribosomal protein L21 HOG631 Glyceraldehyde-3-phosphate dehydrogenase A HOG620 DNA-directed RNA polymerase subunit beta' HOG518 Glutamine tRNA ligase HOG517 60 kDa chaperonin HOG503 10 kDa chaperonin HOG502 50S ribosomal protein L13 HOG485 30S ribosomal protein S2 HOG483 ATP synthase epsilon chain HOG604 Acyl carrier protein HOG600 3-oxoacyl-acyl-carrier-protein synthase 2 HOG599 RNA polymerase sigma factor SigA HOG573 Polyribonucleotide nucleotidyltransferase HOG572 Asparagine tRNA ligase HOG564 Glutamate dehydrogenase HOG562 Fumarate reductase flavoprotein subunit HOG536 50S ribosomal protein L11 HOG525 50S ribosomal protein L11 HOG524 UDP-N-acetylglucosamine--N-acetylmuramyl-pentapeptide pyrophosphoryl- HOG335 undecaprenol N-acetylglucosamine transferase Putative 1,2-phenylacetyl-CoA epoxidase, subunit D HOG154 hypothetical protein HOG152 Cysteine desulfuration protein HOG18 Dihydrolipoyllysine-residue succinyltransferase component of 2oxoglutarate HOG144 dehydrogenase complex.

122

Figure S2.1: Timescale of Blattabacterium evolution. The tree was inferred using Bayesian analysis in BEAST, on the basis of 26 protein-coding genes with 3rd codon sites excluded.

Node bars indicate 95% credibility intervals of node ages. Colours of branches and species names represent different cockroach families, as shown at the base of the figure.

123

Figure S2.2: Timescale of Blattabacterium evolution. The tree was inferred using Bayesian analysis in BEAST, on the basis of 26 protein-coding genes with 3rd codon sites excluded.

Node bars indicate 95% credibility intervals of node ages. Colours of branches and species names represent different cockroach families, as shown at the base of the figure.

124

Figure S2.3: Timescale of Blattabacterium evolution. The tree was inferred using Bayesian analysis in BEAST, on the basis of 26 protein-coding genes with 3rd codon sites excluded.

Node bars indicate 95% credibility intervals of node ages. Colours of branches and species names represent different cockroach families, as shown at the base of the figure.

125

Appendix 2: Supplementary material for Chapter 3

Sampling and Blattabacterium genomic data

A list of samples and collection data for each cockroach and outgroup examined is provided

in Table S3.1. All specimens examined in this study are stored at the Okinawa Institute of

Science and Technology, Japan. For the majority of taxa examined in this study, we obtained

Blattabacterium sequence data from genomic libraries originally used in a previous study of

cockroach mitochondrial genomes carried out by our laboratories (Bourguignon et al. 2018).

In some cases, new genomic data were obtained from fat bodies of individual cockroaches, as follows. DNA was extracted using a DNeasy Blood and Tissue kit (Qiagen), according to the manufacturer’s protocol. Individual DNA samples were tagged with unique barcode combinations, mixed in equimolar concentration, and 150 bp paired-end-reads-sequenced with an Illumina HiSeq4000, following the methods described previously (Bourguignon et al.

2018).

For each cockroach species, raw sequence data from the previous study (Bourguignon

et al. 2018) or the current study were assembled using CLC, and subject to blastn analysis

using the published Blattabacterium genomes from Blattella germanica (López-Sánchez et

al. 2009), Periplaneta americana (Sabree et al. 2009), and Cryptocercus punctulatus (Neef et

al. 2011) as subject sequences. Contigs identified as being derived from Blattabacterium

during this step were then annotated using Prokka v1.12 (Seemann 2014). Across the 55

strains examined in this study, a total of 104 orthologous genes were used for analysis. These

were found in ≥95% of all taxa. All taxa had over 90% of 104 genes, except for Aeluropoda insignis which only had 83 (79.8%) genes. Missing genes were presumed to be a result of uneven sequencing coverage of samples and the relatively low sequencing coverage used, rather than the actual absence of these genes from their genomes; further work is required to

126

confirm their presence or absence. The genome sequences of outgroups were obtained from

GenBank and included three strains of Sulcia muelleri (accession numbers CP002163,

AP013293, and CP010828), one Flavobacterium gilvum (CP017479), one Lutibacter sp.

(CP017478), one Tenacibaculum dicentrarchi (CP013671), and one Polaribacter sp.

(LT629752).

The 104 orthologous Blattabacterium genes were each aligned at the amino acid level

individually using TranslatorX (Abascal et al. 2010) and concatenated into a 107,187 bp

alignment. The mitochondrial genome data set included all protein-coding genes from each

taxon plus 12S rRNA, 16S rRNA, and the 22 tRNA genes, and were obtained during a

previous study (Bourguignon et al. 2018). All mtDNA protein coding genes were free of stop codons and indels, and could be translated into complete amino acid sequences, indicating that they were not nuclear insertions. Mitochondrial protein-coding genes were aligned using

TranslatorX, while MAFFT (Katoh and Standley 2013) was used to align 12S rRNA, 16S rRNA, and the 22 tRNAs. All mitochondrial sequences were then concatenated into a 14,802 bp alignment. MEGA7 (Kumar et al. 2016) was used to calculate the nucleotide composition of cockroach mtDNA and Blattabacterium data sets. The percentage of A+T of host and symbiont nucleotide datasets is shown in Figure S7. We tested for substitution saturation using Xia’s method implemented in DAMBE 6 (Xia 2017; Xia et al. 2003). Third codon sites in the mitochondrial data set were saturated (NumOTU = 32, ISS = 0.804, ISS.CAsym = 0.809)

and were excluded from our analyses. Although the Blattabacterium sequences were not

significantly saturated at 3rd codon sites (NumOTU = 32, ISS = 0.649, ISS.CAsym = 0.819),

we excluded these sites from our analyses because the test statistic was close to the critical

value. After the exclusion of 3rd codon sites, the total lengths of the final data sets were

11,051 bp and 71,458 for the mitochondrial and Blattabacterium alignments, respectively.

127

Phylogenetic analysis

Maximum-likelihood phylogenetic analyses were carried out in RAxML v8.2 (Stamatakis

2014), using 1000 bootstrap replicates to estimate node support. The cockroach mtDNA data set was partitioned into four subsets: 1st codon sites, 2nd codon sites, rRNAs, and tRNAs.

The Blattabacterium data set was partitioned into two subsets: 1st codon sites and 2nd codon

sites. Using jModelTest (Darriba et al. 2012), we selected the GTR+G substitution model for each subset based on Bayesian information criterion scores. Using ProtTest v3.4 (Darriba et

al. 2011), the translated amino acid data set for Blattabacterium was assigned the

CAT+CpREV model and the translated amino acid data set for cockroach mtDNA was

assigned the CAT+MtART model based on Bayesian information criterion scores.

We used ParaFit in R 3.5.1 (R Core Team 2018) to quantify congruence between host

and symbiont topologies. We first created matrices of patristic distances calculated from

maximum-likelihood host and symbiont phylogenies and a host-symbiont association matrix.

We then performed a global test with 999 permutations, using the ParafitGlobal value and a

p-value threshold of 0.05 to determine significance.

Root-to-tip distances and comparison of phylogenetically independent pairs of host and symbiont branch lengths

Root-to-tip distances from the RAxML analyses for each host and symbiont pair were

calculated and subjected to Pearson correlation analysis using the R packages ape (Paradis

and Schliep 2018) and adephylo (Jombart et al. 2010). The use of root-to-tip distances

removes the confounding effects of time, because all lineages leading to the tips of the tree

have experienced the same amount of time since evolving from their common ancestor.

However, the sharing of internal branches by groups of taxa renders these data non-

independent. Therefore, we compared branch-length differences between hosts and

128

symbionts for 27 phylogenetically independent species pairs across the topology (see Figure

S1). These were calculated using a fixed topology (derived from the Blattabacterium analysis described above) for each of the following three data sets: 1) 1st+2nd codon sites of protein- coding genes; 2) translated amino acid sequences; and 3) 1st+2nd codon positions of protein- coding genes plus the inclusion of rRNAs+tRNAs in the case of the mitochondrial data set.

Branch lengths were log transformed, and differences between pairs of hosts and pairs of symbionts were calculated and compared via Pearson correlation analysis.

To test for potential biases in the data that violate the assumptions of linear regressions, we compared the absolute mean value of log-transformed branch lengths with the log-transformed branch-length differences (Freckleton 2000). We found no significant correlation between these values (R = 0.07, p = 0.63 for data from host cockroaches; R =

0.06, p = 0.66 for data from Blattabacterium), indicating that the data were suitable for use in our analyses. We also performed analyses in which branch-length differences were standardised following previous recommendations (Welch and Waxman 2008), to account for

the potential confounding effects of the different amounts of time that sister pairs have had to

diverge. Three standardisations were carried out, each based on dividing log-transformed

branch-length differences by the square root of an estimate of time since divergence for the

pair. In the first, time since divergence for host pairs was estimated as the average branch

length of the host pair, divided by an assumed rate of 0.001 subs/site/million years, while for

corresponding symbionts it was estimated as the average branch length of the symbiont pair,

divided by the same assumed rate. In the second and third standardisations, times since

divergence for both symbionts and hosts were based either on average branch lengths of host

pairs only or symbiont pairs only.

129

Table S3.1: A list of samples and collection data for each cockroach examined. Collecting locality indicate samples that were sequenced for this study (bold) or obtained through GenBank. Blattabacterium gene accession numbers can be found in Table S3.2

Species Family Sample ID Collecting locality Collector Date Aeluropoda Blaberidae B002 Breeding colony of Kyle N/A N/A insignis Kandilian Allacta Ectobiidae AUS Allacta James Cook University, David Rentz 22-Jun- australiensis Rainforest site, Queensland, 2015 Australia Amazonina sp. Ectobiidae Z256E Ecuador, Bosque Protector del Frantisek Apr- Alto Nangaritza Juna 2016 Anallacta Ectobiidae B057 Breeding colony of Kyle N/A N/A methanoides Kandilian Anaplecta Anaplectid Cockroach Kuranda, Queensland, Australia David Rentz 17-Nov- calosoma ae contig 1688 2015 Anaplecta omei Anaplectid Anaplecta_omei Mt Emei, Sichuan, China Zongqing 01-Jul- ae Wang 2013 Balta sp. Ectobiidae Balta_sp. Cairns, Queensland, Australia David Rentz 18-Dec- 2015 Beybienkoa Ectobiidae Beybienkoa_kur Cairns, Queensland, Australia David Rentz 18-Dec- kurandanensis andanensis 2015 Blaberus Blaberidae BGIGA GenBank N/A N/A giganteus Blaptica dubia Blaberidae B056 Breeding colony of Kyle N/A N/A Kandilian Blatta orientalis Blattidae BOR GenBank N/A N/A Blattella Ectobiidae BGE GenBank N/A N/A germanica Carbrunneria Ectobiidae Carbru Cairns, Queensland, Australia David Rentz 05-Oct- paramaxi 2015 Chorisoserrata Ectobiidae CHORI Yunnan, China Zongqing 01-Jul- sp. Wang 2013 Cosmozosteria sp. Blattidae B117 Cape Upstart, Queensland, James 13-Oct- Australia Walker 2015 Cryptocercus Cryptocerc HIR Mt Taibai, Shaanxi, China N/A N/A hirtus idae Cryptocercus Cryptocerc CPU GenBank N/A N/A punctulatus idae Deropeltis Blattidae B069 Breeding colony of Kyle N/A N/A paulinoi Kandilian Ectobius sp. Ectobiidae Z254C Slovenia Frantisek Apr- Juna 2016 Ectoneura Ectobiidae Ectoneura_hanit James Cook University, David Rentz 18-Dec- hanitschi schi Rainforest site, Queensland, 2015 Australia Epilampra maya Blaberidae B095 Arcadia, Florida, USA Kyle 07-Jul- Kandilian 2009 Eublaberus Blaberidae B025 Breeding colony of Kyle N/A distanti Kandilian Euphyllodromia Ectobiidae Z257 Podocarpus National Park, Frantisek Apr- sp. Ecuador Juna 2016 Eupolyphaga Corydiidae B081 Breeding colony of Kyle N/A N/A sinensis Kandilian Eurycotis Blattidae B071 Breeding colony of Kyle N/A N/A decipiens Kandilian Galiblatta Blaberidae Z98 Nouragues, French Guiana N/A 14-Jun- cribrosa 2015 Gromphadorhina Blaberidae B030 Breeding colony of Kyle N/A N/A grandidieri Kandilian Gyna capucina Blaberidae Z139GY Ebogo, Cameroon Frantisek 08-Sep- Juna 2015

130

Ischnoptera Ectobiidae B083 Torreya State Park, Bristol, Kyle 07-Jul- deropeltiformis Florida, USA Kandilian 2009 Lamproblatta sp. Lamprobla LA male Petit Saut, French Guiana Frantisek 08-Jul- ttidae Juna 2009 Laxta sp. Blaberidae AUS2 Olney State Forest, New South, Nathan Lo 25-Aug- Wales, Australia and Thomas 2015 Bourguignon Macropanesthia Blaberidae B092 Breeding colony of Kyle N/A N/A rhinoceros Kandilian Mastotermes Isoptera MADAR GenBank N/A N/A darwiniensis Megaloblatta sp. Ectobiidae ECMD1 Podocarpus National Park, Frantisek Apr- Ecuador Juna 2016 Melanozosteria Blattidae Melanozosteria Cairns, Queensland, Australia David Rentz 18-Dec- sp. _sp. 2015 Methana sp. Blattidae AUS1 North Manly, New South Wales, Nathan Lo 01-Aug- Australia 2015 Nauphoeta Blaberidae BNCIN GenBank N/A N/A cinerea Neolaxta Blaberidae B107 Paluma Range, Queensland, David Rentz 15-Oct- mackerrasae Australia 2015 Opisthoplatia Blaberidae Z15100 Breeding colony of J. Hromádka N/A N/A orientalis Blaberidae B044 Breeding colony of Kyle N/A N/A Kandilian Panesthia Blaberidae Z138 Breeding colony in Czech N/A N/A angustipennis Republic, orig. Vietnam Panesthia sp. Blaberidae Panesthia_sp Bubeng, Yunnan, China N/A N/A Paranauphoeta Blaberidae PARA N/A N/A N/A circumdata Paratemnopteryx Ectobiidae B061 Breeding colony of Kyle N/A N/A couloniana Kandilian Parcoblatta Ectobiidae B102 Breeding colony of Kyle N/A N/A virginica Kandilian Periplaneta Blattidae BPLAN GenBank N/A N/A americana Phyllodromica Ectobiidae Phil Czech Republic Thomas 01-Aug- sp. Bourguignon 2015 Platyzosteria sp. Blattidae AUS3 Olney State Forest, New South, Nathan Lo 25-Aug- Wales, Australia and Thomas 2015 Bourguignon Protagonista Blattidae Cockroach Mt Diaoluo, Hainan, China Zongqing 25-May- lugubris contig 4907 Wang 2015 Pycnoscelus Blaberidae B048 Breeding colony of Kyle N/A N/A femapterus Kandilian Rhabdoblatta sp. Blaberidae RHA Kuranda, Queensland, Australia David Rentz 16-Sep- 2015 Shelfordella Blattidae B080 Breeding colony of Kyle N/A N/A lateralis Kandilian Therea regularis Corydiidae B091 Palm plantation between Kyle N/A Puducherry and Auroville, India Kandilian Tryonicus parvus Tryonicida Tryonicus_parv Olney State Forest, New South, Nathan Lo 10-Mar- e us Wales, Australia and Thomas 2016 Bourguignon Sulcia muelleri Flavobacte CARI GenBank N/A N/A riaceae Sulcia muelleri Flavobacte PSPU GenBank N/A N/A riaceae Sulcia muelleri Flavobacte CARI GenBank N/A N/A riaceae Flavobacterium Flavobacte EM1308 GenBank N/A N/A gilvum riaceae Lutibacter sp. Flavobacte LPB0138 GenBank N/A N/A riaceae Tenacibaculum Flavobacte AY7486TD GenBank N/A N/A dicentrarchi riaceae

131

Polaribacter sp. Flavobacte LT629752 GenBank N/A N/A riaceae

Table S3.2: A list of GenBank accession numbers and names of all 104 Blattabacterium genes sequenced for this study.

Accession No. Gene name MN038417 - Dihydrolipoyllysine-residue succinyltransferase component of MN038462 2oxoglutarate dehydrogenase complex. MN038463 - Cysteine desulfuration protein MN038510 MN038511 - hypothetical protein MN038558 MN038559 - Putative 1,2-phenylacetyl-CoA epoxidase, subunit D MN038606 MN038607 - UDP-N-acetylglucosamine--N-acetylmuramyl-pentapeptide MN038654 pyrophosphoryl-undecaprenol N-acetylglucosamine transferase MN038655 - 50S ribosomal protein L11 MN038703 MN038704 - 50S ribosomal protein L11 MN038752 MN038753 - Fumarate reductase flavoprotein subunit MN038800 MN038801 - Glutamate dehydrogenase MN038848 MN038849 - Asparagine tRNA ligase MN038895 MN038896 - Polyribonucleotide nucleotidyltransferase MN038942 MN038943 - RNA polymerase sigma factor SigA MN038989 MN038990 - 3-oxoacyl-acyl-carrier-protein synthase 2 MN039036 MN039037 - Acyl carrier protein MN039083 MN039084 - ATP synthase epsilon chain MN039130 MN039131 - 30S ribosomal protein S2 MN039177 MN039178 - 50S ribosomal protein L13 MN039223 MN039224 - 10 kDa chaperonin MN039271 MN039272 - 60 kDa chaperonin MN039318

132

MN039319 - Glutamine tRNA ligase MN039365 MN039366 - DNA-directed RNA polymerase subunit beta' MN039412 MN039413 - Glyceraldehyde-3-phosphate dehydrogenase A MN039459 MN039460 - 50S ribosomal protein L21 MN039506 MN039507 - 50S ribosomal protein L22 MN039553 MN039554 - 30S ribosomal protein S19 MN039600 MN039601 - 50S ribosomal protein L2 MN039647 MN039648 - 50S ribosomal protein L3 MN039695 MN039696 - 50S ribosomal protein L1 MN039742 MN039743 - Cysteine desulfurase SufS MN039790 MN039791 - FeS cluster assembly protein SufB MN039837 MN039838 - Protein translocase subunit SecY MN039884 MN039885 - 50S ribosomal protein L15 MN039931 MN039932 - 30S ribosomal protein S10 MN039979 MN039980 - Elongation factor G MN040027 MN040028 - 30S ribosomal protein S7 MN040075 MN040076 - 30S ribosomal protein S12 MN040123 MN040124 - Methionine aminopeptidase 1 MN040171 MN040172 - 30S ribosomal protein S5 MN040219 MN040220 - Alternate 30S ribosomal protein S14 MN040267 MN040268 - 50S ribosomal protein L14 MN040314 MN040315 - 30S ribosomal protein S17 MN040361 MN040362 - 50S ribosomal protein L16 MN040408 MN040409 - 30S ribosomal protein S3 MN040456

133

MN040457 - Two names: 1)] Acetylornithine deacetylase; 2)] Succinyl- MN040504 diaminopimelate desuccinylase MN040505 - Aspartate semialdehyde dehydrogenase MN040552 MN040553 - 50S ribosomal protein L17 MN040599 MN040600 - DNA-directed RNA polymerase subunit alpha MN040647 MN040648 - 30S ribosomal protein S4 MN040695 MN040696 - 30S ribosomal protein S11 MN040743 MN040744 - 30S ribosomal protein S13 MN040791 MN040792 - Translation initiation factor IF-1 MN040839 MN040840 - N-acetylornithine carbamoyltransferase MN040887 MN040888 - Carbamoyl-phosphate synthate large chain MN040935 MN040936 - Carbamoyl-phosphate synthate small chain MN040983 MN040984 - Acetylornithine aminotransferase MN041031 MN041032 - N-acetyl-gamma-glutanyl-phosphate reductase MN041079 MN041080 - Argininosuccinate synthase MN041127 MN041128 - 30S ribosomal protein S1 MN041175 MN041176 - Enoyl-acyl-carrier-protein reductase NADH FabI MN041223 MN041224 - S-adenosylmethionine synthase MN041271 MN041272 - Phospho-2-dehydro-3-deoxyheptonate aldolase MN041319 MN041320 - 50S ribosomal protein L9 MN041366 MN041367 - 30S ribosomal protein S6 MN041412 MN041413 - tRNA modification GTPase MnmE MN041460 MN041461 - Lon protease 2 MN041508 MN041509 - Histidine--tRNA ligase MN041556 MN041557 - Phenylalanine--tRNA ligase alpha subunit MN041603

134

MN041604 - DNA gyrase subunit B MN041650 MN041651 - 30S ribosomal protein S16 MN041698 MN041699 - Aspartate aminotransferase MN041746 MN041747 - Lysine--tRNA ligase MN041794 MN041795 - Octanoyltransferase MN041840 MN041841 - Methionine--tRNA ligase MN041887 MN041888 - Histidinol dehydrogenase MN041935 MN041936 - hypothetical protein MN041981 MN041982 - Phosphate acetyltransferase MN042029 MN042030 - 1-deoxy-D-xylulose-5-phosphate synthase MN042076 MN042077 - Transketolase 2 MN042124 MN042125 - SsrA-binding protein MN042170 MN042171 - Lipoyl synthase MN042216 MN042217 - Multifunctional CCA protein MN042264 MN042265 - 2,3,4,5-tetrahydropyridine-2,6-dicarboxylate N-succinyltransferase MN042311 MN042312 - putative branched-chain-amino-acid aminotransferase MN042358 MN042359 - 2-oxoisovalerate dehydrogenase subunit beta MN042405 MN042406 - Ribosomal RNA small subunit methyltransferase A MN042452 MN042453 - Putative aminopeptase YsdC MN042499 MN042500 - Diaminopimelate epimerase MN042546 MN042547 - ATP synthase subunit c MN042594 MN042595 - ATP synthase subunit beta MN042641 MN042642 - Chaperone protein DnaJ MN042688 MN042689 - hypothetical protein MN042735

135

MN042736 - Ribose-phosphate pyrophosphokinase MN042782 MN042783 - Imidazole glycerol phosphate synthase subunit HisF MN042830 MN042831 - Imidazole glycerol phosphate synthase subunit HisH MN042878 MN042879 - Ribosome recycling factor MN042926 MN042927 - ATP synthase subunit A MN042973 MN042974 - Bifunctional aspartokinase MN043021 MN043022 - hypothetical protein MN043069 MN043070 - 3-dehydroquinate synthase MN043116 MN043117 - DNA gyrase subunit A MN043164 MN043165 - Fructose-bisphosphate aldolase MN043211 MN043212 - hypothetical protein MN043259 MN075834- Elongation Factor MN075880 MN075881- tRNA-2-methylthio-N6-dimethylallyladenosine synthase MN075928 CP003535- Blaberus giganteus CP003536 CP003605- Blatta orientalis CP003606 CP001487 Blattella germanica CP003015- Cryptocercus punctulatus CP003016 CP003000, Mastotermes darwiniensis CP003095 CP005488- Nauphoeta cinerea CP005489 CP001429- Periplaneta americana CP001430 CP002163 Sulcia muelleri AP013293 Sulcia muelleri CP002165 Sulcia muelleri CP017479 Flavobacterium gilvum CP017478 Lutibacter sp. CP013671 Tenacibaculum dicentrachi KT25b Polaribacter sp.

136

Figure S3.1: Phylogenetic trees inferred from (a) cockroach mtDNA data (protein-coding genes plus rRNAs and tRNAs) and (b) their Blattabacterium symbiont data, inferred using maximum likelihood in RAxML. A fixed topology (obtained from the Blattabacterium tree shown in Figure 1) was used in each analysis. Twenty-seven phylogenetically independent pairs of lineages used to test for correlations of evolutionary rates are shown in red. Species names are coloured according to the family to which they belong, as shown on the left of the figure 137

Figure S3.2: Host cockroach phylogenetic tree inferred using maximum likelihood, based on amino acid sequences translated from mitochondrial protein-coding genes. Support values of 100% are indicated by asterisks

138

Figure S3.3: Blattabacterium phylogenetic tree inferred using maximum likelihood, based on amino acid sequences translated from protein-coding genes. Support values of 100% are indicated by asterisks.

139

Figure S3.4: Comparison of evolutionary rates of Blattabacterium symbionts and their host cockroaches, based on protein-coding genes from host and symbiont, plus rRNAs+tRNAs from host mitochondria. (a) Correlation of root-to-tip distances in phylogenies of Blattabacterium and cockroaches. (b–d) Standardised tests for correlation of molecular evolutionary rates between 27 independent pairs of Blattabacterium and host cockroach mitochondria. Three standardisations were carried out, each based on dividing log- transformed branch-length differences by the square root of an estimate of time since divergence for the pair. In the first (b), time since divergence for host pairs was estimated as the average branch length of the host pair, divided by an assumed rate of 0.001.

140

Figure S3.5: Tests for correlation of molecular evolutionary rates between 27 independent pairs of Blattabacterium and host cockroach mitochondria, based on amino acid data translated from protein-coding genes. (a) Test based on comparison of log-transformed branch-length differences. Three standardisations were also carried out, each based on dividing log-transformed branch-length differences by the square root of an estimate of time since divergence for the pair. In the first standardisation (b), time since divergence for host pairs was estimated as the average branch length of the host pair, divided by an assumed rate of 0.001 subs/site/million years, while for corresponding symbionts it was estimated as the average branch length of the symbiont pair, divided by the same assumed rate. In the second (c) and third (d) standardisations, times since divergence for both symbionts and hosts were based either on average branch lengths of host pairs only or symbiont pairs only.

141

Figure S3.6: Standardised tests for correlation of molecular evolutionary rates between 27 independent pairs of Blattabacterium and host cockroach mitochondria, based on protein- coding genes. Three standardisations were carried out, each based on dividing log- transformed branch-length differences by the square root of an estimate of time since divergence for the pair. In the first standardisation (a), time since divergence for host pairs was estimated as the average branch length of the host pair, divided by an assumed rate of 0.001 subs/site/million years, while for corresponding symbionts it was estimated as the average branch length of the symbiont pair, divided by the same assumed rate. In the second (b) and third (c) standardisations, times since divergence for both symbionts and hosts were based either on average branch lengths of host pairs only or symbiont pairs only. subs/site/million years, while for corresponding symbionts it was estimated as the average branch length of the symbiont pair, divided by the same assumed rate. In the second (c) and third (d) standardisations, times since divergence for both symbionts and hosts were based either on average branch lengths of host pairs only or symbiont pairs only.

142

Figure S3.7: AT content (%) of Blattabacterium and mtDNA sequences for each taxon, including all codon positions.

143

Appendix 3: Supplementary material for Chapter 4

Table S4.1. GenBank accession numbers for symbiont and host mtDNA genomes examined in this study. In some cases, mtDNA genomes were assembled using short read data (bold Accession No.). ‘Buchnera’ = Buchnera aphidicola; ‘Sulcia’ = Sulcia muelleri.

Symbiont Host species name Symbiont Accession No. Host mtDNA Accession No. Buchnera Cinara tujafilina NC_015662 KP722583 Buchnera Anoecia oenotherae CP033012 SRR7796603 Buchnera Baizongia pistaciae NC_004545 NC_035314 Buchnera Schlechtendalia chinensis NZ_CP011299 NC_032386 Buchnera Melaphis rhois CP033004 NC_036065 Buchnera Thelaxes californica CP034852 SRR7796612 Buchnera Therioaphis trifolii CP032996 SRR7796611 Buchnera Stegophylla sp. CP032998 SRR7796613 Buchnera Tuberolachnus salignus NZ_LN890285 KP722566 Buchnera Muscaphis stroyani CP034861 SRR7796591 Buchnera Aphis craccivora CP034897 KX447141 Buchnera Aphis glycines NZ_CP009253 KT889380 Buchnera Aphis nasturtii CP034888 SRR7796605 Buchnera Aphis helianthi CP034894 SRR7796607 Buchnera Rhopalosiphum maidis CP032759 MK3687781 Buchnera Rhopalosiphum padi CP034858 KT447631 Buchnera Schizaphis graminum NC_004061 NC_006158 Buchnera Hyperomyzus lactucae CP034876 MK2510631 Buchnera Artemisaphis artemisicola CP034900 SRR7796609 Buchnera Acyrthosiphon pisum NC_002528 NC_011594 Buchnera Acyrthosiphon lactucae CP034891 SRR7796608 Buchnera Macrosiphum gaurae CP034867 SRR7796596 Buchnera Macrosiphum euphorbiae CP033006 SRR7796595 Buchnera Sitobion avenae CP034855 NC_024683 Buchnera Macrosiphoniella sanborni CP034864 SRR7796594 Buchnera Myzus persicae NZ_CP002701 NC_029727 Buchnera Brachycaudus cardui CP034879 SRR7796601 Buchnera Hyadaphis tataricae CP034873 SRR7796597 Buchnera Diuraphis noxia NZ_CP013259 NC_022727 Buchnera Lipaphis pseudobrassicae CP034870 SRR7796598 Buchnera Brevicoryne brassicae CP034882 SRR7796604 Sulcia Philaenus spumarius MPAX000000000 AY630340 Sulcia Nephotettix cincticeps CP016223 KP749836 Sulcia Macrosteles CP006060 NC_034781 quadrilineatus

144

Sulcia Entylia carinata CP021172 KX495488 Sulcia Homalodisca vitripennis CP000770 AY875213 Sulcia Tettigades undata CP007234 KJ193728 Sulcia Kosemia yezoensis CP029015 MG737723 Sulcia Muda kuroiwae CP029017 MG737729 Sulcia Magicicada tredecim CP010828 MG737744 Sulcia Vagitanus terminalis CP029022 MG737734 Sulcia Mogannia minuta CP029016 MG737728 Sulcia Diceroprocta semicincta CP001605 KM000131 Sulcia Platypleura kaempferi CP029019 MG737730 Sulcia Meimuna opalifera CP029027 MG737726 Sulcia Meimuna kuroiwae CP029026 MG737725 Sulcia Meimuna iwasakii CP029025 MG737724 Sulcia Meimuna oshimensis CP029028 MG737727 Sulcia Hyalessa maculaticollis CP029014 MG737722 Sulcia Graptopsaltria bimaculata CP029012 MG737720 Sulcia Graptopsaltria CP029013 MG737721 nigrofuscata Sulcia Yezoterpnosia nigricosta CP029023 MG737732 Sulcia Yezoterpnosia vacua CP029024 MG737733 Sulcia Euterpnosia chibensis CP029011 MG737719 Sulcia Tanna japonensis CP029018 MG737731 Sulcia Auritibicen bihamatus CP029020 MG737715 Sulcia Auritibicen japonicus CP029021 MG737716 Sulcia Cryptotympana facialis CP029010 MG737718 Sulcia Cryptotympana atrata CP029009 MG737717

145

Figure S4.1: A fixed tree topology (obtained from the Buchnera tree shown in Figure 4.1) used in all analyses of root-to-tip distances and branch-length comparisons. Fifteen phylogenetically independent pairs of lineages used to test for correlations of evolutionary rates are shown in red. Species names are coloured according to the family or tribe to which they belong, as shown on the left of the figure

146

Figure S4.2: A fixed tree topology (obtained from the Sulcia tree shown in Figure 4.3) was used in all analyses of root-to-tip distances and branch-length comparisons. Fourteen phylogenetically independent pairs of lineages used to test for correlations of evolutionary rates are shown in red. Species names are coloured according to the family to which they belong, as shown on the left of the figure.

147

Figure S4.3: Congruence between (a) phylogenetic tree of host aphids inferred using maximum likelihood (RAxML) from mitochondrial protein-coding genes, and (b) phylogenetic tree of Buchnera inferred using maximum likelihood from 240 protein-coding genes (3rd codon sites excluded from both data sets). Shaded circles at nodes indicate bootstrap values (black = 100%, grey = 85–99%). Nodes without black or grey circles have bootstrap values <85%. Red outlines on circles indicate disagreement between the phylogenies. Colours of branches indicate different aphid families

148

Figure S4.4: Congruence between phylogenetic trees inferred using maximum likelihood (RAxML) from translated amino acid data sets of (a) mitochondrial protein-coding genes from host aphids, and (b) 240 protein-coding genes from Buchnera. Shaded circles at nodes indicate bootstrap values (black = 100%, grey = 85–99%). Nodes without black or grey circles have bootstrap values <85%. Red outlines on circles indicate disagreement between the phylogenies. Colours of branches indicate different aphid families

149

Figure S4.5: Standardised tests for correlation of evolutionary rates between 15 independent pairs of Buchnera and host aphid mitochondria, based on 240 symbiont genes and whole host mitochondrial genomes, with 3rd codon sites excluded from both data sets. Three standardisations were carried out, each based on dividing log-transformed branch-length differences by the square root of an estimate of time since divergence for the pair. In the first standardisation (a), time since divergence for host pairs was estimated as the average branch length of the host pair, divided by an assumed rate of 0.001 subs/site/Myr, while for corresponding symbionts it was estimated as the average branch length of the symbiont pair, divided by the same assumed rate. In the second (b) and third (c) standardisations, times since divergence for both symbionts and hosts were based either on average branch lengths of host pairs only or symbiont pairs only

150

Figure S4.6 Comparison of evolutionary rates of Buchnera symbionts and their aphid hosts using maximum-likelihood analysis of protein-coding genes only from each data set, with 3rd codon sites excluded. (a) Correlation of root-to-tip distances in phylogenies of Buchnera and aphids. (b–d) Standardised tests for correlation of molecular evolutionary rates between 15 independent pairs of Buchnera and host aphid mitochondria. Three standardisations were carried out, each based on dividing log-transformed branch-length differences by the square root of an estimate of time since divergence for the pair. In the first (b), time since divergence for host pairs was estimated as the average branch length of the host pair, divided by an assumed rate of 0.001 subs/site/million years, while for corresponding symbionts it was estimated as the average branch length of the symbiont pair, divided by the same assumed rate. In the second (c) and third (d) standardisations, times since divergence for both symbionts and hosts were based either on average branch lengths of host pairs only or symbiont pairs only

151

Figure S4.7 Comparison of evolutionary rates of Buchnera symbionts and their aphid hosts using maximum-likelihood analysis of amino acid sequences translated from protein-coding genes. (a) Correlation of root-to-tip distances in phylogenies of Buchnera and aphids. (b–d) Standardised tests for correlation of molecular evolutionary rates between 15 independent pairs of Buchnera and host aphid mitochondria. Three standardisations were carried out, each based on dividing log-transformed branch-length differences by the square root of an estimate of time since divergence for the pair. In the first standardisation (b), time since divergence for host pairs was estimated as the average branch length of the host pair, divided by an assumed rate of 0.001 subs/site/million years, while for corresponding symbionts it was estimated as the average branch length of the symbiont pair, divided by the same assumed rate. In the second (c) and third (d) standardisations, times since divergence for both symbionts and hosts were based either on average branch lengths of host pairs only or symbiont pairs only

152

Figure S4.8: Congruence between (a) phylogenetic tree of Sulcia hosts inferred using maximum likelihood (RAxML) from mitochondrial protein-coding genes, and (b) phylogenetic tree of Sulcia inferred using maximum likelihood from 120 protein-coding genes (3rd codon sites excluded from both data sets). Shaded circles at nodes indicate bootstrap values (black = 100%, grey = 85–99%). Nodes without black or grey circles have bootstrap values <85%. Red outlines on circles indicate disagreement between the phylogenies. Colours of branches indicate different Sulcia host families. Dash dotted branches are not in scale and indicate long branches that were removed to allow readability. Solid lines are in scale according to legend underneath

153

Figure S4.9: Congruence between phylogenetic trees inferred using maximum likelihood (RAxML) from translated amino acid sequences of (a) mitochondrial protein-coding genes from Sulcia hosts, and (b) 120 protein-coding genes from Sulcia. Shaded circles at nodes indicate bootstrap values (black = 100%, grey = 85–99%). Nodes without black or grey circles have bootstrap values <85%. Red outlines on circles indicate disagreement between the phylogenies. Colours of branches indicate different Sulcia host families. The dashed internal branch is not in scale and indicates a long branches that was removed to allow the two trees to be easily included in a single diagram. Solid lines are in scale according to legend underneath

154

Figure S4.10: Standardised tests for correlation of evolutionary rates between 14 independent pairs of Sulcia and host mitochondria using maximum-likelihood analysis of Sulcia symbiont genes and mitochondrial protein-coding host genes plus tRNAs, with 3rd codon sites excluded from host data sets. Three standardisations were carried out, each based on dividing log-transformed branch-length differences by the square root of an estimate of time since divergence for the pair. In the first standardisation (a), time since divergence for host pairs was estimated as the average branch length of the host pair, divided by an assumed rate of 0.001 subs/site/million years, while for corresponding symbionts it was estimated as the average branch length of the symbiont pair, divided by the same assumed rate. In the second (b) and third (c) standardisations, times since divergence for both symbionts and hosts were based either on average branch lengths of host pairs only or symbiont pairs only

155

Figure S4.11: Comparison of evolutionary rates of Sulcia symbionts and their hosts using maximum-likelihood analysis of protein-coding genes only, with 3rd codon sites excluded from host data set. (a) Correlation of root-to-tip distances in phylogenies of Sulcia and their hosts. (b–d) Standardised tests for correlation of evolutionary rates between 14 independent pairs of Sulcia and host mitochondria. Three standardisations were carried out, each based on dividing log-transformed branch-length differences by the square root of an estimate of time since divergence for the pair. In the first (b), time since divergence for host pairs was estimated as the average branch length of the host pair, divided by an assumed rate of 0.001 subs/site/million years, while for corresponding symbionts it was estimated as the average branch length of the symbiont pair, divided by the same assumed rate. In the second (c) and third (d) standardisations, times since divergence for both symbionts and hosts were based either on average branch lengths of host pairs only or symbiont pairs only

156

Figure S4.12: Comparison of evolutionary rates of Sulcia symbionts and their hosts using maximum-likelihood analysis of amino acid sequences translated from protein-coding genes. (a) Correlation of root-to-tip distances in phylogenies of Sulcia and host. (b–d) Standardised tests for correlation of molecular evolutionary rates between 14 independent pairs of Sulcia and host mitochondria. Three standardisations were carried out, each based on dividing log- transformed branch-length differences by the square root of an estimate of time since divergence for the pair. In the first standardisation (b), time since divergence for host pairs was estimated as the average branch length of the host pair, divided by an assumed rate of 0.001 subs/site/million years, while for corresponding symbionts it was estimated as the average branch length of the symbiont pair, divided by the same assumed rate. In the second (c) and third (d) standardisations, times since divergence for both symbionts and hosts were based either on average branch lengths of host pairs only or symbiont pairs only

157

Appendix 4: Publications

158

Received: 21 October 2018 | Revised: 1 February 2019 | Accepted: 11 February 2019 DOI: 10.1002/ece3.5015

ORIGINAL RESEARCH

EasyCodeML: A visual tool for analysis of selection using CodeML

Fangluan Gao1,2* | Chengjie Chen3* | Daej A. Arab2 | Zhenguo Du1 | Yehua He3 | Simon Y. W. Ho2

1Fujian Key Laboratory of Plant Virology, Institute of Plant Virology, Fujian Agriculture Abstract and Forestry University, Fuzhou, China The genomic signatures of positive selection and evolutionary constraints can be 2 School of Life and Environmental detected by analyses of nucleotide sequences. One of the most widely used pro‐ Sciences, University of Sydney, Sydney, New South Wales, Australia grams for this purpose is CodeML, part of the PAML package. Although a number of 3College of Horticulture, South China bioinformatics tools have been developed to facilitate the use of CodeML, these have Agricultural University, Guangzhou, China various limitations. Here, we present a wrapper tool named EasyCodeML that pro‐ Correspondence vides a user‐friendly graphical interface for using CodeML. EasyCodeML has a cus‐ Fangluan Gao, Fujian Key Laboratory of Plant Virology, Institute of Plant Virology, tom running mode in which parameters can be adjusted to meet different Fujian Agriculture and Forestry University, requirements. It also offers a preset running mode in which an evolutionary analysis Fuzhou, China. Email: [email protected] pipeline and publication‐quality tables can be exported by a single click. EasyCodeML and allows visualized, interactive tree labelling, which greatly simplifies the use of the Chengjie Chen, College of Horticulture, South China Agricultural University, branch, branch‐site, and clade models of selection. The program allows comparison Guangzhou, China. of major codon‐based models for analyses of selection. EasyCodeML is a stand‐alone Email: [email protected] package that is supported in Windows, Mac, and Linux operating systems, and is Funding information freely available at https://github.com/BioEasy/EasyCodeML. National Natural Science Foundation of China, Grant/Award Number: 31772103; the KEYWORDS Training Program of Fujian Excellent Talents in University; a Future Fellowship from the CodeML, codon‐based models, likelihood‐ratio test, molecular evolution, positive selection Australian Research Council, Grant/Award Number: FT160100167

1 | INTRODUCTION One method of testing for selection is to compute ω, the ratio of nonsynonymous to synonymous substitution rates. Under the as‐ Advances in high‐throughput sequencing technologies have led to sumption of neutral evolution, ω is expected to have a value of 1. an unprecedented wealth of genome‐scale data for evolutionary Positive and purifying (negative) selection are indicated when ω > 1 analysis. These data offer valuable opportunities for investigating and ω < 1, respectively (Nei & Gojobori, 1986). Several different the effects of positive selection and constraints on genomic evo‐ models have been implemented in CodeML, varying in terms of their lution. Although a range of bioinformatics tools and resources are assumptions about how ω varies across the sequence (site models) readily available for using codon‐based models of evolution (Pond, or across branches of the phylogeny (branch models; Yang, 2007). Frost, & Muse, 2005; Stern et al., 2007; Valle et al., 2014; Zhang, Site models can be used to identify positively selected sites in a Wang, Long, & Fan, 2013), the CodeML program in the PAML pack‐ multiple sequence alignment (Yang & Nielsen, 2002). They employ age (Yang, 2007) is among the most widely used. different site‐class‐specific models, all of which assume that the ω

*These authors contributed equally to this work.

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2019 The Authors. Ecology and Evolution published by John Wiley & Sons Ltd.

 www.ecolevol.org | 3891 Ecology and Evolution. 2019;9:3891–3898. 3892 | GAO et al. ratio is the same across branches of the phylogeny but different model 2a_rel (M2a_rel) in which ω is fixed among clades (Weadick & among sites in the alignment. These codon substitution models are: Chang, 2012). M0 (one‐ratio), M1a (nearly neutral), M2a (positive selection), M3 If a likelihood‐ratio test yields a significant result for any of (discrete), M7 (beta), M8 (beta and ω > 1) and M8a (beta and ω = 1). the pairwise comparisons of codon models, the Bayes empirical The fit of these models to the sequence data can be compared using Bayes (BEB) method (Yang, Wong, & Nielsen, 2005) can then be likelihood‐ratio tests. Support for positive selection can be identi‐ used to identify amino acid residues that have potentially evolved fied if M2a provides a better fit than M1a, or if M8 provides a better under selection. The standard threshold for identifying amino acid fit than M7 or M8a (Yang, Nielsen, Goldman, & Pedersen, 2000). sites under selection is a posterior probability of 0.95 (Scheffler & The M8–M7 comparison offers a very stringent test of positive se‐ Seoighe, 2005). lection (Anisimova, Bielawski, & Yang, 2001), but the M8–M8a com‐ The use of CodeML is controlled by variables listed in a control parison has seen growing use because it yields fewer false positives file, in which numerical optimization parameters can be modified to (Swanson, Nielsen, & Yang, 2003; Wong, Yang, Goldman, & Nielsen, perform evolutionary analysis using a chosen codon model. The con‐ 2004). trol file can be daunting for new users of CodeML. For this reason, Branch models can be used to test whether there are significant several computer programs have been developed with the purpose differences in ω among branches of the tree (Yang & Nielsen, 1998, of providing a more user‐friendly interface for CodeML (Table 1). 2002). There are three branch models in CodeML, including a free‐ However, these programs have various limitations, such as complex ratio model allowing an independent ω for each branch in the tree, a configuration procedures or a reduced set of codon models. For ex‐ one‐ratio model (M0) assuming that ω has been constant throughout ample, two recently released packages, IDEA (Interactive Display for the tree, and a two‐ratio model assuming that specific branches have Evolutionary Analyses; Egan et al., 2008) and IMPACT_S (Integrated an ω that differs from that throughout the rest of the tree (Yang, Multiprogram Platform to Analyze and Combine Tests of Selection; 1998). Pairwise comparisons of these models can be performed Maldonado, et al., 2014), provide a graphical user interface but only using likelihood‐ratio tests (Anisimova et al., 2001). implement three pairs of site models (M0 vs. M3, M1a vs. M2a and Models with heterogeneous ω across sites and across branches M7 vs. M8). Xu and Yang (2013) developed a graphical user inter‐ can be combined in the form of branch‐site models. These models face for PAML named pamlX, but the complex parameter settings for can be used to identify signals of episodic selection occurring along CodeML still remained challenging for users. Notably, the foreground a specified branch after gene duplication (Yang & Nielsen, 2002; and background branches of the phylogeny must be specified (Yang Zhang, Nielsen, & Yang, 2005). A branch‐site model that allows pos‐ & Nielsen, 2002). None of the available tools allows user‐friendly la‐ itive selection along specified branches (Model A) can be compared belling of branches or nodes in the tree by one click (Table 1). against a null model (Model Anull) that allows neutral evolution and Here, we describe EasyCodeML, a program that provides a negative selection (Zhang et al., 2005). user‐friendly interface for setting up complex analyses of selection Clade models allow differences in site‐specific selective con‐ in CodeML. In addition to a custom mode in which all parameters straints among clades in the tree (Bielawski & Yang, 2004; Forsberg can be adjusted to meet the requirements of the user, EasyCodeML & Christiansen, 2003). The model C (CmC) estimates a separate ω offers a preset mode that allows the construction of a pipeline from ratio for each of two or more clades and is compared against a null input to output (Supporting information Figure S1).

TABLE 1 Comparison of features in EasyCodeML and other tools

Key features IDEA pamlX IMPACT_S LMAPb BlastPhyMec EasyCodeML

Supported codon models

Branch model × ✓ × ✓ ✓ ✓ Branch‐site model × ✓ × ✓ ✓ ✓ a a Site model ✓ ✓ ✓ ✓ ✓ ✓ Clade model × ✓ × ✓ ✓ ✓ LRT automatically performed × × ✓ ✓ ✓ ✓ Visual labelling of tree by one click × × × × × ✓ Customizing control files × ✓ ✓ ✓ × ✓ Exporting preformatted table × × × ✓ ✓ ✓ Multithreading × × × ✓ ✓ ✓ Drag‐and‐drop functionality × ✓ × × × ✓ aOnly a few codon‐based models available. bMaldonado et al, 2016, https://doi.org/10.1186/s12859-016-1204-5. cSchott et al, 2016, http://dx.doi. org/10.1101/059881. GAO et al. | 3893

2 | IMPLEMENTATION can also be conducted using the calculator in the utility menu of EasyCodeML (Figure 3a). We have developed a fully func‐ EasyCodeML provides two different running modes. The first is the tional export module in the preset mode that produces a publica‐ preset mode (Figure 1a), in which all key parameters of the nested tion‐quality table containing the results of the CodeML analysis models are built‐in and which has pipelines for the selection anal‐ (Table 3). yses (Table 2). The nested models include the site models (M0 vs. Numerous file conversions are often required to prepare input M3, M1a vs. M2a, and M7 vs. M8), branch models (M0 vs. two‐ratio data for CodeML. To improve the efficiency and ease of data ex‐ model), branch‐site models (Model Anull vs. Model A), and clade mod‐ change among multiple formats, we have incorporated a file‐for‐ els (M2a_rel vs. CmC). The default settings in the control files for mat convertor into EasyCodeML. Named Seqformat convertor, these pairs of nested models are given in Supporting information this utility can convert CLUSTAL, FASTA, MEGA, NEXUS, and Tables S1–S4. PHYLIP formats into PAML format (Figure 3b). A command‐line The second running mode is the custom mode for experienced version of Seqformat convertor is also provided in EasyCodeML, users (Figure 1b). As with pamlX, the parameters for any codon‐ making it possible to convert sequence formats in batch mode based model can be modified to meet different requirements. (Figure 3c). Notably, a utility named “control file viewer” is integrated in the cus‐ We have developed a “check” module that is available for tom running mode in EasyCodeML. This includes all of the described both of the running modes in EasyCodeML. The user is noti‐ codon‐based models, with preoptimized parameters. fied if there are discrepancies between the taxon labels in the When using the models involving heterogeneous ω among input files (Figure S2a). This helps to satisfy the requirement of branches, it can be a challenging task to label branches or nodes in CodeML that the input sequence data and tree file have matching the phylogenetic tree. Performing this task using a text editor is dif‐ taxon labels. ficult and prone to error. EasyCodeML provides a graphical interface In addition to the main functions outlined above, EasyCodeML that allows the labelling of branches and nodes to be done in a visu‐ supports parallel computation (multithreading), which is espe‐ alized, interactive way (Figure 2). cially helpful when multiple comparisons among codon mod‐ In the preset mode in EasyCodeML, likelihood‐ratio tests els are being performed. EasyCodeML also has drag‐and‐drop between nested models are performed automatically. The re‐ functionality for ease of use. A comparison of the features of sults are displayed on the screen at the completion of a CodeML EasyCodeML and other relevant tools or programs is provided in analysis (Figure 1a). In the custom mode, likelihood‐ratio tests Table 1.

FIGURE 1 Screenshot of the main interface of EasyCodeML under the (a) preset and (b) custom running modes. In the preset mode, all key parameters of the nested models are built‐in and there is a pipeline from data input to the output of results. In the custom mode, the parameters of any codon‐based model can be modified to meet the requirements of the user 3894 | GAO et al.

TABLE 2 Codon‐based models available in EasyCodeML in the preset mode. Here, we select “Clade Model” to test for posi‐ tive selection in the ECP‐EDN gene family (Figure 1a). Running mode Codon‐based Nested models (null After the sequence and tree files have been selected, press models Preset Custom vs. alternative) the “Check” button to check the consistency of the taxon labels Site models between the tree and sequence files. The clade models require a M0 (one‐ratio) ✓ ✓ M3 versus M0 the nodes of the tree to be labelled in order to indicate the clades that will be assigned independent ω parameters, so we press the M1 (nearly ✓ ✓ M1a versus M2a neutral) “Label” button. We then click on the entire EDN clade to be se‐

M2a (positive ✓ ✓ M7 versus M8 lected in the tree as the foreground lineage. The dollar symbol selection) “$”with an integer will be shown above the EDN clade (Figure 2a). M3 (discrete) ✓ ✓ M8a versus M8 In EasyCodeML, the symbols “#”(Figure 2b) and “$”(Figure 2a) are M7 (beta) ✓ ✓ used for the branch or branch‐site models and for the clade model, M8 (beta and ✓ ✓ respectively. ω > 1) We use other default settings for the parameters, including the M8a (beta and ✓ ✓ “Num of Threads” and “Clean data” options. Multithreading will only ω = 1) take effect in the analysis using the site model. If the “Clean data” Branch model option is enabled, all sites with ambiguity characters and alignment One‐ratio ✓ ✓ M0 versus BM gaps will be removed from the sequence alignment prior to analysis. model (M0)

Two‐ratio model ✓ ✓ M0 versus FM (BM) 3.1.2 | Step 2: CodeML analysis

Free‐ratio × ✓ Before starting the CodeML analysis, we need to click on the “Save model (FM) Current Profile” button to enable all parameters for the current Branch‐site models analysis. The button “Run CodeML” then starts the CodeML analy‐

Model Anull ✓ ✓ Model Anull versus sis. At the conclusion of the analysis, the log‐likelihood (lnL) values Model A and the number of parameters (np) will be automatically retrieved. Model A ✓ ✓ A likelihood‐ratio test is performed for the nested models and all Clade models results are automatically organized and displayed on the screen M2a_rel ✓ ✓ M2a_rel versus CmC (Figure 1a). CmC ✓ ✓ a The M0–M3 comparison does not allow detection of positive selection. 3.1.3 | Step 3: Summarizing and interpreting results

A publication‐quality table that contains all of the relevant infor‐ mation from the CodeML analyses can be generated using the 3 | WORKED EXAMPLE “Export” button. Microsoft Excel can be launched to view the saved results file by clicking on “View”. A clear rejection of the null 3.1 | Preset running mode in EasyCodeML model indicates that divergent selection was detected between the To demonstrate the use of the clade models in the preset running foreground (the entire EDN clade) and background branches (the mode in EasyCodeML, we present an analysis of the ECP‐EDN gene entire ECP clade). Note that the selection analysis presented here family in primates. The analyses are based on data from a study by is merely instructional. If there are suboptimal peaks in the likeli‐ Bielawski and Yang (2003), which investigated the role of positive hood surface, we can load and edit the control file in the custom selection in the evolution of this gene family. running mode in EasyCodeML, and then run the program several times to find the globally optimal likelihood score using different initial values of ω. 3.1.1 | Step 1: Loading data and configuring parameters 3.2 | Custom running mode in EasyCodeML EasyCodeML has two different running modes, preset and custom. In this case, we choose the preset mode (Figure 1a). We either drag‐ We briefly illustrate the use of the custom running mode in and‐drop a folder into EasyCodeML or click on the button “…”to se‐ EasyCodeML by analysing a data set from Padhi, Verghese, and Otta lect a local folder as the working directory. The required inputs for (2009). We compare the M8 and M8a models to test for sites under analysing selection are the aligned sequences in PAML format and positive selection in the outer membrane protein C (ompC) of strains a tree file in Newick format. We can also drag‐and‐drop these two of Enterobacter aerogenes, although this particular model comparison files into the text box. Four different model approaches are available is also available in the preset running mode of EasyCodeML. GAO et al. | 3895

FIGURE 2 Labelling branches in a tree for the branch‐related models can be done in a simple and intuitive way for the (a) clade models and (b) branch and branch‐site models

checking whether the taxon labels match between the tree and se‐ 3.2.1 | Step 1: Loading data and quence files. configuring parameters

We switch current running mode to the custom mode and spec‐ 3.2.2 | Step 2: CodeML analysis ify a local folder as the working directory using drag‐and‐drop, as described above for the preset mode. The “Load” button can be Clicking “Run CodeML” will start the analysis. In order to perform the used to load a codon model available from a control file viewer subsequent likelihood‐ratio test, we will need to run both models. (Supporting information Figure S2b). This will bring up a dialogue Therefore, we need to repeat the procedure for the M8 model. box from which we choose the M8a model. We can further modify We navigate to the working directory and locate the main result the various parameter values to meet different requirements. Tree files (mlc) of the model M8 and M8a. After noting the log‐likelihood labelling is necessary when examining the branch‐related models (lnL) values and the number of parameters (np) in these mlc files, we (branch models, branch‐site models, and clade models), but not with enter them in the LRT calculator from the “Tools” menu and run a like‐ the site models. Therefore, default values are used for all param‐ lihood‐ratio test. Based on the lnL and np values of the null model (M8, eters except for leaving “Clean data” unchecked (Figure 1b). We lnL = −1878.7, np = 14) and the alternative model (M8a, lnL = −1,892.5, need to save the current profile using “Save Current Profile” after np = 13), the test yields a p‐value below 0.05 (Figure 3a). 3896 | GAO et al.

FIGURE 3 Two utilities available in EasyCodeML: (a) the LRT calculator, and Seqformat convertor in (b) a user‐friendly GUI or (c) command line. Seqformat convertor can convert between diverse types of sequence data formats GAO et al. | 3897 , **

, 3.2.3 | Step 3: Identifying sites under selection ** In the comparison of models M8 and M8a, the BEB analysis under model M8 is used to identify codons under positive selection. Thus, ** , 230 F 0.996

* we find a block called “Bayes Empirical Bayes (BEB) analysis” in the , 238 R 0.999 * mlc file (Supporting information Figure S3). This block lists the amino acids that have a BEB score higher than 0.5. Sites potentially under positive selection are suggested by BEB values higher than 0.95, , 228 S 0.972 *

, 237 G 0.973 , 237 which are indicated by asterisks. In this data set, we identified nine ** codons as being under positive selection with posterior probability

, 232 S 0.567, 233 K 0.881,, 232 S 0.567, 235 D >0.95, matching the results of Padhi et al. (2009). ** , 236 G 0.996 * 4 | CONCLUSIONS N 0.877, 354239 Y 0.924, N 0.877, 279 K 0.997 V 0.993231 0.953 [] Not allowed Not allowed [] Positively selected sites 14 L 0.603,14 133 G 0.961 Not allowed

We have developed EasyCodeML, an interactive visual tool for anal‐ yses of selection that incorporates the major codon‐based models ‐value in CodeML. EasyCodeML includes a feature that allows ­interactive 4.68E−07 LRT p 0.00E+00 9.00E−09 labelling of the tree in branch‐ or clade‐specific analyses.­ We hope that the program proves to be a useful tool for studies of molecular evolution, by broadening the user base of CodeML and improving its usability. EasyCodeML is an ongoing project and we welcome bug reports, feedback, and suggestions. M1a versusM1a M2a Models compared Models M0 versus M3 M7 versusM7 M8 ACKNOWLEDGMENTS

F. G. was funded by the Natural Science Foundation of China 0.02642 0.00473 87.60345

25.64346 (Grant No. 31772103) and the Training Program of Fujian Excellent Talents in University. S.Y.W.H. was funded by a Future Fellowship (FT160100167) from the Australian Research Council. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We thank Mr Zhenxi Chen 0.16652 (Tropical Crops Genetic Resources Institute, Chinese Academy of 1.00000 1.00000 0.17102 0.15483 q = 8.33208 0.04809 Tropical Agricultural Sciences), Dr Han Li (Southwest University), Dr Lin Zhang (Nanjing Normal University), and Dr Qing Chen (Sichuan Agricultural University) for constructive feedback on EasyCodeML. 19.65767 0.03134 0.00637 We also thank Prof. Ziheng Yang (University College London) for 0.00000 0.82898 0.81874 ω = 0.30409 p = 0.00000 q = 0.07365 0.94718 writing the program CODEML, on which our work is based.

CONFLICT OF INTEREST 0.03320) 0.96680 0.00500 = : = 1 0 : : : 0 p None declared. ω p: p: p = Estimates of parameters ( ω p ω ω p:

AUTHOR CONTRIBUTIONS

F. Gao and C. Chen conceived the idea, developed the program, and led the writing of the manuscript. D. A. Arab, Z. Du, Y. He, and S.Y.W. −1,897.250798 −1,877.941799 −1,892.515966 −1,915.094842 Ln L −1,878.735192 −1,876.512700 Ho contributed to the manuscript. 12 14 12 np 14 11 15 DATA ACCESSIBILITY

. [], no data available. The stand‐alone package and user manual of EasyCodeML are M1a M2a M7 Site model Model M8 M0 M3 TABLE 3 ExampleTABLE of a publication‐quality table created by the export module in EasyCodeML, based on a comparison of site models for the ECP‐EDN gene family from primates Note hosted on GitHub: https://github.com/BioEasy/EasyCodeML. 3898 | GAO et al.

ORCID strategies for fast detection of positive selection on phylogenetic trees. Bioinformatics, 30, 1129–1137. https://doi.org/10.1093/ Fangluan Gao https://orcid.org/0000-0001-9031-9944 bioinformatics/btt760 Simon Y. W. Ho https://orcid.org/0000-0002-0361-2307 Weadick, C. J., & Chang, B. S. (2012). An improved likelihood ratio test for detecting site‐specific functional divergence among clades of pro‐ tein‐coding genes. Molecular Biology and Evolution, 29, 1297–1300. https://doi.org/10.1093/molbev/msr311 REFERENCES Wong, W. S., Yang, Z., Goldman, N., & Nielsen, R. (2004). Accuracy and power of statistical methods for detecting adaptive evolu‐ Anisimova, M., Bielawski, J. P., & Yang, Z. (2001). Accuracy and power tion in protein coding sequences and for identifying positively se‐ of the likelihood ratio test in detecting adaptive molecular evolu‐ lected sites. Genetics, 168, 1041–1051. https://doi.org/10.1534/ tion. Molecular Biology and Evolution, 18, 1585–1592. https://doi. genetics.104.031153 org/10.1093/oxfordjournals.molbev.a003945 Xu, B., & Yang, Z. (2013). pamlX: A graphical user interface for PAML. Bielawski, J. P., & Yang, Z. (2003). Maximum likelihood methods for de‐ Molecular Biology and Evolution, 30, 2723–2724. https://doi. tecting adaptive evolution after gene duplication. Journal of Structural org/10.1093/molbev/mst179 and Functional Genomics, 3, 201–212. Yang, Z. (1998). Likelihood ratio tests for detecting positive selection Bielawski, J. P., & Yang, Z. (2004). A maximum likelihood method for and application to primate lysozyme evolution. Molecular Biology and detecting functional divergence at individual codon sites, with ap‐ Evolution, 15, 568–573. https://doi.org/10.1093/oxfordjournals.mol‐ plication to gene family evolution. Journal of Molecular Evolution, 59, bev.a025957 121–132. https://doi.org/10.1007/s00239-004-2597-8 Yang, Z. (2007). PAML 4: Phylogenetic analysis by maximum likeli‐ Egan, A., Mahurkar, A., Crabtree, J., Badger, J. H., Carlton, J. M., & Silva, J. hood. Molecular Biology and Evolution, 24, 1586–1591. https://doi. C. (2008). IDEA: Interactive Display for Evolutionary Analyses. BMC org/10.1093/molbev/msm088 Bioinformatics, 9, 524. https://doi.org/10.1186/1471-2105-9-524 Yang, Z., & Nielsen, R. (1998). Synonymous and nonsynonymous rate Forsberg, R., & Christiansen, F. B. (2003). A codon‐based model of host‐ variation in nuclear genes of mammals. Journal of Molecular Evolution, specific selection in parasites, with an application to the influenza 46, 409–418. https://doi.org/10.1007/PL00006320 A virus. Molecular Biology and Evolution, 20, 1252–1259. https://doi. Yang, Z., & Nielsen, R. (2002). Codon‐substitution models for de‐ org/10.1093/molbev/msg149 tecting molecular adaptation at individual sites along specific lin‐ Maldonado, E., Sunagar, K., Almeida, D., Vasconcelos, V., & Antunes, eages. Molecular Biology and Evolution, 19, 908–917. https://doi. A. (2014). IMPACT_S: Integrated multiprogram platform to analyze org/10.1093/oxfordjournals.molbev.a004148 and combine tests of selection. PLoS ONE, 9, e96243. https://doi. Yang, Z., Nielsen, R., Goldman, N., & Pedersen, A. M. (2000). Codon‐sub‐ org/10.1371/journal.pone.0096243 stitution models for heterogeneous selection pressure at amino acid Maldonado, E.Almeida, D., Escalona, T., Khan, I., Vasconcelos, V., & Antunes, sites. Genetics, 155, 431–449. A. (2016). LMAP: Lightweight multigene analyses in PAML. BMC Yang, Z., Wong, W. S., & Nielsen, R. (2005). Bayes empirical Bayes infer‐ Bioinformatics, 17, 354. https://doi.org/10.1186/s12859-016-1204-5 ence of amino acid sites under positive selection. Molecular Biology Nei, M., & Gojobori, T. (1986). Simple methods for estimating the num‐ and Evolution, 22, 1107–1118. https://doi.org/10.1093/molbev/ bers of synonymous and nonsynonymous nucleotide substitutions. msi097 Molecular Biology and Evolution, 3, 418–426. Zhang, C., Wang, J., Long, M., & Fan, C. (2013). gKaKs: The pipeline Padhi, A., Verghese, B., & Otta, S. K. (2009). Detecting the form of se‐ for genome‐level Ka/Ks calculation. Bioinformatics, 29, 645–646. lection in the outer membrane protein C of Enterobacter aerogenes https://doi.org/10.1093/bioinformatics/btt009 strains and Salmonella species. Microbiological Research, 164, 282– Zhang, J., Nielsen, R., & Yang, Z. (2005). Evaluation of an improved 289. https://doi.org/10.1016/j.micres.2006.12.002 branch‐site likelihood method for detecting positive selection at Pond, S. L. K., Frost, S. D. W., & Muse, S. V. (2005). HyPhy: Hypothesis the molecular level. Molecular Biology and Evolution, 22, 2472–2479. testing using phylogenies. Bioinformatics, 21, 676–679. https://doi. https://doi.org/10.1093/molbev/msi237 org/10.1093/bioinformatics/bti079 Scheffler, K., & Seoighe, C. (2005). A Bayesian model comparison ap‐ proach to inferring positive selection. Molecular Biology and Evolution, 22, 2531–2540. SUPPORTING INFORMATION Stern, A., Doron‐Faigenboim, A., Erez, E., Martz, E., Bacharach, E., & Additional supporting information may be found online in the Pupko, T. (2007). Selecton 2007: Advanced models for detecting positive and purifying selection using a Bayesian inference approach. Supporting Information section at the end of the article. Nucleic Acids Research, 35, W506–W511. https://doi.org/10.1093/ nar/gkm382 Swanson, W. J., Nielsen, R., & Yang, Q. (2003). Pervasive adaptive evo‐ How to cite this article: Gao F, Chen C, Arab DA, Du Z, He Y, lution in mammalian fertilization proteins. Molecular Biology and Ho SYW. EasyCodeML: A visual tool for analysis of selection Evolution, 20, 18–20. https://doi.org/10.1093/oxfordjournals.mol‐ using CodeML. Ecol Evol. 2019;9:3891–3898. https://doi. bev.a004233 Valle, M., Schabauer, H., Pacher, C., Stockinger, H., Stamatakis, org/10.1002/ece3.5015 A., Robinson‐Rechavi, M., & Salamin, N. (2014). Optimization Biological Journal of the Linnean Society, 2019, 126, 304–314. With 4 figures.

Evidence for a complex evolutionary history of mound building in the Australian nasute termites (Nasutitermitinae) Downloaded from https://academic.oup.com/biolinnean/article/126/2/304/5251914 by University of Sydney user on 28 February 2021

PERRY G. BEASLEY-HALL*, , JUANITA CHUI, DAEJ A. ARAB and NATHAN LO

School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia

Received 13 August 2018; revised 1 November 2018; accepted for publication 4 November 2018

Termite mounds have intrigued humans for millennia. Despite great interest in their beautiful and often complex structures, the question of why termites acquired mound-building behaviour has received little attention. Here, we focus on two Australian lineages of the Nasutitermitinae (composed primarily of Nasutitermes and Tumulitermes spp.), which have evolved mound-building behaviour in parallel from arboreal and soil/wood-nesting ancestors, respectively. We used environmental niche modelling and ancestral niche reconstructions to investigate whether abiotic factors, including precipitation, temperature and soil composition, were associated with the repeated acquisition of mound building. Although we found strong evidence for ecological speciation leading to niche divergence in the nasutes, ultimately no abiotic variable was consistently correlated with mound-building behaviour. We also observed no trend in the variables limiting the environmental tolerances of mound builders. This suggests a more complex evolutionary history of mound building that cannot be explained by the abiotic factors we examined. Instead, biotic factors not considered here, e.g. colony expansion and protection, might have played a key role in the acquisition of this trait.

ADDITIONAL KEYWORDS: Australia – Blattodea – climate change – environmental niche modelling – Isoptera – mound building – niche divergence – termites.

INTRODUCTION for ‘intelligent’ buildings with temperature regulation systems that maintain homeostasis in the face of Insect architecture has long been a subject of fascination environmental change (Worall, 2011; Dahl, 2013). for humans, and perhaps none more so than the The functional aspects of these structures have been impressive mounds built by termites. Termite mounds, studied in detail, from the biological significance of defined as epigean structures that rise from the soil, north–south orientations of ‘magnetic’ termite mounds, are common in the savannahs of Africa, Asia, Australia to the optimal construction of bends in tunnels (e.g. and South America. They are constructed largely by Korb, 2003; Korb & Linsenmair, 1999; Turner, 2001; Lee members of the Termitidae, one of most recently evolved et al., 2007, 2008). Less well understood are the factors and diverse termite families (Bourguignon et al., 2017). that drove the evolution of mound-building behaviour. They are intricate structures that can include a wide The evolution of any adaptive trait, including complex range of architectural elements comparable to those traits such as nest building, is thought to be driven by in human buildings, such as domes and cathedral-like a combination of biotic and abiotic selective pressures spires (Noirot & Darlington, 2000). Termite mounds and conditions (Theokritoff, 1992). For some organisms, have been used as sources of medicinal products, abiotic factors have played a relatively large role in mosquito repellent and cleansing ceremonies in shaping the evolution of a trait. For others, the reverse Australian Aboriginal societies (Andersen et al., 2005) is true, such as in cases of strong sexual selection. The and as shields for game hunting in African societies biotic and abiotic factors influencing the evolution of (Rennie, 1857). They have also acted as inspiration mound building are of particular interest given that nests built by unrelated species often appear similar based on their exterior structure (Hill, 1942). *Corresponding author. E-mail: perry.beasley-hall@sydney. The earliest termites probably built their nests edu.au in rotting wood (Hill, 1942; Inward et al., 2007). The

© 2018 The Linnean Society of London, Biological Journal of the Linnean Society, 2019, 126, 304–314 304 NASUTE TERMITE MOUND EVOLUTION 305 construction of mounds and other nesting strategies We have recently performed a phylogenetic analysis (i.e. in soil, or arboreally) have evolved multiple of 44 species of Australian nasutes (Arab et al., 2017). times from this ancestral state (Arab et al., 2017; Our findings demonstrate that mound building has Lee et al., 2017). The trait of foraging away from the evolved in parallel from wood- or soil-nesting ancestors nest was presumably an important preadaptation to on up to eight occasions over the past ~20 Myr, a the evolution of mound building (Higashi et al., 1991; period in which mesic environments shifted to drier Inward et al., 2007), Abiotic factors, such as ancient biomes as a result of ‘bursts’ of sustained aridification temperature changes and certain types of soil, might interspersed by warm and wet periods (Byrne et al., Downloaded from https://academic.oup.com/biolinnean/article/126/2/304/5251914 by University of Sydney user on 28 February 2021 have favoured mound building over other nest types. 2011). The appearance of mound building coincident Factors including the underlying geology, annual with aridification raises the possibility that abiotic precipitation and hillslope morphology have been factors played an important role in the evolution of shown to influence mound densities and distributions this nesting behaviour. Alternatively, the overlapping of savannah-dwelling termites (Davies et al., 2014), of climate change events with the emergence of this although whether they influenced the evolution behaviour might be purely coincidental, and biotic or of mound building in the first place is not known. other abiotic factors might instead be responsible for Multiple biotic factors might also have been important the acquisition of mound building. in the development of this trait. These include the These hypotheses can be tested using a framework benefits of a temperature-controlled environment that considers the phylogeny of the group of interest, for development of offspring, areas for the storage of per-species distribution data and (modern) climate food and immatures, and protection of the colony from trends from the study site. Environmental niche predators. models (ENMs) predicting the fundamental niche Formal studies examining the role of biotic factors in of each species can be constructed from these data the evolution of mound building within a phylogenetic and analysed in ancestral niche reconstructions framework remain challenging. There is a lack of biotic (ANRs). In the present study, we use the above- data available for related mound-building and non- mentioned framework to examine whether certain mound-building termite taxa, and data on observable abiotic variables were associated with the evolution interactions (e.g. direct competition or predation of mound building in Australian nasutes. We pair this between species) are both rarely encountered with the most recent phylogenetic framework for the and difficult to score statistically. In contrast, group from Arab et al. (2017). We also test for niche investigations into the role of abiotic factors in mound divergence vs. conservatism during the diversification evolution are more feasible owing to the availability of the Australian nasutes. This study represents the of extensive environmental data and the development most detailed investigation to date on the potential of ancestral niche reconstruction methods. Lee et al. role played by abiotic factors in the evolution of mound (2017) recently performed the first formal analysis building in termites. of abiotic factors influencing the evolution of mound building. Their study focused on Australian members of the genus Coptotermes, the only group outside the family Termitidae that is known to build conspicuous MATERIAL AND METHODS mounds (Lee et al., 2015). Through ancestral niche reconstructions of eight species of this genus, Lee et al. Data collection (2017) found that all mound-building taxa examined Distribution data were sourced from Watson & Abbey had significantly different environmental niches and (1993), Arab (2015) and the Atlas of Living Australia that there was no clear relationship between the (2018), summarized in Table 1. Species without abiotic factors considered and the incidence of mound- adequate distribution data (> 15 occurrence points) building behaviour. were not included. We obtained a dated phylogenetic A group with a larger number of acquisitions of tree from Arab et al. (2017) and produced simplified mound-building behaviour is the Australian lineage phylogenies of taxa for both lineages of interest of the subfamily Nasutitermitinae, also called (Nasutitermes and Tumulitermes; Fig. 1). Nasutitermes snouted termites or ‘nasutes’ (Hare, 1937). This group centraliensis was omitted from our study owing to the comprises 13 mound-building species, including incorrect placement of its label in the Arab et al. (2017) Nasutitermes triodiae, whose mounds reach over six tree (D. A. Arab, pers. comm.). These two reduced metres high and are the tallest in the world (Gay & phylogenies were used to represent six independent Calaby, 1970). Nasutes boast an impressive variety of acquisitions of mound building in total, and analyses life histories, and in addition to mounds, their nesting were conducted on each tree separately given the habitats include decaying logs, under rocks or in the independent evolution of the taxa they represent. soil, and tree branches (Hill, 1942). For simplicity, here we classified the nesting habits

© 2018 The Linnean Society of London, Biological Journal of the Linnean Society, 2019, 126, 304–314 306 P. G. BEASLEY-HALL ET AL. Downloaded from https://academic.oup.com/biolinnean/article/126/2/304/5251914 by University of Sydney user on 28 February 2021

Figure 1. Simplified phylogenetic trees of lineages 1 (L1) and 3 (here coded as L2) from Arab et al. (2017). Nesting types are denoted by coloured circles at the tips. Coloured circles at internal nodes are ancestral state reconstructions from Arab et al. (2017). The wider nasute phylogeny was estimated using 12S, 16S and COII genes using Bayesian inference. Scale bars are in millions of years from the present.

© 2018 The Linnean Society of London, Biological Journal of the Linnean Society, 2019, 126, 304–314 NASUTE TERMITE MOUND EVOLUTION 307 Downloaded from https://academic.oup.com/biolinnean/article/126/2/304/5251914 by University of Sydney user on 28 February 2021

Figure 2. Environmental niche models (ENMs) produced using MaxEnt for the 25 nasute species examined in this study. Environmental niche models were constructed using distribution data and the 11 bioclimatic variables listed in the Supporting Information (Fig. S1). Warmer colours in the heat maps represent a higher probability of occurrence within the constraints of this analysis, whereas cooler colours represent a lower probability of occurrence. The ENMs here represent only the predicted fundamental niches of species as opposed to their realized niches. An interactive map containing the distribution data used to create these ENMs is available in the Supporting Information (Figure S3).

© 2018 The Linnean Society of London, Biological Journal of the Linnean Society, 2019, 126, 304–314 308 P. G. BEASLEY-HALL ET AL.

Table 1. The 25 nasute termite species considered in this study, with their taxonomic authorities, total occurrence points used in environmental niche model construction (N), nesting types and localities also denoted

Species name and authority N Nesting type Locality*†

Lineage 1 Nasutitermes carnarvonensis (Hill, 1942) 14 Wood or soil Carnarvon Range (QLD)

Nasutitermes exitiosus (Hill, 1925) 30 Mounds Southern Australia broadly; SW WA Downloaded from https://academic.oup.com/biolinnean/article/126/2/304/5251914 by University of Sydney user on 28 February 2021 Nasutitermes graveolus (Hill, 1925) 10 Arboreal Darwin and Tiwi Islands region of NT; N QLD Nasutitermes magnus (Froggatt, 1898) 20 Mounds N QLD Nasutitermes smithi (Hill, 1942) 13 Mounds Central Desert northward to Birdum (NT) Nasutitermes torresi (Hill, 1942) 12 Mounds N QLD; Thursday Island (Torres Strait) Nasutitermes triodiae (Hill, 1942) 174 Mounds N WA; Central Desert northward to Darwin (NT); N QLD Nasutitermes walkeri (Hill, 1942) 114 Arboreal SE coast of QLD; E coast of NSW Tumulitermes pastinator (Hill, 1915) 192 Mounds Northern Australia generally Lineage 2 Nasutitermes dixoni (Hill, 1932) 57 Wood or soil Mid North and South Coast regions of NSW; VIC broadly Occasitermes occasus (Silvestri, 1909) 145 Wood or soil W and SW WA; Eyre Peninsula region of SA Occultitermes occultus (Hill, 1927) 23 Wood or soil N NT; Kimberley region of WA; N QLD Occasitermes watsoni Gay, 1974 17 Wood or soil Riverina region of NSW Tumulitermes apiocephalus (Silvestri, 120 Wood or soil SW regions of WA; Fitzroy region of QLD; central NSW 1909) Tumulitermes comatus (Hill, 1942) 106 Wood or soil SW regions of WA; Central Desert region of NT; QLD broadly; Northern Inland and Riverina regions of NSW Tumulitermes dalbiensis (Hill, 1942) 99 Wood or soil S WA; Alice Springs and Darwin regions of NT; QLD broadly; Orana and Riverina regions of NSW Tumulitermes hastilis (Froggatt, 1898) 144 Mounds Central and northern Australia broadly Tumulitermes kershawi (Hill, 1942) 12 Wood or soil Goldfields–Esperance and W WA; Charters Towers (QLD); Angeldool and Binya SF (NSW) Tumulitermes marcidus (Hill, 1942) 39 Mounds NW and N QLD, Barcaldine region (QLD) Tumulitermes peracutus (Hill, 1925) 89 Wood or soil SW WA; Riverina and Orana regions of NSW Tumulitermes petilus (Hill, 1942) 46 Wood or soil SW WA; Eyre Peninsula region of SA Tumulitermes recalvus (Hill, 1942) 87 Wood or soil Australian mainland broadly Tumulitermes subaquilus (Hill, 1942) 35 Wood or soil SW WA Tumulitermes tumuli (Froggatt, 1898) 132 Mounds WA broadly; Central Desert region of NT; Outback region of SA; SW QLD Tumulitermes westraliensis (Hill, 1921) 57 Mounds Ngaanyatjarraku Shire and southern WA

Species authorities without brackets represent a reclassification. Locality abbreviations are as follows: E, east; N, north; NSW, New South Wales; NT, Northern Territory; NW, northwest; QLD, Queensland; S, south; SA, South Australia; SE, southeast; SF, state forest; SW, southwest; VIC, Victoria; W, west; WA, Western Australia. *Watson & Abbey (1993). †Hill (1942). of nasutes into three categories: arboreal, dwellers in 2016) to ensure that their file formats and resolutions wood/soil substrates or mound building. were consistent with the CliMond data. To ensure the Environmental data were retrieved from the independence of our variables, correlations between CliMond Archive (Hutchinson et al., 2009; Kriticos them were assessed using Pearson’s correlation et al., 2012) and the Commonwealth Scientific and coefficient in the R v.3.3.3 package raster (Hijmans, Industrial Research Organisation (CSIRO) Australian 2017; R Core Team, 2017). Variables with positive or Soil Resource Information System (ASRIS, 2011) to a negative correlations > 75% were discarded using a total of 26 raster layers (summarised in Table S1). random number generator (Table S2), and the following In all, our chosen rasters pertained to temperature, were used for the construction of ENMs: BIO1, BIO2, precipitation, soil composition and plant water BIO3, BIO8, BIO9, BIO11, BIO12 and BIO17 (sourced capacity at three varying soil depths. Rasters sourced from CliMond), soil clay content, soil sand content, from ASRIS were edited using the gdalwarp and available water capacity (AWC) at 0–5 cm and AWC gdal_translate functions in GDAL v.1.11.5 (GDAL, at 100–200 cm (sourced from ASRIS).

© 2018 The Linnean Society of London, Biological Journal of the Linnean Society, 2019, 126, 304–314 NASUTE TERMITE MOUND EVOLUTION 309

Environmental niche modelling Ancestral niche reconstructions Environmental niche models predicting potential In order to assess whether any abiotic variables are distributions for each species were created using consistently associated with the parallel evolution MaxEnt v.3.3.3k (Phillips et al., 2006). Default model of mound building, we performed ancestral niche training settings were used, except that duplicate reconstructions (ANRs) on all 26 variables considered occurrence points were permitted to account for many in this study. The ANRs were produced using the R of our collection records being termites from different package phyloclim, following the methods of Evans Downloaded from https://academic.oup.com/biolinnean/article/126/2/304/5251914 by University of Sydney user on 28 February 2021 mounds sourced from almost identical locations. Ten et al. (2009). Ancestral niche reconstructions make jackknife replicates were created for each species, use of the mean climatic tolerance for each species using 75% of the data for training and the remaining per abiotic variable and the 80% central density 25% as a test dataset to produce probability values. of tolerance. These values are combined with a We used area-under-the-curve (AUC) values to phylogenetic framework that is transformed to match assess the model performance of our ENMs. Area- the mean tolerance of each species for the abiotic under-the-curve values > 0.5 suggest that the model variable of interest. Patterns can then be observed fits better than one generated at random, and values in sister species or clades, e.g. a group sharing the close to 1.0 indicate that presence points are almost same nesting type might all have elevated tolerances always more informative than random background compared with the ancestral state present in their locations, i.e. a ‘perfect’ fit (Baldwin, 2009; Lee et al., sisters. This would indicate that this variable might 2017). have played a role in the evolution of the trait.

Niche overlap RESULTS Fundamental niche overlap using our ENMs was calculated using ENMTools v.1.0 (Warren et al., 2008, Environmental niche modelling 2010), a program that relies on D and I statistics Our AUC values were generally high (Table 2). All as a percentage measure of the similarity between of the species examined had model performance niches. We chose these statistics in combination to far better than that computed from random (i.e. compensate for a lack of absence data in our study. uninformative) data, and all but two had AUC values The degree of niche overlap as a function of the time of < 0.80. The exceptions were Tumulitermes kershawi since divergence was plotted using the age.range. and Nasutitermes smithi, which also had very broad correlation function in the R package phyloclim predicted fundamental niches (Fig. 2). It might be the (Heibl & Calenge, 2013). Additional logistic regression case that the distribution data for these two taxa were analyses were performed in JMP v.13 (JMP, 2016) to uninformative (i.e. they did not contain a sufficient evaluate statistically significant differences between number of occurrence points) or that they spanned realized niches, i.e. our abiotic data paired with our many different niche types, meaning that a more distribution points. In these analyses, abiotic variables precise distribution could not be resolved by MaxEnt were reduced to principal components accounting for in either case. Niche overlap was variable in both ≥ 95% of cumulative variation. The niches of mound lineages; for example, ranging from D = 0.09, I = 0.30 builders, arboreal nesters and wood/soil nesters (Tumulitermes pastinator × Nasutitermes exitiosus) (excluding phylogenetic relationships) were then to D = 0.73, I = 0.93 (T. pastinator × Nasutitermes compared, in addition to all possible sister taxa. triodiae) in lineage 1 (Table S3). Our age range correlation plots produced from these niche overlap values show no overall directionality regarding Phylogenetic signal niche overlap significantly increasing or decreasing Phylogenetic signal, as measured by Blomberg’s K over evolutionary time (Fig. 3). Subsequent logistic (Blomberg et al., 2003) and Pagel’s λ (Pagel, 1999), was regression analyses found that niches of all possible calculated in R using the package phytools (Revell, sister taxa differed significantly from one another, 2012). For both measures, a value close to or greater although the degree of difference as measured by the than one indicates a higher than expected degree of χ2 statistic was not consistent, nor were the variables signal under a Brownian motion model of evolution. influencing these differences. (see also Supporting Evidence of strong phylogenetic signal would suggest Information, Tables S4, S5). that niches had been preserved over evolutionary time, The importance of each variable in our ENMs (i.e. whereas weak signal would suggest niche divergence the environmental tolerances of each species based on as a prevailing method of ecological speciation. their predicted fundamental niche) was assessed using

© 2018 The Linnean Society of London, Biological Journal of the Linnean Society, 2019, 126, 304–314 310 P. G. BEASLEY-HALL ET AL.

Table 2. Area-under-the-curve values for the being preserved over evolutionary time (ecological environmental niche models constructed in this study speciation causing niche divergence).

Species AUC value Ancestral niche reconstructions Lineage 1 Ancestral niche reconstructions were performed on Nasutitermes carnarvonensis 0.904 internal nodes and the 25 species listed in Table 1

Nasutitermes exitiosus 0.934 Downloaded from https://academic.oup.com/biolinnean/article/126/2/304/5251914 by University of Sydney user on 28 February 2021 for all abiotic variables (Fig. 4; see also Supporting Nasutitermes graveolus 0.977 Information, Fig. S2). For all variables examined, Nasutitermes magnus 0.918 we find no clear grouping of mound-building species Nasutitermes smithi 0.802 with respect to climatic tolerances of any variable, Nasutitermes torresi 0.964 indicating a complex evolutionary history of this Nasutitermes triodiae 0.909 trait beyond abiotic influence. Mound builders Nasutitermes walkeri 0.980 did not display either a positive or a negative Tumulitermes pastinator 0.914 bias of tolerance compared with their sister taxa Lineage 2 Nasutitermes dixoni 0.978 for any variable, nor did their tolerances group Occasitermes occasus 0.981 independently of arboreal or wood/soil nesting. Such Occultitermes occultus 0.996 biases or groupings would suggest some degree of Occasitermes watsoni 0.989 environmental influence on the evolution of mound- Tumulitermes apiocephalus 0.964 building behaviour (Lee et al., 2017), but they were Tumulitermes comatus 0.848 not recovered in our analyses. Tumulitermes dalbiensis 0.845 Tumulitermes hastilis 0.828 Tumulitermes kershawi 0.747 Tumulitermes marcidus 0.943 DISCUSSION Tumulitermes peracutus 0.956 Climate, soil and the evolution of mound- Tumulitermes petilus 0.966 building behaviour Tumulitermes recalvus 0.782 We found no clear evidence that the abiotic variables Tumulitermes subaquilus 0.992 considered here contributed to the acquisition of mound Tumulitermes tumuli 0.902 building, indicating a more complex evolutionary Tumulitermes westraliensis 0.961 history for this trait than can be explained solely by the factors we examined. There are a variety of Values have been averaged across the ten replicates of test data for each species. additional abiotic factors not considered in this study Abbreviation: AUC, area under the curve. (e.g. elevation and the presence of water bodies) that might have had a large impact upon determining the percentage contribution analyses from the MaxEnt distributions of nasutes, and thus on our ability to output. There was no clear trend in the types of variables infer the effect of abiotic factors on mound evolution. influencing our models (grouped as related to rainfall, Although it would be impossible to account for every temperature, water capacity or soil composition) in abiotic factor shaping the distributions of termites, the lineage 1, although the majority of species in lineage 2 incorporation of additional information could improve were constrained by temperate-related variables the accuracy of ENMs inferred for each species. above all others aside from Tumulitermes comatus and Nasutitermes dixoni (see also Supporting Information, Fig. S1). Role of biotic factors It might be the case that abiotic factors, no matter how many are considered, are inadequate for resolving Phylogenetic signal any link between the evolution of mound building and Statistically significant K values were produced for external factors. Biotic factors might therefore have seven variables in lineage 1 and a single variable in been the key selective pressure that led to mound lineage 2, but when combined with Pagel’s λ no variables building within the nasutes. Living in a mound affords showed both high and statistically significant values a colony a suite of benefits, including the ability to for the two metrics (see also Supporting Information, support larger population sizes compared with single- Table S6). We therefore observe no strong evidence piece wood nesting, and the storage of food. A further for phylogenetic signal in either lineage, and this is benefit of mound construction could be enhanced defence consistent with a scenario of climatic tolerances not from (or less exposure to) predators. Predation pressure

© 2018 The Linnean Society of London, Biological Journal of the Linnean Society, 2019, 126, 304–314 NASUTE TERMITE MOUND EVOLUTION 311 Downloaded from https://academic.oup.com/biolinnean/article/126/2/304/5251914 by University of Sydney user on 28 February 2021

Figure 3. Niche overlap, quantified using D and I statistics, and median ages of nodes (shown as circles) of the Australian nasute phylogeny shown in Figure 1. The D statistic assumes that probability distributions reflect abundance, whereas the I statistic does not. This plot was produced using the phytools function age.range.correlation in R.

Figure 4. Example ancestral niche reconstructions (ANRs) of Australian nasute lineages 1 and 2 concerning annual precipitation (‘BIO12’). We find no evidence for any of the abiotic variables considered here being correlated with the parallel acquisition of mound building. To the left of each panel is the same phylogenetic framework presented in Figure 1, but the tip positions have been transformed to reflect the mean climatic tolerance of each species for the abiotic variable on the y-axis. The 80% central densities of climatic tolerance are denoted by dotted vertical lines. The nesting types for the species are coded at the tips as arboreal (A), wood/soil/other (W) or mound building (M). Species sharing the same coloured branches are sister to one another. is thought to be ameliorated as a colony becomes less Within lineage 1, the ancestors of mound builders reliant on the soil–nest interface (Abe, 1987). were arboreal and foraged away from their nests Foraging away from the nest is also likely to have been (i.e. separate-piece nesting). In the case of lineage 2, a key trait permitting the evolution of mound building. the nesting biology of most non-mound-building

© 2018 The Linnean Society of London, Biological Journal of the Linnean Society, 2019, 126, 304–314 312 P. G. BEASLEY-HALL ET AL. species is not well understood, but most are likely to flow from their parent lineage, potentially owing to the be separate-piece nesters (Abe, 1987). Intermediate great variation in nesting modes mentioned previously. nesters, the precursors to single-piece nesters, notably do not possess a ‘true’ sterile worker caste that is Conclusion specialized for foraging. Higashi et al. (1991) proposed that workers foraging further away from their nests in Our study is the first of its kind to assess comprehensively intermediate nesting increases colony stability as the the effect of abiotic influences on the evolution of mound nest is consumed less, favouring food–nest separation building in Australian nasute termites. Our data suggest Downloaded from https://academic.oup.com/biolinnean/article/126/2/304/5251914 by University of Sydney user on 28 February 2021 and the evolution of more specialized foragers (‘true’ a complex evolutionary history of this trait that cannot workers) in a cyclical fashion. This might have led to be explained by the abiotic factors we examined alone. the eventual acquisition of separate-piece nesting. Biotic factors are likely to have played a key role in the The transition from one separate-piece nesting acquisition of mound building in the Nasutitermitinae. strategy to another, as in our lineage 1, might also The transition from arboreal nesting to epigeal nesting have increased colony stability. Epigeal mounds might in our first lineage might have afforded the nasutes the provide more stable habitats than other nest types, ability to colonize a wider geographical area. We also enabling termites to persist in harsh environments. For observe strong evidence for niche divergence in the instance, lineage 1 mound-builders, such as N. triodiae, Australian nasutes, in agreement with the findings of N. smithi and T. pastinator, tend to have larger predicted Lee et al. (2017). Ultimately, the analysis of additional fundamental niches and are less restricted to areas variables relevant to the biology of termites, such as the proximate to the eastern and northern coastlines availability of nest substrates or biotic interactions with compared with their tree-dwelling sister species (Fig. 2). other species, might provide more resolution in future Thus, although arboreal nests are likely to offer similar work. An updated classification of termite nesting types advantages to epigeal mounds, it appears that they are might also be necessary to reassess the ancestral states more suited to areas proximate to the coast and do not of nest-building behaviour in these insects. facilitate colonization of more arid areas. The emergence of mound building might therefore have allowed termites in lineage 1 to colonize a wider variety of niches and/or a ACKNOWLEDGEMENTS larger area of the Australian continent. The authors would like to thank Tim Lee for technical Unlike the case for the non-mound-building taxa assistance and two reviewers for assisting with an Nasutitermes walkeri and Nasutitermes graveolus from earlier version of this manuscript. lineage 1, non-mound-building species within lineage 2 are collectively found across a wider area of the continent (Fig. 2). The evolution of mound building within lineage 2 REFERENCES therefore does not appear to have been necessary for the colonization of arid habitats by these taxa in Central Abe T. 1987. Evolution of lifetypes in termites. In: Kawano S, Australia. At the lineage level, the earlier (~18 Mya) Connell JH, Hidaka T, eds. Evolution and coadaptation in arrival of lineage 2 compared with lineage 1 (~12 Mya) biotic communities. Tokyo: University of Tokyo Press. might have permitted its colonization of Central and Andersen A, Jacklyn P, Dawes-Gromadski T, Morris I. Southern Australia. This could be attributable to a 2005. Termites of Northern Australia. Alice Springs: Barker longer persistence on the continent before a burst of Souvenirs. aridification ~15 Mya (Byrne et al., 2011), during which Arab D. 2015. Australian Nasutitermitinae: phylogenetics, time Australia would have been more forested. The evolution, and biodiversity. Master’s Thesis, University of rarity of lineage 1 in these areas could be attributable Sydney. to competitive exclusion between the two lineages or to Arab DA, Namyatova A, Evans TA, Cameron SL, Yeates, DK, Ho SYW, Lo N. 2017. Parallel evolution of the inability of this lineage to colonize Central Australia mound-building and grass-feeding in Australian nasute once arid conditions had spread in this area. termites. Biology Letters 13: 20160665. Finally, we find strong evidence for ecological ASRIS. 2011. ASRIS Australian soil classification. CSIRO speciation via niche divergence in the Australian Land and Water. Available at: http://asris.csiro.au nasutes. Our age range correlation plots, logistic Atlas of Living Australia. 2018. The atlas of living Australia. regression analyses and phylogenetic signal data Available at: http://ala.org.au all suggest that niches have not been conserved Baldwin RA. 2009. Use of maximum entropy modeling in over evolutionary time in any of these species. This wildlife research. Entropy 11: 854–866. is supported by an additional lack of any pattern Blomberg SP, Garland T Jr, Ives AR. 2003. Testing for regarding climatic tolerance in our ANRs. These data phylogenetic signal in comparative data: behavioral traits suggest that the nasutes were easily able to colonize are more labile. Evolution; international journal of organic new niches, which eventually led to cessation of gene evolution 57: 717–745.

© 2018 The Linnean Society of London, Biological Journal of the Linnean Society, 2019, 126, 304–314 NASUTE TERMITE MOUND EVOLUTION 313

Bourguignon T, Lo N, Šobotník J, Ho SY, Iqbal N, Kriticos DJ, Webber BL, Leriche A, Ota N, Macadam I, Coissac E, Lee M, Jendryka MM, Sillam-Dussès D, Bathols J, Scott JK. 2012. CliMond: global high Krížková B, Roisin Y, Evans TA. 2017. Mitochondrial resolution historical and future scenario climate surfaces phylogenomics resolves the global spread of higher termites, for bioclimatic modelling. Methods in Ecology and ecosystem engineers of the Tropics. Molecular Biology and Evolution 3: 53–64. Evolution 34: 589–597. Lee SH, Bardunias P, Su NY. 2007. Optimal length Byrne M, Steane DA, Joseph L, Yeates, DK, distribution of termite tunnel branches for efficient food

Jordan GJ, Crayn D, Aplin K, Cantrill DJ, Cook LG, search and resource transportation. Bio Systems 90: Downloaded from https://academic.oup.com/biolinnean/article/126/2/304/5251914 by University of Sydney user on 28 February 2021 Crisp MD, Keogh JS, Melville J, Moritz C, Porch N, 802–807. Sniderman JMK, Sunnucks P, Weston PH. 2011. Decline Lee SH, Bardunias P, Su NY. 2008. Rounding a corner of a of a biome: evolution, contraction, fragmentation, extinction bent termite tunnel and tunnel traffic efficiency. Behavioural and invasion of the Australian mesic zone biota. Journal of Processes 77: 135–138. Biogeography 38: 1635–1656. Lee TRC, Cameron SL, Evans TA, Ho SYW, Lo N. 2015. The Dahl R. 2013. Cooling concepts: alternatives to air conditioning origins and radiation of Australian Coptotermes termites: for a warm world. Environmental Health Perspectives 121: from rainforest to desert dwellers. Molecular Phylogenetics A18–A25. and Evolution 82 Pt A: 234–244. Davies AB, Levick SR, Asner, GP, Robertson MP, Lee TRC, Evans TA, Cameron SL, Hochuli DF, Ho SYW, ven Rensburg BJ, Parr, CL. 2014. Spatial variability Lo N. 2017. Ecological diversification of the Australian and abiotic determinants of termite mounds throughout a Coptotermes termites and the evolution of mound building. savanna catchment. Ecography 37: 852–862. Journal of Biogeography 44: 1405–1417. Evans ME, Smith SA, Flynn RS, Donoghue MJ. 2009. Miura T, Mastumoto T. 2000. Soldier morphogenesis in a Climate, niche evolution, and diversification of the “bird- nasute termite: discovery of a disc-like structure forming a cage” evening primroses (Oenothera, sections Anogra and soldier nasus. Proceedings of the Royal Society B: Biological Kleinia). The American Naturalist 173: 225–240. Sciences 267: 1185–1189. Gay FJ, Calaby JH. 1970. Termites of the Australian region. Noirot C, Darlington JPEC. 2000. Termite nests: In: Krishna K, Weesner FM (eds.) Biology of termites. New architecture, regulation and defence. In: Abe T, Bignell DE, York: Academic Press. Higashi T, eds. Termites: evolution, sociality, symbioses, GDAL. 2016. GDAL – Geospatial Data Abstraction Library, ecology. Dordrecht, The Netherlands: Springer. version 1.11.5. Open Source Geospatial Foundation, Chicago, Pagel M. 1999. Inferring the historical patterns of biological USA. Available at: http://gdal.osgeo.org evolution. Nature 401: 877–884. Hare L. 1937. Termite phylogeny as evidenced by soldier Phillips SJ. 2006. Maximum entropy modeling of species mandible development. Annals of the Entomological Society geographic distributions. Ecological Modelling 190: 231–259. of America 30: 459–486. R Core Team. 2017. R: a language and environment for Heibl C, Calenge C. 2013. Phyloclim. R package version statistical computing. Vienna: R Foundation for Statistical 0.9–4. Available at: https://cran.r.project.org/web/packages/ Computing. Available at: https://www.R-project.org phyloclim/index.html Rennie J. 1857. Insect architecture: to which are added Higashi M, Yamamura N, Abe T, Burns TP. 1991. Why don’t miscellanies on the ravages, the preservation for purposes of all termite species have a sterile worker caste? Proceedings study, and the classification of insects. London: John Murray. of the Royal Society B: Biological Sciences 246: 25–29. Revell LJ. 2012. phytools: an R package for phylogenetic Hijmans RJ. 2017. raster: geographic data analysis and comparative biology and other things. Methods in Ecology modelling. R package version 2.6–7. Available at: https:// and Evolution 3: 217–223. cran.r-project.org/package=raster Theokritoff G. 1992. Biotic and abiotic factors in evolution. Hill GF. 1942. Termites Isoptera from the Australian region. Bioscience 42: 212–213. Melbourne: CSIRO. Turner JS. 2001. On the mound of Macrotermes michaelseni Hutchinson M, Xu T, Houlder D, Nix H, McMacon J. 2009. as an organ of respiratory gas exchange. Physiological and ANUCLIM 6.0 user’s guide. Canberra: Australian National Biochemical Zoology: PBZ 74: 798–822. University, Fenbner School of Environment and Society. Warren DL, Glor RE, Turelli M. 2008. Environmental niche Inward DJ, Vogler AP, Eggleton P. 2007. A comprehensive equivalency versus conservatism: quantitative approaches to phylogenetic analysis of termites (Isoptera) illuminates key niche evolution. Evolution; international journal of organic aspects of their evolutionary biology. Molecular Phylogenetics evolution 62: 2868–2883. and Evolution 44: 953–967. Warren DL, Glor RE, Turelli M. 2010. ENMTools: a toolbox JMP. 2016. JMP, version 13. Cary: SAS Institute Inc., North for comparative studies of environmental niche models. Carolina, USA. Ecography 33: 607–611. Korb J. 2003. The shape of compass termite mounds and its Watson JAL, Abbey HM. 1993. Atlas of Australian termites. biological significance. Insectes Sociaux 50: 218–221. Canberra: CSIRO Division of Entomology. Korb J, Linsenmair KE. 1999. The architecture of termite Worall M. 2011. Homeostasis in nature: nest building termites mounds: a result of a trade-off between thermoregulation and intelligent buildings. Intelligent Buildings International and gas exchange? Behavioural Ecology 10: 312–316. 3: 87–95.

© 2018 The Linnean Society of London, Biological Journal of the Linnean Society, 2019, 126, 304–314 314 P. G. BEASLEY-HALL ET AL.

SUPPORTING INFORMATION Table S1. Description of environmental variables used in this study. Variables with a BIO prefix are from the BioClim database; soil and plant-related variables are from the ASRIS database. Variables in grey were used to construct environmental niche models. Remaining variables were discarded owing to correlations > 75%. Table S2. Pearson’s correlation coefficient values between all environmental variables used for the construction of environmental niche models in this study. Correlations ≥ 75% are in bold.

Table S3. Degree of niche similarity, as measured by D and I statistics, between Australian Nasutitermitinae Downloaded from https://academic.oup.com/biolinnean/article/126/2/304/5251914 by University of Sydney user on 28 February 2021 lineages 1 and 2. ‘0’ indicates no similarity and ‘1’ indicates that the niches of the two taxa are identical. D statistics are shown in the upper triangle and I statistics in the lower triangle. Niche models were produced using MaxEnt with the bioclimatic variables described in the Supporting Information (Table S1). Table S4. Loading matrix for principal components (PCs) 1, 2 and 3 based on 24 climate variables from the BioClim and ASRIS databases. Principal components were included only if they contributed to ≥ 5% variation. Abbreviation: AWC, available water capacity. Table S5. Effect likelihood-ratio tests for all logistic regression comparisons of niches of all possible sister pairs, indicating the degree of difference between two taxa and the associated χ2 values of principal components (PCs). Clades were grouped into a single set to facilitate analysis against a single species. There are three degrees of freedom in all cases. *P < 0.05, **P < 0.01, ***P < 0.001. Table S6. Phylogenetic signal for all species considered in this study, as measured by Blomberg’s K and Pagel’s λ statistic. K values of > 1 and λ values closer to 1 indicate greater than expected ‘clumping’ of variables with respect to phylogenetic relationships. Bold values have significant P-values and indicate evidence of phylogenetic signal. *P < 0.05, **P < 0.01, ***P < 0.001. Figure S1. Average percentage permutation importance of temperature (yellow), rainfall (blue), available water capacity (green) and soil-related variables (orange) to the environmental niche model of each taxon, i.e. how limited the environmental tolerance of the taxon is by a certain variable. Figure S2. Twenty-three ancestral niche reconstructions used in this study for lineages 1 and 2, respectively. Phylogenetic trees are shown to the left of each panel. Tips of the tree are positioned at the mean climatic tolerance for a given variable on the y-axis for each taxon. The 80% central density of climatic tolerance is denoted by a vertical line. Descriptions of variables are listed in the Supporting Information (Table S1). Species abbreviations for lineage 1: api, T. apiocephalus; car, N. carnarvonensis; com, T. comatus; dix, N. dixoni; exi, N. exitiosus; gra, N. graveolus; mag, N. magnus; pas, T. pastinator; smi, N. smithi; tor, N. torresi; tri, N. triodiae; wal, N. walkeri. Species abbreviations for lineage 2: dal, T. dalbiensis; has, T. hastilis; ker, T. kershawi; mar, T. marcidus; occ (in black), O. occasus; occ (in beige), O. occultus; per, T. peracutus; pet, T. petilus; rec, T. recalvus; sub, T. subaquilus; tum, T. tumuli; wat, O. watsoni; wes, T. westraliensis. The 46 ancestral niche reconstructions are available in PDF format at https://www.dropbox.com/s/yt26w7hb4619bxl/Lineage%201%20ancestral%20niche%20reconstructions. zip?dl=0 (lineage 1 and https://www.dropbox.com/s/da3jnwfog8ip7s4/Lineage%202%20ancestral%20niche%20 reconstructions.zip?dl=0 (lineage 2). Figure S3. Distribution data for the 25 nasutes species considered in the present study. Points on the maps refer to individual occurrence points used to construct ENMs as per Table 1 in the main text. The 25 maps of our distribution data, one for each species, are available in PNG format at https://www.dropbox.com/s/du4x1c6wj7z0o0f/ Nasute%20distribution%20data.zip?dl=0&file_subpath=%2FLocality+maps+for+supp+material.

© 2018 The Linnean Society of London, Biological Journal of the Linnean Society, 2019, 126, 304–314 Report

Increased Mutation Rate Is Linked to Genome Reduction in Prokaryotes

Highlights Authors d Mutation rate is correlated with gene loss in multiple Thomas Bourguignon, Yukihiro Kinjo, prokaryotic lineages Paula Villa-Martı´n, ..., Simon Y.W. Ho, Simone Pigolotti, Nathan Lo d Changes in effective population size are weakly associated with gene loss Correspondence [email protected] (T.B.), d Increased mutation rates should be considered in theories of [email protected] (Y.K.), genome-size evolution [email protected] (N.L.)

In Brief Using phylogenomic analyses, Bourguignon et al. show gene loss and DNA substitution rates (dN, dS) are correlated in seven of nine prokaryote lineages examined. In contrast, gene loss rate is weakly associated with dN/dS. These results indicate that mutation rate, rather than effective population size, is a key driver of genome reduction in prokaryotes.

Bourguignon et al., 2020, Current Biology 30, 3848–3855 October 5, 2020 ª 2020 Elsevier Inc. https://doi.org/10.1016/j.cub.2020.07.034 ll ll

Report Increased Mutation Rate Is Linked to Genome Reduction in Prokaryotes

Thomas Bourguignon,1,2,3,9,10,* Yukihiro Kinjo,1,9,* Paula Villa-Martı´n,1 Nicholas V. Coleman,3 Qian Tang,4 Daej A. Arab,3 Zongqing Wang,5 Gaku Tokuda,6 Yuichi Hongoh,7 Moriya Ohkuma,8 Simon Y.W. Ho,3 Simone Pigolotti,1 and Nathan Lo3,* 1Okinawa Institute of Science & Technology Graduate University, 1919–1 Tancha, Onna-son, Okinawa 904–0495, Japan 2Faculty of Tropical AgriSciences, Czech University of Life Sciences, Kamycka 129, Prague CZ-165 00, Czech Republic 3School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia 4Department of Biological Sciences, National University of Singapore, Singapore 117543, Singapore 5College of Plant Protection, Southwest University, No. 2 Tiansheng Road, Beibei District, Chongqing 400716, China 6Tropical Biosphere Research Center, Center of Molecular Biosciences, University of the Ryukyus, Nishihara, Okinawa 903-0213, Japan 7School of Life Science and Technology, Tokyo Institute of Technology, Tokyo 152-8550, Japan 8RIKEN Bioresource Research Centre, Tsukuba 305-0074, Japan 9These authors contributed equally 10Lead Contact *Correspondence: [email protected] (T.B.), [email protected] (Y.K.), [email protected] (N.L.) https://doi.org/10.1016/j.cub.2020.07.034

SUMMARY

The evolutionary processes that drive variation in genome size across the tree of life remain unresolved. Effective population size (Ne) is thought to play an important role in shaping genome size [1–3]—a key example being the reduced genomes of insect endosymbionts, which undergo population bottlenecks dur- ing transmission [4]. However, the existence of reduced genomes in marine and terrestrial prokaryote species with large Ne indicate that genome reduction is influenced by multiple processes [3]. One candidate process is enhanced mutation rate, which can increase adaptive capacity but can also promote gene loss. To inves- tigate evolutionary forces associated with prokaryotic genome reduction, we performed molecular evolu- tionary and phylogenomic analyses of nine lineages from five bacterial and archaeal phyla. We found that gene-loss rate strongly correlated with synonymous substitution rate (a proxy for mutation rate) in seven of the nine lineages. However, gene-loss rate showed weak or no correlation with the ratio of nonsynony- mous/synonymous substitution rate (dN/dS). These results indicate that genome reduction is largely associ- ated with increased mutation rate, while the association between gene loss and changes in Ne is less well defined. Lineages with relatively high dS and dN, as well as smaller genomes, lacked multiple DNA repair genes, providing a proximate cause for increased mutation rates. Our findings suggest that similar mecha- nisms drive genome reduction in both intracellular and free-living prokaryotes, with implications for devel- oping a comprehensive theory of prokaryote genome size evolution.

RESULTS AND DISCUSSION Additional processes that can explain genome reduction include removal of selective constraints in the case of intracel- Genome size varies dramatically across the tree of life. Among lular endosymbionts [11] and streamlining in the case of marine unicellular organisms, genomes differ in size by over six orders bacteria [8, 12]. A separate potential driver of genome reduction of magnitude [3, 5]. The evolutionary drivers of this variation is enhanced mutation rate [8, 13, 14]. Increased mutation rates remain unresolved [6]. One evolutionary parameter that is can facilitate rapid adaptation in organisms exposed to novel en- thought to shape genome size in both eukaryotes and prokary- vironments [15], an example being bacteria that have recently otes is effective population size (Ne), which determines the rate become intracellular [16]. Such increases can also lead to of genetic drift [1–3]. An important example comes from the ge- enhanced gene erosion and loss [8, 13]. The potential role of nomes of mutualistic insect endosymbionts, which are widely increased mutation rate in driving prokaryote genome reduction considered to undergo long-term degradation as a result of re- has received relatively little attention [17] and lacks empirical ductions in Ne caused by population bottlenecks during support [3, 8, 18]. mother-to-offspring transmission [4, 7–9]. However, a number The influences of different evolutionary processes on genome of free-living bacterial lineages with large Ne have reduced ge- reduction can be disentangled in a phylogenetic framework. nomes [10], indicating the existence of alternative paths to Because mutations at synonymous sites are selectively neutral genome reduction [3, 8]. (assuming that selection on synonymous codon usage is weak

3848 Current Biology 30, 3848–3855, October 5, 2020 ª 2020 Elsevier Inc. ll Report

[19]), the rate of synonymous substitutions (dS) provides a good in their evolution. We found a positive correlation between gene- approximation of mutation rate [20]. On the other hand, the rate loss rate (per Myr) and both dS/time (PGLS coefficient of deter- 2 À4 2 of nonsynonymous substitutions (dN) is affected both by selec- mination r = 0.313, p = 10 , Figure 2B) and dN/time (r = tion and the mutation rate. By comparing rates of gene loss 0.231, p = 0.001, Figure 2C). We estimated dN/dS along each ter- with dN and dS across a phylogeny, we can assess the relative minal branch across the tree and found a positive, albeit weak, 2 importance of changes in Ne, mutation rate, and selection on correlation with per-branch gene-loss rate (r = 0.036, p = genome degradation [21]. 0.228, Figure 2D). We performed ranked correlation analysis Previous studies of the influence of these processes on bacte- across all branches, which corrected for biases associated rial genome evolution have typically compared a few reduced- with estimation of dS for long branches (dS > 1.5; as a result of genome taxa with distantly related taxa possessing larger ge- substitutional saturation) and short branches (dS < 0.2). We nomes [9, 17] or have compared several distantly related taxa found a positive correlation between gene-loss rate (per Myr) À5 [1]. We took a novel approach, performing molecular evolu- and both dS/time (rho = 0.467, p = 6.81 3 10 ) and dN/time À4 tionary analyses in a phylogenetic framework on closely related (rho = 0.443, p = 1.73 3 10 ). We estimated dN/dS across all strains or species with varying genome sizes. We examined nine branches of the tree and found no correlation with per-branch lineages from five bacterial and archaeal phyla (Bacteroidetes, gene-loss rate (rho = À0.109, p = 0.379). Analyses in COEVOL Proteobacteria, Cyanobacteria, Actinobacteria, and Euryarch- [32], based on separate 1st plus 2nd and 3rd codon sites (as prox- aeota) that displayed notable variation in genome size among ies for nonsynonymous and synonymous substitution sites, closely related taxa and for which we were able either to respectively), also revealed positive correlations between gene generate representative genomic data or to retrieve data from loss and evolutionary rate at these different site classes (r = GenBank. Because we compared closely related taxa, we 0.538, p < 0.01; r = 0.395, p < 0.01). These results indicated assumed that the influence of codon usage bias on dS was that gene loss is associated with increases in mutation rate, approximately equal across members of a given lineage. The which are expected to raise both dS and dN [21], rather than intracellular endosymbiont lineages that we investigated (Blatta- with reductions in Ne (which are expected to lead to increases bacterium cuenoti and Buchnera aphidicola) are not known to in dN only). Increases in dN might also be explained by positive share their host cells with secondary symbionts that have under- selection, although this would not be expected to produce the gone long-term co-cladogenesis with their hosts [22, 23]. Long- genome-wide changes detected in our analyses. term secondary symbionts might cause extreme genome reduc- We found heterogeneous GC-content across Blattabacterium tion via removal of selective constraints on redundant genes [24, strains, which potentially leads to biased estimates of dN and dS. 25], which could confound interpretation of the roles of mutation To correct this bias, we used a non-homogeneous nucleotide rates and Ne on gene loss. substitution model across branches implemented in nhPhyML st nd rd [33]on1 plus 2 and 3 codon sites as proxies for dN and Increased Mutation Rate Is Strongly Associated with dS, respectively. We found highly significant correlations be- Gene Loss in Blattabacterium and Buchnera tween gene-loss rates and both dS/time and dN/time, and a Endosymbionts marginally significant correlation between gene-loss rates and

We first examined the genomes of 67 Blattabacterium cuenoti dN/dS (Data S1A). In the analyses described above, we used ra- (hereafter Blattabacterium) strains from cockroach and termite tios as measures of evolutionary rates and gene-loss rates, an hosts that represent the eight dictyopteran families known to approach that might introduce spurious correlations [34]. To cor- harbor this endosymbiont, including 46 sequenced for the pre- rect for any potential biases in our analyses, we performed partial sent study. Blattabacterium is an obligate intracellular mutualist correlation analysis, using the residual values of three linear re- that participates in host nitrogen recycling [26–29] and has gressions: (1) branch lengths calculated for 3rd codon sites been strictly transmitted from mother to offspring for >200 Myr versus time (as a proxy for dS, referred to here as ‘‘time- st nd [30, 31]. Genome sizes were found to vary from 511 to 645 kb controlled dS’’), (2) branch lengths calculated for 1 plus 2 among strains. We estimated a maximum-likelihood phyloge- codon sites versus branch lengths calculated for 3rd codon sites netic tree using a set of 353 genes present in the genomes of (as a proxy for dN/dS, referred to here as ‘‘dS-controlled dN’’), and all 67 taxa. We then reconstructed the evolution of gene loss us- (3) gene loss versus time (referred to as ‘‘time-controlled gene ing a model that allowed gene loss but no gene gain, as is known loss’’). Correlations of time-controlled gene loss with time- to occur in intracellular mutualistic endosymbionts [11](Figure 1). controlled dS and dS-controlled dN indicate respective associa- A comparison of numbers of genes lost with phylogenetic tions of gene-loss rate with mutation rates and Ne. We found a root-to-tip distances for each strain revealed a positive correla- positive correlation between gene-loss rate and time-controlled tion (Spearman’s rank correlation coefficient rho = 0.701, Fig- dS (rho = 0.443, p < 0.001, Figure 3A) but not between gene-loss ure 2A). To examine the relative roles of mutation rate, reduced rate and dS-controlled dN (rho = 0.229, p = 0.071, Figure 3B), Ne, and selection on rates of gene loss, we calculated dS/time confirming that gene loss is strongly associated with mutation and dN/time along each branch of the phylogeny using the align- rate in Blattabacterium. ment of 353 conserved genes, and performed phylogenetic We repeated the analyses described in the previous para- generalized least-squares regression on terminal branch values graphs on 47 strains of Buchnera aphidicola (hereafter Buch- (time duration of each branch was estimated using Bayesian nera), an obligate endosymbiont from the phylum Proteobacteria analysis). Because the genes used in these analyses have never with genome sizes varying from 412 to 646 kb. Buchnera in- been lost during the evolution of Blattabacterium, the removal of fected the ancestor of all aphids >150 Ma and has been passed selective constraints is not expected to have played a major role down from mother to offspring since that point [35], being

Current Biology 30, 3848–3855, October 5, 2020 3849 ll Report

Figure 1. Phylogenetic tree of Blattabacterium Inferred Using Maximum-Likelihood Analysis of 353 Protein-Coding Genes with 3rd Codon Sites Removed Branch color represents cumulative gene loss. Node symbols indicate bootstrap support values. See also Figure S1; Tables S1 and S2.

occasionally lost in some aphid lineages [36]. We reconstructed correlates with mutation rate, while the effect of changes in Ne a maximum-likelihood phylogenetic tree for Buchnera, inferred on genome evolution is less clear. the evolution of gene loss, and performed correlation analyses equivalent to those described for Blattabacterium. We found sig- Gene Loss Is Associated with Mutation Rate in Multiple nificant correlations between gene-loss rates and both dS/time Free-Living Prokaryote Lineages and dN/time but not between gene-loss rates and dN/dS (see We performed the analyses described above on seven additional Data S1B). Partial correlation analysis confirmed these results: free-living lineages. Because these taxa can obtain new genetic time-controlled gene loss was strongly correlated with time- material through horizontal transfer, we estimated total gene loss À6 controlled dS (rho = 0.619, p < 10 , Figure 3C) but not with per branch using a model that allowed both gene loss and gain. dS-controlled dN (rho = 0.196, p = 0.080, Figure 3D). Therefore, For estimations of dN and dS, we used a set of 31 core genes that similar to the case for Blattabacterium, gene loss in Buchnera are unlikely to have been the subject of lateral gene transfer. We

3850 Current Biology 30, 3848–3855, October 5, 2020 ll Report

ABFigure 2. Evolution of Genome Reduction in Blattabacterium (A) Spearman’s rank correlation between total

number of gene losses and root-to-tip dN distance (inferred from the tree represented in Figure 1) for each strain. (B–D) Phylogenetic generalized least-squares regression implemented in the R package CAPER

between (B) gene loss/time and dS/time, (C) gene

loss/time and dN/time, and (D) gene loss/time and

dN/dS, per terminal branch.

We analyzed a further five free-living lineages from a range of habitats. In three of these lineages, Corynebacterium CD(genome sizes range from 2.45 to 3.57 Mb, n = 18), Micrococcineae (genome sizes range from 1.43 to 5.05 Mb, n = 22), and Flavobacteriaceae (genome sizes range from 2.09 to 6.09 Mb, n = 33), we found results similar to those ob- tained for Blattabacterium, Buchnera, Prochlorococcus + Synechococcus, and Thermococcus (Figures 3K–3L and 3O–3R; Data S1F, S1H, and S1I), although a significant correlation be-

tween gene loss and dN/dS was found in Corynebacterium (rho = À0.685, p < 10À4) and between time-controlled gene initially examined two lineages known for possessing reduced loss and dS-controlled dN in the case of Micrococcinae (rho = genomes: the marine cyanobacterial group Prochlorococcus + 0.388, p = 0.031). In the remaining two lineages, Gammaproteo- Synechococcus (genome sizes range from 1.64 to 2.79 Mb, bacteria (genome sizes range from 1.70 to 5.01 Mb, n = 20) and n = 28) and the archaean genus Thermococcus (genome sizes Mycobacteriaceae + Nocardiaceae (genome sizes range from range from 1.52 to 2.16 Mb, n = 19). Prochlorococcus and Syn- 3.28 to 9.70 Mb, n = 15), we did not find consistent evidence echococcus comprise some of the most abundant bacterial spe- for correlations between gene-loss/time and dS/time or dN/time cies on earth [37], while Thermococcus is a genus of hyperther- (Data S1E and S1G), and similar results were found in our partial mophilic found in hydrothermal vents [38]. Multiple correlation analysis (Figures 3I, 3J, 3M, and 3N). In the case of analyses consistently revealed significant correlations between Mycobacteriaceae + Nocardiaceae, a correlation was found be- gene-loss/time and both dS/time and dN/time, but not dN/dS,in tween time-controlled gene loss and dS-controlled dN (rho = each of these groups (Data S1C and S1D). Partial correlation 0.47, p = 0.028) (Figure 3N). These results suggest a potential in- analysis revealed that time-controlled gene loss significantly cor- fluence of Ne on genome reduction in Mycobacteriaceae + No- relates with time-controlled dS but not with dS-controlled dN in cardiaceae and Micrococcinae. Overall, these results indicate both Prochlorococcus + Synechococcus (rho = 0.478, p = that the association between mutation rate and gene-loss rate 0.002; rho = À0.243, p = 0.120) (Figures 3E and 3F) and Thermo- applies to free-living bacterial groups with larger genomes, albeit coccus (rho = 0.881, p < 10À6; rho = 0.191, p = 0.383) (Figures 3G not universally. and 3H). These results indicate that increased mutation rate is strongly associated with genome reduction in free-living bacteria Proximate Causes of Increased Mutation Rates and and archaea with reduced genomes. Because codon usage bias Genome Reduction has been detected in strains of Synechococcus, but not Pro- Our results provide the first phylogenomic evidence for a link be- chlorococcus [39], we repeated our analyses examining only tween increased mutation rate and long-term prokaryotic members of the latter genus. We found a highly similar correla- genome reduction—based on analyses of closely related taxa. tion between time-controlled dS and time-controlled gene-loss We found evidence for this link in seven of the nine phylogenet- (rho = 0.475) compared with the full dataset, although signifi- ically and ecologically divergent lineages that we tested. Previ- cance was marginal (p = 0.054), possibly due to the lower num- ous studies have noted an inverse relationship between microbi- ber of taxa examined (n = 18). Similar to the results using the full al genome size and mutation rate (per base pair, per replication) dataset, there was no correlation between time-controlled gene [40–42]; however, these studies examined relatively few, loss and dS-controlled dN (rho = À0.281, p = 0.256). These re- distantly related taxa and did not specifically look at the process sults indicate that codon usage bias does not have a major effect of gene loss and molecular evolution in a phylogenetic on our results. framework.

Current Biology 30, 3848–3855, October 5, 2020 3851 ll Report

ABC DEF

G HI J KL

M NOPQR

Figure 3. Partial Correlation Analysis of Time-Controlled Gene Loss with Time-Controlled dS (as a proxy for dS) and dS-Controlled dN (as a proxy for dN/dS) for Nine Prokaryote Lineages (A and B) Blattabacterium, (C and D) Buchnera, (E and F) Prochlorococcus + Synechococcus, (G and H) Thermococcus, (I and J) Gammaproteobacteria, (K and L) Corynebacterium, (M and N) Mycobacteriaceae + Nocardiaceae, (O and P) Micrococcineae, and (Q and R) Flavobacteriaceae. See also Data S1.

Proximate causes of the increased mutation rates that we direction, or there might be no causal link between the two phe- identified are likely to include the loss of DNA repair genes [3, nomena. The ultimate causes of increases in rates of mutation 8, 13] and reductions in the accuracy of replication enzymes. and gene loss could be adaptive, neutral, or a combination of In Prochlorococcus, low-light-adapted ecotypes have lower mu- both. Below, we briefly consider a number of hypotheses for the tation rates and have retained a larger set of DNA repair genes ultimate causes of genome reduction in the light of our results. than high-light-adapted ecotypes [37]. The Buchnera strains en- Enhanced mutation rates have been hypothesized to provide dowed with the smallest genomes are those associated with adaptive advantages in prokaryotes [48]. A ‘‘mutator’’ strain Lachninae, Calaphidinae, and Phyllaphidinae, all of which that evolved via modification or loss of DNA repair genes or lower possess reduced repair machinery in comparison with other fidelity polymerases might initially be selected because of its ca- strains of Buchnera [43, 44]. In Blattabacterium, taxa with small pacity to rapidly accrue beneficial mutations in novel environ- genomes have lost significantly more genes in COG categories ments. Increased mutations in such a strain, which could be F (nucleotide transport and metabolism) and L (DNA replication, either free-living or endosymbiotic, would lead to increased recombination, and repair) than do other clades (Figure S1; Table gene deterioration and loss, which could lead to increased S1). Genes in these categories are thought to play key roles in fitness due to the removal of functions with a high cost-to-benefit reducing or removing errors that occur during DNA replication. ratio [14, 16, 49, 50]. Under this scenario, the adaptive benefits of An inverse correlation between genome size and loss of DNA increased mutation rate would be the ultimate cause of genome repair enzymes has been found across numerous prokaryotic reduction, given increased mutation rates were maintained dur- taxa [45]. An increased mutation rate can lead to increased levels ing the evolution of the lineage. of gene inactivation and erosion through deletions or nonsense The streamlining hypothesis for genome reduction in marine mutations [13, 46]. According to the ‘‘error threshold’’ theory, cyanobacteria proposes that strong selection acts to remove genes are lost when the mutation rate exceeds the fitness effects non-essential genes in ocean environments low in nitrogen of such gene loss [13, 47]. Because fitness effects vary among and phosphorus (which are essential elements of DNA) [12, genes, enhanced mutation rates will remove genes that are 51]. A small genome also permits small cell volume, which im- less important in the genome. proves nutrient uptake [52, 53]. One interpretation of the increased mutation rates that we observed in Prochlorococcus Ultimate Causes of Increased Mutation Rates and spp. could be that they are a consequence of streamlining, Genome Reduction stemming from the removal of non-essential DNA-repair genes. Although we identified a strong correlation between mutation rate The streamlining hypothesis has been considered unlikely to and gene loss across multiple lineages, causation may be in either apply to bacteria other than marine bacterioplanktons [1].

3852 Current Biology 30, 3848–3855, October 5, 2020 ll Report

However, selection for both increased mutation rate and mini- B Correlation of gene loss with evolutionary rate and dN/ mal use of DNA could provide an explanation for genome dS reduction in a variety of prokaryotes. For example, in hosts that persist on nutritionally restrictive diets, host-level selection SUPPLEMENTAL INFORMATION for endosymbionts that consume fewer critical nutrients could lead to reduced endosymbiont genome size. During this pro- Supplemental Information can be found online at https://doi.org/10.1016/j. cess, individuals with a higher mutation rate would be selected cub.2020.07.034. because they would be likely to lose genes more quickly than individuals with slower rates. ACKNOWLEDGMENTS Hypotheses that require non-adaptive processes to explain in- We thank Ales Bucek for assistance with phylogenetic analysis, David Rentz creases in rates of mutation and gene loss include those based and James Walker for providing samples used in this study, John Alroy for on the removal of selective constraint and Ne. In the former, gene providing advice regarding statistical analyses, and Eddie Holmes for com- loss occurs because no fitness advantage is provided by reten- ments on the manuscript. T.B. was supported by the Japan Society for the tion of particular genes, while in the latter, enhanced genetic drift Promotion of Science (JSPS) KAKENHI 18K14767, by the subsidiary funding because of population bottlenecks leads to the fixation of dele- to OIST, by a University of Sydney Postdoctoral Fellowship, and by the Internal terious mutations, ultimately resulting in gene erosion and loss Grant Agency of the Faculty of Tropical AgriSciences, CULS (20205014). D.A.A. was supported by an Australian Government Scholarship. Z.W. and [4, 9]. In each of these cases, the ultimate cause of increases G.T. respectively acknowledge the support of the National Natural Science in mutation rate is the non-adaptive loss or degradation of Foundation of China (31672329, 31872271) and JSPS KAKENHI 17H01510. DNA repair genes. A reduction in polymerase fidelity as a result S.Y.W.H. and N.L. were supported by the Australian Research Council. of fixation of mildly deleterious mutations via drift could also contribute to increased mutation rates. Based on the lack of cor- AUTHOR CONTRIBUTIONS relation between dN/dS and gene loss, we found no evidence for an effect of reduced Ne on genome reduction during the diversi- T.B., Y.K., and N.L. conceptualized the experiments, with input from S.Y.W.H. fication of the lineages that we examined, although we cannot T.B., Q.T., Z.W., and N.L. collected the samples. T.B. performed lab experi- rule out such an effect. ments and generated data. Y.K. analyzed the data, with significant input from T.B., N.V.C., D.A.A., P.V.-M., S.P., and N.L. T.B. wrote the first draft of Our results show links between increased mutation rates and the manuscript. N.L. and T.B. wrote subsequent drafts of the manuscript, genome reduction in endosymbiotic and multiple free-living bac- with significant input from S.Y.W.H., Y.K., P.V.M., N.V.C., Y.H., M.O., G.T., terial lineages. Our findings are consistent with previous hypoth- and S.P. eses for genome reduction in some free-living bacterial lineages but also suggest that currently accepted explanations for endo- DECLARATION OF INTERESTS symbiont genome reduction require revision. The hypothesis that adaptive benefits of increased mutation rates during the The authors declare no competing interests. early evolution of a lineage ultimately lead to long-term genome reduction should be tested in future studies and considered in Received: February 4, 2020 Revised: April 27, 2020 the development of a comprehensive theory of prokaryote Accepted: July 9, 2020 genome-size evolution. Published: August 6, 2020

STAR+METHODS REFERENCES

1. Kuo, C.H., Moran, N.A., and Ochman, H. (2009). The consequences of ge- Detailed methods are provided in the online version of this paper netic drift for bacterial genome complexity. Genome Res. 19, 1450–1454. and include the following: 2. Lynch, M., Ackerman, M.S., Gout, J.F., Long, H., Sung, W., Thomas, W.K., d KEY RESOURCES TABLE and Foster, P.L. (2016). Genetic drift, selection and the evolution of the mutation rate. Nat. Rev. Genet. 17, 704–714. d RESOURCE AVAILABILITY B Lead Contact 3. Batut, B., Knibbe, C., Marais, G., and Daubin, V. (2014). Reductive genome evolution at both ends of the bacterial population size spectrum. B Material Availability Nat. Rev. Microbiol. 12, 841–850. B Data and Code Availability 4. Moran, N.A., McCutcheon, J.P., and Nakabachi, A. (2008). Genomics and d EXPERIMENTAL MODEL AND SUBJECT DETAILS evolution of heritable bacterial symbionts. Annu. Rev. Genet. 42, 165–190. d METHOD DETAILS 5. McGrath, C.L., and Katz, L.A. (2004). Genome diversity in microbial eu- B Blattabacterium sequencing karyotes. Trends Ecol. Evol. 19, 32–38. d QUANTIFICATION AND STATISTICAL ANALYSIS 6. Sela, I., Wolf, Y.I., and Koonin, E.V. (2016). Theory of prokaryotic genome B Blattabacterium genome assembly and annotation evolution. Proc. Natl. Acad. Sci. USA 113, 11399–11407. B Phylogenetic analysis of Blattabacterium 7. Wernegreen, J.J. (2002). Genome evolution in bacterial endosymbionts of B Molecular dating of Blattabacterium insects. Nat. Rev. Genet. 3, 850–861. B Phylogenetic analysis of Buchnera and free-living bac- 8. Martı´nez-Cano, D.J., Reyes-Prieto, M., Martı´nez-Romero, E., Partida- teria Martı´nez, L.P., Latorre, A., Moya, A., and Delaye, L. (2015). Evolution of B Timetree reconstruction of Buchnera and free-living small prokaryotic genomes. Front. Microbiol. 5, 742. bacteria 9. Moran, N.A. (1996). Accelerated evolution and Muller’s rachet in endo- B Reconstruction of gene loss symbiotic bacteria. Proc. Natl. Acad. Sci. USA 93, 2873–2878.

Current Biology 30, 3848–3855, October 5, 2020 3853 ll Report

10. Biller, S.J., Berube, P.M., Lindell, D., and Chisholm, S.W. (2015). 31. Evangelista, D.A., Wipfler, B., Bethoux, O., Donath, A., Fujita, M., Kohli, Prochlorococcus: the structure and function of collective diversity. Nat. M.K., Legendre, F., Liu, S., Machida, R., Misof, B., et al. (2019). An integra- Rev. Microbiol. 13, 13–27. tive phylogenomic approach illuminates the evolutionary history of cock- 11. McCutcheon, J.P., and Moran, N.A. (2011). Extreme genome reduction in roaches and termites (Blattodea). Proc. Biol. Sci. 286, 20182076. symbiotic bacteria. Nat. Rev. Microbiol. 10, 13–26. 32. Lartillot, N., and Poujol, R. (2011). A phylogenetic model for investigating 12. Giovannoni, S.J., Tripp, H.J., Givan, S., Podar, M., Vergin, K.L., Baptista, correlated evolution of substitution rates and continuous phenotypic char- D., Bibbs, L., Eads, J., Richardson, T.H., Noordewier, M., et al. (2005). acters. Mol. Biol. Evol. 28, 729–744. Genome streamlining in a cosmopolitan oceanic bacterium. Science 33. Boussau, B., and Gouy, M. (2006). Efficient likelihood computations with 309, 1242–1245. nonreversible models of evolution. Syst. Biol. 55, 756–768. 13. Wernegreen, J.J. (2017). In it for the long haul: evolutionary consequences 34. Kronmal, R.A. (1993). Spurious correlation and the fallacy of the ratio stan- of persistent endosymbiosis. Curr. Opin. Genet. Dev. 47, 83–90. dard revisited. J. R. Stat. Soc. 156, 379–392. 14. Marais, G.A., Calteau, A., and Tenaillon, O. (2008). Mutation rate and 35. Baumann, P., Moran, N.A., and Baumann, L. (1997). The evolution and ge- genome reduction in endosymbiotic and free-living bacteria. Genetica netics of aphid endosymbionts. Bioscience 47, 12–20. 134, 205–210. 36. Chong, R.A., and Moran, N.A. (2018). Evolutionary loss and replacement of 15. Giraud, A., Radman, M., Matic, I., and Taddei, F. (2001). The rise and fall of Buchnera, the obligate endosymbiont of aphids. ISME J. 12, 898–908. mutator bacteria. Curr. Opin. Microbiol. 4, 582–585. 37. Partensky, F., and Garczarek, L. (2010). Prochlorococcus: advantages 16. Clayton, A.L., Jackson, D.G., Weiss, R.B., and Dale, C. (2016). Adaptation and limits of minimalism. Annu. Rev. Mar. Sci. 2, 305–331. by deletogenic replication slippage in a nascent symbiont. Mol. Biol. Evol. 38. Zillig, W., Holz, I., Janekovic, D., Schafer,€ W., and Reiter, W.D. (1983). The 33, 1957–1966. archaebacterium Thermococcus celer represents, a novel genus within 17. Itoh, T., Martin, W., and Nei, M. (2002). Acceleration of genomic evolution the thermophilic branch of the archaebacteria. Syst. Appl. Microbiol. 4, caused by enhanced mutation rate in endocellular symbionts. Proc. Natl. 88–94. Acad. Sci. USA 99, 12944–12948. 39. Yu, T., Li, J., Yang, Y., Qi, L., Chen, B., Zhao, F., Bao, Q., and Wu, J. (2012). € 18. Canback, B., Tamas, I., and Andersson, S.G.E. (2004). A phylogenomic Codon usage patterns and adaptive evolution of marine unicellular cyano- study of endosymbiotic bacteria. Mol. Biol. Evol. 21, 1110–1122. bacteria Synechococcus and Prochlorococcus. Mol. Phylogenet. Evol. 62, 19. Hershberg, R., and Petrov, D.A. (2008). Selection on codon bias. Annu. 206–213. Rev. Genet. 42, 287–299. 40. Drake, J.W. (1991). A constant rate of spontaneous mutation in DNA- 20. Bromham, L. (2011). The genome as a life-history character: why rate of based microbes. Proc. Natl. Acad. Sci. USA 88, 7160–7164. molecular evolution varies between mammal species. Philos. Trans. R. 41. Drake, J.W., Charlesworth, B., Charlesworth, D., and Crow, J.F. (1998). Soc. Lond. B Biol. Sci. 366, 2503–2513. Rates of spontaneous mutation. Genetics 148, 1667–1686. 21. Bromham, L., Cowman, P.F., and Lanfear, R. (2013). Parasitic plants have 42. Massey, S.E. (2008). The proteomic constraint and its role in molecular increased rates of molecular evolution across all three genomes. BMC evolution. Mol. Biol. Evol. 25, 2557–2565. Evol. Biol. 13, 126. 43. Perez-Brocal, V., Gil, R., Ramos, S., Lamelas, A., Postigo, M., Michelena, 22. Bandi, C., Damiani, G., Magrassi, L., Grigolo, A., Fani, R., and Sacchi, L. J.M., Silva, F.J., Moya, A., and Latorre, A. (2006). A small microbial (1994). Flavobacteria as intracellular symbionts in cockroaches. Proc. genome: the end of a long symbiotic relationship? Science 314, 312–313. Biol. Sci. 257, 43–48. 44. Chong, R.A., Park, H., and Moran, N.A. (2019). Genome evolution of the 23. Burke, G.R., Normark, B.B., Favret, C., and Moran, N.A. (2009). Evolution obligate endosymbiont Buchnera aphidicola. Mol. Biol. Evol. 36, 1481– and diversity of facultative symbionts from the aphid subfamily Lachninae. 1489. Appl. Environ. Microbiol. 75, 5328–5335. 45. Acosta, S., Carela, M., Garcia-Gonzalez, A., Gines, M., Vicens, L., Cruet, 24. McCutcheon, J.P., and Moran, N.A. (2007). Parallel genomic evolution and R., and Massey, S.E. (2015). DNA Repair Is Associated with Information metabolic interdependence in an ancient symbiosis. Proc. Natl. Acad. Sci. Content in Bacteria, Archaea, and DNA Viruses. J. Hered. 106, 644–659. USA 104, 19392–19397. 46. Moran, N.A., McLaughlin, H.J., and Sorek, R. (2009). The dynamics and 25. Husnik, F., Nikoh, N., Koga, R., Ross, L., Duncan, R.P., Fujie, M., Tanaka, time scale of ongoing genomic erosion in symbiotic bacteria. Science M., Satoh, N., Bachtrog, D., Wilson, A.C.C., et al. (2013). Horizontal gene 323, 379–382. transfer from diverse bacteria to an insect genome enables a tripartite nested mealybug symbiosis. Cell 153, 1567–1578. 47. Eigen, M. (1971). Selforganization of matter and the evolution of biological macromolecules. Naturwissenschaften 58, 465–523. 26. Bandi, C., Sironi, M., Damiani, G., Magrassi, L., Nalepa, C.A., Laudani, U., and Sacchi, L. (1995). The establishment of intracellular symbiosis in an 48. Taddei, F., Radman, M., Maynard-Smith, J., Toupance, B., Gouyon, P.H., ancestor of cockroaches and termites. Proc. Biol. Sci. 259, 293–299. and Godelle, B. (1997). Role of mutator alleles in adaptive evolution. Nature 387, 700–702. 27. Cochran, D. (1985). Nitrogen excretion in cockroaches. Annu. Rev. Entomol. 30, 29–49. 49. Funchain, P., Yeung, A., Stewart, J.L., Lin, R., Slupska, M.M., and Miller, J.H. (2000). The consequences of growth of a mutator strain of 28. Lo´ pez-Sa´ nchez, M.J., Neef, A., Pereto´ , J., Patin˜ o-Navarrete, R., Pignatelli, Escherichia coli as measured by loss of function among multiple gene tar- M., Latorre, A., and Moya, A. (2009). Evolutionary convergence and nitro- gets and loss of fitness. Genetics 154, 959–970. gen metabolism in Blattabacterium strain Bge, primary endosymbiont of the cockroach Blattella germanica. PLoS Genet. 5, e1000721. 50. Koskiniemi, S., Sun, S., Berg, O.G., and Andersson, D.I. (2012). Selection- driven gene loss in bacteria. PLoS Genet. 8, e1002787. 29. Sabree, Z.L., Kambhampati, S., and Moran, N.A. (2009). Nitrogen recy- cling and nutritional provisioning by Blattabacterium, the cockroach endo- 51. Rocap, G., Larimer, F.W., Lamerdin, J., Malfatti, S., Chain, P., Ahlgren, symbiont. Proc. Natl. Acad. Sci. USA 106, 19521–19526. N.A., Arellano, A., Coleman, M., Hauser, L., Hess, W.R., et al. (2003). 30. Bourguignon, T., Tang, Q., Ho, S.Y.W., Juna, F., Wang, Z., Arab, D.A., Genome divergence in two Prochlorococcus ecotypes reflects oceanic Cameron, S.L., Walker, J., Rentz, D., Evans, T.A., and Lo, N. (2018). niche differentiation. Nature 424, 1042–1047. Transoceanic dispersal and plate tectonics shaped global cockroach dis- 52. Dufresne, A., Garczarek, L., and Partensky, F. (2005). Accelerated evolu- tributions: evidence from mitochondrial phylogenomics. Mol. Biol. Evol. tion associated with genome reduction in a free-living prokaryote. 35, 970–983. Genome Biol. 6, R14.

3854 Current Biology 30, 3848–3855, October 5, 2020 ll Report

53. Button, D.K. (1991). Biochemical basis for whole-cell uptake kinetics: spe- 67. Suyama, M., Torrents, D., and Bork, P. (2006). PAL2NAL: robust conver- cific affinity, oligotrophic capacity, and the meaning of the michaelis con- sion of protein sequence alignments into the corresponding codon align- stant. Appl. Environ. Microbiol. 57, 2033–2038. ments. Nucleic Acids Res. 34, W609-12. 54. Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., 68. Nguyen, L.T., Schmidt, H.A., von Haeseler, A., and Minh, B.Q. (2015). IQ- Bealer, K., and Madden, T.L. (2009). BLAST+: architecture and applica- TREE: a fast and effective stochastic algorithm for estimating maximum- tions. BMC Bioinformatics 10, 421. likelihood phylogenies. Mol. Biol. Evol. 32, 268–274. 55. Kinjo, Y., Saitoh, S., and Tokuda, G. (2015). An efficient strategy devel- 69. Drummond, A.J., and Rambaut, A. (2007). BEAST: Bayesian evolutionary oped for next-generation sequencing of endosymbiont genomes per- analysis by sampling trees. BMC Evol. Biol. 7, 214. formed using crude DNA isolated from host tissues: a case study of 70. Wu, M., and Scott, A.J. (2012). Phylogenomic analysis of bacterial and Blattabacterium cuenoti inhabiting the fat bodies of cockroaches. archaeal sequences with AMPHORA2. Bioinformatics 28, 1033–1034. Microbes Environ. 30, 208–220. 71. Capella-Gutierrez, S., Silla-Martı´nez, J.M., and Gabaldo´ n, T. (2009). 56. Nadalin, F., Vezzi, F., and Policriti, A. (2012). GapFiller: a de novo assembly trimAl: a tool for automated alignment trimming in large-scale phyloge- approach to fill the gap within paired reads. BMC Bioinformatics 13 (Suppl netic analyses. Bioinformatics 25, 1972–1973. 14 ), S8. 72. Rambaut, A., and Drummond, A.J. (2007). Tracer. http://tree.bio.ed.ac.uk/ 57. Walker, B.J., Abeel, T., Shea, T., Priest, M., Abouelliel, A., Sakthikumar, S., software/tracer/. Cuomo, C.A., Zeng, Q., Wortman, J., Young, S.K., and Earl, A.M. (2014). 73. Stamatakis, A. (2014). RAxML version 8: a tool for phylogenetic analysis Pilon: an integrated tool for comprehensive microbial variant detection and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313. and genome assembly improvement. PLoS ONE 9, e112963. 74. Yang, Z. (2007). PAML 4: phylogenetic analysis by maximum likelihood. 58. Hyatt, D., Chen, G.L., Locascio, P.F., Land, M.L., Larimer, F.W., and Mol. Biol. Evol. 24, 1586–1591. Hauser, L.J. (2010). Prodigal: prokaryotic gene recognition and translation 75. Paradis, E., Claude, J., and Strimmer, K. (2004). APE: analyses of phyloge- initiation site identification. BMC Bioinformatics 11, 119. netics and evolution in R language. Bioinformatics 20, 289–290. 59. Galperin, M.Y., Makarova, K.S., Wolf, Y.I., and Koonin, E.V. (2015). 76. Revell, L.J. (2012). phytools: An R package for phylogenetic comparative Expanded microbial genome coverage and improved protein family anno- biology (and other things). Methods Ecol. Evol. 3, 217–223. tation in the COG database. Nucleic Acids Res. 43, D261–D269. 77. Orme, D., Freckleton, R., Thomas, G., and Petzoldt, T. (2013). The caper 60. Lagesen, K., Hallin, P., Rødland, E.A., Staerfeldt, H.H., Rognes, T., and package: comparative analysis of phylogenetics and evolution in R (R Ussery, D.W. (2007). RNAmmer: consistent and rapid annotation of ribo- Package). somal RNA genes. Nucleic Acids Res. 35, 3100–3108. 78. Kim, S. (2015). ppcor: An R package for a fast calculation to semi-partial correlation coefficients. Commun. Stat. Appl. Methods 22, 665–674. 61. Lowe, T.M., and Eddy, S.R. (1997). tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 79. Fu, L., Niu, B., Zhu, Z., Wu, S., and Li, W. (2012). CD-HIT: accelerated for 25, 955–964. clustering the next generation sequencing data. Bioinformatics. 28, 3150– 3152. 62. Nawrocki, E.P., and Eddy, S.R. (2013). Infernal 1.1: 100-fold faster RNA 80. Minh, B.Q., Nguyen, M.A.T., and von Haeseler, A. (2013). Ultrafast approx- homology searches. Bioinformatics 29, 2933–2935. imation for phylogenetic bootstrap. Mol. Biol. Evol. 30, 1188–1195. 63. Lechner, M., Findeiss, S., Steiner, L., Marz, M., Stadler, P.F., and 81. Drummond, A.J., Ho, S.Y.W., Phillips, M.J., and Rambaut, A. (2006). Prohaska, S.J. (2011). Proteinortho: detection of (co-)orthologs in large- Relaxed phylogenetics and dating with confidence. PLoS Biol. 4, e88. scale analysis. BMC Bioinformatics 12, 124. 82. Gernhard, T. (2008). The conditioned reconstructed process. J. Theor. 64. Meyer, F., Overbeek, R., and Rodriguez, A. (2009). FIGfams: yet another Biol. 253, 769–778. set of protein families. Nucleic Acids Res. 37, 6643–6654. 83. Ho, S.Y.W., and Phillips, M.J. (2009). Accounting for calibration uncer- 65. Marchler-Bauer, A., and Bryant, S.H. (2004). CD-Search: protein domain tainty in phylogenetic estimation of evolutionary divergence times. Syst. annotations on the fly. Nucleic Acids Res. 32, W327-31. Biol. 58, 367–380. 66. Katoh, K., and Standley, D.M. (2013). MAFFT multiple sequence alignment 84. Pagel, M. (1994). Detecting correlated evolution on phylogenies: a general software version 7: improvements in performance and usability. Mol. Biol. method for the comparative analysis of discrete characters. Proc. Biol. Evol. 30, 772–780. Sci. 255, 37–45.

Current Biology 30, 3848–3855, October 5, 2020 3855 ll Report

STAR+METHODS

KEY RESOURCES TABLE

REAGENT or RESOURCE SOURCE IDENTIFIER Biological samples Cockroach samples used for RNA isolation This Study, see Table S2 N/A Chemicals, Peptides, and Recombinant Proteins RNA-later ThermoFisher Scientific Cat#AM7021 Fisherbrand Disposable Pestle System Fisher Scientific Cat#03-392-103 DNeasy Blood and Tissue extraction kit QIAGEN Cat#69506 Qubit ThermoFisher Scientific Cat#Q32854 Deposited data Blattabacterium genomes associated with This Study; BioProject: PRJNA643811; see N/A 46 cockroach species Table S2 Software and Algorithms COEVOL [32] https://megasun.bch.umontreal.ca/ People/lartillot/www/ nhPhyML [33] http://pbil.univ-lyon1.fr/software/nhphyml/ BLAST+ package [54] https://blast.ncbi.nlm.nih.gov/Blast.cgi TCSF and IMRA [55] https://github.com/Yukihirokinjo/ TCSF_IMRA GapFiller [56] https://sourceforge.net/projects/gapfiller/ Pilon [57] https://github.com/broadinstitute/pilon Prodigal [58] https://github.com/hyattpd/Prodigal COG database [59] https://www.ncbi.nlm.nih.gov/COG/ RNAmmer [60] http://www.cbs.dtu.dk/services/ RNAmmer/ tRNAscan-SE. [61] http://lowelab.ucsc.edu/tRNAscan-SE/ Infernal [62] http://eddylab.org/infernal/ Proteinortho ver. 5.16 [63] https://www.bioinf.uni-leipzig.de/Software/ proteinortho/manual5.html FIGfam [64] http://blog.theseed.org/servers/ presentations/t1/figfams.html CD-search [65] https://www.ncbi.nlm.nih.gov/Structure/ bwrpsb/bwrpsb.cgi MAFFT v7.300b [66] https://mafft.cbrc.jp/alignment/software/ pal2nal v14 [67] https://github.com/HajkD/orthologr/tree/ master/inst/pal2nal IQ-TREE 1.6.7 [68] http://www.iqtree.org BEAST1.8.4 [69] https://beast.community AMPORA2 [70] https://github.com/martinwu/AMPHORA2 trimAl [71] http://trimal.cgenomics.org Tracer 1.5 [72] http://tree.bio.ed.ac.uk/software/tracer/ Paleobiology Database https://www.paleobiodb.org/#/ RAxML version 8.2 [73] https://cme.h-its.org/exelixis/web/ software/raxml/ PAML4 [74] http://evomics.org/resources/software/ molecular-evolution-software/paml/ ape [75] https://cran.r-project.org/web/packages/ ape/index.html (Continued on next page)

e1 Current Biology 30, 3848–3855.e1–e4, October 5, 2020 ll Report

Continued REAGENT or RESOURCE SOURCE IDENTIFIER phytools [76] https://cran.r-project.org/web/packages/ phytools/ CAPER [77] https://cran.r-project.org/web/packages/ caper/index.html ppcor [78] https://cran.r-project.org/web/packages/ ppcor/index.html CD-HIT [79] http://weizhongli-lab.org/cd-hit/

RESOURCE AVAILABILITY

Lead Contact Further information and requests may be directed to and will be fulfilled by the lead contact Thomas Bourguignon (thomas. [email protected]). Yukihiro Kinjo ([email protected]), and Nathan Lo ([email protected]) may also be contacted for further information.

Material Availability This study did not generate new unique reagents.

Data and Code Availability The assembled genomes of Blattabacterium generated in this study are freely available on NCBI under the bioProject ID: PRJNA643811.

EXPERIMENTAL MODEL AND SUBJECT DETAILS

We obtained samples of 46 cockroach species preserved in RNA-later. Cockroaches were shipped at room temperature to Sydney, Australia, where they were stored at À80C until DNA extraction. Details on individual sample collection can be found in Table S2.

METHOD DETAILS

Blattabacterium sequencing Fat bodies of a single cockroach specimen were dissected using a sterile scalpel and DNA was extracted with a DNeasy Blood and Tissue kit (QIAGEN). DNA extraction was performed according to the manufacturer’s protocol. Cockroach fat-body DNA, which in- cludes Blattabacterium DNA, was sequenced during multiple Illumina runs. For the first run, DNA samples of 23 cockroach speci- mens were tagged with unique barcode combinations, mixed in equimolar concentration, and 150 bp paired-end-reads-sequenced on an Illumina HiSeq4000. From this initial sequencing run, 10 Blattabacterium genomes were each assembled in a single circular chromosome (Table S2), while the remaining 13 genomes were each split into several contigs. In the second run, we used the same procedure and sequencing platform and sequenced fat-body DNA of 18 cockroach species, two of which were specimens re-sequenced from the first run (Table S2). In total, this sequencing run yielded four Blattabacterium genomes, each assembled in a single circular chromosome (Table S2). To improve the assembly of fragmented genomes, we re-sequenced specimens over 11 runs of Illumina HiSeq X Ten. The fat-body DNA from two to six species, belonging to different cockroach families or subfamilies (i.e., divergent taxa), was mixed prior to library preparation and sequenced in one run of Illumina HiSeq X Ten. Therefore, the reads obtained from these sequencing runs included DNA from several Blattabacterium strains, which were assembled together. We observed no interaction between Blattabacterium genomes during the assembling steps. Each Blattabacterium contig could be unambiguously attributed to a single cockroach spe- cies using blastn searches, implemented in the BLAST+ package [54].

QUANTIFICATION AND STATISTICAL ANALYSIS

Blattabacterium genome assembly and annotation We assembled high-quality reads using the ‘‘TCSF and IMRA’’ pipeline as previously described [55]. Unknown regions within scaf- folds were determined using GapFiller [56]. For each species, we evaluated the final assembly and corrected erroneous regions using Pilon [57]. Regions of low quality, characterized by a high probability of being misassembled, were removed and masked with ‘‘N.’’ We annotated a total of 67 Blattabacterium genomes, 46 of which were sequenced in this study. The remaining 21 genomes were downloaded from RefSeq (Table S2). We predicted protein-coding regions using Prodigal [58] with a cut-off score of 0.6. In addition to the Prodigal prediction, we carried out homology-based open reading frame prediction using blastp search, implemented in the

Current Biology 30, 3848–3855.e1–e4, October 5, 2020 e2 ll Report

BLAST+ package [54], against the COG database [59]. Predictions for rRNAs, tRNAs, and other non-coding RNAs were carried out using RNAmmer [60], tRNAscan-SE [61], and Infernal [62], respectively. Pseudogenes were identified by checking for fragmentation and truncation of open reading frames. Briefly, we used blastp to search each predicted gene against the predicted orthologous protein sequences of eight published Blattabacterium genomes. We used an e-value of 10À30 as the threshold. Genes with fragmented open reading frames and with disrupted conserved functional motifs or domains were regarded as pseudogenes. We used CDD searches to identify functional motifs and domains. Truncated genes missing more than 30% of typical mean gene length, and missing complete functional motifs or domains, were also considered as pseudogenes. We determined all sets of orthologous genes from all genomes used in this study using Proteinortho ver. 5.16 [63] with the param- eter -cov = 35. All orthologous gene sets were further curated manually, and only those shared among at least five strains were used for our evolutionary analyses. In addition, to remove uncertainties from the prediction of orthologous gene sets, orthologous gene sets with low clustering confidence scores (< 0.6) were removed from the analyses. Functional annotation of each predicted orthologous gene set was carried out using FIGfam [64]. Annotation was further curated using CD-search [65] against COG database.

Phylogenetic analysis of Blattabacterium We carried out phylogenetic analyses on 67 strains of Blattabacterium using 353 orthologous protein-coding genes that were present across all strains, and did not consider further the genes that were absent in one or more strains. We aligned the amino acid com- plement of each gene with MAFFT v7.300b using the option ‘‘–maxiterate 1000–globalpair’’ for maximum accuracy [66]. Amino acid sequence alignments were back-translated to nucleotides using pal2nal v14 [67], and stop codons were masked as ‘‘NNN.’’ The concatenated sequence alignment was partitioned into three subsets, one for each codon position of the protein-coding genes. We removed the 3rd codon sites from subsequent phylogenetic analyses, and partitioned our dataset into two subsets: one containing the 1st codon sites and one containing the 2nd codon sites. A maximum-likelihood phylogenetic tree was recon- structed with IQTREE version 1.6.7 [68] using ultrafast bootstrapping and 1000 replicates [80].

Molecular dating of Blattabacterium We inferred a time-calibrated phylogenetic tree for Blattabacterium using BEAST 1.8.4 [69]. Because BEAST analyses are compu- tationally intensive, we ran the analyses with a subset of 31 genes from the 353 genes used for our maximum-likelihood phylogenetic analysis with IQ-TREE. The 31 selected genes were standard bacterial phylogenetic marker genes used in AMPORA2 [70]. Each gene was aligned independently and the 31 gene alignments were concatenated as described above. We further trimmed the concate- nated alignment matrix, removed the 3rd codon sites, and removed each column containing gaps using trimAl [71]. The final sequence alignment included 14,100 nucleotide sites. We partitioned our dataset into two subsets: one containing 1st codon sites and one containing 2nd codon sites. An independent GTR+G model of nucleotide substitution was assigned to each subset. We implemented an uncorrelated lognormal relaxed clock to account for rate variation across branches [81]. For each analysis, Markov chain Monte Carlo (MCMC) sampling was used to estimate the tree and the posterior distributions of parameters. Each MCMC analysis was performed in duplicate. The MCMC chains were run for 108 steps and the parameter values were sampled every 104 steps. Following inspection of the MCMC samples in Tracer 1.5 [72], we discarded the samples from the first 107 steps as burn-in. The marginal log-likelihood of the tree inferred with a birth-death tree prior was À174,154, whereas that of the tree inferred with a Yule process was À174,712. Therefore, we only present the tree inferred using a birth-death tree prior [82]. The molecular clock was calibrated using seven minimum age constraints (Table S3). Each calibration was based on the fossil re- cord and we systematically selected the youngest possible age for each fossil, as mentioned on the Paleobiology Database (www. paleobiodb.org; last accessed on 27 July 2018). Fossil calibrations were implemented as exponential priors on node times [83]. In each case the 97.5% soft maximum bound was determined using a combination of phylogenetic bracketing and absence of fossil evidence (Table S3).

Phylogenetic analysis of Buchnera and free-living bacteria We obtained genomic data from the RefSeq database. For each lineage, we used the CD-HIT program [84] to remove redundant genomes, which we defined as genomes with nucleotide identity, on the marker gene alignment without 3rd codon positions, upward of 96%. As a result, we obtained 46 genomes of Buchnera, 28 genomes of Prochlorococcus and Synechococcus, 19 genomes of Thermococcus, 18 genomes of Corynebacterium, 22 genomes of Micrococcineae, 33 genomes of Flavobacteriaceae, 20 genomes of Gammaproteobacteria, and 15 genomes of Mycobacteriaceae and Nocardiaceae (Table S4). We predicted gene orthology and carried out alignments as described above. We inferred phylogenetic trees using maximum-likelihood analysis of 30 bacterial phylo- genetic marker genes for Buchnera, and 31 bacterial phylogenetic marker genes for all other lineages. The marker genes were those used in AMPORA2 [70]. The alignment was then recoded into RY (A/G to R, T/C to Y) to avoid bias caused by heterogenous nucle- otide composition in the alignments. Phylogenetic trees were reconstructed using RAxML version 8.2 [73] with the BINGAMMA bi- nary character substitution model.

e3 Current Biology 30, 3848–3855.e1–e4, October 5, 2020 ll Report

Timetree reconstruction of Buchnera and free-living bacteria We used MCMCtree implemented in PAML4 [74] to estimate divergence times, using the alignment generated for the maximum-like- lihood phylogenetic analysis. We used the GTR+G nucleotide substitution model and the log-normal correlated clock model to model rate variation across branches with the following priors: rgene_gamma = 1, 15; sigma2_gamma = 1, 10. The MCMC chains were run for 5.05 3 105 steps and the parameter values were sampled every 50 steps. The first 5,000 steps were discarded as burn-in. We ran two independent MCMC chains with different random seed values, and confirmed convergence. The molecular clocks were cali- brated using two minimum age constraints for Buchnera and one minimum age constraint for Prochlorococcus and Synechococcus (Table S3). For other free-living prokaryote lineages, we set the root of the tree to an arbitrary depth of 1 to obtain time-related branch lengths.

Reconstruction of gene loss We reconstructed the evolution of gene loss using the function ‘‘ace’’ from the R package ape [75]. The presence or absence of each gene was treated independently as a discrete binary character and the ancestral state was estimated using maximum likelihood [84]. For Blattabacterium and Buchnera, the model was specified using the option ‘‘model= matrix(c(0, 1, 0, 0), 2)’’ which assumes no gene gain. For the seven lineages of free-living bacteria, including Prochlorococcus+Synechococcus, Thermococcus, Corynebacterium, Micrococcineae, Flavobacteriaceae, Gammaproteobacteria, and Mycobacteriaceae+Nocardiaceae, we selected the all-rates- different model, which allows unequal rates of gene loss and gain. We ran these analyses on each maximum-likelihood tree. The result of each reconstruction was visualized using the function ‘‘plotTree’’ in the R package phytools [76]. We also used the cumu- lative maximum-likelihood estimate of gene loss to plot the rate of gene loss across each tree.

Correlation of gene loss with evolutionary rate and dN/dS We investigated the relationship between gene loss and evolutionary rate in Blattabacterium using a combination of methods. First, we calculated the Spearman’s rank correlation coefficient between the total number of genes lost by each strain and phylogenetic root-to-tip distances. To correct for the phylogenetic non-independence of data points in our analyses of root-to-tip distances, we calculated for each branch: a) gene-loss rate per unit of time (based on a subset of genes analyzed in BEAST); and b) dN and dS per unit of time, and dN/dS. As we did for Buchnera, we used CD-HIT to remove genomes with upward of 96% nucleotide identity on the rd marker gene alignment without 3 codon positions. We calculated dN, dS and dN/dS using codeml implemented in PAML4 [74] with the F3x4 codon substitution model on a concatenated alignment of 30 core protein-coding genes shared by all Blattabacterium strains. We then carried out phylogenetic generalized least-squares regression using the pgls function implemented in the R package CAPER [77] with lambda value estimation (lambda = ‘‘ML’’). Other parameters were set to default values. To avoid possible bias caused by over-/under-estimation of dS for short/long branches, we performed Spearman’s rank correlation on all branches of the tree. We used the software COEVOL [32] to test for correlation between mutation rates and gene loss. We ran COEVOL twice: once with 1st+2nd codon sites, and once with 3rd codon sites (as proxies for nonsynonymous and synonymous substitution sites, respectively). In addition to the above-described analyses carried out on Blattabacterium only, we carried out two more analyses on all lineages, including Blattabacterium. In these analyses, we used nhPhyML [33]on1st+2nd and 3rd codon sites to correct for potential rate-esti- mation bias associated with heterogeneous GC-content, which we found to be present in all lineages. The branch lengths estimated st nd rd by nhPhyML for 1 +2 and 3 codon sites were used as proxies for dN and dS, respectively. In the first analysis, we used dN and dS values calculated with nhPhyML to estimate dS/time and dN/dS for each branch, except for short branches, whose length estimations are imprecise, and which were removed from the analyses. We then carried out Spearman’s rank correlation on gene-loss/time versus nhPhyML-estimated dS/time and dN/dS. In the second analysis, we carried out partial correlation analyses with the ppcor R package [78]. In the partial correlation analyses, we did not use ratios, such as gene-loss/time, dS/time, and dN/dS, because com- parisons of fractions can generate spurious correlations [34]. Instead, we used residual values obtained from linear regressions that we refer to as controlled variables. We carried out three linear regressions: 1) time against branch lengths calculated for 3rd codon rd positions (as a proxy for dS/time, referred to here as ‘time-controlled dS’); 2) branch lengths calculated for 3 codon positions against st nd branch lengths calculated for 1 +2 codon position (as a proxy for dN/dS, referred to here as ‘dS-controlled dN’), and 3) time against gene loss (referred to as ‘time-controlled gene-loss’). We then carried out Spearman’s rank correlation on time-controlled gene-loss versus time-controlled dS and time-controlled gene-loss versus dS-controlled dN.

Current Biology 30, 3848–3855.e1–e4, October 5, 2020 e4