<<

Genome During Development of in Extracellular Mutualists of Stink Bugs ()

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree in the Graduate School of The Ohio State University

By

Alejandro Otero-Bravo

Graduate Program in Evolution, and Organismal

The Ohio State University

2020

Dissertation Committee:

Zakee L. Sabree, Advisor

Rachelle Adams

Norman Johnson

Laura Kubatko

Copyrighted by

Alejandro Otero-Bravo

2020

Abstract

Nutritional symbioses between and are prevalent, diverse, and have allowed insects to expand their feeding strategies and niches. It has been well characterized that long-term -bacterial mutualisms cause reduction resulting in extremely small , some even approaching sizes more similar to organelles than bacteria. While several symbioses have been described, each provides a limited view of a single or few stages of the process of reduction and the minority of these are of extracellular symbionts. This dissertation aims to address the knowledge gap in the genome evolution of extracellular insect symbionts using the stink bug – system. Specifically, how do these symbionts genomes evolve and differ from their free- living or intracellular counterparts?

In the introduction, we review the literature on extracellular symbionts of stink bugs and explore the characteristics of this system that make it valuable for the study of symbiosis.

We find that stink bug symbiont genomes are very valuable for the study of genome evolution due not only to their biphasic lifestyle, but also to the degree of coevolution with their hosts.

i

In Chapter 1 we investigate one of the traits associated with genome reduction, high rates, for Candidatus ‘Pantoea carbekii’ the symbiont of the economically important insect Halyomorpha halys, the brown marmorated stink bug, and evaluate its potential for elucidating distribution, an analysis which has been successfully used with other intracellular symbionts. We find that while increased mutation rates are present, the symbiont loci are not as effective as host loci for these studies.

In Chapter 2 we sequenced and analyzed the genomes of four tropical stink bug symbionts belonging to the Edessa. The four symbionts show similar levels of genome reduction, reaching 0.8 Mb, five times smaller than the closest free-living relative and over 20% smaller than the genome of P. carbekii. Additionally, we show that they display signatures of amino acid supplementation for their host and that these are a distinct from P. carbekii, indicating convergence in many genomic traits of the symbiosis.

Chapter 3 expands on chapter 2 by describing 11 more genomes of stink bug symbionts with varying degrees of genome reduction, ranging from those of the previously described symbionts of Edessa to genomes equal in size to their free-living relatives. We identify the multiple stages of genome reduction, including an initial massive pseudogenization, followed by progressive degeneration of metabolic pathways including those involved in cell wall components and amino acid biosynthesis.

ii

Finally, Chapter 4 explores a problem with the study of the phylogenetic relatedness of genome reduced symbionts, long branch attraction artifacts, which is prevalent in many of these studies and can have profound implications in addressing the convergence of these traits. We show that our method to identify and analyze separately taxa prone to

LBA can address this issue and perform similarly to more sophisticated but also more computationally expensive methods.

iii

This dissertation is dedicated to my parents and my grandparents

iv

Acknowledgments

This dissertation was possible thanks to the help of many individuals. First, I would like to thank my advisor, Zakee Sabree for taking me as his student and dedicating a great amount of time, resources, and patience to my training. The opportunity to move to

Columbus from Colombia was incredible and will likely be one of the most important events not just professionally, but in all aspects of my life; I am deeply thankful to him for it. I would also like to thank him for his guidance early in my program, his constant enthusiasm, and the freedom he gave me to explore both this system and to learn skills outside of the lab’s focus throughout these last few years.

I thank my fellow lab mates, past and present, particularly Ben Jahnes and Arturo Vera

Ponce de León, for all their help throughout the years and their kindness and willingness to help and discuss my research. I thank my committee members Rachelle Adams, Laura

Kubatko, and Norm Johnson for their input during this project, as well as Shana Goffredi, a member of my Candidacy Committee, who has been vital to this project through samples, discussion, and feedback. I also thank the department of EEOB and GEES for their camaraderie, feedback, and mentorship during the program, especially Megan v

Smith, Ben Stone, Becca Dillon, and Amara Huddleston. I am deeply thankful to Corey

Ash and Chanelle Kinney from the main office, who were exceptional at helping me navigate the complex university bureaucracy.

I acknowledge COLCIENCIAS for the funding that allowed me to focus on my research.

Also, CONAGEBIO and The Organization of Tropical Studies, the staff and crew at La

Selva, as well as Liz Clifton for their support during my fieldwork, the College of Arts and Sciences Unity cluster at OSU for computational resources. I acknowledge the amazing work of Laura Kenyon and Kalia Bistolas on stink bug symbionts, which has been at the core of this work from the beginning. I thank Tim Haye, Don Weber, Michael

Toews, Celeste Welty, and the countless people willing to part with stink bugs from their colonies for the samples they donated to this project.

Finally, I cannot thank my enough for their love, understanding, and support in spite of the distance. They have done more for me than I can list, even during my prolonged absence. Everything I have and will ever achieve is dedicated to my parents and grandparents. I could only take on this giant project because they ensured I could fully dedicate to my studies, supported and encouraged me constantly, and have always celebrated even my smallest achievements. To my partner Mark, I cannot thank you enough for your support, encouragement, and the structure you have given my life since we met. Thank you all for everything.

vi

Vita

2015...... B.S. Biology, Universidad de los Andes

2015...... B.S. Microbiology, Universidad de los Andes

2015-2016 ...... Graduate Fellow, The Ohio State University

2017-2019 ...... Graduate Teaching Associate Department of Evolution, Ecology and Organismal Biology, The Ohio State University

Publications

Otero-Bravo A, Sabree ZL. 2018. Comparing the utility of host and primary loci for predicting global invasive insect genetic structuring and migration patterns. Biol. Control. 116:10–16. doi: 10.1016/j.biocontrol.2017.04.003. Otero-Bravo A, Goffredi S, Sabree ZL. 2018. Cladogenesis and Genomic Streamlining in Extracellular of Tropical Stink Bugs. Genome Biol. Evol. 10:680– 693. doi: 10.1093/gbe/evy033. Otero-Bravo A, Sabree ZL. 2015. Inside or out? Possible genomic consequences of extracellular transmission of crypt-dwelling stinkbug mutualists. Front. Ecol. Evol. 3:1–7. doi: 10.3389/fevo.2015.00064.

Fields of Study

Major Field: Evolution, Ecology and Organismal Biology

vii

Table of Contents

Abstract ...... i Acknowledgments...... v Vita ...... vii Publications ...... vii Fields of Study ...... vii Table of Contents ...... viii List of Tables ...... xii List of Figures ...... xiii Chapter 1 Introduction: Inside or out? Possible genomic consequences of extracellular transmission of crypt-dwelling stink bug mutualists ...... 1 Abstract ...... 1 Introduction ...... 2 Who are the stink bugs? ...... 3 Transmission strategies ...... 4 Coevolution ...... 8 Genome reduction ...... 9 Conclusion ...... 13 Acknowledgements ...... 14 References ...... 15 Figures and Tables ...... 20 Chapter 2 Comparing the utility of host and primary endosymbiont loci for predicting global invasive insect genetic structuring and migration patterns ...... 22 Abstract ...... 22 Introduction ...... 22 Methods ...... 25 Accelerated rate of molecular evolution in P. carbekii ...... 25 Collection and DNA extraction ...... 26 Loci variability in the invaded range ...... 27 Host-symbiont haplotype correlation ...... 27 viii

Comparison of symbiont and host markers ...... 28 Results ...... 29 Acceleration in the rate of molecular evolution in P. carbekii ...... 29 Symbiont haplotype identification...... 30 Host-symbiont haplotype genotyping and correlation...... 31 Discussion ...... 33 P. carbekii has a high rate of molecular evolution as other insect symbionts ...... 33 P. carbekii has lower diversity than H. halys mitochondria ...... 34 Origin of alternate haplotype in North American range ...... 35 Use of P. carbekii as a proxy for host population history...... 36 Conclusion ...... 38 Acknowledgements ...... 38 Funding...... 38 References ...... 39 Web References...... 43 Figures and Tables ...... 44 Chapter 3 Cladogenesis and genomic streamlining in extracellular endosymbionts of tropical stink bugs ...... 49 Data deposition statement ...... 49 Abstract ...... 49 Introduction ...... 50 Materials and Methods ...... 54 Sample collection & Sequencing ...... 54 Assembly and Annotation ...... 55 Phylogenetic Reconstruction ...... 57 Genome Synteny and Average Nucleotide Identity (ANI) ...... 58 Mutation rates ...... 58 Results and Discussion ...... 59 Edessa primary symbiont genomes are highly reduced...... 59 The SoE represent a distinct, rapidly evolving Pantoea ...... 61 SoE Genome Exhibit Considerable Intraspecific Structural Conservation ...... 62 SoE Are Enriched in Functions Supportive of Host-Symbiont Association ...... 64 SoE Can Provide Several Amino Acids, Vitamins and Cofactors ...... 65 ix

SoE Exhibit Intra-Specific Variation and Inter-Symbiont Convergence in Carbon Metabolism ...... 68 Proposal of Candidate names ...... 69 Conclusion ...... 70 Supplementary Material ...... 71 Acknowledgements ...... 71 References ...... 72 Figures and Tables ...... 79 Chapter 4 Multiple stages of genome shrinkage across extracellular stink bug symbionts ...... 88 Introduction ...... 88 Methods ...... 91 Genome sequencing and assembly ...... 91 Annotation and genome comparisons...... 92 Symbiont identification ...... 93 Host mitochondrial phylogeny ...... 94 Results ...... 94 Genome assembly ...... 94 Symbiont genome annotation ...... 95 Host mitochondrial genome sequences and identification ...... 98 Branched chain amino acid biosynthesis pathway ...... 99 LPS and antigen biosynthesis gene loss during genome reduction ...... 100 Discussion ...... 103 Different stages of reduction among pentatomid stink bugs ...... 103 Core genome ...... 107 Amino acid pathway loss ...... 108 Cell membrane component loss during symbiosis ...... 110 Conclusion ...... 113 References ...... 114 Figures and Tables ...... 123 Chapter 5 Identifying independent origins of symbiosis in the presence of long branch attraction artifacts...... 132 Introduction ...... 132

x

Methods ...... 136 Stink bug symbiont genome sequencing, assembly and annotation ...... 136 Ortholog discovery and alignment ...... 137 Tree reconstruction using branch extraction ...... 138 Results ...... 139 Summary statistics discriminate genome reduction even with small alignments ... 139 Branch extraction indicates separate origin for genome reduced symbionts of stink bugs ...... 140 Genome synteny analysis supports some placements ...... 141 Branch extraction produces similar results as more complex models of drawing phylogenetic inferences ...... 141 Discussion ...... 142 Identification of problematic taxa and branch extraction method ...... 142 Detailed evolution models and branch extraction ...... 145 Taxa selection and branch extraction ...... 146 Genome synteny as evidence of placement ...... 147 Importance/Implications of single vs multiple events of symbiosis ...... 147 Conclusion ...... 148 References ...... 150 Figures and Tables ...... 156 References ...... 162 Appendix A: Additional Tables ...... 183

xi

List of Tables

Table 1.1 Summary of primary gammaproteobacterial gut symbionts of stink bugs ...... 20 Table 2.1 Haplotype diversity and nucleotide diversity across seven gene regions in P. carbekii's invaded range ...... 44 Table 2.2 Combination of host haplotypes into final haplotypes ...... 45 Table 2.3 Haplotype diversity and nucleotide diversity for H. halys and P. carbekii...... 46 Table 3.1 Genome statistics for stink bug symbionts, members of the Pantoea, and ...... 79 Table 4.1 Genome characteristics of stink bug symbionts ...... 131

Table A.1. BOLD ID matches for the COX1 gene of sequenced stink bugs ...... 183 Table A.2. Accession numbers used for core and pangenome analysis ...... 184 Table A.3. SINA best match of 16S gene sequence against the SILVA database...... 185

xii

List of Figures

Figure 2.1 Sampling and haplotype distribution for H. halys and P. carbekii ...... 47 Figure 2.2 Haplotype networks for host and symbiont ...... 48 Figure 3.1 A+T% bias in Edessa primary symbiont genomes ...... 80 Figure 3.2 Phylogenetic reconstruction of ...... 81 Figure 3.3 Core genome phylogenetic reconstruction of the Pantoea genus including the Symbionts of Edessa ...... 82 Figure 3.4 Genome synteny for the Symbionts of Edessa ...... 84 Figure 3.5 COG Profiles of SoE compared to other Pantoea ...... 85 Figure 3.6 Metabolic reconstruction of the Symbionts of Edessa ...... 86 Figure 4.1 Consensus cladogram of the Pantoea and genera including stink bug symbionts ...... 123 Figure 4.2 Genomic charatceristics of pentatomid symbionts...... 125 Figure 4.3 Stink bug mitochondrial tree with their respective symbiotic bacterial genome size...... 126 Figure 4.4 Bayesian inference consensus tree for host mitochondrial genomes...... 127 Figure 4.5 Presence of genes involved in branched chain amino acid biosynthesis pathway...... 128 Figure 4.6 Genes involved in the production of Lipid A and attachment of O-antigen . 129 Figure 5.1 Distribution of alignment statistic principal component in relation to genome size...... 156 Figure 5.2 Maximum likelihood reconstruction of Pantoea, Erwinia, and stink bug symbionts...... 157

xiii

Figure 5.3 Branch extraction consensus tree for Pantoea, Erwinia, and stink bug symbionts...... 158 Figure 5.4 Select genome synteny plots for symbionts of stink bugs...... 159 Figure 5.5 Husnik et al. 2011 dataset reconstructed using Maximum Likelihood...... 160 Figure 5.6 Husnik et al. dataset using Elaboorate with the EPA ...... 161

xiv

Chapter 1 Introduction: Inside or out? Possible genomic consequences of extracellular

transmission of crypt-dwelling stink bug mutualists

This chapter was originally published in Frontiers in Ecology and Evolution, volume 23

June 2015 (available through the doi: 10.3389/fevo.2015.00064) and has been modified for this format.

Abstract

Genome reduction has been widely studied in obligate intracellular bacterial mutualists of insects because they have, in comparison to closely-related, nonhost-associated bacteria, extremely small genomes. Pentatomid stink bugs also maintain bacterial symbionts, yet they are extracellular, residing within host-derived crypts, and are transmitted to offspring outside of the host's tissues, which exposes them to the external environment. In this review, we explore how the biphasic lifestyle of stink bug symbionts (e.g. on the surfaces of in various matrices during transmission and inside host-derived tissues during much of the host's life), in contrast with the solely intracellular lifestyle of many insect endosymbionts, may impact their genome's architecture, size and content.

Furthermore, we demonstrate how additional stink bug symbiont genomes are needed to more fully explore these questions and the potential value of the stink bug-symbiont system in understanding genome evolution and reduction in the absence of intracellularity.

1

Introduction

Insects spanning many species-rich groups, including , , and , maintain mutualisms with intracellular bacteria that are essential to their growth and development (Moran et al. 2008). Typical characteristics of these bacterial partners are that they are not cultivable by classical methods, maternally transmitted to offspring (with one reported exception; Watanabe et al, 2014), and have highly reduced,

A+T% biased genomes (Moran and Bennett, 2014). Among the most likely contributors to this latter of obligate endosymbiotic lifestyles are 1) the oft-observed absence of genes encoding DNA repair and recombination functions in endosymbiont genomes (Wernegreen, 2002); 2) relaxed purifying selection due to the stable intracellular conditions (Nikoh et al. 2011); 3) repeated of small, clonal populations with limited-to-no opportunities for gene exchange with other bacteria

(Batut et al. 2014; Wernegreen, 2002) and 4) loss of genes not essential for a strictly intracellular lifestyle (Mira et al. 2001), many of which can be >4,500 nt in length and their loss can contribute significantly to genome size shrinkage (Kenyon and Sabree,

2014).

The abundance of genomes for bacteria occupying a wide array of habitats facilitates comparative analyses that can reveal habitat-specific genomic characteristics. Given genomic evolutionary trends observed with intracellular insect symbionts, we are interested in examining if similar characteristics of an obligate host-bacterial would be observed in insects that harbor symbionts in specialized tissues but transmit

2 them to offspring by alternative modes of inheritance (Hosokawa et al. 2005; Prado and

Almeida, 2009a; Hosokawa et al. 2012a; Hosokawa et al. 2012b; Bansal et al. 2014;

Bistolas et al. 2014). Phytophagous stinkbugs are useful exemplars in that they harbor bacterial symbionts in modified sections of their midgut, called crypts or caeca, and employ a variety of extracellular intergenerational transmission strategies. In this mini- review, we will detail what is known about the genomic architecture and content of stinkbug bacterial symbionts in the context of alternative intergenerational transmission strategies.

Who are the stink bugs?

‘Stink bug’ is the common name given to members of the family Pentatomidae

() due to the production of noxious secretions from abdominal glands.

However, the name has been applied to members of the superfamily

(including , , , , Urostylidae,

Parastrachiidae and ) and even to members of the more general infraorder

Pentatomorpha (including members of the superfamilies , and

Pyrrhocoroidea). Symbionts from members of these groups have been demonstrated to undergo genome reduction (Hosokawa et al. 2010; Nikoh et al. 2011; Kenyon et al.

2015). The present mini-review will focus on the primary extracellular gammaproteobacterial gut symbionts of the Pentatomoidea and include more distant members for comparative purposes.

3

Transmission strategies

The means for reliable transgenerational transmission is necessary for the evolution of stable host-symbiont mutualisms. Stink bug symbionts are not transmitted to offspring during , as observed in obligate intracellular symbionts (e.g. -Buchnera, -Blattabacterium, carpenter -Blochmannia), but rather they are deposited either on or proximal to the surfaces of eggs, outside of the insect, and await their consumption by newly emerged nymphs to complete their transmission. While intracellular symbionts have shown to be housed in specialized bacteriomes (Moran et al.

2008) located inside or adjacent to the reproductive system where they have access to nascent oocytes (Matsuura et al. 2012), stink bug symbionts are restricted to the posterior section of the midgut (Prado and Almeida, 2009a). As the gammaproteobacterial primary symbionts have not been reliably detected in the (see Matsuura et al. 2014 for suggestive evidence) of female pentatomids, the only way for symbionts to be acquired by each generation is through infection of midgut tissues following their consumption by nymphs.

Horizontal (e.g. between nest mates) or vertical (e.g. mother to offspring) transmission of gut symbionts has been observed in insect orders including the Blattodea (Ohkuma,

2008) and Hymenoptera (Anderson et al. 2012) in the form of trophallaxis (exchange of gastric contents) between individuals (Sabree et al. 2012). This method can minimize exposure of symbionts, namely strict anaerobes, to the environment and ensure receipt of viable microbes. However, some or all elements of (e.g. gregariousness,

4 multigenerational colonies, maternal care, brood care) are also necessary but not present in many insect groups. While some stink bug species are regarded as sub-social for their display of behaviors such as gregariousness and alarm (Eurydema rugosa, E. pulchra, viridula, Megalotomus quinquespinosus, Alydus eurinus, A. pilosulus,

Dysdercus intermedius; Bell and Cardé 1984), maternal guarding and food provisioning (Adomerus triguttulus, cinctus, Canthophorus niveimarginatus

Parastrachia japonensis; Filippi et al. 2009) and paternal egg guarding (Lopadusa augur and Edessa nigropunctata; Requena et al. 2010), gut symbiont transmission through trophallaxis has only been sparsely recorded, in Brachypelta atterima (Hosokawa et al.

2012a) and Coriobacterium glomerans (Kaltenpoth et al. 2009).

The most common method for stink bug symbiont transmission, even in species presenting maternal guarding of the nymphs, is through egg smearing, which consists of females smearing egg masses with symbiont-containing secretions immediately following egg deposition. Upon hatching, nymphs consume the maternal secretions and the symbiont traverses the gastric tract and colonizes rows of tissues comprised of crypts in the distal midgut (Prado et al. 2006; Prado and Almeida, 2009b; Kikuchi et al. 2009;

Kikuchi et al. 2012; Kaiwa et al. 2010; Bauer et al. 2014; Kobayashi et al. 2011;

Hosokawa et al. 2013). In this case, the symbiont is exposed to the environment outside the insect for ~1-3 weeks with only the maternal secretions to protect it. After the nymphs hatch, the symbiont must also be able to survive the passage through the immature ’s digestive system up to the last section of the midgut where it colonizes the

5 crypts. Through this strategy, vertically transmitted symbionts experience distinct habitats (e.g maternal tissues-egg surface-gastric tract-distal midgut) that vertically transmitted intracellular symbionts do not. It is expected this multiphasic lifestyle would select for genes that would not be essential for intracellular symbionts (e.g. cell wall biosynthesis) (Moran and Bennett, 2014). This has been evidenced for Candidatus

“Pantoea carbekii” and for the symbiont of Plautia stali, who are transmitted by this means, whose sequenced genomes reveal the presence of genes involved peptidoglycan and cell wall biosynthesis that are not usually present in intracellular mutualists (Kenyon et al. 2015). However, it cannot be conclusively determined that taxa exposed to these distinct habitats will experience habitat-specific selection of loci, given that few complete genomes of extracellular symbionts are currently available. Specifically, as extracellular mutualists have a different transmission strategy, exactly how this exposure to the environment affects genome reduction is currently unknown. Sampling more genomes from symbionts that use this strategy can show if the retention of these genes is a common trend and if the genome reduction for these extracellular mutualists reaches a limit because of this exposure.

Plataspid stink bugs employ a variation on external symbiont transmission that involves production of symbiont capsules that consist of a cuticle-like envelope surrounding a resin-like matrix that contains symbiont cells (Hosokawa et al. 2005). The envelope and matrix protect symbiont cells from exposure. Capsules are placed on the apex of each egg, compelling nymphs to consume the capsule and its symbiont infusion as it emerges.

6

This strategy is highly specialized in that the host organism has had to develop a specialized system for the protection of the symbiont during the vertical transmission phase. The different secretions involved in the capsule are produced in different regions of the midgut and are stored until oviposition, and capsule-based transmission is unique to members of the Plataspidae family (Hosokawa et al. 2005). While this is similar to trophic egg production, which is where infertile eggs are laid along with fertile eggs to be eaten by the nymphs upon hatching for nutritional purposes, capsules are produced by different mechanisms. Urostylid stink bugs use a variation on capsule-based symbiont transmission where the symbiont is embedded in a voluminous gel like substance or 'jelly' that covers the eggs and provides a food source for nymphs during the winter (Kaiwa et al. 2014).

Evolution of the symbiont capsule and jelly structures suggests that reduced environmental exposure provides higher fitness for the host-symbiont association, highlighting that transmission is critical for the symbiotic relationship. These protective structures may also protect the symbionts from contaminating environmental bacteria that could jeopardize the fidelity of the transmission if competing environmental are allowed to replace the symbiont. Also, exposure to UV radiation or desiccation may affect the symbionts before they are able to reach their new hosts or reduce their titers below infective levels. On the other hand, horizontal symbiont transfer among stink bugs has been speculated to happen due to symbiont cross contamination between eggs of different species in where clear host-symbiont coevolution is not

7 evident (Kikuchi et al. 2009). This is especially important when considering that symbiont transfers may grant their new hosts the ability to exploit different plant resources (Hosokawa et al. 2007).

Coevolution

In most reported cases, stink bug symbionts are well-integrated into the growth and development their host. Aposymbiotic individuals commonly display lower fitness parameters such as longer time between molts (Kikuchi et al. 2009), reduced size

(Hosokawa et al. 2006), and lower progeny survivorship (Taylor et al. 2014) (see Table

1.1). While there are examples in which aposymbiotic insects do not display reduced fitness in comparison to their symbiotic counterparts (Prado et al. 2006; Prado and

Almeida, 2009b), it is possible the measured parameters or laboratory conditions, such as a particular diet, would not detect the impact of the loss. Nevertheless, host physiological and behavioral investments in the maintenance of the symbiont support the general hypothesis that the relationship is mutually beneficial. This is further evidenced by a range of traits developed specifically to harbor and stably inherit these symbionts, including organs such as the midgut crypts in most pentatomids, the lubricating organ and further modified isolated midgut crypt in Acanthosomatidae (Kikuchi et al. 2009) and certain pentatomids (Hayashi et al. 2015); secretions such as the symbiont capsule in

Plataspidae (Nikoh et al. 2011), and the nutritional symbiont jelly in Urostylidae (Kaiwa et al. 2014); and even behaviors such as pre-hatching mucus secretion in

(Hosokawa et al. 2012a) and nymphal aggregation or wandering in relation to symbiont

8 acquisition (Hosokawa et al. 2008). Additionally, benefits for the symbiont likely include a regulated and isolated environment, free from host defenses (Futahashi et al. 2013) with a continued provision of nutrients and protection from invasion of competing bacteria.

On a smaller scale, the coevolution of insect symbiont can be evidenced through examples where vertical transmission is stricter. In most cases of simple egg smearing, symbiont and host are consistently grouped as corresponding monophyletic groups at the genus level (Matsuura et al. 2014; Bansal et al. 2014), or even level (Bistolas et al. 2014) even when sampled from distinct geographic locations. However, at the family level this tendency is not always maintained (Hosokawa, et al. 2012b). When symbionts are not inherited but are reacquired from the environment, no such tendencies are evidenced (Kikuchi et al. 2011). However, stricter methods of transmission show a more consistent relationship between host and symbiont phylogeny in the form of co- cladogenesis (Hosokawa et al. 2006; Kikuchi et al. 2009).

Genome reduction

The <1 Mb genome is commonly observed for intracellular obligate endosymbionts

(Moran and Bennett, 2014). Intracellular residence under stable environmental conditions results in relaxed purifying selection that subsequently facilitates mutation fixation and accumulation. Additionally, the loss of DNA repair and recombination mechanisms (as evidenced by a survey of complete genomes), erroneous DNA replication and successive genetic bottlenecks with each generation also contribute to dramatic losses of genes

9 nonessential for the maintenance of the mutualism (Kenyon and Sabree 2014). The abundance of genomes for endosymbionts of insect for which life history information is available, clear habitat conditions (e.g. consistent intracellular localization) and host demands (e.g. phytophagy and low assimilable nitrogen) are strongly correlated with endosymbiont gene content (e.g. maintenance of biosynthesis pathways). Drawing similar conclusions for the crypt-dwelling, extracellular symbionts of stink bugs is limited by the paucity of available genomes.

Currently, complete sequences for four primary bacterial symbionts of stink bug are available: the unnamed symbiont of P. stali (Kobayashi et al. 2011); Megacopta punctatissima symbiont, Candidatus ‘Ishikawaella capsulata’ (Nikoh et al. 2011); H. halys symbiont, Candidatus ‘Pantoea carbekii’ (Kenyon et al. 2015); and Urostylis westwoodii symbiont, Candidatus ‘Tachikawaea gelatinosa’ (Kaiwa et al. 2014). Both I. capsulata and T. gelatinosa have considerably reduced genomes (0.7 and 0.75 Mb, respectively) that largely reflect genic repertoires observed in intracellular symbionts.

During intergenerational transmission, these symbionts are ensconced in host-derived structures (capsules and jelly, respectively) that may significantly reduce the symbiont’s exposure to the external environment, and thus these bacteria may require fewer genome- encoded products to maintain their viability while awaiting nymphal uptake. Conversely,

P. carbekii and P. stali’s show great differences in genome size despite their identical transmission strategy and their host’s phylogenetic closeness, which may indicate different ages of their symbiotic relationships with their hosts. With the exception of the

10

P. stali symbiont, the genomes of all of the stinkbug symbionts lack genes involved in

DNA replication and repair, which is a characteristic shared with intracellular symbiont genomes. It is common in both intra- and extra-cellular symbionts that amino-acid and vitamin synthesis pathways are selectively lost or retained according to the hosts diet

(phytophagy) (Kaiwa, et al. 2014; Kenyon, et al. 2015), which suggests a similar condition in which the host exerts a positive selection for symbiont genes involved in the production of nutrients they cannot obtain from their diets. Extreme genome reduction

(genomes smaller than <0.25 Mb) has only been observed in obligate intracellular symbionts (McCutcheon and Moran, 2011; Moran and Bennett, 2014), yet moderately reduced genomes (0.7-1.2 Mb) are observed in their extracellular counterparts when compared to free-living relatives, namely Pantoea ananatis and E. coli (Table 1.1;

Kenyon et al. 2015). Additional characteristics stink bug symbionts shared with intracellular symbionts are A+T%-bias (with symbiont genomes between 69 and 75%, excluding P. stali’s symbiont) and high rates of sequence evolution (Moran and Bennett,

2014; Kenyon and Sabree, 2014; Hosokawa et al. 2013).

While some intracellular symbionts have lost genes involved in canonical ATP synthesis, the sampled extracellular symbiont genomes have retained them. Host-generated ATP present within cytoplasm is likely sourced by intracellular symbionts for their energetic needs (Moran and Bennett, 2014) while crypt-dwelling symbionts may not have access to and therefore must retain these pathways. Additionally, genes involved in lipid and cell wall biosynthesis tend to be missing in intracellular symbionts, and loss of

11 these is probably enabled by the presence of the host cell membrane that provides the missing protective and transport functions (McCutcheon and Moran, 2011). However, these genes remain present in P. carbekii and P. stali’s symbiont and absent in I. capsulata, which might correspond to the ex vivo experience of these symbionts on egg surfaces during intergenerational transmission while I. capsulata may receive sufficient ex vivo protection by the capsule that alleviates the need for de novo production of these cell wall components (Nikoh et al. 2011). Although extracellular stink bug symbionts do not reside within the host cells, they are often the sole occupants of elaborate specialized host-derived structures, which reflect a significant adaptation and investment of physiological resources for supporting the mutualism.

Given the dynamic life histories of stinkbug symbionts and relatively minimal amount of time needed to generate complete bacterial genomes, these organisms provide an exemplary opportunity to study the effects of genome reduction in extracellular symbionts. While the vertical transmission method is reliable enough for strict mutualisms to develop, it is possible that new, competing microorganisms can occasionally replace the symbiont and start the association anew. As a result, different stink bug symbionts display different degrees of association and reliance on their hosts

(Prado and Almeida, 2009b). This is evidenced by the different degrees of dependence from the host on their symbiont, as shown by the differential survivorship and multiple symptoms of aposymbiotic hosts (see Table 1.1), and the varying levels of genome reduction among the symbionts. Because of this, studying different symbionts could give

12 us detailed information on the different stages of genome reduction as a symbiotic relationship strengthens. On the other hand, they occupy very similar niches (the midgut crypts), they must undergo exposure to the environment while they are transmitted, and they are pressured by their hosts for certain nutritional benefits (see Table 1.1). This translates to a relatively consistent set of evolutionary pressures that grants reproducibility for the comparison of genomes.

Conclusion

The last few years have greatly increased the amount of knowledge on primary extracellular gut symbionts of pentatomid stink bugs. While genome reduction of extracellular symbionts has not achieved the extremes it has on intracellular counterparts, evidence shows intracellularity is not necessary for genome reduction, while placing greater importance on transmission systems. The life history of this particular mutualistic association has shown several factors that allow an unparalleled study on the coevolution of insect-microbe mutualism, and an integrative analysis of this group seems very promising. The few genomes available, while incredibly informative, come from radically different backgrounds that do not allow certain hypotheses to be tested.

Obtaining more complete genomes from these symbionts and comparing them among similar or different life stories, as well as with the several already published genomes from intracellular symbionts, can allow a greater understanding on the requirements for a non-intracellular simplest cell. In conjunction with studies on host diet, physiology, and metabolism, the effects of particular pressures from the host on the symbiont can be

13 elucidated such as the requisite for diet supplementing pathways and mechanisms for host-symbiont communication and nutrient transport between cells.

Acknowledgements

We thank the Ohio Agricultural Research Development Center SEEDS Research

Enhancement-Interdisciplinary Grant and The Ohio State University for funding to

A.O.B. and Z.L.S, respectively.

14

References

Abe, Y., Mishiro, K., and Takanashi, M. 1995. Symbiont of brown-winged green bug, Plautia stali Scott. Japanese J. Appl. Entomol. Zool. 39, 109–115. doi:10.1303/jjaez.39.109.

Anderson, K. E., Russell, J. a., Moreau, C. S., Kautz, S., Sullam, K. E., Hu, Y., Basinger, U., Mott, B. M., Buck, N., and Wheeler, D. E. 2012. Highly similar microbial communities are shared among related and trophically similar species. Mol. Ecol. 21, 2282–2296. doi:10.1111/j.1365-294X.2011.05464.x.

Bansal, R., Michel, A., and Sabree, Z. 2014. The crypt-dwelling primary bacterial symbiont of the polyphagous pentatomid pest Halyomorpha halys (Hemiptera: Pentatomidae). Environ. Entomol., 617–625. Available at: http://www.bioone.org/doi/abs/10.1603/EN13341 [Accessed December 18, 2014].

Batut, B., Knibbe, C., Marais, G., and Daubin, V. 2014. Reductive genome evolution at both ends of the bacterial population size spectrum. 12, 841–850. doi:10.1038/nrmicro3331.

Bauer, E., Salem, H., Marz, M., Vogel, H., and Kaltenpoth, M. 2014. Transcriptomic immune response of the stainer Dysdercus fasciatus to experimental elimination of vitamin-supplementing intestinal symbionts. PLoS One 9, e114865. doi:10.1371/journal.pone.0114865.

Bell, W. J., and Cardé, R. T. 1984. Chemical Ecology of Insects. Springer US doi:10.1016/0305-1978(86)90124-9.

Bistolas, K. S. I., Sakamoto, R. I., Fernandes, J. a M., and Goffredi, S. K. 2014. Symbiont polyphyly, co-evolution, and necessity in pentatomid stinkbugs from Costa Rica. Front. Microbiol. 5, 1–15. doi:10.3389/fmicb.2014.00349.

Filippi, L., Baba, N., Inadomi, K., Yanagi, T., Hironaka, M., and Nomakuchi, S. 2009. Pre- and post-hatch trophic egg production in the subsocial burrower bug, Canthophorus niveimarginatus (: Cydnidae). Naturwissenschaften 96, 201–211. doi:10.1007/s00114-008-0463-z.

Fukatsu, T., and Hosokawa, T. 2002. Capsule-transmitted gut symbiotic bacterium of the Japanese common plataspid stinkbug, Megacopta punctatissima. Appl. Environ. Microbiol. 68, 389–396. doi:10.1128/AEM.68.1.389-396.2002.

Futahashi, R., Tanaka, K., Tanahashi, M., Nikoh, N., Kikuchi, Y., Lee, B. L., and Fukatsu, T. 2013. in gut symbiotic organ of stinkbug affected by extracellular bacterial symbiont. PLoS One 8. doi:10.1371/journal.pone.0064557. 15

Hayashi, T., Hosokawa, T., Meng, X.-Y., Koga, R., and Fukatsu, T. 2015. Female- specific specialization of a posterior end region of the midgut symbiotic organ in Plautia splendens and allied stinkbugs. Appl. Environ. Microbiol. 81, AEM.04057–14. doi:10.1128/AEM.04057-14.

Hosokawa, T., Hironaka, M., Inadomi, K., Mukai, H., Nikoh, N., and Fukatsu, T. 2013. Diverse strategies for vertical symbiont transmission among subsocial stinkbugs. PLoS One 8, 4–11. doi:10.1371/journal.pone.0065081.

Hosokawa, T., Hironaka, M., Mukai, H., Inadomi, K., Suzuki, N., and Fukatsu, T. (2012a). Mothers never miss the moment: A fine-tuned mechanism for vertical symbiont transmission in a subsocial insect. Anim. Behav. 83, 293–300. doi:10.1016/j.anbehav.2011.11.006.

Hosokawa, T., Kikuchi, Y., Nikoh, N., and Fukatsu, T. (2012b). Polyphyly of gut symbionts in stinkbugs of the family Cydnidae. Appl. Environ. Microbiol. 78, 4758–4761. doi:10.1128/AEM.00867-12.

Hosokawa, T., Kikuchi, Y., Nikoh, N., Shimada, M., and Fukatsu, T. 2006. Strict host- symbiont cospeciation and reductive genome evolution in insect gut bacteria. PLoS Biol. 4, 1841–1851. doi:10.1371/journal.pbio.0040337.

Hosokawa, T., Kikuchi, Y., Nikon, N., Meng, X. Y., Hironaka, M., and Fukatsu, T. 2010. Phylogenetic position and peculiar genetic traits of a midgut bacterial symbiont of the stinkbug japonensis. Appl. Environ. Microbiol. 76, 4130–4135. doi:10.1128/AEM.00616-10.

Hosokawa, T., Kikuchi, Y., Shimada, M., and Fukatsu, T. 2007. Obligate symbiont involved in pest status of host insect. Proc. Biol. Sci. 274, 1979–1984. doi:10.1098/rspb.2007.0620.

Hosokawa, T., Kikuchi, Y., Shimada, M., and Fukatsu, T. 2008. Symbiont acquisition alters behaviour of stinkbug nymphs. Biol. Lett. 4, 45–48. doi:10.1098/rsbl.2007.0510.

Hosokawa, T., Kikuchi, Y., Xien, Y. M., and Fukatsu, T. 2005. The making of symbiont capsule in the plataspid stinkbug Megacopta punctatissima. FEMS Microbiol. Ecol. 54, 471–477. doi:10.1016/j.femsec.2005.06.002.

Kaiwa, N., Hosokawa, T., Kikuchi, Y., Nikoh, N., Meng, X. Y., Kimura, N., Ito, M., and Fukatsu, T. 2011. Bacterial symbionts of the giant jewel stinkbug Eucorysses grandis (Hemiptera: Scutelleridae). Zool. Sci. 28, 169-174. doi: 10.2108/zsj.28.169

16

Kaiwa, N., Hosokawa, T., Kikuchi, Y., Nikoh, N., Meng, X. Y., Kimura, N., Ito, M., and Fukatsu, T. 2010. Primary gut symbiont and secondary, -allied symbiont of the scutellerid stinkbug ocellatus. Appl. Environ. Microbiol. 76, 3486– 3494. doi:10.1128/AEM.00421-10.

Kaiwa, N., Hosokawa, T., Nikoh, N., Tanahashi, M., Moriyama, M., Meng, X., Maeda, T., Yamaguchi, K., and Shigenobu, S. 2014. Symbiont-supplemented maternal investment underpinning host’s ecological adaptation. Curr. Biol., 1–6. doi:10.1016/j.cub.2014.08.065.

Kaltenpoth, M., Winter, S. a., and Kleinhammer, A. 2009. Localization and transmission route of Coriobacterium glomerans, the endosymbiont of pyrrhocorid bugs. FEMS Microbiol. Ecol. 69, 373–383. doi:10.1111/j.1574-6941.2009.00722.x.

Kashima, T., Nakamura, T., and Tojo, S. 2006. Uric acid recycling in the shield bug, (Hemiptera: Parastrachiidae), during diapause. J. Insect Physiol. 52, 816-825. doi: 10.1016/j.jinsphys.2006.05.003.

Kenyon, L. J., Meulia, T., and Sabree, Z. L. 2015. Habitat visualization and genomic analysis of “Candidatus Pantoea carbekii,” the primary symbiont of the brown marmorated stink bug. Genome Biol. Evol. 7, 620–635. doi:10.1093/gbe/evv006.

Kenyon, L. J., and Sabree, Z. L. 2014. Obligate insect endosymbionts exhibit increased ortholog length variation and loss of large accessory proteins concurrent with genome shrinkage. Genome Biol. Evol. 6, 763–75. doi:10.1093/gbe/evu055.

Kikuchi, Y., Hosokawa, T., and Fukatsu, T. 2011. An ancient but promiscuous host- symbiont association between gut symbionts and their heteropteran hosts. ISME J. 5, 446–460. doi:10.1038/ismej.2010.150.

Kikuchi, Y., Hosokawa, T., Nikoh, N., and Fukatsu, T. 2012. Gut symbiotic bacteria in the bugs Eurydema rugosa and Eurydema dominulus (Heteroptera: Pentatomidae). Appl. Entomol. Zool. 47, 1–8. doi:10.1007/s13355-011-0081-7.

Kikuchi, Y., Hosokawa, T., Nikoh, N., Meng, X.-Y., Kamagata, Y., and Fukatsu, T. 2009. Host-symbiont co-speciation and reductive genome evolution in gut symbiotic bacteria of acanthosomatid stinkbugs. BMC Biol. 7, 2. doi:10.1186/1741-7007-7-2.

Kobayashi, H., Kawasaki, K., Takeishi, K., and Noda, H. 2011. Symbiont of the stink bug Plautia stali synthesizes rough-type lipopolysaccharide. Microbiol. Res. 167, 48–54. doi:10.1016/j.micres.2011.03.001.

Matsuura, Y., Hosokawa, T., Serracin, M., Tulgetske, G. M., Miller, T. a., and Fukatsu, T. 2014. Bacterial symbionts of a devastating coffee plant pest, the stinkbug 17

Antestiopsis thunbergii (Hemiptera: Pentatomidae). Appl. Environ. Microbiol. 80, 3769–3775. doi:10.1128/AEM.00554-14.

Matsuura, Y., Kikuchi, Y., Hosokawa, T., Koga, R., Meng, X.-Y., Kamagata, Y., Nikoh, N., and Fukatsu, T. 2012. Evolution of symbiotic organs and endosymbionts in lygaeid stinkbugs. ISME J. 6, 397–409. doi:10.1038/ismej.2011.103.

McCutcheon, J. P., and Moran, N. A. 2011. Extreme genome reduction in symbiotic bacteria. Nat. Rev. Microbiol. 10, 13–26. doi:10.1038/nrmicro2670.

Mira, A., Ochman, H., and Moran, N. A. 2001. Deletional bias and the evolution of bacterial genomes. Trends Genet. 17, 589–596. doi:10.1016/S0168- 9525(01)02447-7.

Moran, N. A, McCutcheon, J. P., and Nakabachi, A. 2008. Genomics and evolution of heritable bacterial symbionts. Annu. Rev. Genet. 42, 165–190. doi:10.1146/annurev.genet.41.110306.130119.

Moran, N. A., and Bennett, G. M. 2014. The tiniest tiny genomes. Annu. Rev. Microbiol., 195–215. doi:10.1146/annurev-micro-091213-112901.

Nikoh, N., Hosokawa, T., Oshima, K., Hattori, M., and Fukatsu, T. 2011. of bacterial genome in insect gut environment. Genome Biol. Evol. 3, 702–714. doi:10.1093/gbe/evr064.

Ohkuma, M. 2008. Symbioses of flagellates and prokaryotes in the gut of lower . Trends Microbiol. 16, 345–352. doi:10.1016/j.tim.2008.04.004.

Prado, S. S., and Almeida, R. P. P. (2009a). Phylogenetic placement of pentatomid stink bug gut symbionts. Curr. Microbiol. 58, 64–69. doi:10.1007/s00284-008-9267-9.

Prado, S. S., and Almeida, R. P. P. (2009b). Role of symbiotic gut bacteria in the development of hilare and histrionica. Entomol. Exp. Appl. 132, 21–29. doi:10.1111/j.1570-7458.2009.00863.x.

Prado, S. S., Rubinoff, D., and Almeida, R. P. P. 2006. Vertical transmission of a pentatomid caeca-associated symbiont. Ann. Entomol. Soc. Am. 99, 577–585. doi:10.1603/0013-8746(2006)99[577:VTOAPC]2.0.CO;2.

Requena, G. S., Nazareth, T. M., Schwertner, C. F., and Machado, G. 2010. First cases of exclusive paternal care in stink bugs (Hemiptera: Pentatomidae). Zool. (Curitiba) 27, 1018–1021. doi:10.1590/S1984-46702010000600026.

Tada, A., Kikuchi, Y., Hosokawa, T., Musolin, D. L., Fujisaki, K., and Fukatsu, T. 2011. Obligate association with gut bacterial symbiont in Japanese populations of the

18

southern green stinkbug (Heteroptera: Pentatomidae). Appl. Entomol. Zool. 46, 483-488. doi: 10.1007/s13355-011-0066-6.

Taylor, C. M., Coffey, P. L., DeLay, B. D., and Dively, G. P. 2014. The importance of gut symbionts in the development of the brown marmorated stink bug, Halyomorpha halys (Stål). PLoS One 9. doi:10.1371/journal.pone.0090312.

Sabree, Z. L., Huang, Y. C., Arakawa, G., Tokuda, G., Lo, N., Watanabe, H., and Moran, N.A. 2012. Genome shrinkage and loss of nutrient-providing potential in the obligate symbiont of the primitive Mastotermes darwiniensis. Appl. Environ. Microbiol. 78, 204-210. doi: 10.1128/AEM.06540-11.

Watanabe, K., Yukuhiro, F., Matsuura, Y., Fukatsu, T., and Noda, H. 2014. Intrasperm vertical symbiont transmission. Proc. Natl. Acad. Sci. U. S. A. 111, 7433–7. doi:10.1073/pnas.1402476111.

Wernegreen, J. J. 2002. Genome evolution in bacterial endosymbionts of insects. Nat. Rev. Genet. 3, 850–861. doi:10.1038/nrg931.

19

Figures and Tables

Table 1.1 Summary of primary gammaproteobacterial gut symbionts of stink bugs Genome Symbiont Nutritional Host insect species Host insect family Symbiont size Symptoms of aposymbiotic hosts References transmission benefits (Mb) Nezara viridula Pentatomidae Egg smear n.a. n.a. n.a. High nymphal mortality Tada et al. 2011 Acrosternum n.a. n.a. n.a. Prolonged development, lower Prado and Almeida Pentatomidae Egg smear hilare survivorship 2009b Murgantia n.a. n.a. n.a. Prado and Almeida Pentatomidae Egg smear Prolonged development histrionica 2009b Eurydema rugosa n.a. n.a. n.a. Prolonged development, different color, Kikuchi et al. 2012 Pentatomidae Egg smear

20 & E. dominulus lower weight

Halyomorpha Candidatus Amino acids Lower progeny survivorship and Kenyon et al. 2015; Pentatomidae Egg smear 1.2 halys "Pantoea carbekii” and vitamins prolonged development Taylor et al. 2014 Kobayashi et al. Unnamed (Plautia Vitamins (A, Plautia stali Pentatomidae Egg smear 3.8 Fewer individuals reaching adulthood 2011; stali symbiont) E, Carotene) Abe et al. 1995 Scutelleridae Egg smear n.a. n.a. n.a. n.a. Kaiwa et al. 2010 Eucorysses Unconfirmed Kaiwa et al. 2011 Scutelleridae n.a. n.a. n.a. n.a. grandis egg smearing Continued

20

Table 1.1 continued

Elasmotethus, Kikuchi, et al. 2009 Candidatus Acanthosoma, Fewer individuals reaching adulthood, Acanthostomatidae Egg smeara “Rosenkranzia 0.9* n.d. Sastragala, prolonged development, different color clausaccus” Lindbergicoris Hosokawa et al. 2010; Egg smear just Candidatus Parastrachia Uric acid High mortality during overwintering Hosokawa et al. Parastrachiidae before “Benitsuchiphilus 0.85* japonensis recycling diapause in adulthood 2012a; hatching tojoi” Kashima et al. 2006 Megacopta Candidatus Nikoh et al. 2011; 21 Amino acids Lower body size, different color,

punctatissima & Plataspidae Capsule "Ishikawaella 0.75 Fukatsu and and vitamins, prolonged development M. cribaria capsulata" Hosokawa, 2002 Candidatus Kaiwa et al. 2014 Urostylis Amino acids High nymphal mortality, lower body Jelly “Tachikawaea 0.70 westwoodii and vitamins sizeb gelatinosa” a Uses a closed midgut crypt and modified organs for transmission of symbiont. b Symptoms of individuals without access to nutritional ‘jelly’ which provides both nutrition and symbiont cells. n.a.-not available. n.d.-not determined. *-approximate genome size as determined by pulsed field gel electrophoresis

21

Chapter 2 Comparing the utility of host and primary endosymbiont loci for predicting

global invasive insect genetic structuring and migration patterns

Note: This chapter was originally published in Biological Control, volume 116 January

2018 (pages 10-16 available through the doi: 10.1016/j.biocontrol.2017.04.003) and has been modified for this format.

Abstract

Halyomorpha halys, commonly known as the Brown Marmorated Stink Bug, is a highly polyphagous invasive pest introduced from East Asia into North America and Europe. It harbors ‘Candidatus Pantoea carbekii’, an obligately-associated, vertically-inherited gammaproteobacterial mutualist. We evaluated the use of this symbiont as a proxy for measuring host diversity, distribution, and phylogeography. Despite the symbiont’s accelerated molecular evolution, the symbiont genome shows relatively lower genetic diversity and structuring compared to the host mitochondrial genome in both native and invaded ranges. Therefore, we conclude that P. carbekii is not as effective as the host mitochondria for determining recent host population history and migration.

Introduction

The Brown Marmorated Stinkbug, Halyomorpha halys (Pentatomidae), (henceforth called BMSB) is an invasive pest native to eastern Asia that has been recently introduced into North America and Europe. BMSB is highly polyphagous, attacking a wide range of

22 plants from up to 45 different families (Lee et al. 2013), including many economically important such as , soybean, and corn (Leskey et al. 2012). Initially detected in North America in Allentown, PA in 1996 (Hoebeke and Carter, 2003), and in Europe in Zurich-Seefeld, Switzerland in 2004, it has since rapidly expanded, reaching 43 US states and two Canadian provinces (StopBMSB, June 2016) in North America and 8 countries in Europe, being widespread in Switzerland (CABI, 2016).

BMSB harbors a primary, obligately-associated, vertically-transmitted endosymbiont,

‘Candidatus Pantoea carbekii’ (henceforth called P. carbekii), that is the sole inhabitant of host midgut gastric invaginations called caeca (Bansal et al. 2014). P. carbekii is vertically transmitted from mother to offspring through symbiont-enriched gastric secretions posteriorly deposited on the eggs. As only a subset of the total maternally- associated P. carbekii population is transmitted to offspring, a population bottleneck that is commonly observed in insect endosymbionts (Wernegreen, 2015) is created each generation for the symbiont, leading to reduced effective population size and magnified impacts of (Wernegreen, 2015). On the other hand, given the extracellular nature and transfer of this symbiont there is a possibility of some degree of horizontal transfer between individuals (Kikuchi et al. 2009).

P. carbekii exhibits many traits typically observed in bacterial mutualists of insects: no in vitro cultivation, bears a relatively reduced genome (1.15 Mb) that retains many essential and nonessential amino acid and vitamin biosynthesis pathways (Bansal et al. 2014;

23

Kenyon et al. 2015), and its presence enables the host to develop normally (Taylor et al.

2014). Other stinkbugs from the Pentatomidae have shown similar associations with extracellular gut symbionts (Otero-Bravo and Sabree, 2015), and display a monophyly of symbionts within the species, genus, and sometimes subfamily of the host indicative of co-speciation (Bistolas et al. 2014; Duron and Noël, 2016). Additionally, other genome- reduced bacterial symbionts of insects show accelerated rates of molecular evolution

(Hosokawa et al. 2013, 2006; Kikuchi et al. 2009; Nikoh et al. 2011).

Mitochondrial loci are often used for tracing the movement and origins of introduced species due to their relatively rapid rates of molecular evolution compared to nuclear loci.

Previous studies have used BMSB mitochondrial regions (COI, CYTB, ITS1, 12S+CR) to identify host diversity, population history, distribution, spread, and possible source populations in the three continents (Gariepy et al. 2013; Xu et al. 2014; Zhu et al. 2016).

A different strategy that has not been exploited in BMSB is to use symbiont loci for the same purpose. Symbionts, particularly parasitic species, have successfully been used to trace the source populations and movement of their hosts (Nieberding and Olivieri,

2007). While symbionts may or may not show similar phylogeographic patterns as their hosts (Espíndola et al. 2014), they can have different rates of molecular evolution than their hosts (Hafner et al. 1994) and in some cases show even higher resolution than host loci (Criscione et al. 2006; Funk et al. 2000). This can be due to higher structuring of the symbiont population within the hosts’ sub-population, different life history parameters

24 such as generation time and effective population size, and the symbiont’s method of transmission between the host individuals.

P. carbekii has some life history characteristics such as stable vertical transmission, and a different effective population size from its host that make it a prime candidate for its usage as a proxy for understanding BMSB population dynamics. We investigated whether the rate of molecular evolution of P. carbekii was accelerated as in the other symbionts, which would further indicate its use as a useful marker for the host. Then, we evaluated the genetic diversity and distribution of P. carbekii and its relation to the host genetic diversity obtained with mitochondrial markers in populations across its introduced range, as well as a population in its native range identified as the most likely source of the American invasion (Xu et al. 2014). We hypothesized that the symbiont, due to a high rate of molecular evolution, would show greater and geographic structuring relative to its host and allow a more detailed picture of its distribution and spread.

Methods

Accelerated rate of molecular evolution in P. carbekii

To compare the rate of sequence evolution between P. carbekii and its congenerics, we calculated the average pairwise identity of the entire 16S region for two datasets: a) only congenerics (hereafter called ‘Pantoea’) of P. carbekii and b) congenerics and members of the sister genus Erwinia (hereafter called ‘P+E’). We extracted all sequences from the

25

SILVA database (Quast et al. 2013) belonging to the taxonomical classification Pantoea, and Erwinia with sequence length >1500 and sequence quality >90 preserving common gaps. We performed the Tajima relative rate test (Tajima, 1993) on the 16S sequences, as well as the conserved genes rpoB and dnaE as implemented in the R package ‘pegas’ using a variety of outgroups. We evaluated significance under an alpha value of 0.05 as well as a Bonferroni corrected alpha value when comparing simultaneously across multiple samples.

Collection and DNA extraction

Individuals were sampled from three populations of the United States: California

(38°33’20” N, 121°28’08” W), Ohio (40°48’33” N, 81°56’14” W), and Michigan

(42°35’38” N, 86°6’13” W) (n=34) (see Bansal, et al. 2014), one population from their native range in (Langfang, Hebei province, n=18), and three populations in

Switzerland (Canton Ticino, Lugano; Canton Basel, Basel; and Canton Zurich, Zurich, n=18). Samples were stored in 70% ethanol before being transported. Individuals were rinsed in 70% ethanol, and dissected with sterile forceps to extract the V4 region of the midgut. DNA extraction was done with the Qiagen DNEasy Blood and Tissue kit according to the manufacturer’s instructions. For host loci, DNA was extracted from muscle tissue or used from the same DNA as the symbiont loci. PCR was performed with

GoTaq Green Master Mix from Promega and amplicons were cleaned with the Zymo

Research DNA Clean and Concentrator before Sanger sequencing in both directions.

26

Quality trimming, alignment, and base calling were done on Geneious 8.1.8. Unique haplotypes were amplified and sequenced twice to account for PCR errors.

Loci variability in the invaded range

Five loci identified as putative pseudogenes from the annotation of the P. carbekii genome (ΔybgF, ΔftsN, ΔspeA, Δtransglosylase C, ΔyigL) and a 345 bp region of the 16S rRNA (coordinates 649995 to 650339 from NZ_CP010907) were selected to identify variability in the symbiont populations in the invaded range. Primers were designed using

Primer3 as implemented in Geneious 8.1.8. 30 individuals from 3 locations in the United

States (California, Ohio, Michigan) were sequenced for all loci. PCR conditions consisted of an initial denaturation at 95°C for 3 min, followed by 30 cycles consisting of 95°C for

30 s, 50°C for 30 s, and 72°C for 30 s, and a final extension at 72°C for 2 min. Sequences were uploaded to GenBank with accession numbers KY379170-KY379176. Sequences were aligned to the previously sequenced genomes (NZ_CP010907 and NC_022547).

Nucleotide diversity and haplotype diversity were compared where possible, and linkage disequilibrium and correlation for was calculated using DnaSP 5.10 (Librado and Rozas, 2009).

Host-symbiont haplotype correlation

Two more markers were selected based on high diversity identified between the two sequenced genomes of P. carbekii: NC_022547 from Tsukuba, Japan and NZ_CP010907 from Wooster, Ohio, United States. Previous studies using host markers have identified

27 these populations as separate with some gene flow (Xu et al. 2014; Zhu et al. 2016). The two regions chosen were the hypothetical protein CDS with similarity to a primosomal protein or “primo” and the gene for 2-oxoglutarate dehydrogenase subunit E1 (odhA).

The primo region is a 232 bp region with 3 SNPs between the two sequenced strains, while odhA is a 408 bp region with 4 SNPs between the two strains, both significantly higher than the average of 1 SNP per kb identified in the whole genome (Kenyon et al.

2015). PCR conditions consisted of an initial denaturation at 94°C for 3 min, 11 touchdown cycles consisting of 94°C for 30 s, 63°C for 30 s, decreasing by 1°C each cycle, and 72°C for 45 s, followed by 19 regular cycles with 94°C for 30 s, 53°C 30 s, and 72°C for 45 s, and a final extension at 72°C for 5 min. Sequences were uploaded to

GenBank with accession numbers KY379163-KY379169. Nucleotide diversity, haplotype diversity, and population diversity were compared where possible and linkage between mutations was calculated with DnaSP v5.10.1 (Librado and Rozas, 2009).

Comparison of symbiont and host markers

Two mitochondrial loci, COII and 12SCR, used by Xu et al. 2014 were chosen to compare with symbiont loci. PCR conditions were as described in the original paper with the following modifications: the first step was replaced with a 4 cycle touchdown starting with an annealing temperature of 52°C, decreasing by 1°C each cycle, and also increasing the second step’s annealing temperature to 49°C. Nucleotide diversity, haplotype diversity, and population diversity were compared where possible and linkage between

28 mutations was calculated with DnaSP. Haplotype networks and analysis of molecular variance (AMOVA) were done using Popart 1.7.2 (Leigh and Bryant, 2015).

Results

Acceleration in the rate of molecular evolution in P. carbekii

We obtained 482 16S sequences for the Pantoea dataset and 606 for the P+E dataset. For the six 16S sequences available on the SILVA database for P. carbekii, the average pairwise identity to all other sequences varied between 93.5 and 94%, while average pairwise identities between the other sequences ranged from 96.5 to 98%. When using the second dataset, P+E, that included Erwinia, the average divergence for P. carbekii’s sequences ranged between 93 and 94%, while for all other sequences ranged from 96 to

98%.

When comparing P. carbekii to 18 other strains of the Pantoea+Erwinia clade using the relative rate test with E. coli as an outgroup, significant p-values were obtained even when using the Bonferroni correction, the largest being 0.000379 for Pantoea ananatis

LMG 20103. When compared to other strains of the Enterobacteria using Pseudomonas protegens Pf-5 we only found significance with the corrected p-value to Candidatus

‘Ishikawaella gelatinosa’, another stinkbug endosymbiont. Comparisons to another endosymbiont in the dataset, Buchnera aphidicola from Aphis glycines was only significant under the uncorrected alpha level. When using Proteus vulgaris, a basal member of the Enterobacteriaceae, significant p-values are obtained for all strains except

29

Citrobacter rodentium, when using an uncorrected alpha value, but not all if using a corrected alpha value. If using closer outgroups such as Yersinia pestis KIM10+ or

Serratia marcescens, significance is found for all strains even using the corrected alpha value. We found similar results when doing the test on the coding genes rpoB and dnaE, with the exception that comparisons to P. vulgaris were not significant as opposed to the most distant outgroup Aeromonas hydrophila. For comparison, the same test for Pantoea rodasii and Pantoea dispersa do not show significance for most comparisons even using the uncorrected alpha value.

Symbiont haplotype identification

We sequenced amplicons generated for six P. carbekii loci (regions within five putative pseudogenes: ΔybgF, ΔftsN, ΔspeA, Δtransglosylase C, ΔyigL; and a 345 bp region within the 16S rRNA) in three populations in the United States. Of these loci, only ΔybgF showed any variability, with one segregating site dividing populations into two haplotypes. Parameters estimated for these populations using this region are shown in

Table 2.1. Due to the lack of variability in the remaining markers, two more P. carbekii loci, odhA, and primo, were used. odhA, primo, and ΔybgF loci were sequenced from multiple individuals obtained from the native (Chinese, CH, n=13) and invaded

(American, Ohio, n=3) populations. Two haplotypes were identified, with 4 SNPs separating them across the three genes, one SNP in ΔybgF, one SNP in the primo region, and two SNPs in the odhA region. In all cases, each gene had two versions which were always correlated with the versions in the other genes (correlation coefficient R=1.000,

30

Chi-Square test p < 0.0001 for all six pairwise comparisons). Nucleotide diversity in the source population was 0.00191 if considering all three regions. Therefore, we identified these two haplotypes as P1 and P2 and used only the ΔybgF gene for further analysis.

Haplotype distributions for ΔybgF are shown in Figure 2.1a-c

Host-symbiont haplotype genotyping and correlation

Using the previous samples and additional ones we identified host haplotypes using the markers described by Xu et al. 2014 and the symbiont haplotypes using ΔybgF per individual. We identified a total of two symbiont haplotypes (P1-P2), three host 12S haplotypes (12S-A, 12S-B, 12S-C) and four host COII haplotypes (CO2-A, CO2-B,CO2-

C, CO2-D) which combined yielded six host haplotypes (named Hh1 through Hh6, see

Table 2.2). The two symbiont haplotypes were identified in the native population, while all individuals from America contained P1 and those from Europe contained P2. Only one host haplotype was found in the American population (Hh1) which was also present in the Chinese population in a high proportion (41%), while three host haplotypes were found in the European populations (Hh3, Hh4 and Hh6). The population from Canton

Zurich, in the north of Switzerland, contained two host haplotypes, Hh3 and Hh6, with only Hh3 being present in the native population. The population from Ticino, in southern

Switzerland contained one haplotype, Hh4, which was also present in the native population. The Chinese population showed the greatest diversity with 5 of the 6 haplotypes identified (not including Hh6). Haplotype and nucleotide diversity based for all populations is summarized in Table 2.3. Haplotype network of the host genes is also

31 shown in Figure 2.2. The AMOVA for host genes yielded a ΦST of 0.45 indicating a large genetic differentiation with p<0.001, and for symbiont populations yielded a ΦST of 0.83 indicating even larger genetic differentiation with p<0.001.

The symbiont P1 haplotype was detected in the Hh1 and Hh2 hosts and the P2 haplotype was present in the Hh3-6 hosts. While the 2x5 contingency table does not have sufficient statistical power due to a low number of individuals (not all expected outcomes were above 5), the calculated correlation value is 1. The correlation can be observed with host and symbiont haplotype networks as shown in Figure 2.2.

Since the quality of the sequenced regions for the host genes were not uniform around the flanks, the sequence used was only 351 bps for 12S and 322 bps for COII (as opposed to

552 and 534 bps for the previously published sequences, respectively). When comparing the sequenced regions to the sequences obtained by Xu et al. 2014, the 43 unique haplotypes from these authors (named H1 through H43) were collapsed into 24 unique haplotypes. After collapsing these haplotypes, no collapsed group contained sequences collected from more than one of the three main regions sampled (China, Korea, Japan).

Five of the six identified haplotypes in this study were identical to one of the haplotypes from the previous study, with Hh1 corresponding to either H2 or H23, Hh2 to either H1,

H25, or H26, Hh3 to one of H7, H14, H21, H22, Hh4 to one of H3, H4, H5, H12, H16,

H18, H19, and Hh6 to H35 or H40. Hh5 was not identical to any of the published sequences.

32

The host haplotype found in America, Hh1, is identical to H2, the haplotype previously identified as the dominant haplotype in invasive American populations. The host haplotype recovered from the Canton Ticino, in the south of Switzerland, as well as from one individual from the native population, Hh4 is identical to haplotype H3, which has been previously found in this location and in the native population (Gariepy et al. 2015).

The haplotype Hh3 found in most individuals of the population in the Canton Zurich, north of Switzerland, was not identical to any haplotype identified in this region, but was identical to several ones identified previously in China (H7, H14, H21, H22). The other haplotype found in this location as a singleton, Hh6, which was not identified in the source population, was identical to H35 and H40 both haplotypes identified from

Japanese populations.

Discussion

P. carbekii has a high rate of molecular evolution as other insect symbionts

P. carbekii showed a high 16S divergence from other members of its genus and the sister genus Erwinia. The 16S divergence is higher between P. carbekii and its congenerics than the average difference between Pantoea and Erwinia species. However, phylogenetically, P. carbekii is firmly placed in the Pantoea + Erwinia clade using multilocus sequence analysis (Kenyon et al. 2015). Relative rate tests show that P. carbekii has a significantly different rate of molecular evolution to other members of the

Pantoea and Enterobacteriaceae even when comparing at multiple levels of closeness.

33

We found that choosing outgroups too distant to the focal taxa when using this test could result in all comparisons being significant or none at all, even when using stringent corrections for multiple comparisons. We evaluated different outgroups and found that P. carbekii has a different rate than its congenerics and other members of the

Enterobacteriaceae, similar to other insect endosymbionts (Hosokawa et al. 2013, 2006;

Kikuchi et al. 2009; Nikoh et al. 2011).

We found that for the 16S gene as well as two coding regions, rpoB and dnaE, P. carbekii also has a different rate than I. capsulata and B. aphidicola, two insect symbionts with accelerated mutation rates. However, in these cases, both other symbionts have higher rates than P. carbekii. This also correlates with the fact that while P. carbekii’s genome is 1.15 Mb, I. capsulata’s is 0.75 and B. aphidicola’s is 0.64. This is consistent with the hypothesis that longer, more stable relationships yield smaller genomes and simultaneously faster rates of molecular evolution (Hosokawa et al. 2016).

P. carbekii has lower diversity than H. halys mitochondria

We identified two symbiont haplotypes in 69 individuals spanning a native population and six invaded populations, while in 32 individuals we identified 6 host haplotypes.

Haplotype and nucleotide diversity was shown to be higher for host loci than for symbiont loci in all but one exception (the Californian population). However, results taken together show that P. carbekii has a lower genetic diversity and structure than its host H. halys. Only two haplotypes were identified for P. carbekii in both the native and

34 the introduced populations. Six haplotypes were identified for the host in the same number of individuals and a similar length of sequence using mitochondrial loci. The invaded range exhibited reduced diversity, with two host haplotypes and one symbiont haplotype when considering samples sequenced for both organisms, and two haplotypes for the symbiont when including samples for which host haplotype identification was not possible. This lack of diversity is consistent with an introduced population with few introduction events (Xu et al. 2014). The European range of invasion displays more variability than the American, which has been previously assessed multiple times (Cesari et al. 2015; Gariepy et al. 2015, 2013) and exhibits the possibility of multiple introductions. Additionally, the presence of two host haplotypes in the invaded North

American range was identified previously (Xu et al. 2014). In this case, one haplotype was present in both eastern and western US populations while the second one was only present in eastern US populations. In our study, we were only able to sample host haplotypes in regions where the previous study found only one of the two haplotypes.

Considering that the previous study used specimens sampled between 2006 and 2008, while our specimens were sampled between 2013 and 2015 we can see that despite other nearby populations having a different haplotype, it has not migrated in sufficient numbers to be detected by our sampling.

Origin of alternate haplotype in North American range

Of the two identified P. carbekii haplotypes, P1 was previously sequenced as part of the

P. carbekii sequencing project. An alternate haplotype, P2, differed from P1 by only one

35

SNP in the ΔybgF region, one SNP in the primo region, and two SNPs in the odhA region. When aligning the sequenced regions to the second sequenced genome of P. carbekii (AP012554), from Tsukuba, Japan, we see that the new SNPs are always present in the second genome. However, the Japan haplotype had additional SNPs in the sequenced regions, which means the is likely old and represents part of the native diversity.

The P2 haplotype was found in the native population of China, in the region of Beijing, which is theorized as the source of the North American introduction. This provides further evidence of this area as the source of the invasion. However, more information on the distribution of the symbiont haplotypes in other areas may be needed. In our study, we detected the two mitochondrial host haplotypes detected by Xu et al. (2014) as present in North America, but one of them only in the source population. Both of these haplotypes were always simultaneously present with the symbiont haplotype P1 in the

Beijing population. Therefore, the introduction of these two host haplotypes may not be enough to explain the distribution of symbiont haplotypes in North America, unless there was a horizontal symbiont transfer between hosts prior to the invasion.

Use of P. carbekii as a proxy for host population history

Expected elevated mutation rates and/or population structuring of bacterial symbionts of invasive insects have provided attractive characteristics for their use in predicting host migration routes and identifying source populations, and the utility of these

36 characteristics have been assessed multiple times using mostly parasites of these (Criscione et al. 2006; Nieberding and Olivieri, 2007). Other vertically inherited symbionts have shown co-divergence with their hosts (Degnan et al. 2004; Hosokawa et al. 2012), and sometimes similar degrees of resolution at the intra-species level, as is the case of aphid symbionts (Funk et al. 2000). We report that P. carbekii exhibits elevated mutation rates, yet its genes provide no greater resolution for population structuring than those from the BMSB mitochondria. Other symbionts have also shown lower genetic structure than their host. In the case studied by (Anderson et al. 2004), hemipteran symbionts associated to a semi- host displayed lower genetic diversity than the host. These were attributed to the breeding systems and dispersal capabilities of these organisms. Since H. halys relies on the consumption of maternal secretions after to acquire P. carbekii, as well as the gregariousness of this insect, it is possible that one female’s depositions could infect a different one’s offspring. This scenario has been used to explain observed incomplete cophylogeny between other stinkbugs and their respective symbionts (Kikuchi et al. 2009). While the probability of this happening has not been directly assessed, small rates of migration of the symbiont between host individuals could help explain reduced genetic structuring.

The lack of new mutations in the pseudogenes for P. carbekii show that even with the fast mutation rate of this symbiont and in regions where selection is relaxed, there has been no appearances of new mutations that have reached sufficient proportion in the population to be detected. This can be due to the time scale evaluated, with the estimated

37 invasion of America being estimated to be 20 years. Additionally, the difference between symbiont haplotypes and mitochondrial haplotypes may be due to the fact that while they are both maternally inherited, there is a higher probability for the symbiont to undergo .

Conclusion

The results shown here indicate that the use of P. carbekii loci is not as effective as the use of host mitochondrial loci to infer host population history, source populations, and spread, in spite of the symbiont containing the hallmarks of a potential proxy: high mutation rates, obligate mutualism, and vertical inheritance. Whether this is specific to P. carbekii or to other extracellular, externally transmitted symbionts such as other stink bug symbionts remains to be found.

Acknowledgements

We would like to thank Tim Haye for the donation of specimens from Europe and China.

Funding

This work was supported by the Ohio Agricultural Research Development Center SEEDS

Research Enhancement-Interdisciplinary Grant, The Ohio State University, and the

Colombian Administrative Department of Science, Technology and Innovation

COLCIENCIAS.

38

References

Anderson, B., Olivieri, I., Lourmas, M., Stewart, B.A., 2004. Comparative population genetic structure and local adaptation of two mutualists. Evolution (N. Y). 58, 1730–

1747.

Bansal, R., Michel, A., Sabree, Z., 2014. The Crypt-Dwelling Primary Bacterial

Symbiont of the Polyphagous Pentatomid Pest Halyomorpha halys (Hemiptera:

Pentatomidae). Environ. Entomol. 617–625.

Bistolas, K.S.I., Sakamoto, R.I., Fernandes, J. a M., Goffredi, S.K., 2014. Symbiont polyphyly, co-evolution, and necessity in pentatomid stinkbugs from Costa Rica. Front.

Microbiol. 5, 1–15. doi:10.3389/fmicb.2014.00349

Cesari, M., Maistrello, L., Ganzerli, F., Dioli, P., Rebecchi, L., Guidetti, R., 2015. A pest alien invasion in progress: potential pathways of origin of the brown marmorated stink bug Halyomorpha halys populations in Italy. J. Pest Sci. (2004). 88, 1–7. doi:10.1007/s10340-014-0634-y

Criscione, C.D., Cooper, B., Blouin, M.S., 2006. Parasite identify source populations of migratory more accurately than fish genotypes. Ecology 87, 823–828. doi:[823:PGISPO]2.0.CO;2

Degnan, P., Lazarus, A., Brock, C., Wernegreen, J., 2004. Host-Symbiont Stability and

Fast Evolutionary Rates in an Ant-Bacterium Association: Cospeciation of Camponotus

Species and Their Endosymbionts, Candidatus Blochmannia. Syst. Biol. 53, 95–110. doi:10.1080/10635150490264842

39

Duron, O., Noël, V., 2016. A wide diversity of Pantoea lineages are engaged in mutualistic symbiosis and cospeciation processes with stinkbugs. Environ. Microbiol.

Rep. doi:10.1111/1758-2229.12432

Espíndola, A., Carstens, B.C., Alvarez, N., 2014. Comparative phylogeography of mutualists and the effect of the host on the genetic structure of its partners. Biol. J. Linn.

Soc. 113, 1021–1035. doi:10.1111/bij.12393

Funk, D.J., Helbling, L., Wernegreen, J.J., Moran, N.A., 2000. Intraspecific phylogenetic congruence among multiple symbiont genomes. Proc. Biol. Sci. 267, 2517–21. doi:10.1098/rspb.2000.1314

Gariepy, T.D., Bruin, A., Haye, T., Milonas, P., Vétek, G., 2015. Occurrence and genetic diversity of new populations of Halyomorpha halys in Europe. J. Pest Sci. (2004). 88,

451–460. doi:10.1007/s10340-015-0672-0

Gariepy, T.D., Haye, T., Fraser, H., Zhang, J., 2013. Occurrence, genetic diversity, and potential pathways of entry of Halyomorpha halys in newly invaded areas of Canada and

Switzerland. J. Pest Sci. (2004). 87, 17–28. doi:10.1007/s10340-013-0529-3

Hafner, M., Sudman, P., Villablanca, F., Spradling, T., Demastes, J., Nadler, S., 1994.

Disparate rates of molecular evolution in cospeciating hosts and parasites. Science (80-. ).

265, 1087–1090. doi:10.1126/science.8066445

Hoebeke, E.R., Carter, M.E., 2003. Halyomorpha halys (Stål) (Heteroptera:

Pentatomidae): a polyphagous plant pest from Asia newly detected in North America.

Proc. Entomol. Soc. Washingt. 105, 225–237.

40

Hosokawa, T., Hironaka, M., Inadomi, K., Mukai, H., Nikoh, N., Fukatsu, T., 2013.

Diverse Strategies for Vertical Symbiont Transmission among Subsocial Stinkbugs.

PLoS One 8, 4–11. doi:10.1371/journal.pone.0065081

Hosokawa, T., Kikuchi, Y., Nikoh, N., Shimada, M., Fukatsu, T., 2006. Strict host- symbiont cospeciation and reductive genome evolution in insect gut bacteria. PLoS Biol.

4, 1841–1851. doi:10.1371/journal.pbio.0040337

Hosokawa, T., Matsuura, Y., Kikuchi, Y., Fukatsu, T., 2016. Recurrent evolution of gut symbiotic bacteria in pentatomid stinkbugs. Zool. Lett. 2, 24. doi:10.1186/s40851-016-

0061-4

Hosokawa, T., Nikoh, N., Koga, R., Satô, M., Tanahashi, M., Meng, X.-Y., Fukatsu, T.,

2012. Reductive genome evolution, host–symbiont co-speciation and uterine transmission of endosymbiotic bacteria in bat flies. ISME J. 6, 577–587. doi:10.1038/ismej.2011.125

Kenyon, L.J., Meulia, T., Sabree, Z.L., 2015. Habitat Visualization and Genomic

Analysis of “Candidatus Pantoea carbekii,” the Primary Symbiont of the Brown

Marmorated Stink Bug. Genome Biol. Evol. 7, 620–635. doi:10.1093/gbe/evv006

Kikuchi, Y., Hosokawa, T., Nikoh, N., Meng, X.-Y., Kamagata, Y., Fukatsu, T., 2009.

Host-symbiont co-speciation and reductive genome evolution in gut symbiotic bacteria of acanthosomatid stinkbugs. BMC Biol. 7, 2. doi:10.1186/1741-7007-7-2

Lee, D.-H., Short, B.D., Joseph, S. V, Bergh, J.C., Leskey, T.C., 2013. Review of the biology, ecology, and management of Halyomorpha halys (Hemiptera: Pentatomidae) in

China, Japan, and the Republic of Korea. Environ. Entomol. 42, 627–641.

41

Leigh, J.W., Bryant, D., 2015. popart : full-feature software for haplotype network construction. Methods Ecol. Evol. 6, 1110–1116. doi:10.1111/2041-210X.12410

Leskey, T.C., Hamilton, G.C., Nielsen, A.L., Polk, D.F., Rodriguez-Saona, C.,

Christopher Bergh, J., Ames Herbert, D., Kuhar, T.P., Pfeiffer, D., Dively, G.P., Hooks,

C.R.R., Raupp, M.J., Shrewsbury, P.M., Krawczyk, G., Shearer, P.W., Whalen, J.,

Koplinka-Loehr, C., Myers, E., Inkley, D., Hoelmer, K. a., Lee, D.H., Wright, S.E., 2012.

Pest status of the brown marmorated stink bug, Halyomorpha halys in the USA. Outlooks

Pest Manag. 23, 218–226. doi:10.1564/23oct07

Librado, P., Rozas, J., 2009. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25, 1451–2. doi:10.1093/bioinformatics/btp187

Nieberding, C.M., Olivieri, I., 2007. Parasites: proxies for host genealogy and ecology?

Trends Ecol. Evol. 22, 156–165. doi:10.1016/j.tree.2006.11.012

Nikoh, N., Hosokawa, T., Oshima, K., Hattori, M., Fukatsu, T., 2011. Reductive evolution of bacterial genome in insect gut environment. Genome Biol. Evol. 3, 702–714. doi:10.1093/gbe/evr064

Otero-Bravo, A., Sabree, Z.L., 2015. Inside or out? Possible genomic consequences of extracellular transmission of crypt-dwelling stinkbug mutualists. Front. Ecol. Evol. 3, 1–

7. doi:10.3389/fevo.2015.00064

Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., Peplies, J.,

Glockner, F.O., 2013. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596. doi:10.1093/nar/gks1219

42

Tajima, F., 1993. Simple methods for testing the molecular evolutionary clock hypothesis. Genetics 135, 599–607.

Taylor, C.M., Coffey, P.L., DeLay, B.D., Dively, G.P., 2014. The importance of gut symbionts in the development of the brown marmorated stink bug, Halyomorpha halys

(Stahl). PLoS One 9. doi:10.1371/journal.pone.0090312

Wernegreen, J.J., 2015. Endosymbiont evolution: predictions from theory and surprises from genomes. Ann. N. Y. Acad. Sci. 1360, 16–35. doi:10.1111/nyas.12740

Xu, J., Fonseca, D.M., Hamilton, G.C., Hoelmer, K. a., Nielsen, A.L., 2014. Tracing the origin of US brown marmorated stink bugs, Halyomorpha halys. Biol. Invasions 16, 153–

166. doi:10.1007/s10530-013-0510-3

Zhu, G.-P., Ye, Z., Du, J., Zhang, D.-L., Zhen, Y.-H., Zheng, C.-G., Zhao, L., Li, M., Bu,

W.-J., 2016. Range wide molecular data and niche modeling revealed the Pleistocene history of a global invader (Halyomorpha halys). Sci. Rep. 6, 23192. doi:10.1038/srep23192

Web References

CABI, 2016. Invasive species compendium. Wallingford, UK: CAB International. www.cabi.org/isc

StopBMSB, 2016. Stop BMSB: Biology, ecology, and management of brown marmorated stink bug in specialty crops. www.stopbmsb.org

43

Figures and Tables

Table 2.1 Haplotype diversity and nucleotide diversity across seven gene regions in P. carbekii's invaded range California Ohio Michigan Hd π n H Hd π n H Hd π n H ΔybgF 0.533 0.00896 10 2 0 0 10 1 0 0 10 1 Δagmanitase 0 0 10 1 0 0 10 1 0 0 10 1 ΔftsN 0 0 10 1 0 0 10 1 0 0 10 1 ΔspeA 0 0 10 1 0 0 10 1 0 0 10 1 Δtransglycosylase 0 0 10 1 0 0 10 1 0 0 10 1 C ΔyigL 0 0 10 1 0 0 10 1 0 0 10 1 16S 0 0 10 1 0 0 10 1 0 0 10 1 Hd: Haplotype diversity; π: nucleotide diversity; n: number of samples; H: number of haplotypes detected.

44

Table 2.2 Combination of host haplotypes into final haplotypes Identical previously identified Location of previously identified CO2 12S Total haplotypes haplotypes

A A Hh1 H2, H23 China, United States

A B Hh2 H1, H25, H26 China, United States

B A Hh3 H7, H14, H21, H22 China

H3, H4, H5, H12, H16, H18, B B Hh4 China H19

C C Hh5 - -

D B Hh6 H35, H40 Japan

45

Table 2.3 Haplotype diversity and nucleotide diversity for H. halys and P. carbekii. Halyomorpha halys Pantoea carbekii 12S CO2 Concatenated ybgF primo odhA Concatenated Hd π n H Hd π n H Hd π n H Hd π n H Hd π n H Hd π n H Hd π n H

0.522 0.0016 32 3 0.558 0.00197 32 4 0.783 0.00178 31 6 0.469 0.00317 69 2 0.315 0.0014 17 2 0.431 0.00211 24 2 0.325 0.00161 16 2

Total

Regions

0.582 0.00182 18 3 0.386 0.00148 18 3 0.68 0.00155 18 5 0.308 0.00185 17 2 0.363 0.00156 14 2 0.309 0.00151 17 2 0.385 0.00191 13 2

China

Native

Beijing

0 0 6 1 0.333 0.00104 6 2 0.333 0.0005 6 2 0 0 10 1 ------

Zurich

47

0 0 4 1 0 0 4 1 0 0 4 1 0 0 4 1 - - - - 0 0 4 1 - - - -

Europe

Ticino

------0 0 4 1 ------

Basel

Introduced

0 0 4 1 0 0 4 1 0 0 4 1 0 0 14 1 0 0 3 1 0 0 3 1 0 0 3 1

OH

------0 0 10 1 ------

MI

America ------0.533 0.00896 10 2 ------

CA Hd: Haplotype diversity; π: nucleotide diversity; n: number of samples; H: number of haplotypes detected.

46

Figure 2.1 Sampling and haplotype distribution for H. halys and P. carbekii (a-c) Map for P. carbekii haplotypes in (a) China, (b) Switzerland, and (c) the United

States using the ΔybgF gene. Each slice represents an individual and different colors represent different haplotypes. Correlation of host-symbiont haplotypes in (d) China, (e)

Switzerland, and (f) the United States per individual, with each slice representing an individual, inner slices representing the symbiont haplotype and outter slices representing the host haplotype.

47

Figure 2.2 Haplotype networks for host and symbiont Minimum spanning haplotype network of the ΔybgF gene (right) and the COII + 12SCR regions (left). Size of the circles represents the number of individuals and tick marks represent substitutions between sequences. Colors represent locations sampled, CH=

China; OH= Ohio, USA; ZU= Canton Zurich, Switzerland; TI= Canton Ticino,

Switzerland; BA= Canton Basel. Boxes indicate haplotypes that coexist within individuals.

48

Chapter 3 Cladogenesis and genomic streamlining in extracellular endosymbionts of

tropical stink bugs

Note: This chapter was originally published in Genome Biology and Evolution, volume

10, issue 2, February 2018 (pages 680–693 available through the doi:

10.1093/gbe/evy033) and has been modified for this format.

Data deposition statement

These Whole Genome Shotgun projects have been deposited at GenBank under the accessions PDKR00000000, PDKS00000000, PDKT00000000, and PDKU00000000.

The versions described in this paper are versions PDKR01000000, PDKS01000000,

PDKT01000000, and PDKU01000000. Sequence alignments for phylogenetic reconstruction as well as scripts used in this paper are available under DOI

10.6084/m9.figshare.5700718 and 10.6084/m9.figshare.5700721.

Abstract

Genome reduction and degeneration has been widely characterized in intracellular symbionts of insects but relatively few examples of extracellular insect symbionts with reduced genomes are available. Many phytophagous stink bugs harbor extracellular, vertically inherited bacterial symbionts, some of which have undergone extensive genome reduction. However, those available are taxonomically distant and preclude

49 analyses of closely related genome reduced symbionts. To address this gap we present the complete de novo genome sequencing of four bacterial symbionts of stink bugs of the neotropical genus Edessa. The four symbionts show similar levels of genomic reduction, reaching 0.8 Mb, five times smaller than the closest free living relative. Phylogenetic analysis revealed that like other stink bug symbionts, these form a clade within the

Pantoea genus. Furthermore, genome synteny analysis as well as a jackknife approach for phylogenetic reconstruction were able to overcome long branch attraction artifacts and show that these four symbionts form a single symbiotic event distinct from that of previously sequenced stink bug symbionts. We additionally present their inferred metabolic capabilities indicating a shift in genomic composition characteristic of its lifestyle, as well as a comparative genomic analysis of symbionts spanning three host taxonomic levels: genus, family, and superfamily. This study significantly expands the availability of sequence data for a relatively under-sampled group. Finally, we propose the candidate name ‘Candidatus Pantoea edessiphila’ for the species of these symbionts with strain designations according to its host species.

Introduction

True bugs (Hemiptera), including aphids, cicadas, and stink bugs, as a group are known to maintain ancient mutualist associations (>100 million year old) with 1-2 obligate bacterial species. The study of vertically-inherited bacterial symbionts has yielded surprising discoveries about genome evolution and the minimal requirements for cell subsistence (Wernegreen 2015). Obligately host-associated, vertically-inherited

50 bacterial symbionts have undergone dramatic genome reduction in which they retain the relatively few genes necessary for their symbiotic lifestyle (e.g. primary metabolism and supplementation of host metabolism such as amino acid or cofactor biosynthesis) while losing the bulk of the genes typically found in closely related free-living bacteria. Many of the genes that are lost encode functions that have been inferred to be tangential to the mutualism (Kenyon & Sabree 2014; Moran et al. 2008). Genome reduction in insect endosymbionts has been largely characterized in hemipterans (i.e and

Sternorrhyncha) with strictly intracellular mutualist bacteria that are vertically transmitted within host tissues (intra-egg to bacteriome), yet studies focusing on symbionts that are extracellularly and vertically transmitted, as observed in stink bugs

(; Heteroptera), are conspicuously lacking.

Stink bugs typically transmit their primary bacterial symbionts via nymphal consumption of symbiont-laden secretions that have been deposited on the surfaces of eggs by gravid females (Prado et al. 2006; Kikuchi et al. 2009; Kaiwa et al. 2010; Otero-Bravo & Sabree

2015; Salem et al. 2015). Primary symbionts traverse the gastric tract following their consumption to reside within a modified posterior midgut, typically referred to as the V4 or M4 region, that contain invaginations or “crypts” that are specialized to house bacterial symbionts (Buchner 1965; Gordon et al. 2016). Prolonged developmental time, aberrant behavior and reduced fecundity have been observed in several pentatomid stink bugs deprived of their primary symbiont and essential nutrient supplementation is among the

51 functions that they have been hypothesized to perform (Abe et al. 1995; Prado et al.

2006; Kikuchi et al. 2009; Tada et al. 2011; Taylor et al. 2014; Bistolas et al. 2014).

Molecular characterization of pentatomid symbionts, largely using the 16S rRNA gene, has demonstrated that the primary symbionts of these insects are members of the Pantoea genus (Bansal et al. 2014; Bistolas et al. 2014; Duron & Noël 2016). Successful efforts thus far to cultivate Pantoea symbionts from pentatomids have been limited to phyllospheric Pantoea species that could infect Nezara viridula digestive tissues

(Esquivel & Medrano 2014) and four Pantoea species that cooccurred with one of two additional uncultivable Pantoea (Hosokawa et al. 2016). Interestingly, the draft genomes of the uncultivable Plautia stali Pantoea symbionts were up to 50% reduced in comparison to the cultivable P. stali Pantoea symbionts, which supports previous work demonstrating that primary insect mutualists with reduced genomes have lost the genes necessary for growth independent of their hosts (Wernegreen 2015). Unlike the relatively well-sampled primary insect mutualists of members of the Auchenorrhyncha and

Sternorrhyncha, genomic sequencing of stink bug primary insect mutualists has been limited to a few exemplars, including those associated with P. stali, spanning several stink bug families: ‘Can. Ishikawaella capsulata’ (Megacopta cribaria; Plataspididae)

(Hosokawa et al. 2006), ‘Can. Tachikawaea gelatinosa’ (Urostylis westwoodii;

Urosylididae) (Kaiwa et al. 2014), and ‘Can. Pantoea carbekii’ (Halyomorpha halys;

Pentatomidae) (Kenyon et al. 2015). The paucity of primary insect mutualists genomes available for hosts, especially within a single insect family, precludes identifying patterns

52 of genome evolution under the divergent symbiont transmission modality observed in stink bugs.

The genus Edessa is an exclusively neotropical stink bug genus that is comprised of over

300 species that exhibit significant morphological and ecological diversity, and the genus includes all but a few members of the subfamily Edessinae (Panizzi & Grazia 2015).

Although the ecological roles of the Edessa are poorly described (Silva & Oliveira 2010;

Panizzi & Grazia 2015), E. rufomarginata and E. meditabunda are generalists (Rizzo &

Saini 1987) and potentially depend on several plants and distinct plant tissues for complete development (Panizzi & Machado-Neto 1992). Nonetheless, Edessa feed on plant material that are likely limited in essential amino acids and vitamins, and thus may rely upon primary bacterial symbionts to provision these and other nutrients limited in their host’s diet.

While symbiont transmission has not been previously documented in Edessa, maternal secretions on recently laid eggs, as is typical of pentatomids, is observed. Symbiont transmission by egg smearing is widespread throughout the Pentatomidae, and symbiont localization in the crypts has been documented for E. bella (Bistolas et al. 2014), both of which suggests symbiont transmission and localization to be likely the same for other

Edessa. This study reports the de novo genome sequencing of the primary bacterial symbionts of four Edessa spp. (Pentatomidae). Phylogenetic and metabolic pathway inferences are used to describe Edessa primary symbiont evolution within the context of

53 free-living Pantoea and previously described symbionts of stink bugs. Overall, this is the first comparative genomic analysis of stink bug primary symbionts that span stink bug hosts within the Edessa, Pentatomidae and Pentatomorpha. Additionally, primary symbiont genome evolution is detailed in this relatively under-sampled bacterial-insect symbiosis in the context of obligately intracellular insect endosymbionts.

Materials and Methods

Sample collection & Sequencing

Four Edessa species local to the La Selva Biological Station, Costa Rica were studied: E. eburatula, E. bella (previously called “Edessa sp. nov 1” in Bistolas et al. 2014 and formally described in Fernandes Marin et al. 2015), E. loxdalii and Edessa n. sp.

Although the first two were previously characterized as each having a primary symbiont

(Bistolas et al. 2014), the latter two edessines were found to harbor related symbionts in this study. Specimens were collected in June 2015, and preserved in 70% ethanol until dissection. Individuals were rinsed three times with filtered 70% ethanol before dissection, and the V4 region of the midgut was removed and subjected to DNA extraction using the DNEasy Blood and Tissue kit (Qiagen) with RNAse treatment.

Illumina libraries were created using the Nextera XT DNA Library Prep kit and sequenced using Illumina MiSeq sequencer to generate 2x300 bp paired-end reads at the

Ohio State University Molecular and Cellular Imaging Center.

54

Assembly and Annotation

Reads were evaluated with FastQC (Andrews, 2010) and trimmed using Trimmomatic

0.36 (Bolger et al. 2014) to trim Illumina indices and low quality bases. Individual libraries were assembled using SPAdes 3.5.0 (Bankevich et al. 2012). Bandage version

0.8.0 (Wick et al. 2015) was used to indicate possible plasmids as circular contigs, and infer connections between contigs and scaffolds based on the assembly deBruijn graphs

(See Supplementary Material: Genome Assembly). Additionally, all contigs were assigned to kingdom-level taxonomic bins by querying the NCBI GenBank ‘nt’ database using BLASTN (default parameters) with the contigs and separated based on the kingdom of the subject sequence according to the NCBI GenBank ‘taxdb’ database.

Within each assembly, eight scaffolds could be distinguished from the rest due to read coverage binning and BLAST hit results, and, given that these scaffolds were linked in the deBruijn graphs, these results suggested that near-complete SoE genomes were captured (See Supplementary Material: ‘Genome Assemblies’). Each assembly resulted in a draft genome comprised of six scaffolds ranging between 47 and 378 kbp, each with an average per-base coverage ranging between 86x and 206x, and two shorter scaffolds

(~1,700 and ~3,600 bp) of higher coverage (270x-1,837x and 289x to 1932x, respectively) that corresponded to the 16S rRNA and the 5S+23S rRNA gene regions

(See Supplementary Material: Assembly Statistics). As the ends of several of the genome-comprising scaffolds contained regions of the rRNA gene, it is likely that the scaffolds for each genome could not be resolved into circular due to the

55 rRNA gene operons being too long to allow in silico resolution of assembly conflicts.

Putative plasmids were detected in all of the genome projects and were comprised of single scaffolds that had paired reads that mapped at the ends of the scaffold, had best

BLAST hits either to Bacteria or previously identified plasmids and generally had average per base read coverage that was higher than observed in chromosomal scaffolds.

Coding regions were identified using Prokka (Seemann 2014), which uses Prodigal

(Hyatt et al. 2010) for CDS prediction, RNAmmer (Lagesen et al. 2007) for ribosomal genes, Aragorn (Laslett & Canback 2004) for tRNAs, SignalP (Petersen et al. 2011) for signal peptides, and Infernal (Kolbe & Eddy 2011) for non-coding RNAs. Functional annotation was performed in Prokka with the UniProt (Consortium 2017), Pfam (Finn et al. 2014), and TIGRFAMs (Selengut et al. 2007) databases. Additional functional annotation was performed using the KEGG (Kanehisa et al. 2016) and COG (Galperin et al. 2015) databases with KAAS (Moriya et al. 2007) and cdd2cog (Leimbach, 2016), respectively. Genomes and annotations were inspected and manually edited in Geneious v8 (Kearse et al. 2012). Metabolic reconstruction was performed using the KEGG and

MetaCyc (Caspi et al. 2016) databases and web portals. Pseudogenes were identified by the NCBI Prokaryotic Genome Annotation Pipeline (PGAP) and by comparing annotated protein length to the reference length of each protein.

Metabolic reconstructions of the Edessa symbionts were compared to publicly available primary stink bug symbiont genomes: P. carbekii (GenBank accession number

NZ_CP010907.1), ‘Candidatus Ishikawaella capsulata’ (henceforth I. capsulata,

56

NZ_AP010872.1) and ‘Candidatus Tachikawaea gelatinosa’ (henceforth T. gelatinosa,

NZ_AP014521.1). To compare the functional profiles of the Edessa symbiont genomes, we also generated the COG profiles with the same method for the five available complete reference genomes of non-insect associated species of the genus Pantoea: (NZ_CP016889), P. ananatis (NC_013956), P. stewartii (NZ_CP017581),

P. rwandensis (NZ_CP009454), and P. vagans (NC_014562). We compared the profiles using R v3.3.3 (R Core Team 2017) and used a two-tailed nested ANOVA for comparisons between symbionts and non-symbionts.

Phylogenetic Reconstruction

Multilocus bacterial phylogenies using protein coding sequences from up to 35 members of the Enterobacteriaceae lato (See Supplementary Table S2 for genome accession numbers), including Edessa primary symbionts, were generated using RAxML

(Stamatakis 2014) for maximum likelihood estimation, FastTree for approximate maximum likelihood (Price et al. 2010) and PhyloBayes (Lartillot et al. 2007) for

Bayesian inference. Alignments on gene clusters were done using TranslatorX (Abascal et al. 2010) and inspected manually and concatenated in Geneious v8.

Bayesian and maximum likelihood phylogenies were reconstructed using eleven protein coding genes from all four Edessa primary symbionts and 35 non-primary insect symbiont members of the Enterobacteriaceae to contextualize them within the

Enterobacteriaceae (See Supplementary Material S1 and Tables S3-4 for assessment of the phylogenetic runs). Next, we inferred the placement of the Edessa primary symbionts

57 within the genus Pantoea using maximum likelihood and approximate maximum likelihood methods on 322 shared protein coding regions between the Edessa primary symbionts and 122 Pantoea species and strains (See Supplementary Material S6-S10).

Additionally, individual Edessa primary symbiont phylogenies were reconstructed using a subset of representative Enterobacteriaceae species exclusive of all other reduced genome insect symbionts to address potentially confounding effects of using one or more symbionts in the inference of the position of the other symbionts (Husník et al. 2011).

Genome Synteny and Average Nucleotide Identity (ANI)

Inter-Edessa primary symbiont pairwise genome alignments were made using LastZ

(parameters: step size =20, –nogapped –notransition) and plotted using R to identify syntenic regions (Harris 2007). Similar methods were used to identify syntenic regions between an exemplar Edessa primary symbiont and four non-insect primary symbiont

Pantoea species (i.e. P. stewartii, P. agglomerans, P. rwandensis and P. ananatis) and P. carbekii. Average Nucleotide Identity (ANI) was calculated for whole genomes using

JSpecies (Richter & Rosselló-Móra 2009) with the ‘nucmer’ option as a way to assess evolutionary divergence from the previously named Pantoea species.

Mutation rates

Synonymous (dS) and nonsynonymous (dN) substitution rate values were calculated for all shared genes between the four symbionts of Edessa, as well as P. agglomerans, P. ananatis, P. stewartii, and P. rwandensis. Genes were identified by using reciprocal best

58 hit blast searches between all pairs of genome, custom Python and R scripts, and PAML

(Yang 2007), using codeml (parameters: runmode -2, codonfreq 2, nssites 0, model 2).

For each of these we generated the maximum likelihood non-rooted RAxML tree from 5 tree searches. The omega ratio (dN/dS) and dS values were compared between the symbionts of the Edessa and the non-insect associated Pantoea. Genes with saturated values (>3.0) were excluded from the analysis. Additionally, the COG category as annotated by the COG database was used to evaluate differences according to gene function. Two tailed nested ANOVA tests were done in R to evaluate significant differences between symbiotic and non-symbiotic taxa.

Results and Discussion

Edessa primary symbiont genomes are highly reduced

Full genome data was obtained for four Edessa bacterial symbionts, and hereafter referred to collectively as the ‘Symbionts of Edessa’ (SoE) and individually as SoEL

(symbiont of E. loxdalii), SoEE (symbiont of E. eburatula), SoEO (symbiont of E. bella), and SoET (symbiont of Edessa n sp.). In each assembly, eight scaffolds were consistently recovered that comprised the chromosomal genome and the assemblies appeared to break at rRNA operons. Additionally, three plasmids, pSOE1 (12.8 – 13.7 kb), pSOE2 (3.4 -

4.3 kb) and pSOE3 (2.9 - 3.5 kb) were identified in all four Edessa genome projects, while a fourth plasmid, pSOE4 (2.6 – 3.0 kb), was identified only in SoEL, SoEE and

SoEO.

59

Gene was largely maintained across pSOE1, with the following loci being represented: RepB replication protein (repB), a thiamine biosynthesis operon

(thiCDEFGHS), glutamate dehydrogenase (gdhA), oxalate decarboxylase (oxdD), the phenolic acid decarboxylase C (bsdC), and two hypothetical proteins. Plasmids pSOE2, pSOE3, and pSOE4 also exhibited high gene order conservation and contained the genes for repA, a hypothetical protein, and either the phospholipase D precursor gene (pld) in pSoE2, the heat shock protein (ibpA) and site-specific recombinase (hin) in pSOE3, or a

YGGT family protein gene in pSOE4. Given the presence and functional conservation of pSOE1-pSOE3 in all four symbionts, it is likely these were ancestral prior to Edessa speciation and contribute to the maintenance of the mutualism.

The SoE genome sizes ranged from 0.81 to 0.83 Mb, which are smaller than other primary symbiont genomes (e.g. P. carbekii: 1.2 Mb; Plautia stali: 2.4-5.5

Mb), but slightly larger than the genomes of more distantly related pentatomorpha stink bug primary symbionts (e.g. I. capsulata: 0.75 Mb; T. gelatinosa: 0.71 Mb) (Table 3.1).

SoE genomes exhibited A+T% biases (~27% G+C%) that were generally lower than all other stink bug symbionts and other nonhost-restricted Pantoea (Figure 3.1), yet they were more similar to the A+T% bias observed in the genomes of the gamma proteobacterium symbionts of aphids, Buchnera aphidicola (~25% G+C%; based on 27

Buchnera genomes deposited in NCBI GenBank). Protein coding density in SoE genomes was greater than in P. carbekii and generally more similar to the pentatomorpha stink bug primary symbionts.

60

The SoE represent a distinct, rapidly evolving Pantoea species

Phylogenies inferred by two methods (RAxML and Phylobayes) consistently placed the common ancestor of the Edessa symbionts within the Enterobacteriaceae sensu lato, what is now considered the Erwiniaceae (Adelou et al. 2016). They are firmly placed amongst the Pantoea (Figure 3.2), similarly to P. carbekii, the various symbionts of P. stali, and previous phylogenetic reconstructions of other stink bug symbionts (Duron & Noël 2016;

Bistolas et al. 2014; Kenyon et al. 2015). In order to disentangle the effects of long branch attraction artifacts, several Pantoea phylogenies were reconstructed using a jackknife or ‘leave-one-out’ approach, where a single SoE symbiont was used at a time to evaluate if the SoE shared a recent common ancestor with P. carbekii (See

Supplementary Material S2-S5 and S7-10). By this approach, all four SoE symbionts were consistently placed at the base of a P. ananatis-P. stewartii-P. agglomerans clade in trees constructed either with individual or all SoE symbionts, but excluding P. carbekii

(Figure 3.3). Meanwhile, similar phylogenies generated with P. carbekii, but excluding all SoE symbionts, placed P. carbekii in a distinct clade closer to P. dispersa, P. rodasii, and P. rwandensis, indicating that this symbiont would have originated from a different clade of the Pantoea. Although Pantoea phylogenies that included both SoE and P. carbekii yielded a single clade comprised exclusively of these taxa the authors warn against inferring ancestral relationships from these results as this is likely a long branch attraction artifact, which is common in insect symbiont phylogenies and distorts conclusions made about their relatedness (Ruano-Rubio & Fares 2007).

61

The branch of the SoE is significantly longer than the branch for any other non-symbiotic

Pantoea indicating an accelerated rate of evolution, characteristic of genome reduced symbionts. Synonymous substitutions along the branch leading up to all four SoE were found to be saturated (dS>3.0), disallowing direct comparisons to non-symbiotic members. Comparisons between rates found between symbionts and between distantly related non-symbiotic species of Pantoea yielded no significant differences between groups when separated by COG category (p-value >>0.05) except for category R –

General Function Prediction Only. However, taken together, the omega values from the

SoE are slightly higher than for the non-insect associated Pantoea. No single gene showed a dN/dS value higher than 1. This does not necessarily mean that no genes are under strong selection, but is consistent with hypotheses implicating genetic drift as having a significant impact on insect symbiont genome evolution.

SoE Genome Exhibit Considerable Intraspecific Structural Conservation

All four SoE genomes were nearly completely syntenic to each other and exhibited a highly conserved gene order, with a single 25 kb inversion in scaffold 1 of SoEL being observed (Figure 3.4a). Fifteen genes, including most of the biotin biosynthesis operon, were found within this inverted region. On the other hand, reduced genomic architectural conservation was observed when an exemplar of the SoE genomes (SoEE) was compared to P. carbekii (Figure 3.4b), contrasting with the high conservation amongst the SoE.

Alignments of the Edessa symbionts were compared to alignments made for non-host

62 restricted Pantoea (Figure 3.4c), and although a general trend of gene order is evident, a large number of recombination events are present. This suggest that the genomic rearrangements between the four SoE and P. carbekii are likely due to recombination events that occurred in their ancestors before they were associated with their hosts or at very early stages of their association with their host, as seen in nodule symbionts

(Pinto-Carbó et al. 2016). This supports the hypothesis that SoE represents an independent origin of symbiosis from a non-symbiotic ancestor and not from P. carbekii.

Genome architectural conservation has been observed for several obligate intracellular insect symbionts with few or no deviations from complete colinearity (Tamas et al. 2002;

Degnan et al. 2005; Sabree et al. 2010) and has been attributed to the loss of phages, mobile elements, and genes encoding recombination proteins, and/or with virtually no interaction with other bacterial species due to their host-restricted lifestyle. Extracellular symbionts on average have less mobile elements than facultative intracellular symbionts, but more than obligate intracellular symbionts (Newton & Bordenstein 2011). While stability tends to be the trend, several factors can alter this stability even in very restricted symbionts, such as the presence of large repetitive intergenic regions as in Portiera

(Sloan & Moran 2013), long and variable host life as in Hodgkinia (Campbell et al.

2015), genome fusion and posterior degeneration as in Tremblaya (Gil et al. 2017).

However, the absence of recombination events among the SoE indicate surprising stability and sparse contact with other bacteria, despite not only being extracellular symbionts, but being exposed to the outside of their host and forced to go through the

63 digestive tract, without being encapsulated during transmission like other extracellular symbionts of comparable genome size (Hosokawa et al. 2006; Kaiwa et al. 2014). This mechanism would increase chances of the SoE coming in contact with other related bacteria as well as open the possibility of symbiont replacement. The fact that these symbionts have reached this level of reduction indicates there is likely high pressure from the host to maintain strict inheritance or a mechanism for symbiont sorting, such as the one found in bean bugs (Ohbayashi et al. 2015). Details into the transmission of stink bug symbionts and their establishment in the midgut crypts can be incredibly informative to understand this apparent high fidelity to their host.

SoE Are Enriched in Functions Supportive of Host-Symbiont Association

When COG profiles of non-host-associated Pantoea were compared to those of the SoE, significantly greater relative representation in the following categories were observed: general function prediction only, amino acid transport and metabolism, translation, ribosomal structure and biogenesis, energy production and conversion, inorganic ion transport and metabolism, coenzyme transport and metabolism, replication, recombination and repair, posttranslational modification and chaperones, and nucleotide transport and metabolism (Figure 3.5). Most of these can be explained by the host- restricted lifestyle, where only the most vital and necessary functions are maintained while those encoding secondary functions are lost (Kenyon & Sabree 2014; Moran &

Bennett 2014). This was exhibited in the SoE genomes where they had a lower proportion of genes dedicated to the categories of carbohydrate metabolism and transport,

64 function unknown, , cell wall, cell membrane and envelope biogenesis, signal transduction, cell motility, intracellular trafficking, secretion, and vesicular transport. The loss of genes that provide these functional redundancies concurrent with many other loci tangential to the mutualism, and the retention of the relatively few genes essential for amino acid metabolism, has resulted in an overall dramatic reduction in the genome sizes of SoE and the preservation of the skewed proportion of genes in SoE genomes that support the mutualism.

SoE Can Provide Several Amino Acids, Vitamins and Cofactors

Stink bug symbionts are thought to generally provide the host with amino acids, vitamins and cofactors that are underrepresented in their herbivorous diets (Kenyon et al. 2015;

Nikoh et al. 2011), and all of the SoE genomes encode complete or near complete canonical pathways for most essential amino acids, including branched chain amino acids

(valine, isoleucine, and leucine) whose pathways are nearly complete except for the lack of the ilvE gene encoding the terminal aminotransferase (Figure 3.6). This gene is conspicuously absent in the genomes of multiple other insect mutualists, including P. carbekii, I. capsulata, T. gelatinosa, Buchnera, and Candidatus “Uzinura aspidicola”, and it has been shown that a host encoded equivalent is up-regulated in the of the hosts for intracellular symbionts (Hansen & Moran 2011; Husnik et al. 2013; Luan et al. 2015). Edessa primary symbionts can also synthetize some nonessential amino acids (i.e. glutamate, aspartate, alanine, cysteine, tyrosine) and notable gene deletions in several pathways have been observed. Canonical proline biosynthesis requires the

65 products of proA, proB, proC, and putA, however, the latter two genes are absent in all four genomes, a pattern that is mirrored in P. carbekii (Kenyon et al. 2015). Although

SoET and SoEL encode the complete canonical serine biosynthesis pathway that requires the products of serA, serB, and serC, the first two genes are either absent or pseudogenized in SoEO and SoEE. P. carbekii, I. capsulata, T. gelatinosa, Buchnera,

Baumannia, and Blochmania also lack intact versions of either or both serA and serB. In

SoEE, serB is missing, serC is retained, but serA appears to be pseudogenized, where four shortened ORFs can be found in the region where serA is found in SoET and SoEL.

It is unclear if the products of these four smaller ORFs fulfill the role of SerA for this symbiont, but shows a potentially recent pseudogenization. In the case of asparagine, only SoEE encodes asparagine synthetase B (asnB), which is sufficient for the asparagine biosynthesis from aspartate. This enzyme is also absent in several other insect symbionts, including I. capsulata and T. gelatinosa, while being present in Wigglesworthia, and is pseudogenized in P. carbekii (Kenyon et al. 2015).

Arginine biosynthesis is accomplished in SoE by a nearly canonical pathway that includes the replacement of the acetylornithine aminotransferase ArgD with the succinylornithine transaminase AstC, which also participates in ornithine degradation

(Kim & Copley 2007). This replacement appears to be consistent across the other stink bug symbionts and it is notable that enzymes that are inferred to participate in multiple metabolic pathways are retained during SoE genome reduction. This phenomenon has been observed in other bacterial endosymbionts of insects (e.g. Sulcia muelleri and

66

Sodalis; Koga & Moran 2014) and reflects a convergence upon this evolutionary trajectory.

Thiamine and biotin can be generated by all of the SoE, with the former being an essential cofactor in all three domains of life (Costliow & Degnan 2017) and the latter being a key metabolite underlying other insect-microbe symbioses (Nikoh et al. 2014).

While all the genes for the biosynthesis of biotin are present in the SoE genomes, several genes for thiamine biosynthesis (thiCDEFGHS) are on the plasmid pSOE1 that is present in all of the SoE genomes. P. carbekii is able to produce thiamine but not biotin, while the reverse is true for I. capsulata and T. gelatinosa. Interestingly, among stink bug symbionts, T. gelatinosa is the only one encoding a full thiamine transporter despite sharing the absence of the biosynthesis pathway with I. capsulata. Additionally, all SoE encode nearly all of the enzymes for producing folate, another critical vitamin for life, yet they lack nudB (dihydroneopterin triphosphate pyrophosphohydrolase), pabC

(aminodeoxychorismate lyase) and pabB (4-amino-4-deoxychorismate synthase), which catalyze the committed step of folate synthesis in bacteria (Gabelli et al. 2007) and chorismate incorporation into folate biosynthesis, respectively. nudB is also missing in P. carbekii, I. capsulata and T. gelatinosa, while pabC and pabB are missing in T. gelatinosa. Finally, all of the SoE encode pathways for riboflavin biosynthesis except for lacking the 5-amino-6-(5-phospho-D-ribitylamino)uracil phosphatase YigB, which is present in other stink bug symbionts, yet several other phosphatases are encoded on the genomes that may complement this missing function.

67

SoE Exhibit Intra-Specific Variation and Inter-Symbiont Convergence in Carbon

Metabolism

The TCA cycle shows an interesting variation among the SoE and other symbionts.

While SoEE and SoET contain the full TCA cycle, SoEL lacks gltA, acnA, and icd, catalyzing half of the cycle from oxoaloacetate to 2-oxoglutarate. Additionally, SoEO also lacks icd, but both gltA and acnA are pseudogenized, the first being split into four coding regions, while the second one has an early stop codon. Whether the proteins encoded by the coding regions remaining in gltA can perform the role of the complete enzyme is unknown. This case shows how among genomes from four closely related symbionts, three stages of potential genomic degeneration of an important metabolic pathway are observed. While P. carbekii and I. capsulata retain all the enzymes of the

TCA cycle, T. gelatinosa has lost almost the entire pathway while retaining genes encoding SucAB that catalyze the single step between 2-oxoglutarate and succinyl-CoA.

Additionally, the intergenic region between the genes surrounding icd is only 500~ bp in

SoEL, and a blast search against the ‘nt’ database results in no hits with high query coverage, while the same intergenic region in SoEO is 1500~ bp and a blast search results in the icd protein in P. carbekii and I. capsulata, which is likely due to a recent loss.

Given that glutamate dehydrogenase (GdhA) can generate 2-oxoglutarate/alpha- ketoglutaric acid from glutamate, it is possible for SoE, all of whom encode gdhA on their plasmids, to derive sufficient energy from their incomplete TCA cycles (Koga &

Moran 2014).

68

SoE genomes also exhibit convergence with other insect symbionts in that all SoE retain only the non-oxidative branch of the pentose phosphate cycle, which is similar to

Wigglesworthia but unlike P. carbekii and I. capsulata that both retain the full cycle.

Additionally, all SoE encode full pathways for glycolysis and the production of G3P from glucose-6-phosphate, yet lack phosphoglucomutase (pgm) for the transfer of the phosphate group in glucose, which is reflected in I. capsulata, T. gelatinosa, and

Buchnera, and distinct from P. carbekii who retains genes encoding the entire pathway.

Proposal of Candidate names

The SoE have been shown to be exclusively found in the midgut crypts of the Edessa, and comparisons to other primary symbionts of stink bugs, such as P. carbekii, I. capsulata, and T. gelatinosa, reveal several common motifs of primary unculturable symbionts. The symbiont loci recovered in this study show near perfect identity to those previously sequenced (Bistolas et al. 2014), with individuals collected three years apart, indicating their stability in their host’s population. A protocol for distinguishing the SoE from other stink bug symbionts through FISH microscopy has established (Bistolas et al.

2014), and the reported multilocus phylogenies and extensive genome sequencing and analyses all suggest that SoE represent a distinct Pantoea species. These data satisfy the recommendations for the naming of Candidatus microbial species (Stackebrandt et al.

2002), and the description of several strains is also contained in this paper. Additionally, they satisfy the conditions recently proposed for uncultivated microbes (Konstantinidis et al. 2017) that require a near-complete genome sequence with no contamination,

69 ecological data on the microbe’s habitat, reliable rDNA sequence, as well as a picture of the . Based on these conditions, we propose the candidate name

“Candidatus Pantoea edessiphila” for the symbionts of the Edessa, due to their close association with their hosts, and we also propose strain designations to distinguish the pentatomid host with which it is associated: “Candidatus Pantoea edessiphila” strain

SoEL (symbiont of E. loxdalii); “Candidatus Pantoea edessiphila” strain SoEE

(symbiont of E. eburatula), “Candidatus Pantoea edessiphila” strain SoEO (symbiont of

E. bella); and “Candidatus Pantoea edessiphila” strain SoET (symbiont of Edessa sp.).

Conclusion

Genomic analysis of the symbionts of Edessa pentatomid stink bugs have revealed several typical signatures of strong host association and vertical inheritance (i.e. significant genome reduction, retention of mutualism-supportive genes, A+T% bias), yet these features are observed in symbionts that neither reside within host cells or are perpetually within host tissues. Both phylogenetic and genomic structural analyses support the placement of these symbionts within a novel clade amongst the Pantoea that is distinct from previously sequenced stink bug symbionts. Among the Edessa symbionts, metabolic capabilities vary to a small degree, and gene and metabolic pathway losses parallel genome streamlining in distant, unrelated insect symbionts. Exploiting the availability of both insects and low-cost sequencing technology facilitates the analysis of symbionts of closely related insects to infer a more robust depiction of both the

70 relationship between the bacterial symbionts and their hosts and the evolutionary history of the bacterial symbiont as written in their genomes.

Supplementary Material

Supplementary Material is available at Genome Biology and Evolution online

(http://www.gbe.oxfordjournals.org/).

Acknowledgements

The authors thank La Selva Biological Research Station and staff, as well as the

Organization of Tropical Studies (OTS) and CONAGEBio, Dr. Fernandez Marin for his help with identification of specimens. This work was supported by the Ohio Agricultural

Research Development Center SEEDS Research Enhancement-Interdisciplinary Grant,

The Ohio State University, the computation resources of the Ohio Biodiversity

Conservation Partnership (OBCP) cluster, and the Colombian Administrative Department of Science, Technology and Innovation COLCIENCIAS.

71

References

Abascal F, Zardoya R, Telford MJ. 2010. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Res. 38:W7-13. doi: 10.1093/nar/gkq291.

Abe Y, Mishiro K, Takanashi M. 1995. Symbiont of Brown-Winged Green Bug, Plautia stali SCOTT. Japanese J. Appl. Entomol. Zool. 39:109–115. doi: 10.1303/jjaez.39.109.

Adeolu M, Alnajar S, Naushad S, Gupta RS. 2016. Genome-based phylogeny and of the ‘Enterobacteriales’: Proposal for ord. nov. divided into the families Enterobacteriaceae, Erwiniaceae fam. nov., Pectobacteriaceae fam. nov., Yersiniaceae fam. nov., Hafniaceae fam. nov., Morgane. Int. J. Syst. Evol. Microbiol. 66:5575–5599. doi: 10.1099/ijsem.0.001485.

Bankevich A et al. 2012. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. J. Comput. Biol. 19:455–477. doi: 10.1089/cmb.2012.0021.

Bansal R, Michel A, Sabree Z. 2014. The Crypt-Dwelling Primary Bacterial Symbiont of the Polyphagous Pentatomid Pest Halyomorpha halys (Hemiptera: Pentatomidae). Environ. Entomol. 617–625.

Bistolas KSI, Sakamoto RI, Fernandes J a M, Goffredi SK. 2014. Symbiont polyphyly, co-evolution, and necessity in pentatomid stink bugs from Costa Rica. Front. Microbiol. 5:1–15. doi: 10.3389/fmicb.2014.00349.

Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 30:2114–2120. doi: 10.1093/bioinformatics/btu170.

Buchner P. 1965. Endosymbiosis of with plant microorganisms. Interscience Publishers: New York.

Campbell MA et al. 2015. Genome expansion via lineage splitting and genome reduction in the endosymbiont Hodgkinia. Proc. Natl. Acad. Sci. U. S. A. 112:10192–9. doi: 10.1073/pnas.1421386112.

Caspi R et al. 2016. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 44:D471– D480. doi: 10.1093/nar/gkv1164.

72

Consortium U. 2017. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45:D158–D169. doi: 10.1093/nar/gkw1099.

Costliow ZA, Degnan PH. 2017. Thiamine Acquisition Strategies Impact Metabolism and in the Gut Microbe Bacteroides thetaiotaomicron Gilbert, JA, editor. mSystems. 2:e00116-17. doi: 10.1128/mSystems.00116-17.

Degnan PH, Lazarus AB, Wernegreen JJ. 2005. Genome sequence of Blochmannia pennsylvanicus indicates parallel evolutionary trends among bacterial mutualists of insects. Genome Res. 15:1023–33. doi: 10.1101/gr.3771305.

Duron O, Noël V. 2016. A wide diversity of Pantoea lineages are engaged in mutualistic symbiosis and cospeciation processes with stink bugs. Environ. Microbiol. Rep. doi: 10.1111/1758-2229.12432.

Eisen JA, Heidelberg JF, White O, Salzberg SL. 2000. Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genome Biol. 1:research0011.1. doi: 10.1186/gb-2000-1-6-research0011.

Esquivel JF, Medrano EG. 2014. Ingestion of a Marked Bacterial Pathogen of Cotton Conclusively Demonstrates Feeding by First Southern (Hemiptera: Pentatomidae). Environ. Entomol. 43:110–115. doi: 10.1603/EN13051.

Fernandes Marin JA, Juliete Da Silva V, Oliveira Correia A, Mendes Nunes B. 2015. New species of Edessa Fabricius, 1803 (Hemiptera: Pentatomidae) from Costa Rica. Zootaxa. 3999:511–536.

Finn RD et al. 2014. Pfam: the protein families database. Nucleic Acids Res. 42:D222– D230. doi: 10.1093/nar/gkt1223.

Gabelli SB et al. 2007. Structure and Function of the E. coli Dihydroneopterin Triphosphate Pyrophosphatase: A Nudix Enzyme Involved in Folate Biosynthesis. Structure. 15:1014–1022. doi: 10.1016/j.str.2007.06.018.

Galperin MY, Makarova KS, Wolf YI, Koonin E V. 2015. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res. 43:D261–D269. doi: 10.1093/nar/gku1223.

Gil R et al. 2017. Tremblaya phenacola PPER: an evolutionary beta- gammaproteobacterium collage. ISME J. doi: 10.1038/ismej.2017.144.

Gordon ERL, McFrederick Q, Weirauch C. 2016. Phylogenetic Evidence for Ancient and Persistent Environmental Symbiont Reacquisition in (Hemiptera:

73

Heteroptera) Stabb, E V., editor. Appl. Environ. Microbiol. 82:7123–7133. doi: 10.1128/AEM.02114-16.

Hansen AK, Moran NA. 2011. Aphid genome expression reveals host-symbiont cooperation in the production of amino acids. Proc. Natl. Acad. Sci. U. S. A. 108:2849–54. doi: 10.1073/pnas.1013465108.

Harris RS. 2007. Improved pairwise alignment of genomic DNA. PhD Thesis.

Hosokawa T et al. 2016. Obligate bacterial mutualists evolving from environmental bacteria in natural insect populations. Nat. Microbiol. 1:15011. doi: 10.1038/nmicrobiol.2015.11.

Hosokawa T, Kikuchi Y, Nikoh N, Shimada M, Fukatsu T. 2006. Strict host-symbiont cospeciation and reductive genome evolution in insect gut bacteria. PLoS Biol. 4:1841–1851. doi: 10.1371/journal.pbio.0040337.

Husník F, Chrudimský T, Hypša V. 2011. Multiple origins of endosymbiosis within the Enterobacteriaceae (γ-): convergence of complex phylogenetic approaches. BMC Biol. 9:87. doi: 10.1186/1741-7007-9-87.

Husnik F et al. 2013. Horizontal Gene Transfer from Diverse Bacteria to an Insect Genome Enables a Tripartite Nested Symbiosis. Cell. 153:1567–1578. doi: 10.1016/j.cell.2013.05.040.

Hyatt D et al. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 11:119. doi: 10.1186/1471-2105-11-119.

Kaiwa N et al. 2010. Primary gut symbiont and secondary, sodalis-allied symbiont of the scutellerid stink bug cantao ocellatus. Appl. Environ. Microbiol. 76:3486–3494. doi: 10.1128/AEM.00421-10.

Kaiwa N et al. 2014. Symbiont-Supplemented Maternal Investment Underpinning Host’s Ecological Adaptation. Curr. Biol. 1–6. doi: 10.1016/j.cub.2014.08.065.

Kanehisa M et al. 2016. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44:D457–D462. doi: 10.1093/nar/gkv1070.

Kearse M et al. 2012. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 28:1647–1649. doi: 10.1093/bioinformatics/bts199.

Kenyon LJ, Meulia T, Sabree ZL. 2015. Habitat Visualization and Genomic Analysis of ‘Candidatus Pantoea carbekii,’ the Primary Symbiont of the Brown Marmorated Stink Bug. Genome Biol. Evol. 7:620–635. doi: 10.1093/gbe/evv006.

74

Kikuchi Y et al. 2009. Host-symbiont co-speciation and reductive genome evolution in gut symbiotic bacteria of acanthosomatid stink bugs. BMC Biol. 7:2. doi: 10.1186/1741-7007-7-2.

Kim J, Copley SD. 2007. Why Metabolic Enzymes Are Essential or Nonessential for Growth of Escherichia coli K12 on Glucose. Biochemistry. 46:12501–12511. doi: 10.1021/bi7014629.

Koga R, Moran NA. 2014. Swapping symbionts in spittlebugs: evolutionary replacement of a reduced genome symbiont. ISME J. 8:1237–46. doi: 10.1038/ismej.2013.235.

Kolbe DL, Eddy SR. 2011. Fast filtering for RNA homology search. Bioinformatics. 27:3102–3109. doi: 10.1093/bioinformatics/btr545.

Konstantinidis KT, Rosselló-Móra R, Amann R. 2017. Uncultivated microbes in need of their own taxonomy. ISME J. doi: 10.1038/ismej.2017.113.

Lagesen K et al. 2007. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35:3100–3108. doi: 10.1093/nar/gkm160.

Lartillot N et al. 2007. Suppression of long-branch attraction artefacts in the phylogeny using a site-heterogeneous model. BMC Evol. Biol. 7:S4. doi: 10.1186/1471-2148-7-S1-S4.

Laslett D, Canback B. 2004. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res. 32:11–16. doi: 10.1093/nar/gkh152.

Luan J-B et al. 2015. Metabolic Coevolution in the Bacterial Symbiosis of Whiteflies and Related Plant Sap-Feeding Insects. Genome Biol. Evol. 7:2635–2647. doi: 10.1093/gbe/evv170.

Moran N a., Bennett GM. 2014. The Tiniest Tiny Genomes. Annu. Rev. Microbiol. 195– 215. doi: 10.1146/annurev-micro-091213-112901.

Moran N a, McCutcheon JP, Nakabachi A. 2008. Genomics and evolution of heritable bacterial symbionts. Annu. Rev. Genet. 42:165–190. doi: 10.1146/annurev.genet.41.110306.130119.

Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M. 2007. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 35:W182–W185. doi: 10.1093/nar/gkm321.

Newton ILG, Bordenstein SR. 2011. Correlations Between Bacterial Ecology and Mobile DNA. Curr. Microbiol. 62:198–208. doi: 10.1007/s00284-010-9693-3.

75

Nikoh N et al. 2014. Evolutionary origin of insect- nutritional mutualism. Proc. Natl. Acad. Sci. 111:10257–62. doi: 10.1073/pnas.1409284111.

Nikoh N, Hosokawa T, Oshima K, Hattori M, Fukatsu T. 2011. Reductive evolution of bacterial genome in insect gut environment. Genome Biol. Evol. 3:702–714. doi: 10.1093/gbe/evr064.

Ohbayashi T et al. 2015. Insect’s intestinal organ for symbiont sorting. Proc. Natl. Acad. Sci. 201511454. doi: 10.1073/pnas.1511454112.

Otero-Bravo A, Sabree ZL. 2015. Inside or out? Possible genomic consequences of extracellular transmission of crypt-dwelling stink bug mutualists. Front. Ecol. Evol. 3:1–7. doi: 10.3389/fevo.2015.00064.

Panizzi AR, Grazia J. 2015. True bugs (Heteroptera) of the neotropics. Springer Netherlands: Dordrecht doi: 10.1007/978-94-017-9861-7.

Panizzi AR, Machado-Neto E. 1992. Development of Nymphs and Feeding Habits of Nymphal and Adult Edessa meditabunda (Heteroptera: Pentatomidae) on Soybean and Sunflower. Ann. Entomol. Soc. Am. 85:477–481. doi: 10.1093/aesa/85.4.477.

Petersen TN, Brunak S, von Heijne G, Nielsen H. 2011. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Meth. 8:785–786. http://dx.doi.org/10.1038/nmeth.1701.

Pinto-Carbó M et al. 2016. Evidence of horizontal gene transfer between obligate leaf nodule symbionts. ISME J. 10:2092–2105. doi: 10.1038/ismej.2016.27.

Prado SS, Rubinoff D, Almeida RPP. 2006. Vertical Transmission of a Pentatomid Caeca-Associated Symbiont. Ann. Entomol. Soc. Am. 99:577–585. doi: 10.1603/0013-8746(2006)99[577:VTOAPC]2.0.CO;2.

Price MN, Dehal PS, Arkin AP. 2010. FastTree 2- Approximately maximum-likelihood trees for large alignments. PLoS One. 5:e9490. doi: 10.1371/journal.pone.0009490.

R Core Team. 2017. R: A Language and Environment for Statistical Computing. https://www.r-project.org.

Richter M, Rosselló-Móra R. 2009. Shifting the genomic gold standard for the prokaryotic species definition. Proc. Natl. Acad. Sci. U. S. A. 106:19126–31. doi: 10.1073/pnas.0906412106.

Rizzo HF, Saini ED. 1987. Aspectos morfologicos y biologicos de Edessa rufomarginata (De Geer)(Hemiptera, Pentatomidae). Rev. Fac. Agron. 8:51–63.

76

Ruano-Rubio V, Fares MA. 2007. Artifactual Phylogenies Caused by Correlated Distribution of Substitution Rates among Sites and Lineages: The Good, the Bad, and the Ugly. Syst. Biol. 56:68–82. doi: 10.1080/10635150601175578.

Sabree ZL, Degnan PH, Moran NA. 2010. stability and gene loss in cockroach endosymbionts. Appl. Environ. Microbiol. 76:4076–9. doi: 10.1128/AEM.00291-10.

Sabree ZL, Huang CY, Okusu A, Moran NA, Normark BB. 2013. The nutrient supplying capabilities of Uzinura, an endosymbiont of armoured scale insects. Environ. Microbiol. 15:1988–99. doi: 10.1111/1462-2920.12058.

Salem H, Florez L, Gerardo N, Kaltenpoth M. 2015. An out-of-body experience: the extracellular dimension for the transmission of mutualistic bacteria in insects. Proc. Biol. Sci. 282:20142957. doi: 10.1098/rspb.2014.2957.

Seemann T. 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 30:2068–2069. doi: 10.1093/bioinformatics/btu153.

Selengut JD et al. 2007. TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res. 35:D260–D264. doi: 10.1093/nar/gkl1043.

Silva DP, Oliveira PS. 2010. Field Biology of Edessa rufomarginata (Hemiptera: Pentatomidae): Phenology, Behavior, and Patterns of Host Plant Use. Environ. Entomol. 39:1903–1910. doi: 10.1603/EN10129.

Sloan DB, Moran NA. 2013. The evolution of genomic instability in the obligate endosymbionts of whiteflies. Genome Biol. Evol. 5:783–93. doi: 10.1093/gbe/evt044.

Stackebrandt E et al. 2002. Report of the ad hoc committee for the re-evaluation of the species definition in bacteriology. Int. J. Syst. Evol. Microbiol. 52:1043–1047. doi: 10.1099/00207713-52-3-1043.

Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 30:1312–1313. doi: 10.1093/bioinformatics/btu033.

Tada A et al. 2011. Obligate association with gut bacterial symbiont in Japanese populations of the southern green stink bug Nezara viridula (Heteroptera: Pentatomidae). Appl. Entomol. Zool. 46:483–488. doi: 10.1007/s13355-011- 0066-6.

77

Tamas I et al. 2002. 50 Million Years of Genomic Stasis in Endosymbiotic Bacteria. Science (80-. ). 296. http://science.sciencemag.org/content/296/5577/2376.full (Accessed August 3, 2017).

Taylor CM, Coffey PL, DeLay BD, Dively GP. 2014. The importance of gut symbionts in the development of the brown marmorated stink bug, Halyomorpha halys (Stal). PLoS One. 9. doi: 10.1371/journal.pone.0090312.

Wernegreen JJ. 2015. Endosymbiont evolution: predictions from theory and surprises from genomes. Ann. N. Y. Acad. Sci. 1360:16–35. doi: 10.1111/nyas.12740.

Wick RR, Schultz MB, Zobel J, Holt KE. 2015. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics. 31:3350–3352. doi: 10.1093/bioinformatics/btv383.

Yang Z. 2007. PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol. Biol. Evol. 24:1586–1591. doi: 10.1093/molbev/msm088.

78

Figures and Tables

Table 3.1 Genome statistics for stink bug symbionts, members of the Pantoea, and Buchnera. Grey indicates previously sequenced stink bug symbionts. *Symbiont genome size is estimated from the best assembly. Genome Coding Coding Pseudo Bacteria Association Insect Host size# GC% density tRNAs rRNAs CDS genes (Mb) (%) Pantoea carbekii Vertically inherited symbiont Halyomorpha halys () 1.15 30.6 67.40 829 12 40 8 SoET Vertically inherited symbiont Edessa sp. 2 (Edessinae) 0.81 27.8 81.79 677 11 35 6* SoEO Vertically inherited symbiont Edessa bella (Edessinae) 0.81 26.6 80.76 668 9 35 6* SoEL Vertically inherited symbiont Edessa loxdalii (Edessinae) 0.82 27.6 83.67 689 8 35 6* SoEE Vertically inherited symbiont Edessa eburatula (Edessinae) 0.83 26.9 83.71 698 11 35 6* Ishikawaella Vertically inherited symbiont Megacopta spp. (Plataspididae) 0.75 30.2 82.10 620 35 37 9 capsulata Tachikawaea 79 Vertically inherited symbiont Urostylis spp. (Urostylididae) 0.71 25.1 85.16 613 8 35 9 gelatinosa

P. ananatis Non-host-restricted - 4.70 53.7 86.69 4282 153 71 22 P. rwandensis Non-host-restricted - 4.33 53.9 87.51 3941 57 79 22 P. vagans Non-host-restricted - 4.02 55.5 87.29 3670 60 80 22 P. agglomerans Non-host-restricted - 4.18 55.5 86.92 3844 213 76 22 Plautia stali-A Vertically inherited symbiont Plautia stali (Pentatominae) 3.87 57.0 NA 3890 - - - Plautia stali-B Vertically inherited symbiont Plautia stali (Pentatominae) 2.43 55.9 NA NA - - - Plautia stali-C Insect symbiont Plautia stali (Pentatominae) 5.14 57.4 NA 4882 - - - Plautia stali-D Insect symbiont Plautia stali (Pentatominae) 5.54 53.8 NA 5311 - - - Plautia stali-E Insect symbiont Plautia stali (Pentatominae) 5.41 53.7 NA 5064 - - - Plautia stali-F Insect symbiont Plautia stali (Pentatominae) 4.67 56.7 NA 4457 - - - Buchnera Vertically inherited symbiont A. pisum () 0.66 26.4 88.00 617 1 32 3 aphidicola str. APS

79

Figure 3.1 A+T% bias in Edessa primary symbiont genomes

G+C% for symbionts of Edessa genomes (red dots) were plotted with similar information from the genomes of other gammaproteobacterial stink bug symbionts (black outlined light grey circles), nonobligately host-associated and free living Pantoea species (dark grey circles) and other that span several known ecological niches (small light grey circles). Data was generated from 40,356 gammaproteobacterial genomes deposited NCBI GenBank as of October 2017.

80

Figure 3.2 Phylogenetic reconstruction of Enterobacteriaceae

Tree was based on 11 coding sequences run using PhyloBayes model GTR+CAT with 6- categories Dayhoff recoding. Support values are posterior probabilities labeled as dots as shown in the legend.

81

Figure 3.3 Core genome phylogenetic reconstruction of the Pantoea genus including the Symbionts of Edessa

82

Figure 3.3 Core genome phylogenetic reconstruction of the Pantoea genus including symbionts of Edessa Maximum likelihood reconstruction using RAxML for 322 loci obtained from reciprocal best hit blasts. Support values are based on 50 bootstrap iterations. Colors indicate highly supported clades among all reconstructions. Scale bar is proportional to 0.05 substitutions per site.

83

Figure 3.4 Genome synteny for the Symbionts of Edessa

Dotplots showing synteny between the four strains (A), synteny between SoEE and P. carbekii (B), and synteny between non-host restricted strains of Pantoea and SoEE (C).

84

Figure 3.5 COG Profiles of SoE compared to other Pantoea

The proportion of genes belonging to each category is shown. Asterisks indicate significant differences in a two-tailed nested ANOVA.

85

Figure 3.6 Metabolic reconstruction of the Symbionts of Edessa

86

Figure 3.6 Metabolic reconstruction of the symbionts of Edessa Boxes indicate metabolites or products of bacterial metabolism. A solid black outline indicates all SoE contain the genes in the canonical pathway for the synthesis of that product, a dashed black outline indicates no SoE contains all canonical enzymes for the synthesis of that product, while a dashed orange outline indicates some, but not all SoE contain all canonical enzymes for the synthesis of that product. Genes along an outline are colored grey if present, orange if present in some, or red if absent in all. Genes in orange also include the indication of in which SoE genomes they are present, absent, or pseudogenized. Large blue boxes indicate essential amino acids while small blue boxes indicate non-essential amino acids. Pink boxes indicate pyrimidines. Purple boxes indicate purines. Yellow boxes indicate vitamins and cofactors. Green boxes indicate other metabolites. 3PG - 3-phosphoglycerate; B6 - vitamin B6; BIO - biotin; CHSM - chorismate; dATP - deoxyadenosine triphosphate; dCTP - deoxycitidine triphosphate; dGTP - deoxyguanosine triphosphate; dTTP - deoxythymidine triphosphate; dUTP - deoxyuridine triphosphate; E4P - erythrose-4-phosphate; F6P - fructose-6-phosphate; FAD - flavin adenine dinucleotide; FMN - flavin mononucleotide; FOL - folate; GSH - glutathione; GTP - guanosine 5’ triphosphate; IMP - inosinic acid; M-acp - malonyl- ACP; OXA - oxoaloacetate; PEP - phosphoenolpyruvate; PRPP - phosphoribosylpyrophosphate; PYR - pyruvate; R5P - ribose-5-phosphate; Rb5P - ribulose 5-phosphate; RBFL - riboflavin; SAM - s-adenosyl methionine; THM - thiamine; UGN – UDP-N-acetyl glucosamine; UMP - uridine monophosphate.

87

Chapter 4 Multiple stages of genome shrinkage across extracellular stink bug symbionts

Introduction

Animal-microbe associations are prevalent throughout the tree of life and can greatly benefit and expand the host’s ability to survive in a variety of environments. Hosts can assimilate resources produced by their bacterial symbionts, thereby benefitting from entire metabolic pathways that they are unlikely to develop (Sudakaran et al. 2017).

Insects, particularly hemipterans, have used these symbioses to expand their feeding strategies to include nutrient-poor or unbalanced resources (Giron et al. 2017). Resource provisioning-based mutualisms between insects and bacteria have been well-documented, including aphids (Hansen & Moran 2011), cockroaches (Sabree et al. 2009), armored scales (Sabree et al. 2013), cotton strainers (Salem et al. 2014), and many others.

Bacterial partners can be members of diverse phyla, including Proteobacteria and

Bacteroidetes, and functionally converging around the supplementation of essential amino acids and/or vitamins absent from the host’s diet (Moran et al. 2008). Host- microbial mutualisms often involve modifications to the involved partners that are evident at genomic, transcriptomic, and physiological scales, and some common symptoms include the evolution of specialized host cells and organs to house symbionts

(Okude et al. 2017; Moran & Bennett 2014; Nakabachi et al. 2005; Hirota et al. 2017) and genomic streamlining in the symbiont (Bennett & Moran 2015; Moran et al. 2008).

88

Some patterns of genomic modifications in bacterial symbionts have emerged across a wide range of symbioses. When the genomes of these symbiotes are compared with free- living close relatives, evolutionary pressures that may be unique to lifestyles that involve living within the host’s cells or tissues come into focus (McCutcheon & Moran 2011).

Dramatic gene losses are often observed in bacterial symbionts and this phenomenon is thought to be due, in part, to relatively stable environmental condition within host tissues.

Relaxed selective pressures upon genes not essential for survival in the host combined with genetic bottlenecks resulting from vertical transmission all contribute to genome shrinkage (Rispe & Moran 2000). For example, genes involved in cell wall biosynthesis and self-defense are consistently missing from symbiont genomes (Wilson & Duncan

2015). As a result, core symbiont genomes are consistently reduced relative to their free- living close relatives (Moran & Bennett 2014). Pseudogenization of genes experiencing relaxed selection often precedes their loss (Lo et al. 2016), however it is rare to find examples of a single symbiosis at radically different stages of this process, allowing the study of this process to be only comparable to some extent.

Members of the Pentatomomorpha often harbor an extracellular bacterial symbiont within the crypts of the posterior midgut (Kikuchi et al. 2011). While this symbiotic organ is fairly similar across a wide variety of families, other traits of the symbiosis can be quite variable, including the acquisition strategy (Kaiwa et al. 2014; Kikuchi et al. 2007;

Olivier-Espejel et al. 2011), degree of cophylogeny between host and symbiont

(Karamipour et al. 2016), and the symbiont’s reliance on its host (Hosokawa et al. 2016).

89

In the Pentatomidae, generally symbionts are vertically-inherited by the deposition of symbiont rich secretions by the female on top of the eggs which is later consumed by the nymphs. While symbionts can vary between cultivable and uncultivable (Scopel &

Cônsoli 2018) they are nonetheless exposed to the outside environment. Some pentatomids can harbor both vertically and environmentally-acquired symbionts under different circumstances, depending on which bacteria they were exposed to during their first instar (Oishi et al. 2019). For example, Plautia stali (Pentatomidae) can have up to six different symbiont ‘types’ (named A through F), that vary in origin and degree of host reliance (Hosokawa et al. 2016). However, the most common scenario is that of highly specific host-symbiont associations, where a single host species is consistently associated with a single symbiont (Tada et al. 2011; Otero-Bravo & Sabree 2018), and cophylogeny is detected at the genus-level (Kikuchi et al. 2009) but not necessarily at higher taxonomic levels (Bistolas et al. 2014) which would indicate multiple associations at different points in time throughout the family.

Since the Pentatomidae have a conserved symbiotic organ but an imperfect and variable symbiont transmission mechanism and multiple symbiont acquisitions, different taxonomically distant species are likely to contain symbionts at different stages of the symbiosis. Previously we identified two cases of genome reduced symbionts in two distinct within the Penatomidae, with different degrees of genome reduction that originated separately (Otero-Bravo et al. 2018; Kenyon et al. 2015). We identified this as a potentially valuable system to understand the of nutritional

90 symbioses from the bacterial symbiont’s perspective (Otero-Bravo & Sabree 2015).

Therefore, we increased the sampling of known symbionts in this family through genome sequencing in order to shed light into the diversity and variability of these symbionts. We identified a variety of genome sizes across different species of the Pentatomidae, as well as the core genome of these symbionts and metabolic pathways predicted to be retained or lost at different stages of the reduction. Together, this allowed us to better understand the process of gene loss, leading to the finding of both consistency and variability in gene loss for different key pathways.

Methods

Genome sequencing and assembly

Wild-caught stink bugs belonging to the subfamilies Edessinae and Pentatominae were collected at La Selva Biological Station, Organization for Tropical Studies. These included Edessa oxcarti, E. sp, and Brachystethus rubromaculatus for the Edessinae, and

Arvelius albopunctatus, Sibaria englemani, Taurocerus edessoides, sp., and

Pentatomidae sp. for the Pentatomidae. Additionally, individuals of the Brown Stink Bug

Euschistus servus and Green Stink Bug Nezara viridula were donated by Michael Toews

(from Tifton, GA) and individuals of the Harlequin Bug Murgantia histrionica were donated by Don Weber (from Beltsville, MD). Morphological identification was carried out with appropriate guides (Duzee 1904; Rolston & McDonald 1979; Rolston et al.

1980; Rolston & McDonald 1980, 1984; Barcellos & Grazia 2003; Do Nascimento et al.

2017) and confirmed by molecular analysis as described below. DNA extraction and

91 genome sequencing were performed as in (Otero-Bravo et al. 2018). Briefly, insects were surface-sterilized with three washes of ethanol 70% and the symbiotic organ was dissected. DNA was extracted from the whole organ using the DNEasy Blood and Tissue kit (Qiagen) with RNAse treatment according to manufacturer instructions. Paired 300 bp

Nextera Libraries were prepared and sequenced using Illumina MiSeq at the Ohio State

MCIC. Raw reads were corrected using Trimmomatic (Bolger et al. 2014) and assembly was performed using Unicycler 0.4.8 (Wick et al. 2017). Resulting contigs were filtered based on coverage level and scaffold linkage using Bandage 0.8.0 (Wick et al. 2015).

Additionally, BLASTn (Camacho et al. 2009) searches were employed to separate host and symbiont sequences. Host mitochondria was obtained by searching resulting contigs for the 13 genes of stink bug mitochondria or performing de-novo assembly on a subset of reads obtained by mapping to these contigs using BWA (Li & Durbin 2009) and

SAMtools (Li et al. 2009) until recovering a circularized mitochondrial genome.

Annotation and genome comparisons

Genomes were annotated using Prokka (Seemann 2014), RAST (Aziz et al. 2008), and the NCBI’s PGAP (Tatusova et al. 2016; Haft et al. 2018), and annotations were visualized using Geneious 8.1.9 (https://www.geneious.com). Genome completeness was assessed with BUSCO 2.0.1 (Waterhouse et al. 2018) using the gammaproteobacterial ortholog set. Host mitochondria were annotated using the Geneious ‘Annotate’ feature with a database of available Pentatomomorpha mitochondria and manually curated. For analyses including core and pangenomes, Roary 3.11.0 (Page et al. 2015) was used on

92 previously sequenced stink bug symbiont genomes and representative genomes of species of the genus Pantoea (Table S2. KEGG and COG codes were assigned using KAAS

(Moriya et al. 2007) and 4.5.1 (Huerta-Cepas et al. 2016), respectively.

Metabolic pathways were evaluated for completeness using the KEGG pathway mapper as well as the MetaCyc databases (Caspi et al. 2016).

Symbiont identification

It has been well documented that genome reduced symbionts are difficult to accurately place in a phylogeny due to their high mutation rates creating long branch attraction

(Husník et al. 2011), which can lead to erroneous predictions in tree-based identification methods. Additionally, this fast mutation rate significantly decreases the sequence identity between genome reduced symbionts and other bacteria. In order to identify sequenced genomes we used SINA (Pruesse et al. 2012) which uses ribosomal RNA genes to identify the Least Common Ancestor from available sequences Additionally, we used a modified method for estimating a for these species as described in (Otero-Bravo et al. 2018) in which we estimated separate trees for each genome reduced bacteria using FastTree 2.1.11 (Price et al. 2010), which would prevent some of the long branch attraction artifacts, and evaluated the consensus of all trees. Alignments were produced from the 10 largest protein coding genes common to all strains by reciprocal best hit BLAST and aligned using TranslatorX (Abascal et al. 2010).

93

Host mitochondrial phylogeny

Cytochrome oxidase I (COX1) genes from the mitochondrial genomes sequenced were searched against the BOLD database (Ratnasingham & Hebert 2007) to corroborate morphological identification. Additionally, all 13 coding regions and two rRNA regions were aligned using MUSCLE (Edgar 2004). Protein sequences for the recovered mitochondrial genomes, other complete pentatomid mitochondria and the outgroup

Coptosoma bifaria (Pentatomoidea; Plataspidae, accession number EU427334.1) were aligned using the translation alignment feature from Geneious. Trees were reconstructed using RAxML (Stamatakis 2014) with separate partitions for each codon and the rDNA genes, and MrBayes (Ronquist et al. 2012) with 4 heated chains and 10,000,000 iterations, with burn-in of 1,000,000 and sampling every 1,000.

Results

Genome assembly

Sequencing resulted in libraries containing between 8 and 12 million paired reads for the different samples. After removing adapters and low-quality bases, individual assemblies were performed and each resulted in few-to-many scaffolds (ranging between 9 and 285 per assembly, see Table 4.1), forming a single connected component of ambiguously connected scaffolds ranging from 0.82 to 5.6 Mb with few or no dead ends. Separately, between one and three contigs belonging to the host mitochondria or circular high coverage plasmids were found. Scaffolds of all the main components had BLAST matches to bacteria from the Enterobacteriaceae, namely Pantoea, Erwinia, and other

94 genome-reduced insect symbionts. Circularizing the genomes was not possible using only the Illumina data due to most components converging multiple times on contigs containing the 5S, 16S, and 23S rRNA genes, which are repeated across the genome.

Symbiont genome annotation

The genomes of symbionts of stink bugs belonging to the Edessinae and Pentatominae subfamilies were sequenced and annotated. Symbionts of Edessinae stinkbugs, which included the symbionts of six Edessa species (hereafter called SoE) and the symbiont of

Brachystethus rubromaculatus (h.f. SoBr), had genomes that were <1Mb in size, varying little in size, and taxonomically categorized by SINA as ‘unclassified Enterobacteriaceae’

(See Table 4.1, S3). Symbionts of the Pentatominae stink bugs were larger than 1Mb, ranging from 1 to 5.6 Mb, and were assigned to the Pantoea genus. Due to issues with reconstructing phylogenies with genome reduced species due to long branch attraction artifacts (Husník et al. 2011), traditional phylogenetic reconstruction was not used.

However, using a modified approach where we estimated separate trees for each genome- reduced and generating a consensus tree from each result. All genomes were placed in the genus Pantoea and stink bug symbionts were shown to be paraphyletic within the genus (Figure 4.1). Further phylogenetic analyses on these taxa are forthcoming.

The smallest genomes among the symbionts of the Pentatominae belong to the symbiont of the harlequin bug Murgantia histrionica (h.f. SoMh) at 1.02 Mb, followed by the symbiont of albopunctatus (h.f. SoAa) at 1.14 Mb, the symbiont of the brown

95 marmorated stink bug (Halyomorpha halys), Candidatus ‘Pantoea carbekii’ (Kenyon et al., 2014) (h.f. P. carbekii). The symbiont of the green stink Bug, Nezara viridula (h.f.

SoNv) showed a slightly larger size, at 1.42 Mb. This is followed by a much larger genome, belonging to the B-type symbiont of Plautia stali (h.f SoPs-B) with the smallest symbiont genome identified for this host at 2.4 Mb. Finally, the remaining symbionts contained genomes larger than 3.9 Mb, which is within the range of non-stink bug associated strains of the Pantoea genus.

Metrics such as gene number, GC content, and genome completeness (Table 4.1, Figure

4.2b-c) were also in accordance to expectations for genomes of given size: GC content showed little variation from the range between 53% and 57% among unreduced genomes and even the moderately reduced genome of SoPs-B (at 2.4 Mb). Reduced genomes on the other hand, presented a GC bias characteristic of genome reduction, ranging from

25% to 30%, while the intermediately sized SoNv showed a slightly lesser skew, with a

GC content of 40% (Table 4.1, Figure 4.2b). BUSCO completeness and number of total genes also decreased with greater genome reduction, with intermediate genomes of SoPs-

B and SoNv also having intermediate values (Table 4.1, Figure 4.2c-d). Coding density and pseudogene number followed a different trend, where both large and small genomes showed similar values (near 100% and 0, respectively), while intermediate sized genomes deviated significantly from this (Figure 4.2e-f).

96

The core genome, or the subset of genes that is common to all strains, was obtained for the four species of the Pantoea genus with complete representative genomes available

(which are not associated with stink bugs) and the symbiont genomes used for previous analyses (Table A.2). The size of the core genome plateaus for all genomes larger than 4

Mb, following the trend from the non-SB associated strains. This subset (henceforth the

‘unreduced core genome’) of approximately 2450 genes represents the core genome of the genus Pantoea, which most stink bug symbionts have been assigned to (Duron &

Noël 2016). However, the size of the core genome begins decreasing with the addition of the symbiont of servus (h.f. SoEus) which despite a relatively large 3.9 Mb genome has lost a few of these conserved genes due to pseudogenization. Subsequently, the size of the core genome rapidly decays with the genomes of SoPs-B (2.4 Mb) and

SoNv (1.4 Mb). The reduction between the next few genomes is modest due to their similar size of 1 Mb, followed by a larger decrease when the most reduced genomes of

~0.8Mb are added. This smaller core genome (henceforth the ‘reduced core genome’) size stabilizes at 454 genes (Figure 4.2g).

The differences between the unreduced core genome and the reduced core genome can yield insights into the change in requirements for the symbiotic lifestyle (Figure 4.2c).

Primarily, the proportion of genes in the translation, ribosomal structure and biogenesis

(J) and energy production (C) is much higher for the reduced core genome than the unreduced core genome. This difference comes at the expense of a large loss of proteins with poorly characterized function, in the category unknown function (S), general

97 function prediction only (R), or no COG category assigned. Notably, the categories cell motility (N), defense mechanisms (V), and signal transduction mechanisms (T) are present in the unreduced core genome but completely absent in the reduced core genome.

Host mitochondrial genome sequences and identification

Host mitochondrial scaffolds were recovered from the assembly as a single circular chromosome or in some cases two or three scaffolds that would map to the other stink bug mitochondrial genomes which allowed them to be circularized. All mitochondria ranged between 13 and 16 Mbps in size and contained the 13 protein coding genes found in previously described pentatomid mitochondria (Yuan et al. 2015), two rRNA genes and 22 tRNAs. Gene order was conserved across all genomes. COX1 genes were searched against the BOLD database confirming species identification for 9 samples, while in 6 cases for specimens collected in the Neotropics a certain match was not found, likely due to the species not being present in the database (See Table A.1).

The tree reconstructions based on host mitochondra showed good support for the genus

Edessa (Figure 4.3, 4.4, Table A.3), containing the smaller symbiont genomes sequenced.

The Brachystethus genus, also in the subfamily Edessinae contains a symbiont of similar size, however the subfamily node was not well supported under this analysis (Figure 4.3).

Additionally, despite similar sizes, the genome of SoBr has several key differences from those of the SoE (See below). For the second subfamily, the Pentatominae, we see a large

98 variation in symbiont genome sizes, ranging from moderately reduced (~ 1Mb) to showing no signs of reduction (> 5 Mb).

Branched chain amino acid biosynthesis pathway

The supplementation of essential amino acids not present in phytophagous pentatomid host diets is one of the likely advantages of these inheritable symbionts (Otero-Bravo &

Sabree 2015; Taylor et al. 2017). We identified the canonical biosynthetic pathways for these amino acids, and found all genomes contained full pathways for the biosynthesis of all essential amino acids with the exception of the branched chain amino acid biosynthesis pathway. The ilv operon encoding most of the pathway (Smith et al. 1976) is present in all genomes with the particularity that the ilvE gene which encodes the branched-chain-amino-acid aminotransferase BCAT, the enzyme responsible for the final step in the valine, isoleucine and leucine biosynthesis pathway is missing in all symbionts with a genome under 2 Mb, with the exception of SoMh (Figure 4.5a). In the SoNv, ilvE gene is in the process of being pseudogenized, being present as a small ORF that does not include any of the protein’s catalytic sites but retains nucleotide and protein identity to the functional genes in other genomes (Figure 4.5b). ilvG and ilvM are also missing from these genomes, which when present are located adjacent to ilvE and in SoPs-B ilvG is in a similar process of pseudogenization as ilvE in SoNv. While ilvE was present in the genome of the SoMh, being the only genome under 2 Mb to retain this gene, it is also the only genome of those sampled to contain a pseudogenized ilvA gene, which is completely conserved in all others. ilvA encodes L-threonine deaminase, which catalyzes the first

99 step of the synthesis of isoleucine from threonine which shares the remaining steps with the other branch chain amino acid synthesis pathways (Favre et al. 1974).

LPS and antigen biosynthesis gene loss during genome reduction

External cell wall components are responsible for protection of the cell in an outside environment but also are in direct contact with the host while in symbiosis. We compared the genes and pathways involved in some important and well conserved cell wall components: Lipid A, Peptidoglycan (PG), the O-antigen, and the Enterobacterial common antigen (ECA). Lipid A is a precursor for the outer cell membrane lipopolysaccharide (LPS) present in Gram-negative bacteria including the Enterobacteria

(Wang & Quinn 2010).

All genomes contained the necessary genes for the production of UDP-N-Acetyl-D- glucosamine (UDP-GlcNAc) pgi and the glm operon, which is a necessary precursor for both Lipid A and peptidoglycan (Heijenoort 2001; Opiyo et al. 2010). The genes necessary for the production of peptidoglycan, murABCDEFGIJ, ddl, and mraY are also present in all genomes studied. The canonical pathway encoding the production of lipid A from UDP-GlcNAc includes the genes lpxACDHBK (for the conversion to lipid IVA), kdtA (for the addition of KDO), lpxLM (conversion to KDO-Lipid A) and rfaCFGPQYBOJ (synonym waa) (addition of sugars to Lipid A). The pathway for the synthesis of lipid IVA is conserved in all genomes larger than 1 Mb but lost in smaller ones, with the exception of lpxA which is found in some SoE and SoBr and lpxB which is

100 found in SoBr (Figure 4.6b). In some cases, these genes are lost without any change to the adjacent genes (such as the case of lpxC, see Figure A.1) while in others, regions containing several adjacent genes are lost (such as the case of lpxL, Figure 4.6c).

The genes lpxD, lpxA, and lpxB, are located syntenically in the Pantoea genomes along with fabZ, rnhB, and dnaE in a conserved pattern (Figure 4.6b). This region is disrupted in the different genome reduced symbionts: The SoE lack lpxB and lpxD, and some

(SoEO, SoEF) also lack lpxA. Despite this loss, the adjacent genes fabZ and rnhB are retained in these genomes. lpxC is found in all genomes between the ftsZ and secA genes, but it is lost in the SoE, where these two genes are adjacent to each other with no ORFs in the intergenic space. lpxH is also lost on all SoE along with the adjacent gene ppiB.

However, genes flanking these two, cysS and purE, are conserved in the remaining genomes, the exception being P. carbekii, where ppiB and lpxH are retained but purEK are lost, which are part of the purine biosynthesis pathway. This loss is also present in

SoEO. lpxK is also absent in all SoE genomes, in a gap where several genes are missing.

In other genome reduced symbionts, lpxH is retained together with some adjacent genes that are absent in SoE. This region contains several gaps of multiple genes that are conserved in large genomes (Figure 4.6c).

Adjacent to this region, we see evidence of pseudogenization of another gene, comEC, in the larger genomes that have begun the reduction process. In SoEus and SoPs-B, with genomes of 3.9 and 2.4 Mb respectively, rpsA and ihfB sit upstream of comEC while

101 msbA and lpxK sit downstream. While in smaller genomes such as SoNv only a small intergenic region between ihfB and msbA remains, in SoEus and SoPs-B the region is roughly the same length as in unreduced genomes and contains several small, interrupted

ORFs, some similar to the full protein, but likely not functional.

Another requirement for the production of lipid A are the kdsABCD genes in charge of the production of KDO. The first gene in the pathway, kdsC is the most conserved, present in all genomes except some SoE (SoEL & SoEE retain it). In SoEus, kdsC is split into two ORFs, indicating a possible pseudogenization occurring, and it is unsure if either of these have catalytic function. These results together show how the SoE have a complete or almost complete loss of the lipid A biosynthesis pathway, while others, including the similarly sized symbiont SoBr, conserve all genes.

The O-antigen consists of an oligosaccharide that can be attached to Lipid A to produce a mature LPS. Several glycosyltransferases can be involved in this process, in the case of

Pantoea the rfbBCD genes encode enzymes involved in the production of dTDP-beta-L- rhamnose, a monomer of their O-antigen. This operon is absent in all genomes under 2

Mb, as well as some of the larger genomes, including those of the non-associated

Pantoea species. The Enterobacterial common antigen, or ECA, is another antigen that can be found in the periplasmic space in a cyclic form, attached to PG or attached to LPS.

The genes for its production wecABDEFG, and rffG, as well as the gene in charge of producing the cyclic form wzzE are lost in all genomes smaller than SoPs-B. rfaL, which

102 binds both the O-antigen and the ECA to the LPS, and wzxE and wzyE which are in charge of the export of both antigens (Mitchell et al. 2018; Danese et al. 1998) are also lost in smaller genomes.

Discussion

Different stages of reduction among pentatomid stink bugs

There is a large variation in genome sizes of inhabitants of the V4 region of stink bugs in the family Pentatomidae. The placement of these symbionts is in the genus Pantoea consistent with previously discovered symbionts (Duron & Noël 2016; Prado & Almeida

2009). Since phylogenetic reconstruction is not reliable because the increased mutation rates of genome reduced organisms generate long branch attraction artifacts, the true relationships between these are incredibly difficult to ascertain. However, the radically different genome size, different placement of taxa in the absence of long branches, as well as considerable differences in the gene order between these different organisms

(Otero-Bravo et al. 2018) are all indications of separate association events at different times. The SoE have almost identical genome size and gene order amongst each other, while being considerably different from that of P. carbekii indicating these to be separately evolving groups. However, P. carbekii, SoMh, and SoAa are not similar enough in terms of gene order and independent phylogenetic placement to ascertain that they belong to the same clade and their similar genome size is not different enough to claim the contrary. Therefore, while largely different genome sizes are likely an indication of separate clades, similar genome sizes should not be taken as evidence of the

103 same clade and comparisons must be cautious. For example, SoBr is of a similar genome size to the SoE and while B. rubromaculatus is the only other member of the Edessinae that is not in the genus Edessa, there are considerable differences between them, most prominently the latter retaining the full pathway for synthesis of lipid A. Gene order in

SoBr is similar to that of the SoE, but to a lesser extent as within the genus, in which gene order is nearly identical.

While some of these genomes are considerably small for extracellular bacteria, undoubtedly at a late stage of genome reduction, examples such as SoPs-B and SoNv allow us to see a glimpse of the intermediate stage of this process. Additionally, the

SoEus, SoPt, and SoPs-A, while retaining a total genome size similar to their non-host associated relatives, show several characteristics of an early stage genome reduction such as a considerable number of pseudogenes and a decrease in the coding density of the genome. This likely reflects a recent symbiont replacement event in which the host associated with a new clade, as SoEus is shown to be closely associated with P. vagans and P. agglomerans while SoPs-A and SoPt are more closely associated with P. dispersa.

For SoPs-A, a shift in geography may have facilitated this replacement as different populations in separate islands contain different associates, including a more genome reduced SoPs-B (Hosokawa et al. 2016). The variation in genome sizes indicates that the replacement process can be relatively common and actively occurs in species of this family. Given the extracellular nature of these bacteria and the host transmission and acquisition system, it is not difficult for other bacteria to invade. However, the high

104 symbiont titers of genome reduced species show colonization must be favored for closer associates. In the case that genome reduction proceeds to an irreversible deletion within the symbiont that affects its utility to the host, it may more advantageous for the host to replace it (Bennett & Moran 2015) with a large genome bacteria and start the process anew.

A wide variety of genome reduced insect symbionts has been described, covering the full range of genome sizes down to near organelle levels and with great diversity in the association to their host. Several of these symbionts have been hypothesized to be in a transitional state towards a stable symbiosis (Clayton et al. 2016), while others have reduced their functions almost to a minimal extent, in the case of the intracellular small genomes reaching organelle status (Tamames et al. 2007). Many of these cases require the host evolution of traits such as methods for intracellular vertical transmission (Dan et al. 2017), a second symbiont’s complementation (Bublitz et al. 2019), or even horizontal gene transfer of symbiont genes to the host (Husnik et al. 2013). However, most of these cases allow a limited view on an individual symbiosis without understanding the gradual steps required to achieve it, allowing us only to understand the process of symbiosis by comparing sometimes distant hosts with different organs, lifestyles, and diets. Here, we show how within a single family of stink bugs with little host change of the symbiotic organ we can evidence a range of steps in the development of the symbiosis from the bacterial perspective due to repeated establishment of the symbiosis.

105

While most of the extremely reduced genomes of symbionts are from intracellular symbionts, which are carefully protected within host cells, extracellular symbionts have additional constraints that likely impede further genome reduction. However, these can be overcome if there is significant investment from the host on structures that guarantee housing and transmission of its bacterium (Salem et al. 2017). Stink bug symbionts are housed in separate gut compartments (or crypts) developed with a complex morphogenetic process from birth to adulthood (Oishi et al. 2019). However, during transmission these symbionts are left exposed to the environment (Kenyon et al. 2015).

Some further strategies have been developed in other stink bug relatives such as a symbiont capsule in Plataspididae (Fukatsu & Hosokawa 2002) or transmission jelly in the Urostylidae (Hosokawa et al. 2012) which also allowed a further decrease in genome size. However, the SoE show a similarly sized genome to these without evidence of specialized transmission other than egg smearing or change in the symbiont housing organ (Otero-Bravo et al. 2018; Bistolas et al. 2014). Additionally, some genome reduction can be observed exclusively with the development of host housing structures such as in the case of the deep sea anglerfish which house a luminescent symbiont with a genome 50% the size of its nearest free living relative (Hendry et al. 2018) acquired from persistent cells in the environment, not by vertical nor horizontal transmission (Baker et al. 2019).

Cases where change is required on the physiology of the host to accommodate a symbiont requires longer evolutionary time due to the difference in generation time and population

106 size (Sudakaran et al. 2017). These traits can vary considerably, such as with the different symbiotic organs of Lygaeid bugs (Matsuura et al. 2012), which makes it unlikely for the exact same structure to appear convergently. Thus, comparison of genome reduction for these cases can only be done with distantly related taxa, if at all. In the case of stink bugs, the midgut crypts are common to most of the Pentatomoidea (except in cases where they were subsequently lost such as for carnivorous groups or otherwise modified) (Kikuchi et al. 2011) and the nymphal probing behavior is also common across the superfamily

(Hosokawa et al. 2008). Since host traits that enable the symbiosis are almost identical across the group, yet the symbionts appear at radically different stages of association, this system is invaluable for understanding the effects of varying constraints on genome composition.

Core genome

The core genome obtained for the unreduced genomes consists of 2450 genes, being between 46.6% and 61.4% of the total genes in the genome. Our estimate is slightly larger than previous core genome analyses of the Pantoea (Wang et al. 2017; Palmer et al. 2018) which estimate them at between 38.8-56% and 30% of the total genes, respectively. However, previous analyses included more distant strains of Pantoea and more samples for some of species which could contribute to their smaller core genome.

Nonetheless, this core genome was used as a baseline comparison for the reduced core genome. When including all symbionts down to the SoE, the core genome reduces to 450 genes, which constitutes between 47.7% and 62% of the genes in the smaller genomes.

107

The estimate is likely higher given that these methods use sequence similarity to identify orthologous genes which can result in many false negatives because the high mutation rate of genome reduced bacteria significantly lowers the sequence identity of orthologous proteins. We observed some cases where groups were incorrectly separated due to insufficient sequence identity while annotations and genome position were identical. This is a considerable problem for organisms with such high mutation rates and should be carefully considered in comparative genomic frameworks.

For the core genome to reduce between genome sizes while being in consistent size with multiple genomes of similar size indicates (if at least some of these symbionts are paraphyletic) that there are consistent stages of the reduction. The expectation of a homogeneous and random gene loss between free-living bacteria and the smallest genome the needed for survival would be for that the core genome quickly reduces to this point and plateau. However, multiple plateaus between similar sized genomes indicate that the genes selectively retained per stages. While preliminary, these results indicate that the process of genome reduction may be more complex than previously considered.

Amino acid pathway loss ilvE is the only gene in the valine, isoleucine, and leucine biosynthesis pathway lost between the large and small core genomes. It encodes BCAT which catalyzes the last step for the production of valine, leucine and isoleucine. This gene is missing from multiple other nutritional symbionts (Sabree et al. 2013; Wilson et al. 2010; Rio et al.

108

2012; Mccutcheon et al. 2007). In Buchnera it has been shown the host aphid upregulates its own BCAT in its bacteriocytes (Hansen & Moran 2011) completing the pathway.

BCAT is also present in the brown marmorated stink bug genome (LOC106691811) indicating stink bugs may also be completing this pathway in the symbiosis. The loss of ilvE in these cases also comes with the loss of ilvG and ilvM, which encode the two subunits of acetolactate synthase, which catalyzes the first step of the branched-chain amino acid biosynthesis pathway (Hill, et al, 1997). However, there are three isozymes with this function in other Enterobacteria such as E. coli, among them IlvIH, encoded by ilvI and ilvH (Lawther, et al, 1981), which is found in all genomes including those of the

SoE. The exception to the loss of ilvE is SoMh, which retains a full copy of ilvE.

However, it is unique among the stink bug symbionts in the loss of ilvA, a gene encoding

L-threonine dehydratase [E.C:4.3.1.19] which is required for a previous step in the biosynthesis of isoleucine (Figure 4.4a). This loss is also present in Buchnera and

Wigglesworthia, and in aphids can be replaced with the host enzyme TcdB which is expressed and marginally upregulated in the bacteriocytes (Hansen & Moran 2011). A similar protein is found in the genome of the closest Pentatomid genome (H. halys,

LOC106681826). This could be evidence of an alternative evolution of a shared pathway between host and symbiont, regulating exclusively the isoleucine pathway instead of all three branched-chain amino acids. This complementation is found in multiple genome reduced symbionts and their hosts, including other pathways such as tyrosine supplementation in weevils (Anbutsu et al. 2017). The loss of a vital step of the biosynthesis pathway for the symbiosis is likely beneficial for the host as it facilitates

109 increased control over the production of the required nutrient and the growth of the symbiotic bacteria (Ankrah et al. 2018).

This region also contains an example of progressive gene decay: SoPs-B contains an identical gene order to large genome symbionts but has lost ilvM and ilvG is in the process of pseudogenization, the next smaller genome SoNv has lost ilvG altogether and ilvE is being pseudogenized, and finally in SoAa and P. carbekii ilvE is lost altogether.

Since IlvGM is redundant it is likely one of the fastest genes to disappear, and ilvE or ilvA disappear in the next stage (Figure 4.6c).

Cell membrane component loss during symbiosis

Lipopolysaccharides (LPS) are the main component of the exterior cell membrane of

Gram-negative bacteria acting as a protective barrier for the cell. It is composed of Lipid

A, which binds to the cell membrane, a core oligosaccharide, and an external polysaccharide chain sometimes referred to as the O-antigen. There are both conserved regions as well as considerable variation in the composition of LPS, and particularly the

O-antigen region is a target of host immune responses (Carlsson et al. 1998). We identified two major changes in the biosynthetic capability of stink bug symbionts with regards to LPS: the first being the loss of addition of O-antigen and ECA and the second being the complete loss of LPS. We found that the genes responsible for the production and attachment of the ECA and O-antigens to lipid A are absent in stink bug symbionts with reduced genomes which would render them unable to produce smooth-type (antigen

110 containing) LPS. In other species of the genus Pantoea and symbionts with regular genome size the rfb operon can be complete (P. rwandensis, SoEus, SoPs-A, SoSe and

SoPd) or incomplete (P. ananatis, SoMo, SoTe, and other SoPs) which indicates the addition of the O-antigen is not essential for the survival of the bacteria. E. coli mutants are able to survive without the addition of O-antigen, however, their membrane is considerably more permeable making it hypersusceptible (Murata et al. 2007). The ECA behaves similarly: while it is not essential for the survival of strains in culture, outer membrane permeability is considerably reduced (Mitchell et al. 2018).

In the Burkholderia and bean bug Riptortus pedestris symbiosis, symbiont Burkholderia cells produce LPS with O-antigen when grown in culture media but the O-antigen is absent in symbiotic cells. Additionally, symbiotic Burkholderia do not induce host antimicrobial peptides (AMPs), and the expression of host AMPs in the V4 region is lower than the basal expression in the fat bodies. It has been shown that cells without the

O-antigen are much more susceptible to cell lysis and host immune responses. However, this downregulation of AMPs in the symbiotic organ likely allows the survival of a weakened symbiotic cell (Kim et al. 2016). It is likely that as in the bean bug-

Burkholderia symbiosis, the weakened membrane due to the loss of O-antigen is compensated by the host protection and/or changes in its immune reaction. While the loss of the genes as in stink bug symbionts likely can be due to increased pressure to prevent the activation of the host’s immune system or other adaptation to the host environment

(Kobayashi et al. 2011). Since these symbionts need to travel through the host gut during

111 its first instar in order to colonize, as well as proliferate throughout the host’s development, the loss of this antigen is likely helpful if not necessary for the consistent establishment of symbiont populations, however, there is a trade-off involved of increased cell permeability which may be related to the inability of genome reduced symbionts to grow in vitro. Additionally, as with Burkholderia, unreduced genome bacteria may be able to stop production of these antigens when in a symbiotic state.

Furthermore, we found that all SoE lacked the complete pathway to produce lipid A (with the exception of the gene lpxA encoding UDP-GlcNAc acyltransferase which catalyzes the first step in this pathway, which was present in SoEL, SoEE, and SoOX). Under normal circumstances the disruption of lipid A is fatal for most gram-negative bacteria, with few exceptions (Steeghs et al. 1998; Zhang et al. 2013). While the SoE lacked the ability to produce lipid A, the production of UDP-GlcNAc remained intact in all genomes. The only other pathway that this metabolite has been associated with is in the biosynthesis of peptidoglycan. All the genomes contained all genes necessary for the production of peptidoglycan, which is found in the periplasmic space. Only the most reduced genome symbionts in other systems are able to lose production of peptidoglycan, and these are restricted to intracellular symbionts and those that can rely on host production of peptidoglycan through horizontally transferred genes (Bublitz et al. 2019;

Skidmore & Hansen 2017). Given that stink bug symbionts are extracellular bacteria, we hypothesize that maintenance of LPS in the outer membrane is not necessary for an

112 extracellular symbiosis but peptidoglycan production in the periplasmic space must be conserved, likely for cell wall integrity.

Conclusion

Genome reduction has been widely studied across insect symbionts, yet comparative genomic methods studying convergence in genome evolution have been limited to either too distant or too similar instances of symbiosis. Here we show how the extracellular symbionts of stink bugs can be used as a system of similar and convergent instances of genome reduction, as well as covering the range of symbiosis from free living bacteria to highly specialized symbionts. We identify convergence in gene loss in multiple pathways associated with symbiosis and different stages of loss including partial gene losses and ongoing pseudogenization. We identify a convergence in the loss of a single gene in branched chain amino acid biosynthesis, with one example displaying the loss of a different step in the pathway (ilvA as opposed to ilvE) found in other nutritional symbionts, as well as the selective loss of genes involved in the production and attachment of antigens to the cell wall which likely influences interactions with the host.

Further study of the diversity and evolution of these symbionts will likely elucidate key factors in the process of genome reduction.

113

References

Abascal F, Zardoya R, Telford MJ. 2010. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Res. 38:W7–W13. doi: 10.1093/nar/gkq291.

Anbutsu H et al. 2017. Small genome symbiont underlies cuticle hardness in . Proc. Natl. Acad. Sci. 201712857. doi: 10.1073/pnas.1712857114.

Ankrah NYD, Chouaia B, Douglas AE. 2018. The Cost of Metabolic Interactions in Symbioses between Insects and Bacteria with Reduced Genomes. MBio. 9. doi: 10.1128/mBio.01433-18.

Aziz RK et al. 2008. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 9:75. doi: 10.1186/1471-2164-9-75.

Baker LJ et al. 2019. Diverse deep-sea anglerfishes share a genetically reduced luminous symbiont that is acquired from the environment. Elife. 8. doi: 10.7554/eLife.47606.

Barcellos A, Grazia J. 2003. Revision of Brachystethus (Heteroptera, Pentatomidae, Edessinae). Iheringia. Série Zool. 93:413–446. doi: 10.1590/s0073- 47212003000400008.

Bennett GM, Moran N a. 2015. Heritable symbiosis: The advantages and perils of an evolutionary rabbit hole. Proc. Natl. Acad. Sci. 2015:201421388. doi: 10.1073/pnas.1421388112.

Bistolas KSI, Sakamoto RI, Fernandes J a M, Goffredi SK. 2014. Symbiont polyphyly, co-evolution, and necessity in pentatomid stinkbugs from Costa Rica. Front. Microbiol. 5:1–15. doi: 10.3389/fmicb.2014.00349.

Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 30:2114–2120. doi: 10.1093/bioinformatics/btu170.

Bublitz DC et al. 2019. Peptidoglycan Production by an Insect-Bacterial Mosaic. Cell. 179:703-712.e7. doi: 10.1016/J.CELL.2019.08.054.

Camacho C et al. 2009. BLAST+: architecture and applications. BMC Bioinformatics. 10:421. doi: 10.1186/1471-2105-10-421.

Carlsson A, Nystrom T, de Cock H, Bennich H. 1998. Attacin an insect immune protein binds LPS and triggers the specific inhibition of bacterial outer membrane protein synthesis. Microbiology. 2179–2188. 114

https://www.microbiologyresearch.org/docserver/fulltext/micro/144/8/mic-144-8- 2179.pdf?expires=1577821309&id=id&accname=sgid026576&checksum=08FA B03AC3617B71B86F9E1547E9DE16 (Accessed January 2, 2020).

Caspi R et al. 2016. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 44:D471– D480. doi: 10.1093/nar/gkv1164.

Clayton AL, Jackson DG, Weiss RB, Dale C. 2016. Adaptation by Deletogenic Replication Slippage in a Nascent Symbiont. Mol. Biol. Evol. 33:1957–1966. doi: 10.1093/molbev/msw071.

Dan H, Ikeda N, Fujikami M, Nakabachi A. 2017. Behavior of bacteriome symbionts during transovarial transmission and development of the Asian psyllid Niedz, RP, editor. PLoS One. 12:e0189779. doi: 10.1371/journal.pone.0189779.

Danese PN et al. 1998. Accumulation of the enterobacterial common antigen lipid II biosynthetic intermediate stimulates degP transcription in Escherichia coli. J. Bacteriol. 180:5875–84. http://www.ncbi.nlm.nih.gov/pubmed/9811644 (Accessed January 30, 2020).

Duron O, Noël V. 2016. A wide diversity of Pantoea lineages are engaged in mutualistic symbiosis and cospeciation processes with stinkbugs. Environ. Microbiol. Rep. 8:715–727. doi: 10.1111/1758-2229.12432.

Duzee EP Van. 1904. Annotated List of the Pentatomidæ Recorded from America North of Mexico, with Descriptions of Some New Species. Trans. Am. Entomol. Soc. 30:1–80. doi: 10.2307/25076770.

Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32:1792–7. doi: 10.1093/nar/gkh340.

Favre R, Iaccarino M, Levinthal M. 1974. Complementation between different mutations in the ilvA gene of Escherichia coli K-12. J. Bacteriol. 119:1069–71. http://www.ncbi.nlm.nih.gov/pubmed/4604254 (Accessed January 6, 2020).

Fukatsu T, Hosokawa T. 2002. Capsule-transmitted gut symbiotic bacterium of the Japanese common plataspid stinkbug, Megacopta punctatissima. Appl. Environ. Microbiol. 68:389–396. doi: 10.1128/AEM.68.1.389-396.2002.

Giron D et al. 2017. Influence of Microbial Symbionts on Plant–Insect Interactions. Adv. Bot. Res. 81:225–257. doi: 10.1016/BS.ABR.2016.09.007.

Haft DH et al. 2018. RefSeq: an update on prokaryotic genome annotation and curation. Nucleic Acids Res. 46:D851–D860. doi: 10.1093/nar/gkx1068.

115

Hansen AK, Moran NA. 2011. Aphid genome expression reveals host-symbiont cooperation in the production of amino acids. Proc. Natl. Acad. Sci. U. S. A. 108:2849–54. doi: 10.1073/pnas.1013465108.

Heijenoort J v. 2001. Formation of the glycan chains in the synthesis of bacterial peptidoglycan. Glycobiology. 11:25R-36R. doi: 10.1093/glycob/11.3.25R.

Hendry TA et al. 2018. Ongoing Transposon-Mediated Genome Reduction in the Luminous Bacterial Symbionts of Deep-Sea Ceratioid Anglerfishes. MBio. 9:e01033-18. doi: 10.1128/mBio.01033-18.

Hirota B et al. 2017. A Novel, Extremely Elongated, and Endocellular Bacterial Symbiont Supports Cuticle Formation of a Grain Pest . MBio. 8:e01482-17. doi: 10.1128/mBio.01482-17.

Hosokawa T et al. 2012. Mothers never miss the moment: A fine-tuned mechanism for vertical symbiont transmission in a subsocial insect. Anim. Behav. 83:293–300. doi: 10.1016/j.anbehav.2011.11.006.

Hosokawa T et al. 2016. Obligate bacterial mutualists evolving from environmental bacteria in natural insect populations. Nat. Microbiol. 1:15011. doi: 10.1038/nmicrobiol.2015.11.

Hosokawa T, Kikuchi Y, Shimada M, Fukatsu T. 2008. Symbiont acquisition alters behaviour of stinkbug nymphs. Biol. Lett. 4:45–48. doi: 10.1098/rsbl.2007.0510.

Huerta-Cepas J et al. 2016. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 44:D286–D293. doi: 10.1093/nar/gkv1248.

Husnik F et al. 2013. Horizontal Gene Transfer from Diverse Bacteria to an Insect Genome Enables a Tripartite Nested Mealybug Symbiosis. Cell. 153:1567–1578. doi: 10.1016/j.cell.2013.05.040.

Husník F, Chrudimský T, Hypša V. 2011. Multiple origins of endosymbiosis within the Enterobacteriaceae (γ-Proteobacteria): convergence of complex phylogenetic approaches. BMC Biol. 9:87. doi: 10.1186/1741-7007-9-87.

Kaiwa N et al. 2014. Symbiont-Supplemented Maternal Investment Underpinning Host’s Ecological Adaptation. Curr. Biol. 24:1–6. doi: 10.1016/j.cub.2014.08.065.

Karamipour N et al. 2016. Gammaproteobacteria as essential primary symbionts in the striped shield bug, Graphosoma Lineatum (Hemiptera: Pentatomidae). Sci. Rep. 6:33168. doi: 10.1038/srep33168.

116

Kenyon LJ, Meulia T, Sabree ZL. 2015. Habitat Visualization and Genomic Analysis of ‘Candidatus Pantoea carbekii,’ the Primary Symbiont of the Brown Marmorated Stink Bug. Genome Biol. Evol. 7:620–635. doi: 10.1093/gbe/evv006.

Kikuchi Y et al. 2009. Host-symbiont co-speciation and reductive genome evolution in gut symbiotic bacteria of acanthosomatid stinkbugs. BMC Biol. 7:2. doi: 10.1186/1741-7007-7-2.

Kikuchi Y, Hosokawa T, Fukatsu T. 2011. An ancient but promiscuous host-symbiont association between Burkholderia gut symbionts and their heteropteran hosts. ISME J. 5:446–460. doi: 10.1038/ismej.2010.150.

Kikuchi Y, Hosokawa T, Fukatsu T. 2007. Insect-microbe mutualism without vertical transmission: A stinkbug acquires a beneficial gut symbiont from the environment every generation. Appl. Environ. Microbiol. 73:4308–4316. doi: 10.1128/AEM.00067-07.

Kim JK, Park HY, Lee BL. 2016. The symbiotic role of O-antigen of Burkholderia symbiont in association with host Riptortus pedestris. Dev. Comp. Immunol. 60:202–208. doi: 10.1016/J.DCI.2016.02.009.

Kobayashi H, Kawasaki K, Takeishi K, Noda H. 2011. Symbiont of the stink bug Plautia stali synthesizes rough-type lipopolysaccharide. Microbiol. Res. 167:48–54. doi: 10.1016/j.micres.2011.03.001.

Li H et al. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25:2078–2079. doi: 10.1093/bioinformatics/btp352.

Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 25:1754–1760. doi: 10.1093/bioinformatics/btp324.

Lo W-S, Huang Y-Y, Kuo C-H. 2016. Winding paths to simplicity: genome evolution in facultative insect symbionts. FEMS Microbiol. Rev. 2:158–61. doi: 10.1093/femsre/fuw028.

Matsuura Y, Kikuchi Y, Meng XY, Koga R, Fukatsu T. 2012. Novel clade of alphaproteobacterial endosymbionts associated with stinkbugs and other . Appl. Environ. Microbiol. 78:4149–4156. doi: 10.1128/AEM.00673- 12.

McCutcheon JP, Moran N a. 2011. Extreme genome reduction in symbiotic bacteria. Nat. Rev. Microbiol. 10:13–26. doi: 10.1038/nrmicro2670.

117

Mccutcheon JP, Moran NA, Greenberg EP. 2007. Parallel genomic evolution and metabolic interdependence in an ancient symbiosis. www.pnas.org/cgi/content/full/ (Accessed January 9, 2019).

Mitchell AM, Srikumar T, Silhavy TJ. 2018. Cyclic Enterobacterial Common Antigen Maintains the Outer Membrane Permeability Barrier of Escherichia coli in a Manner Controlled by YhdP. MBio. 9. doi: 10.1128/mBio.01321-18.

Moran N a., Bennett GM. 2014. The Tiniest Tiny Genomes. Annu. Rev. Microbiol. 195– 215. doi: 10.1146/annurev-micro-091213-112901.

Moran N a, McCutcheon JP, Nakabachi A. 2008. Genomics and evolution of heritable bacterial symbionts. Annu. Rev. Genet. 42:165–190. doi: 10.1146/annurev.genet.41.110306.130119.

Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M. 2007. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 35:W182–W185. doi: 10.1093/nar/gkm321.

Murata T, Tseng W, Guina T, Miller SI, Nikaido H. 2007. PhoPQ-mediated regulation produces a more robust permeability barrier in the outer membrane of Salmonella enterica serovar typhimurium. J. Bacteriol. 189:7213–22. doi: 10.1128/JB.00973- 07.

Nakabachi A et al. 2005. Transcriptome analysis of the aphid bacteriocyte, the symbiotic host cell that harbors an endocellular mutualistic bacterium, Buchnera. Proc. Natl. Acad. Sci. U. S. A. 102:5477–82. doi: 10.1073/pnas.0409034102.

Do Nascimento DA, Da Silva Mendonça MT, Fernandes JAM. 2017. Description of a new group of species of Edessa (Hemiptera: Pentatomidae: Edessinae). Zootaxa. 4254:136–150. doi: 10.11646/zootaxa.4254.1.10.

Oishi S, Moriyama M, Koga R, Fukatsu T. 2019. Morphogenesis and development of midgut symbiotic organ of the stinkbug Plautia stali (Hemiptera: Pentatomidae). Zool. Lett. 5:16. doi: 10.1186/s40851-019-0134-2.

Okude G et al. 2017. Novel bacteriocyte-associated pleomorphic symbiont of the grain pest beetle Rhyzopertha dominica (Coleoptera: Bostrichidae). Zool. Lett. 3:13. doi: 10.1186/s40851-017-0073-8.

Olivier-Espejel S, Sabree ZL, Noge K, Becerra JX. 2011. Gut microbiota in nymph and adults of the giant mesquite bug ( neocalifornicus) (Heteroptera: ) is dominated by Burkholderia acquired de novo every generation. Environ. Entomol. 40:1102–1110.

118

Opiyo SO, Pardy RL, Moriyama H, Moriyama EN. 2010. Evolution of the Kdo2-lipid A biosynthesis in bacteria. BMC Evol. Biol. 10:362. doi: 10.1186/1471-2148-10- 362.

Otero-Bravo A, Goffredi S, Sabree ZL. 2018. Cladogenesis and Genomic Streamlining in Extracellular Endosymbionts of Tropical Stink Bugs. Genome Biol. Evol. 10:680–693. doi: 10.1093/gbe/evy033.

Otero-Bravo A, Sabree ZL. 2018. Comparing the utility of host and primary endosymbiont loci for predicting global invasive insect genetic structuring and migration patterns. Biol. Control. 116:10–16. doi: 10.1016/j.biocontrol.2017.04.003.

Otero-Bravo A, Sabree ZL. 2015. Inside or out? Possible genomic consequences of extracellular transmission of crypt-dwelling stinkbug mutualists. Front. Ecol. Evol. 3:1–7. doi: 10.3389/fevo.2015.00064.

Page AJ et al. 2015. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics. 31:3691–3693. doi: 10.1093/bioinformatics/btv421.

Palmer M, Steenkamp ET, Coetzee MPA, Blom J, Venter SN. 2018. Genome-Based Characterization of Biological Processes That Differentiate Closely Related Bacteria. Front. Microbiol. 9:113. doi: 10.3389/fmicb.2018.00113.

Prado SS, Almeida RPP. 2009. Phylogenetic placement of pentatomid stink bug gut symbionts. Curr. Microbiol. 58:64–69. doi: 10.1007/s00284-008-9267-9.

Price MN, Dehal PS, Arkin AP. 2010. FastTree 2- Approximately maximum-likelihood trees for large alignments. PLoS One. 5:e9490. doi: 10.1371/journal.pone.0009490.

Pruesse E, Peplies J, Glöckner FO. 2012. SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes. Bioinformatics. 28:1823–1829. doi: 10.1093/bioinformatics/bts252.

Ratnasingham S, Hebert PDN. 2007. bold: The Barcode of Life Data System (http://www.barcodinglife.org). Mol. Ecol. Notes. 7:355–364. doi: 10.1111/j.1471-8286.2007.01678.x.

Rio RVM et al. 2012. Insight into the transmission biology and species-specific functional capabilities of tsetse (Diptera: glossinidae) obligate symbiont Wigglesworthia. MBio. 3:e00240-11. doi: 10.1128/mBio.00240-11.

119

Rispe C, Moran NA. 2000. Accumulation of Deleterious Mutations in Endosymbionts: Muller’s Ratchet with Two Levels of Selection. Am. Nat. 156:425–441. doi: 10.1086/303396.

Rolston LH, McDonald FJD. 1984. A Conspectus of Pentatomini of the Western Hemisphere. Part 3 (Hemiptera: Pentatomidae). 1Journal New York Entomol. Soc. 92:69–86.

Rolston LH, McDonald FJD. 1980. Conspectus of Pentatomini Genera of the Western Hemisphere: Part 2 (Hemiptera: Pentatomidae). J. New York Entomol. Soc. 88:257–272.

Rolston LH, McDonald FJD. 1979. Keys and Diagnoses for the Families of Western Hemisphere Pentatomoidea , Subfamilies of Pentatomidae and Tribes of Pentatominae (Hemiptera). J. New York Entomol. Soc. 87:189–207.

Rolston LH, McDonald FJD, Thomas DBJ. 1980. A Conspecuts of Pentatomini Genera of the Western Hemisphere. Part I (Hemiptera: Pentatomidae). 88:120–132.

Ronquist F et al. 2012. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61:539–42. doi: 10.1093/sysbio/sys029.

Sabree ZL, Huang CY, Okusu A, Moran NA, Normark BB. 2013. The nutrient supplying capabilities of Uzinura, an endosymbiont of armoured scale insects. Environ. Microbiol. 15:1988–99. doi: 10.1111/1462-2920.12058.

Sabree ZL, Kambhampati S, Moran NA. 2009. Nitrogen recycling and nutritional provisioning by Blattabacterium, the cockroach endosymbiont. Proc. Natl. Acad. Sci. 106:19521–19526. doi: 10.1073/pnas.0907504106.

Salem H et al. 2017. Drastic Genome Reduction in an ’s Pectinolytic Symbiont. Cell. 1–12. doi: 10.1016/j.cell.2017.10.029.

Salem H et al. 2014. Vitamin supplementation by gut symbionts ensures metabolic homeostasis in an insect host. Proc. Biol. Sci. 281:20141838. doi: 10.1098/rspb.2014.1838.

Scopel W, Cônsoli FL. 2018. Culturable symbionts associated with the reproductive and digestive tissues of the Neotropical brown stinkbug Euschistus heros. Antonie Van Leeuwenhoek. 1–12. doi: 10.1007/s10482-018-1130-9.

Seemann T. 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 30:2068–2069. doi: 10.1093/bioinformatics/btu153.

120

Skidmore IH, Hansen AK. 2017. The evolutionary development of plant-feeding insects and their nutritional endosymbionts. Insect Sci. 24:910–928. doi: 10.1111/1744- 7917.12463.

Smith JM, Smolin DE, Umbarger HE. 1976. Polarity and the regulation of the ilv gene cluster in Escherichia coli strain K-12. MGG Mol. Gen. Genet. 148:111–124. doi: 10.1007/BF00268374.

Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 30:1312–1313. doi: 10.1093/bioinformatics/btu033.

Steeghs L et al. 1998. Meningitis bacterium is viable without endotoxin. Nature. 392:449–449. doi: 10.1038/33046.

Sudakaran S, Kost C, Kaltenpoth M. 2017. Symbiont Acquisition and Replacement as a Source of Ecological Innovation. Trends Microbiol. doi: 10.1016/j.tim.2017.02.014.

Tada A et al. 2011. Obligate association with gut bacterial symbiont in Japanese populations of the southern green stinkbug Nezara viridula (Heteroptera: Pentatomidae). Appl. Entomol. Zool. 46:483–488. doi: 10.1007/s13355-011- 0066-6.

Tamames J et al. 2007. The frontier between cell and organelle: genome analysis of Candidatus Carsonella ruddii. BMC Evol. Biol. 7:181. doi: 10.1186/1471-2148-7- 181.

Tatusova T et al. 2016. NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res. 44:6614–6624. doi: 10.1093/nar/gkw569.

Taylor C, Johnson V, Dively G. 2017. Assessing the use of antimicrobials to sterilize brown marmorated stink bug egg masses and prevent symbiont acquisition. J. Pest Sci. (2004). 90:1287–1294. doi: 10.1007/s10340-016-0814-z.

Wang L, Wang J, Jing C. 2017. Comparative Genomic Analysis Reveals Organization, Function and Evolution of ars Genes in Pantoea spp. Front. Microbiol. 8:471. doi: 10.3389/fmicb.2017.00471.

Wang X, Quinn PJ. 2010. Lipopolysaccharide: Biosynthetic pathway and structure modification. Prog. Lipid Res. 49:97–107. doi: 10.1016/J.PLIPRES.2009.06.002.

Waterhouse RM et al. 2018. BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics. Mol. Biol. Evol. 35:543–548. doi: 10.1093/molbev/msx319.

121

Wick RR, Judd LM, Gorrie CL, Holt KE. 2017. Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads Phillippy, AM, editor. PLOS Comput. Biol. 13:e1005595. doi: 10.1371/journal.pcbi.1005595.

Wick RR, Schultz MB, Zobel J, Holt KE. 2015. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics. 31:3350–3352. doi: 10.1093/bioinformatics/btv383.

Wilson ACC et al. 2010. Genomic insight into the amino acid relations of the pea aphid, pisum, with its symbiotic bacterium Buchnera aphidicola. Insect Mol. Biol. 19:249–258. doi: 10.1111/j.1365-2583.2009.00942.x.

Wilson ACC, Duncan RP. 2015. Signatures of host/symbiont genome coevolution in insect nutritional endosymbioses. Proc. Natl. Acad. Sci. 112:201423305. doi: 10.1073/pnas.1423305112.

Yuan M-L, Zhang Q-L, Guo Z-L, Wang J, Shen Y-Y. 2015. Comparative mitogenomic analysis of the superfamily Pentatomoidea (Insecta: Hemiptera: Heteroptera) and phylogenetic implications. BMC Genomics. 16:460. doi: 10.1186/s12864-015- 1679-x.

Zhang G, Meredith TC, Kahne D. 2013. On the essentiality of lipopolysaccharide to Gram-negative bacteria. Curr. Opin. Microbiol. 16:779–85. doi: 10.1016/j.mib.2013.09.007.

122

Figures and Tables

Figure 4.1 Consensus cladogram of the Pantoea and Erwinia genera including stink bug symbionts

Individual FastTree reconstructions for the 10 longest protein sequences identified to be common for all taxa using reciprocal best hit Blast. Accession numbers for genomes are 123 shown in parenthesis. Squares indicate a symbiont of a member of the Pentatominae while circles indicate symbionts of Edessinae. Taxa that were individually reconstructed are shown as polytomies in blue.

124

Figure 4.2 Genomic characteristics of pentatomid symbionts.

(a-f) Genomic characteristics as a function of genome size for stink bug symbionts. (a) number of genes annotated, (b) GC % of the entire genome, (c) percentage of conserved genes found according to the BUSCO gammaproteobacterial set (Dotted line in indicates the recommended 95% threshold of BUSCO completeness for new species descriptions). (d) tRNAs annotated, (e) coding density of the genome, (f) number of pseudogenes annotated by Prokka. (g) Number of conserved genes in the core genome after including each genome in reverse order of genome size. (h) COG composition of the core genome for non-symbiotic and large genome symbionts (dashed bar) and the core genome including genome reduced symbionts (solid line). 125

Figure 4.3 Stink bug mitochondrial tree with their respective symbiotic bacterial genome size.

Nodes with lower than 60% bootstrap support were collapsed. Numbers above nodes represent bootstrap support while numbers underneath represent posterior probability from Bayesian Inference using MrBayes. X indicates lower than 0.8 posterior probability.

126

Figure 4.4 Consensus of trees sampled in the posterior distribution from MrBayes for host mitochondrial genomes.

127

Figure 4.5 Presence of genes involved in branched chain amino acid biosynthesis pathway.

The presence of genes in each genome in the order of the pathway (a) and gene order and presence around the ilv operon (b). Numbers in parentheses show genome sizes in Mb.

128

Figure 4.6 Genes involved in the production of Lipid A and attachment of O- antigen

129

Figure 4.6 Genes involved in the production of Lipid A and attachment of O-antigen (a) Intermediate metabolites are represented in bold and the presence or absence of a gene is represented in the table, with a crossed box indicating not present in all members. (b) Progressive elimination of lpxA, lpxB, and lpxD in the SoE. (c) Loss of lpxK and kdsB, and adjacent genes. A double grey line indicates a region of approximately 15kb containing several genes not shown here. Genes labeled ‘hypoth.’ or unlabeled indicate hypothetical proteins or interrupted CDS. Asterisk indicates a fragmented CDS. Numbers in parentheses correspond to genome sizes in Mb.

130

Table 4.1 Genome characteristics of stink bug symbionts Coding SILVA Genome GC Name Host Subfamily density Genes Reference Classification Size (Mb) % (%) Otero-Bravo Edessa SoEE Enterobacterales Edessinae 0.850 26.9 84.3 750 & Sabree, eburatula 2018 Otero-Bravo SoEL Enterobacterales Edessa loxdalii Edessinae 0.837 27.6 84.3 741 & Sabree, 2018 Otero-Bravo SoEO Enterobacterales Edessa bella Edessinae 0.836 26.6 81.6 721 & Sabree, 2018 Otero-Bravo SoET Enterobacterales Edessa sp. 2 Edessinae 0.841 27.8 82.7 733 & Sabree, 2018 SoEOx Enterobacterales Edessa oxcartii Edessinae 0.827 26.8 84.4 731 This paper SoEF Enterobacterales Edessa F. Edessinae 0.839 27.8 82.1 739 This paper Brachystethus SoBr Enterobacterales Edessinae 0.843 25.5 80.7 761 This paper rubromaculatus Murgantia SoMh Pantoea Pentatominae 1.025 27.1 70.8 794 This paper histrionica Arvelius SoAa Erwiniaceae Pentatominae 1.143 27.9 77.5 943 This paper albopunctatus SoNv Pantoea Nezara viridula Pentatominae 1.425 40.6 68.9 1045 This paper Euschistus SoEus Pantoea Pentatominae 3.928 55.2 81.8 3998 This paper servus Pentatomidae SoPt Pantoea Pentatominae 4.314 57.7 86.7 4142 This paper sp. Sibaria SoSe Pantoea Pentatominae 5.046 53.9 88.3 4764 This paper englemanni Taurocerus SoTe Pantoea Pentatominae 5.458 53.7 87.7 5099 This paper edessoides SoMo Pantoea Mormidea sp. Pentatominae 5.612 56.9 85.0 5497 This paper Can P. Halyomorpha Kenyon et Can ‘P. carbekii’ Pentatominae 1.151 30.4 69.1 877 carbekii halys al, 2015 Hosokawa SoPS-A Pantoea Plautia stali Pentatominae 4.093 57 84.8 5202 et al, 2016 Hosokawa SoPS-B Pantoea Plautia stali Pentatominae 2.429 55.9 72.5 3214 et al, 2016 Hosokawa SoPS-C Pantoea Plautia stali Pentatominae 5.138 57.4 86.7 4850 et al, 2016 Hosokawa SoPS-D Pantoea Plautia stali Pentatominae 5.531 53.8 86.4 5299 et al, 2016 Hosokawa SoPS-E Pantoea Plautia stali Pentatominae 5.406 53.7 87.2 5135 et al, 2016 Hosokawa SoPS-F Pantoea Plautia stali Pentatominae 4.665 56.7 86.2 4381 et al, 2016

131

Chapter 5 Identifying independent origins of symbiosis in the presence of long branch

attraction artifacts

Introduction

Genome sequencing has become a frequent practice and in some cases the first step to identifying and studying uncultivated biodiversity (Medina & Sachs 2010). Most bacteria remain to be cultivated and one of the first steps in their characterization is to identify the major taxonomic group they belong to and how they are related to previously identified organisms (Sachs et al. 2011). Additionally, new genomes are being described using metagenome assembled genomes, or MAGs (Hugerth et al. 2015; Cabello-Yeves et al.

2017). However, many unculturable organisms also feature special life history traits that can make standard techniques for this difficult. Long branch attraction (LBA) is a prevalent artifact in phylogenetic tree constructions where distantly related taxa are artificially grouped together despite little shared evolutionary history. LBA particularly affects species that, when compared to other closely related taxa, have increased rates of substitutions or base pair bias (Ruano-Rubio et al. 2007). Moreover, the confounding effect of these artifacts remain even if species-rich and/or whole-genome level datasets are used, usually accompanied by high support values (Pick et al. 2010), which would cause the evolutionary history of these bacteria to be incorrectly inferred and result in erroneous conclusions being drawn.

132

To counter this problem, increasing the complexity of the sequence evolution model used for the phylogenetic inference to better reflect specific characteristics of the data can be used (Boussau & Gouy 2006; Qu et al. 2017). However, LBA is very rarely considered at the time of reconstruction, likely due to the need of specialized software, increased computational resources and time, or specific skills for the interpretation of results as well as the requirement of larger datasets to be performed successfully. Adding taxa has also been shown to improve placement of taxa in the presence of LBA (Siddall &

Whiting 1999; Hedtke et al. 2006; Pick et al. 2010), however, in some cases this is not possible due to a dramatic shift in mutation rate that no intermediate taxa.

Additionally, it has been shown that including more taxa may even reduce accuracy and that adding more character states is most beneficial (Poe & Swofford 1999). Taxon selection is also frequently neglected, where not enough clades are included to fully understand the source or relationships of these symbionts. Given that bacterial diversity in symbionts is relatively unexplored, lack of available sequences from close clades may also contribute to this problem. Alternatively, with increasing computational power and sequence data, the use of fast methods on giant datasets can be used to include all currently available sequences. However, while whole genome analyses exist, the greatest amount of newly found diversity is often represented in single or few loci, such as 16S rRNA genes, ribosomal proteins, or few other genes that could be found in a specific metagenomic sample. These are frequently used in phylogenetic reconstructions as they require lower computational requirements and contain far fewer character states. LBA

133 can arise in either case as increasing sequence length does not improve taxon placement and can even magnify the artifact (Hedtke et al. 2006).

One example of taxa particularly affected by LBA would be vertically-inherited bacterial symbionts, which are frequently unculturable by standard enrichment approaches and which share a range of traits that makes their identification particularly difficult. While these symbionts have been known for decades (Buchner 1965), newer imaging and sequencing techniques facilitate their detection and the inference of their evolutionary history (Kenyon et al. 2015). Due to their close associations to their respective hosts, several traits required for survival outside of their hosts are no longer necessary, which generates conditions favorable for rapid gene loss as well as changes in genome size, content, and sequence characteristics (Degnan et al. 2009). Bacterial symbiont dependency upon their hosts can exist on a continuum, ranging from obligate intracellular symbionts that complete much of their life cycle within specialized tissues (e.g.

Blattabacterium spp. in cockroaches, Sabree et al. 2009) to other vertically-inherited symbionts that, in addition to residing in modified gut tissues, must survive outside of their hosts for weeks during their transmission to offspring (e.g. Pantoea spp. in stink bugs; Tada et al. 2011; Hosokawa et al. 2013; Bistolas et al. 2014). Lack of contact with other bacterial species, strict vertical transmission and relaxed selection can result in magnification of the effects of genetic drift within these symbionts (Rispe & Moran

2000). Relatively high sequence evolution rates are often observed (Wernegreen 2015;

Allen et al. 2009), which dramatically increases branch lengths and can result in

134 saturation of informative sites in sequences that can further confound these methods.

Bacterial symbionts of stink bugs are frequently affected by this genome reduction process and also have associated with their hosts convergently multiple times (Otero-

Bravo et al. 2018; Hosokawa et al. 2012; Duron & Noël 2016). These characteristics make this group useful for evaluating LBA artifacts.

The long branch extraction method proposed by Siddall et al. 1999 consists of removing a long branch from a phylogenetic reconstruction and evaluating whether the other long branches shifted their position in the tree, and it was reportedly useful when applied to beetles and flies (Siddall & Whiting 1999). However, identifying which branches are long enough to be attracted elsewhere is usually qualitative and relies on the identification of two or few branches. Newer datasets are considerably larger and the presence of a wide variety of branch lengths with different positions in the tree can complicate the inference of the relationships.

In this paper, we describe Elaboorate, an easy to use python pipeline which combines branch extraction with component analysis of summary statistics to determine independent origins of long branches in a tree. We evaluate this method using bacterial symbionts of insects, which are usually affected by LBA. Elaboorate uses alignment- based signatures such as residue composition or k-mer composition and k-distance from an outgroup which can be used as a proxy for branch length. Combining these metrics using strategies such as PCA and separating taxa using a cutoff can determine which

135 branches should be evaluated separately. We show that this method is able to identify taxa prone to LBA and their potential origin while also using other lines of evidence to address possible outcomes. Elaboorate allows for fast analysis using few loci and rapid tree construction methods or more thorough, multi-locus analysis by using different tree reconstruction and placement methods. This is intended to serve as a tool to evaluate if

LBA is present in a dataset before further analyses are performed. Additionally, we provide guidelines for identifying possible artifacts in tree reconstructions such as the case of reduced genome symbionts or symbionts with genome reduction without host restriction (Dufresne et al. 2005; Giovannoni et al. 2005).

Methods

Stink bug symbiont genome sequencing, assembly and annotation

Wild caught stink bugs were collected at La Selva Biological Station, Organization for

Tropical Studies. These included Arvelius albopunctatus, Sibaria englemani, Mormidea sp., Pentatomidae sp. and Taurocerus edessoides. Additionally, individuals of the Brown

Stink Bug Euschistus servus and Green Stink Bug Nezara viridula were donated by

Michael Toews and individuals of the Harlequin Bug Murgantia histrionica were donated by Don Weber. DNA extraction and genome sequencing were performed as in

(Otero-Bravo et al. 2018). Briefly, insects were surface-sterilized and the symbiotic organ was dissected. DNA was extracted from the whole organ using DNEasy kit according to manufacturer instructions. Paired 300 bp Nextera Libraries were prepared and sequenced at the Ohio State MCIC. Assembly was performed with SPAdes v3.10 (Bankevich et al.

136

2012) and resulting contigs were filtered based on coverage level, scaffold linkage using

Bandage (Wick et al. 2015) and BLAST (Camacho et al. 2009) searches to separate host and symbiont sequences. Genomes were annotated using RAST (Aziz et al. 2008) and the

NCBI PGAP (Tatusova et al. 2016; Haft et al. 2018) and submitted to GenBank

(SAMN11586498- SAMN11586504). A detailed assembly summary can be found in

Supplementary Material S1.

Ortholog discovery and alignment

Genomes from the Pantoea and Erwinia genera were downloaded from GenBank using the Entrez toolkit. The genome of E. coli (NC_016579) was also downloaded to be used as the outgroup. For each genome, ORFs were identified using Prodigal (Hyatt et al.

2010) and a reciprocal best-hit BLAST was performed between each pair of genomes keeping only the best hits. The set of genes common to all genomes was kept and were aligned using translatorX (Abascal et al. 2010) and MUSCLE (Edgar 2004) and cleaned with GBLOCKS (Castresana 2000). Alignments were inspected and concatenated in

Geneious 8 (Kearse et al. 2012). In order to reduce the runtime of many tree inferring software, we removed genomes that were >99% identical to each other using custom scripts. We also used an alignment from a previous analysis of long branch attraction

(Husnik et al. 2013) in bacterial symbionts of insects which consists of 69 orthologous genes from 55 taxa including 14 with reduced genomes.

137

Tree reconstruction using branch extraction

In order to avoid long branch attraction artifacts, Elaboorate uses the long branch extraction method described in (Siddall & Whiting 1999) in which long branches are individually assessed while removing other branches which they could be attracted to.

For this, we identified taxa potentially prone to LBA using a principal component analysis of the k-distance from each sequence to an outgroup taxon calculated using the pairwise distance function of BioPython (Cock et al. 2009) as well as frequencies for each character and for each dimer in the sequence. These variables were summarized by their first principal component (PC1). The distribution of this PC1 was evaluated for bimodality and a cutoff was selected based on the valley of the distribution. Then, we created a base tree excluding all taxa identified as prone to LBA.

Tree reconstruction was performed using RAxML (Stamatakis 2014) using the GTR model with CAT approximation for nucleotide sequence data and the DAYHOFF matrices for protein sequence data. Preliminary analyses were performed using Fast-Tree

(Price et al. 2010). For the branch extraction method, a separate alignment was generated for each taxon prone to LBA at the exclusion of all others. Tree reconstruction was performed for each of these as the previous ML tree reconstruction. Finally, best trees for each of the long branch taxa were combined using a modified majority rule plus consensus algorithm (Jansson et al. 2018) allowing the inclusion of trees with differing taxa sets. Separately, the Evolutionary Placement Algorithm from (Berger et al. 2011)

138 was used to calculate the best placement of each taxon on the base tree. All trees were visualized using Dendroscope (Huson & Scornavacca 2012).

Results

Summary statistics discriminate genome reduction even with small alignments

49 ortholog sequences were recovered from 40 genomes in the Pantoea + Erwinia genera of which 13 corresponded to genome-reduced symbionts of stink bugs. After trimming with Gblocks, the protein alignment spanned 19125 residues and nucleotide alignment spanned 57375 bases. The genomes of the symbionts of E. servus and M. histrionica which were not reduced in genome size as the other stink bug symbionts, were found to be over 99% identical to previously identified Pantoea species but were not excluded from the analyses.

We obtained the first principal component (PC1) of the alignment statistics, which explained 91% of the variance in the values for the concatenated alignment. The distribution of the PC1 consisted of several values in a small, negative range, and a number of them with large positive values. When comparing this PC1 with the size of the bacteria’s genome we were able to distinguish genome reduced bacteria from the others

(Figure 5.1). When performing the same analysis using a single coding gene (rpoB) we noted the same trend: genome reduced sequences were clearly from a separate distribution, allowing an easy cutoff to determine the difference between the groups.

139

Branch extraction indicates separate origin for genome reduced symbionts of stink bugs

Branch extraction performed on a maximum likelihood tree reconstruction revealed all but Candidatus ‘Erwinia pseudotaxifoliae’ to be in the same clade (Figure 5.2), which is a aphid symbiont (Manzano‐ Marín et al. 2017). Several of the nodes within this clade included high support. Upon identifying taxa with extreme PC1 values, these flagged taxa were evaluated to determine if they were particularly affected by LBA using a branch extraction approach. Separate alignments for each of the LBA-prone taxa at the exclusion of all other LBA-prone taxa and generated tree reconstructions were created for each of them separately. A consensus tree of the individual reconstructions was created to compare the placement of each taxon. By this approach, genome-reduced taxa which are frequently prone to LBA were placed in separate clades (Figure 5.3). While a few of these were placed in clades adjacent to the outgroup, all others were distributed across four clades in the consensus tree. The genome-reduced E. pseudotaxifoliae showed identical placements between the regular ML reconstruction and the branch extraction method. On the other hand, stink bug symbionts were roughly grouped by their genome size, with the symbionts of Edessa stink bugs being placed with Pantoea algahi and

Pantoea sp. PSNIH2, the symbiont of Nezara viridula placed with Pantoea septica, and symbionts of M. histrionica, A. albopucntatus, and H. halys were placed near a clade including P. dispersa, P. rwandensis, and P. eucrina. Results from RAxML and MrBayes did not differ significantly in the placement of these taxa or the main clades in the base tree.

140

Genome synteny analysis supports some placements

Genome synteny is highly conserved in reduced genome symbionts with tight associations to their host (Tamas et al. 2002). When complete or near complete genomes are available, the synteny of these could reveal whether they are part of the same or different symbiotic event. The synteny of reduced genomes were compared to corroborate the placement of their corresponding taxa within the tree. In the case of stink bug symbionts, none of the genomes between 1-2Mbp in size exhibited complete synteny

(Figure 5.4a-b). However, all of the <1Mbp genomes exhibited nearly perfect synteny

(Figure 5.4c) (Otero-Bravo et al. 2018). Additionally, symbionts of more distantly related stink bugs such as T. gelatinosa exhibited little synteny with P. carbekii (Figure 5.4d) as well as other symbiotic and non-symbiotic Pantoea, which suggests a more distant origin and explains their proximity to the outgroup. The genomes of the more derived symbionts

T. gelatinosa, I. capsulata, and E. pseudotaxifoliae showed a greater level of genomic rearrangements even when compared to non-reduced genome taxa near their inferred placement.

Branch extraction produces similar results as more complex models of drawing phylogenetic inferences

Sequence recoding methods and complex models of sequence evolution that seek to adjust for heterogeneity in the dataset both attempt to disentangle long branches (Lartillot et al. 2007; Rota-Stabelli et al. 2013; Sheffield et al. 2009). Recoding sequences by translation to a protein sequence can remove convergence in third codon positions, which

141 are likely saturated for extremely reduced genomes. Identification of fast evolving sites that are likely saturated and decrease inference accuracy has also been used (Kostka et al.

2008; Song et al. 2016). Husnik, et al. 2011 showed that the evolutionary history of

Enterobacteriaceae symbionts of insects could be most accurately inferred when using amino acid sequence recoding and non-homologous branch models were used (Husník et al. 2011), and this dataset was used to compare the results of the branch extraction method to their model. Branch extraction was able to recover almost the same groupings as were found by Husnik et al. (Figure 5.5, 5.6). The main difference included some

Buchnera genomes being placed near the base of the tree while others were placed near the Pantoea clade. However, other groupings such as the Wigglesworthia + Baumannia +

Blochmannia + Sodalis, the Hamiltonella + Regiella and the Riesia + Arsenophonus were recovered and placed in the same or nearby nodes. Additionally, Ishikawaella and

Buchnera were not placed in the same node.

Discussion

Identification of problematic taxa and branch extraction method

The life history traits that result in increased rates of molecular evolution also magnify transition-transversion bias, which is reflected even in short alignments. The correlation between transition-transversion bias and sequence identity with a common taxon such as an outgroup has been shown to reliably identify taxa with genomes that have undergone significant recent reduction (Hosokawa et al. 2016). Integrating these factors and summarizing them using dimensional reduction techniques facilitates the identification of

142 taxa that are likely to violate common assumptions of tree reconstruction methods. Once taxa with highly reduced genomes are identified, steps can be taken to account for known artifacts that datasets including these taxa can suffer.

Within an Enterobacteriaceae dataset that included endosymbionts of stinkbugs, all but one of the taxa identified as potentially prone to LBA were placed in the same node

(Figure 5.2). The exception to this was the genome reduced Erwinia symbiont of Cinara aphid, which is a secondary symbiont (Meseguer et al. 2017). Despite having a very similar genome size to some of the stink bug symbionts, it likely retained enough information to relate it to the Erwinia genus despite its sequence bias and a long branch which could be prone to attraction. This exception provides evidence that when using branch extraction method with long branch taxa that are not affected by LBA, we recover the same placement.

The remaining reduced-genome symbionts were placed in separate nodes of the tree. The first group included the smallest genomes used and the longest branches. The symbionts of T. gelatinosa, I. capsulata, E. eburatula, and E. sp were placed adjacent to the outgroup taxon. This is likely due to the fact that the outgroup itself is a long branch, therefore long branches that cannot be reliably placed in the tree can end up attracted to the outgroup (Bergsten 2005). Synteny analysis also show a very degenerated genome for

T. gelatinosa and I. capsulata indicating longer periods of host association and therefore more saturation in their sequences and possibly a different source genus. However,

143 outgroup selection is critical as it allows us to generate a consistent branch length with which to compare taxa, as well as to visualize correct relationships between the taxa.

Additionally, since the outgroup itself is a long branch which we have certainty is not related to the clades in question, it can still attract sequences to the base of the tree, producing an incorrect placement. If long branch taxa are found to be near the root of the tree, this should be interpreted as a problem with the taxa selected for the analysis or the size of the dataset and should be further investigated.

The other two symbionts of Edessa stink bugs were placed on a separate node near basal

Pantoea species. This placement is not consistent with the previous placement when using a larger alignment including more Pantoea species (Otero-Bravo et al. 2018).

However, both species that were not placed near the outgroup were placed on the same node, which indicates some consistency in the placement method. Genome synteny in this group is identical (with a single small inversion) and gene content is also remarkably similar pointing to a single symbiotic event (Otero-Bravo et al. 2018). However, this group shows how even very similar sequences can be placed in radically different places in the tree, even with the branch extraction method. This is likely due to the heuristic nature of tree searching, that when the appropriate area of the tree-space is not evaluated, the taxa will be attracted to the longest branch, in this case the outgroup. It also shows how the use of different loci in the alignment and taxa in the base tree can yield different placements even if not attracted to the outgroup.

144

Detailed evolution models and branch extraction

On the broader family level phylogeny, we see similar results as before. While four

Buchnera genomes were used in this analysis, two of them were placed near the outgroup while the other two were placed near the Pantoea group. This second placement is consistent with the one recovered using non-homogeneous PhyML (Husník et al. 2011) and amino acid recoding. This is likely the same phenomenon as was observed with the symbionts of Edessa in the previous example, where some exploration of the tree space was not able to identify the correct placement for some of these Buchnera. An interesting difference is that while the more complex model recovered a Buchnera + Ishikawaella group, the branch extraction method placed them in different yet close nodes. These two symbionts are unlikely to be closely related as their hosts and times of symbiosis are radically different (Tamas et al. 2002; Hosokawa et al. 2006; Clark et al. 1999).

These sophisticated methods that account for removal of fast sites, specialized substitution matrices, or special models of sequence evolution such as non homogeneous branch models can be used to counter some of the artifacts shown here. However, despite the existence of such methods, their prevalence in their use has been limited. This could be due to their technical requirements for appropriate installation or running or the complexity in the models that require specification. The algorithms we present here can be performed with regular tools likely to be already in most researcher’s toolbox. The branch extraction method is an easier approach to consider as it uses the same requirements as the commonly used methods. Additionally, we developed the python

145 pipeline Elaboorate to facilitate this process, freely available at

(github.com/oterobravo/elaboo). It is also important to consider that even the long branch extraction method may incorrectly specify the position of a clade (Parks & Goldman

2014) and that movement of branches is not necessarily the only test for the presence of

LBA. Therefore, it is important to consider alternative metrics that inform on the monophyly of groups, such as the genome synteny analysis we have shown here.

Taxa selection and branch extraction

Probably the single most important step for the reconstruction of the phylogeny of a group that often gets dismissed is taxon selection. Using too distantly related groups or ignoring closely related ones can impact severely the placement of one or more groups during any analyses. Here we show that the inclusion or exclusion of taxa can result in markedly different placements for a single group. Being able to systematically identify taxa prone to LBA is then very useful at this stage of the process since it indicates areas where the taxon selection may need to be refined. In this case, using sequence similarity to identify the closest taxonomic delimitation to all taxa prone to LBA can help include a couple of members of several genera in that delimitation or including sibling groups to broaden the scope of the analysis would be recommended. The trade-off for this would be an increased computational load, however, less intensive preliminary methods can be used at this step before moving on to more intensive yet accurate methods. Another would include reducing the total data available for more intensive methods.

146

Genome synteny as evidence of placement

For highly derived vertically inherited symbionts, high genomic stasis even across hundreds of millions of years is common despite increased mutation rates (Tamas et al.

2002). While it has been suggested that this could be due to the loss of genes associated with recombination, some cases have shown this to be unlikely (Sloan & Moran 2013;

McCutcheon & von Dohlen 2011). However, a high degree of genomic synteny is observed for most symbionts that are derived from a single symbiotic event (Tamas et al.

2002; Otero-Bravo et al. 2018). Small inversions can be evidenced in some of these symbionts but they are rare. While convergence in gene content is normal in symbiont replacement, genomic synteny is unlikely to converge as it is static upon an established symbiosis. Therefore, genome synteny could be used as evidence that particular symbionts are associated by the same or different symbiotic events. However, how fast a symbiont achieves this genomic stasis is still unknown and small differences in genomic synteny are difficult to interpret. Finally, other symbionts such as T. gelatinosa, I. capsulata, and E. pseudotaxifoliae show very little synteny with all close genera, which could indicate some genome reductions could have dramatic rearrangements while others such as the symbionts of Edessa stink bugs do not, despite similar genome sizes.

Importance/Implications of single vs multiple events of symbiosis

Phylogenies of insect symbionts are made to test hypothesis of the origin of a particular, highly derived clade. Frequently, these tests result in a single origin of insect endosymbiosis, often between symbionts inhabiting deeply diverged insect species.

147

These hypotheses are usually incorrect in that different symbionts likely originated from natural populations of free-living or (McCutcheon et al. 2019) instead of previously identified symbionts, which would imply a transition in hosts, lifestyles

(intracellular, extracellular), host relationship (nutrient supplementation, host defense, etc). Symbiont phylogenies have been used to inform host phylogenies (Nováková et al.

2013), the appearance of host traits (Belcaid et al. 2019) or even host ecological traits

(Nieberding & Olivieri 2007) which highlights the importance of a careful consideration of these trees. Since endosymbionts have convergently developed molecular traits due to their population structure, and like any convergent trait it can confound phylogenetic methods it represents an additional challenge to understanding which traits are important or necessary for a symbiosis to occur.

Conclusion

While there is still no perfect method to account for the genome reduction in understanding the origin of some symbionts, Elaboorate allows for a rapid evaluation of the presence of LBA artifacts and evaluates placements for taxa likely affected by it.

Genome synteny can be useful in organisms such as host-restricted symbionts but requires at least nearly-complete genomes to yield informative results. Other organisms may not be subject to the same degree of genomic stasis and therefore other lines of evidence should be used. Here, we present a framework to identify possible artifacts that confound the life history of some organisms and propose the exclusion of long branches and generation of a consensus tree based on separate phylogenetic inferences that

148 excluded other long branches when assessing the origin of a new derived symbiont. The identification of such branches is a critical step in order to avoid erroneous conclusions and can be performed reliably with principal component analysis on simple sequence metrics, even if an entire genome is not used.

Correct differentiation of long branches is critical for studies on the evolution of traits such as symbiosis, where a possibly convergent trait can confound commonly used methods and generate inflated confidence in its conclusions. Additionally, studying symbioses such as these can benefit from correctly identifying multiple instances of independent symbiotic events as it can shed light into host traits, specific requirements of a particular symbiont or the minimum requirements for a cell to survive under particular environments.

149

References

Abascal F, Zardoya R, Telford MJ. 2010. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Res. 38:W7–W13. doi: 10.1093/nar/gkq291.

Allen JM, Light JE, Perotti MA, Braig HR, Reed DL. 2009. Mutational Meltdown in Primary Endosymbionts: Selection Limits Muller’s Ratchet Tregenza, T, editor. PLoS One. 4:e4969. doi: 10.1371/journal.pone.0004969.

Aziz RK et al. 2008. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 9:75. doi: 10.1186/1471-2164-9-75.

Bankevich A et al. 2012. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. J. Comput. Biol. 19:455–477. doi: 10.1089/cmb.2012.0021.

Belcaid M et al. 2019. Symbiotic organs shaped by distinct modes of genome evolution in cephalopods. Proc. Natl. Acad. Sci. U. S. A. 116:3030–3035. doi: 10.1073/pnas.1817322116.

Berger SA, Krompass D, Stamatakis A. 2011. Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads under Maximum Likelihood. Syst. Biol. 60:291–302. doi: 10.1093/sysbio/syr010.

Bergsten J. 2005. A review of long-branch attraction. Cladistics. 21:163–193. doi: 10.1111/j.1096-0031.2005.00059.x.

Bistolas KSI, Sakamoto RI, Fernandes J a M, Goffredi SK. 2014. Symbiont polyphyly, co-evolution, and necessity in pentatomid stinkbugs from Costa Rica. Front. Microbiol. 5:1–15. doi: 10.3389/fmicb.2014.00349.

Boussau B, Gouy M. 2006. Efficient likelihood computations with nonreversible models of evolution. Syst. Biol. 55:756–768. doi: 10.1080/10635150600975218.

Buchner P. 1965. Endosymbiosis of animals with plant microorganisms. Interscience Publishers: New York.

Cabello-Yeves PJ et al. 2017. Novel Synechococcus Genomes Reconstructed from Freshwater Reservoirs. Front. Microbiol. 8:1151. doi: 10.3389/fmicb.2017.01151.

Camacho C et al. 2009. BLAST+: architecture and applications. BMC Bioinformatics. 10:421. doi: 10.1186/1471-2105-10-421.

150

Castresana J. 2000. Selection of Conserved Blocks from Multiple Alignments for Their Use in Phylogenetic Analysis. Mol. Biol. Evol. 17:540–552. doi: 10.1093/oxfordjournals.molbev.a026334.

Clark MA, Moran NA, Baumann P. 1999. Sequence evolution in bacterial endosymbionts having extreme base compositions. Mol. Biol. Evol. 16:1586–1598. doi: 10.1093/oxfordjournals.molbev.a026071.

Cock PJA et al. 2009. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 25:1422–1423. doi: 10.1093/bioinformatics/btp163.

Degnan PH et al. 2009. Dynamics of genome evolution in facultative symbionts of aphids. Environ. Microbiol. 12:2060–2069. doi: 10.1111/j.1462- 2920.2009.02085.x.

Dufresne A, Garczarek L, Partensky F. 2005. Accelerated evolution associated with genome reduction in a free-living prokaryote. Genome Biol. 6:R14. doi: 10.1186/gb-2005-6-2-r14.

Duron O, Noël V. 2016. A wide diversity of Pantoea lineages are engaged in mutualistic symbiosis and cospeciation processes with stinkbugs. Environ. Microbiol. Rep. 8:715–727. doi: 10.1111/1758-2229.12432.

Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32:1792–7. doi: 10.1093/nar/gkh340.

Giovannoni SJ et al. 2005. Genome streamlining in a cosmopolitan oceanic bacterium. Science. 309:1242–5. doi: 10.1126/science.1114057.

Haft DH et al. 2018. RefSeq: an update on prokaryotic genome annotation and curation. Nucleic Acids Res. 46:D851–D860. doi: 10.1093/nar/gkx1068.

Hedtke SM, Townsend TM, Hillis DM. 2006. Resolution of Phylogenetic Conflict in Large Data Sets by Increased Taxon Sampling Collins, T, editor. Syst. Biol. 55:522–529. doi: 10.1080/10635150600697358.

Hosokawa T et al. 2013. Diverse Strategies for Vertical Symbiont Transmission among Subsocial Stinkbugs. PLoS One. 8:4–11. doi: 10.1371/journal.pone.0065081.

Hosokawa T, Kikuchi Y, Nikoh N, Fukatsu T. 2012. Polyphyly of gut symbionts in stinkbugs of the family cydnidae. Appl. Environ. Microbiol. 78:4758–4761. doi: 10.1128/AEM.00867-12.

151

Hosokawa T, Kikuchi Y, Nikoh N, Shimada M, Fukatsu T. 2006. Strict host-symbiont cospeciation and reductive genome evolution in insect gut bacteria. PLoS Biol. 4:1841–1851. doi: 10.1371/journal.pbio.0040337.

Hosokawa T, Matsuura Y, Kikuchi Y, Fukatsu T. 2016. Recurrent evolution of gut symbiotic bacteria in pentatomid stinkbugs. Zool. Lett. 2:24. doi: 10.1186/s40851-016-0061-4.

Hugerth LW et al. 2015. Metagenome-assembled genomes uncover a global brackish microbiome. Genome Biol. 16:279. doi: 10.1186/s13059-015-0834-7.

Husník F, Chrudimský T, Hypša V. 2011. Multiple origins of endosymbiosis within the Enterobacteriaceae (γ-Proteobacteria): convergence of complex phylogenetic approaches. BMC Biol. 9:87. doi: 10.1186/1741-7007-9-87.

Huson DH, Scornavacca C. 2012. Dendroscope 3: An Interactive Tool for Rooted Phylogenetic Trees and Networks. Syst. Biol. 61:1061–1067. doi: 10.1093/sysbio/sys062.

Hyatt D et al. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 11:119. doi: 10.1186/1471-2105-11-119.

Jansson J, Rajaby R, Shen C, Sung W-K. 2018. Algorithms for the Majority Rule (+) Consensus Tree and the Frequency Difference Consensus Tree. IEEE/ACM Trans. Comput. Biol. Bioinforma. 15:15–26. doi: 10.1109/TCBB.2016.2609923.

Kearse M et al. 2012. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 28:1647–1649. doi: 10.1093/bioinformatics/bts199.

Kenyon LJ, Meulia T, Sabree ZL. 2015. Habitat Visualization and Genomic Analysis of ‘Candidatus Pantoea carbekii,’ the Primary Symbiont of the Brown Marmorated Stink Bug. Genome Biol. Evol. 7:620–635. doi: 10.1093/gbe/evv006.

Kostka M, Uzlikova M, Cepicka I, Flegr J. 2008. SlowFaster, a user-friendly program for slow-fast analysis and its application on phylogeny of Blastocystis. BMC Bioinformatics. 9:341. doi: 10.1186/1471-2105-9-341.

Lartillot N, Brinkmann H, Philippe H. 2007. Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol. Biol. 7:S4. doi: 10.1186/1471-2148-7-S1-S4.

Manzano‐ Marín A, Szabó G, Simon J, Horn M, Latorre A. 2017. Happens in the best of subfamilies: establishment and repeated replacements of co‐ obligate secondary

152

endosymbionts within aphids. Environ. Microbiol. 19:393–408. doi: 10.1111/1462-2920.13633.

Mccutcheon JP, Boyd BM, Dale C. 2019. The Life of an Insect Endosymbiont from the Cradle to the Grave. doi: 10.1016/j.cub.2019.03.032.

McCutcheon JP, von Dohlen CD. 2011. An Interdependent Metabolic Patchwork in the Nested Symbiosis of . Curr. Biol. 21:1366–1372. doi: 10.1016/J.CUB.2011.06.051.

Medina M, Sachs JL. 2010. Symbiont genomics, our new tangled bank. Genomics. 95:129–137. doi: 10.1016/J.YGENO.2009.12.004.

Meseguer AS et al. 2017. Buchnera has changed flatmate but the repeated replacement of co-obligate symbionts is not associated with the ecological expansions of their aphid hosts. Mol. Ecol. 26:2363–2378. doi: 10.1111/mec.13910.

Nieberding CM, Olivieri I. 2007. Parasites: proxies for host genealogy and ecology? Trends Ecol. Evol. 22:156–165. doi: 10.1016/j.tree.2006.11.012.

Nováková E et al. 2013. Reconstructing the phylogeny of aphids (Hemiptera: Aphididae) using DNA of the obligate symbiont Buchnera aphidicola. Mol. Phylogenet. Evol. 68:42–54. doi: 10.1016/j.ympev.2013.03.016.

Otero-Bravo A, Goffredi S, Sabree ZL. 2018. Cladogenesis and Genomic Streamlining in Extracellular Endosymbionts of Tropical Stink Bugs. Genome Biol. Evol. 10:680–693. doi: 10.1093/gbe/evy033.

Parks SL, Goldman N. 2014. Maximum Likelihood Inference of Small Trees in the Presence of Long Branches. Syst. Biol. 63:798–811. doi: 10.1093/sysbio/syu044.

Pick KS et al. 2010. Improved Phylogenomic Taxon Sampling Noticeably Affects Nonbilaterian Relationships. Mol. Biol. Evol. 27:1983–1987. doi: 10.1093/molbev/msq089.

Poe S, Swofford DL. 1999. Taxon sampling revisited. Nature. 398:299–300. doi: 10.1038/18592.

Price MN, Dehal PS, Arkin AP. 2010. FastTree 2- Approximately maximum-likelihood trees for large alignments. PLoS One. 5:e9490. doi: 10.1371/journal.pone.0009490.

Qu X-J, Jin J-J, Chaw S-M, Li D-Z, Yi T-S. 2017. Multiple measures could alleviate long-branch attraction in phylogenomic reconstruction of Cupressoideae (Cupressaceae). Sci. Rep. 7:41005. doi: 10.1038/srep41005.

153

Rispe C, Moran NA. 2000. Accumulation of Deleterious Mutations in Endosymbionts: Muller’s Ratchet with Two Levels of Selection. Am. Nat. 156:425–441. doi: 10.1086/303396.

Ronquist F et al. 2012. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61:539–42. doi: 10.1093/sysbio/sys029.

Rota-Stabelli O, Lartillot N, Philippe H, Pisani D. 2013. Serine Codon-Usage Bias in Deep Phylogenomics: Pancrustacean Relationships as a Case Study. Syst. Biol. 62:121–133. doi: 10.1093/sysbio/sys077.

Ruano-Rubio V, Fares MA, Collins T. 2007. Artifactual Phylogenies Caused by Correlated Distribution of Substitution Rates among Sites and Lineages: The Good, the Bad, and the Ugly. Syst. Biol. 56:68–82. doi: 10.1080/10635150601175578.

Sabree ZL, Kambhampati S, Moran NA. 2009. Nitrogen recycling and nutritional provisioning by Blattabacterium, the cockroach endosymbiont. Proc. Natl. Acad. Sci. 106:19521–19526. doi: 10.1073/pnas.0907504106.

Sheffield NC, Song H, Cameron SL, Whiting MF. 2009. Nonstationary Evolution and Compositional Heterogeneity in Beetle Mitochondrial Phylogenomics. Syst. Biol. 58:381–394. doi: 10.1093/sysbio/syp037.

Siddall ME, Whiting MF. 1999. Long-Branch Abstractions. Cladistics. 15:9–24. http://www.idealibrary.com (Accessed June 5, 2018).

Sloan DB, Moran NA. 2013. The evolution of genomic instability in the obligate endosymbionts of whiteflies. Genome Biol. Evol. 5:783–93. doi: 10.1093/gbe/evt044.

Song F et al. 2016. Capturing the Phylogeny of Holometabola with Mitochondrial Genome Data and Bayesian Site-Heterogeneous Mixture Models. Genome Biol. Evol. 8:1411–1426. doi: 10.1093/gbe/evw086.

Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 30:1312–1313. doi: 10.1093/bioinformatics/btu033.

Tada A et al. 2011. Obligate association with gut bacterial symbiont in Japanese populations of the southern green stinkbug Nezara viridula (Heteroptera: Pentatomidae). Appl. Entomol. Zool. 46:483–488. doi: 10.1007/s13355-011- 0066-6.

154

Tamas I et al. 2002. 50 Million Years of Genomic Stasis in Endosymbiotic Bacteria. Science (80). 296. http://science.sciencemag.org/content/296/5577/2376.full (Accessed August 3, 2017).

Tatusova T et al. 2016. NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res. 44:6614–6624. doi: 10.1093/nar/gkw569.

Wernegreen JJ. 2015. Endosymbiont evolution: predictions from theory and surprises from genomes. Ann. N. Y. Acad. Sci. 1360:16–35. doi: 10.1111/nyas.12740.

Wick RR, Schultz MB, Zobel J, Holt KE. 2015. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics. 31:3350–3352. doi: 10.1093/bioinformatics/btv383.

155

Figures and Tables

Figure 5.1 Distribution of alignment statistic principal component in relation to genome size.

a) PC1 of a single coding gene (rpoB) for 40 species of Pantoea and Erwinia used, and b)

PC1 of alignment statistics for 49 concatenated coding sequences.

156

Figure 5.2 Maximum likelihood reconstruction of Pantoea, Erwinia, and stink bug symbionts.

Underlined taxa indicate symbionts of insects. Bolded taxa indicate genome reduced symbionts.

157

Figure 5.3 Branch extraction consensus tree for Pantoea, Erwinia, and stink bug symbionts.

Underlined taxa indicate symbionts of insects. Bolded taxa indicate genome reduced symbionts.

158

Figure 5.4 Select genome synteny plots for symbionts of stink bugs.

Vertical lines indicate breaks between scaffolds in non-circularized genomes. Very similar, but not identical synteny can be observed between mid-size genome symbionts

(a, b). Identical synteny can be observed between symbionts of Edessa with smaller genomes (c). Lack of synteny between symbionts of more distantly related hosts.

159

Figure 5.5 Husnik et al. 2011 dataset reconstructed using Maximum Likelihood.

Bootstrap support is indicated at nodes and colors correspond to groups identified in the previously mentioned work.

160

Figure 5.6 Husnik et al. dataset using Elaboorate with the EPA

161

References

Abascal F, Zardoya R, Telford MJ. 2010. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Res. 38:W7-13. doi: 10.1093/nar/gkq291.

Abe Y, Mishiro K, Takanashi M. 1995. Symbiont of Brown-Winged Green Bug, Plautia stali SCOTT. Japanese J. Appl. Entomol. Zool. 39:109–115. doi: 10.1303/jjaez.39.109.

Adeolu M, Alnajar S, Naushad S, Gupta RS. 2016. Genome-based phylogeny and taxonomy of the ‘Enterobacteriales’: Proposal for enterobacterales ord. nov. divided into the families Enterobacteriaceae, Erwiniaceae fam. nov., Pectobacteriaceae fam. nov., Yersiniaceae fam. nov., Hafniaceae fam. nov., Morgane. Int. J. Syst. Evol. Microbiol. 66:5575–5599. doi: 10.1099/ijsem.0.001485.

Allen JM, Light JE, Perotti MA, Braig HR, Reed DL. 2009. Mutational Meltdown in Primary Endosymbionts: Selection Limits Muller’s Ratchet Tregenza, T, editor. PLoS One. 4:e4969. doi: 10.1371/journal.pone.0004969.

Anbutsu H et al. 2017. Small genome symbiont underlies cuticle hardness in beetles. Proc. Natl. Acad. Sci. 201712857. doi: 10.1073/pnas.1712857114.

Anderson, B., Olivieri, I., Lourmas, M., Stewart, B.A., 2004. Comparative population genetic structure and local adaptation of two mutualists. Evolution (N. Y). 58, 1730–1747.

Anderson, K. E., Russell, J. a., Moreau, C. S., Kautz, S., Sullam, K. E., Hu, Y., Basinger, U., Mott, B. M., Buck, N., and Wheeler, D. E. 2012. Highly similar microbial communities are shared among related and trophically similar ant species. Mol. Ecol. 21, 2282–2296. doi:10.1111/j.1365-294X.2011.05464.x.

Ankrah NYD, Chouaia B, Douglas AE. 2018. The Cost of Metabolic Interactions in Symbioses between Insects and Bacteria with Reduced Genomes. MBio. 9. doi: 10.1128/mBio.01433-18.

Aziz RK et al. 2008. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics. 9:75. doi: 10.1186/1471-2164-9-75.

162

Baker LJ et al. 2019. Diverse deep-sea anglerfishes share a genetically reduced luminous symbiont that is acquired from the environment. Elife. 8. doi: 10.7554/eLife.47606.

Bankevich A et al. 2012. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. J. Comput. Biol. 19:455–477. doi: 10.1089/cmb.2012.0021.

Bansal R, Michel A, Sabree Z. 2014. The Crypt-Dwelling Primary Bacterial Symbiont of the Polyphagous Pentatomid Pest Halyomorpha halys (Hemiptera: Pentatomidae). Environ. Entomol. 617–625.

Barcellos A, Grazia J. 2003. Revision of Brachystethus (Heteroptera, Pentatomidae, Edessinae). Iheringia. Série Zool. 93:413–446. doi: 10.1590/s0073- 47212003000400008.

Batut, B., Knibbe, C., Marais, G., and Daubin, V. 2014. Reductive genome evolution at both ends of the bacterial population size spectrum. Nature 12, 841–850. doi:10.1038/nrmicro3331.

Bauer, E., Salem, H., Marz, M., Vogel, H., and Kaltenpoth, M. 2014. Transcriptomic immune response of the cotton stainer Dysdercus fasciatus to experimental elimination of vitamin-supplementing intestinal symbionts. PLoS One 9, e114865. doi:10.1371/journal.pone.0114865.

Belcaid M et al. 2019. Symbiotic organs shaped by distinct modes of genome evolution in cephalopods. Proc. Natl. Acad. Sci. U. S. A. 116:3030–3035. doi: 10.1073/pnas.1817322116.

Bell, W. J., and Cardé, R. T. 1984. Chemical Ecology of Insects. Springer US doi:10.1016/0305-1978(86)90124-9.

Bennett GM, Moran N a. 2015. Heritable symbiosis: The advantages and perils of an evolutionary rabbit hole. Proc. Natl. Acad. Sci. 2015:201421388. doi: 10.1073/pnas.1421388112.

Berger SA, Krompass D, Stamatakis A. 2011. Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads under Maximum Likelihood. Syst. Biol. 60:291–302. doi: 10.1093/sysbio/syr010.

Bergsten J. 2005. A review of long-branch attraction. Cladistics. 21:163–193. doi: 10.1111/j.1096-0031.2005.00059.x.

163

Bistolas KSI, Sakamoto RI, Fernandes J a M, Goffredi SK. 2014. Symbiont polyphyly, co-evolution, and necessity in pentatomid stinkbugs from Costa Rica. Front. Microbiol. 5:1–15. doi: 10.3389/fmicb.2014.00349.

Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 30:2114–2120. doi: 10.1093/bioinformatics/btu170.

Boussau B, Gouy M. 2006. Efficient likelihood computations with nonreversible models of evolution. Syst. Biol. 55:756–768. doi: 10.1080/10635150600975218.

Bublitz DC et al. 2019. Peptidoglycan Production by an Insect-Bacterial Mosaic. Cell. 179:703-712.e7. doi: 10.1016/J.CELL.2019.08.054.

Buchner P. 1965. Endosymbiosis of animals with plant microorganisms. Interscience Publishers: New York.

Cabello-Yeves PJ et al. 2017. Novel Synechococcus Genomes Reconstructed from Freshwater Reservoirs. Front. Microbiol. 8:1151. doi: 10.3389/fmicb.2017.01151.

Camacho C et al. 2009. BLAST+: architecture and applications. BMC Bioinformatics. 10:421. doi: 10.1186/1471-2105-10-421.

Campbell MA et al. 2015. Genome expansion via lineage splitting and genome reduction in the cicada endosymbiont Hodgkinia. Proc. Natl. Acad. Sci. U. S. A. 112:10192–9. doi: 10.1073/pnas.1421386112.

Carlsson A, Nystrom T, de Cock H, Bennich H. 1998. Attacin an insect immune protein binds LPS and triggers the specific inhibition of bacterial outer membrane protein synthesis. Microbiology. 2179–2188. https://www.microbiologyresearch.org/docserver/fulltext/micro/144/8/mic-144-8- 2179.pdf?expires=1577821309&id=id&accname=sgid026576&checksum=08FA B03AC3617B71B86F9E1547E9DE16 (Accessed January 2, 2020).

Caspi R et al. 2016. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 44:D471– D480. doi: 10.1093/nar/gkv1164.

Castresana J. 2000. Selection of Conserved Blocks from Multiple Alignments for Their Use in Phylogenetic Analysis. Mol. Biol. Evol. 17:540–552. doi: 10.1093/oxfordjournals.molbev.a026334.

Cesari, M., Maistrello, L., Ganzerli, F., Dioli, P., Rebecchi, L., Guidetti, R., 2015. A pest alien invasion in progress: potential pathways of origin of the brown marmorated

164

stink bug Halyomorpha halys populations in Italy. J. Pest Sci. (2004). 88, 1–7. doi:10.1007/s10340-014-0634-y

Clark MA, Moran NA, Baumann P. 1999. Sequence evolution in bacterial endosymbionts having extreme base compositions. Mol. Biol. Evol. 16:1586–1598. doi: 10.1093/oxfordjournals.molbev.a026071.

Clayton AL, Jackson DG, Weiss RB, Dale C. 2016. Adaptation by Deletogenic Replication Slippage in a Nascent Symbiont. Mol. Biol. Evol. 33:1957–1966. doi: 10.1093/molbev/msw071.

Cock PJA et al. 2009. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 25:1422–1423. doi: 10.1093/bioinformatics/btp163.

Consortium U. 2017. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45:D158–D169. doi: 10.1093/nar/gkw1099.

Costliow ZA, Degnan PH. 2017. Thiamine Acquisition Strategies Impact Metabolism and Competition in the Gut Microbe Bacteroides thetaiotaomicron Gilbert, JA, editor. mSystems. 2:e00116-17. doi: 10.1128/mSystems.00116-17.

Criscione, C.D., Cooper, B., Blouin, M.S., 2006. Parasite genotypes identify source populations of migratory fish more accurately than fish genotypes. Ecology 87, 823–828. doi:[823:PGISPO]2.0.CO;2

Dan H, Ikeda N, Fujikami M, Nakabachi A. 2017. Behavior of bacteriome symbionts during transovarial transmission and development of the Asian citrus psyllid Niedz, RP, editor. PLoS One. 12:e0189779. doi: 10.1371/journal.pone.0189779.

Danese PN et al. 1998. Accumulation of the enterobacterial common antigen lipid II biosynthetic intermediate stimulates degP transcription in Escherichia coli. J. Bacteriol. 180:5875–84. http://www.ncbi.nlm.nih.gov/pubmed/9811644 (Accessed January 30, 2020).

Degnan PH et al. 2009. Dynamics of genome evolution in facultative symbionts of aphids. Environ. Microbiol. 12:2060–2069. doi: 10.1111/j.1462- 2920.2009.02085.x.

Degnan PH, Lazarus AB, Wernegreen JJ. 2005. Genome sequence of Blochmannia pennsylvanicus indicates parallel evolutionary trends among bacterial mutualists of insects. Genome Res. 15:1023–33. doi: 10.1101/gr.3771305.

Degnan, P., Lazarus, A., Brock, C., Wernegreen, J., 2004. Host-Symbiont Stability and Fast Evolutionary Rates in an Ant-Bacterium Association: Cospeciation of

165

Camponotus Species and Their Endosymbionts, Candidatus Blochmannia. Syst. Biol. 53, 95–110. doi:10.1080/10635150490264842

Do Nascimento DA, Da Silva Mendonça MT, Fernandes JAM. 2017. Description of a new group of species of Edessa (Hemiptera: Pentatomidae: Edessinae). Zootaxa. 4254:136–150. doi: 10.11646/zootaxa.4254.1.10.

Dufresne A, Garczarek L, Partensky F. 2005. Accelerated evolution associated with genome reduction in a free-living prokaryote. Genome Biol. 6:R14. doi: 10.1186/gb-2005-6-2-r14.

Duron O, Noël V. 2016. A wide diversity of Pantoea lineages are engaged in mutualistic symbiosis and cospeciation processes with stinkbugs. Environ. Microbiol. Rep. 8:715–727. doi: 10.1111/1758-2229.12432.

Duzee EP Van. 1904. Annotated List of the Pentatomidæ Recorded from America North of Mexico, with Descriptions of Some New Species. Trans. Am. Entomol. Soc. 30:1–80. doi: 10.2307/25076770.

Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32:1792–7. doi: 10.1093/nar/gkh340.

Eisen JA, Heidelberg JF, White O, Salzberg SL. 2000. Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genome Biol. 1:research0011.1. doi: 10.1186/gb-2000-1-6-research0011.

Espíndola, A., Carstens, B.C., Alvarez, N., 2014. Comparative phylogeography of mutualists and the effect of the host on the genetic structure of its partners. Biol. J. Linn. Soc. 113, 1021–1035. doi:10.1111/bij.12393

Esquivel JF, Medrano EG. 2014. Ingestion of a Marked Bacterial Pathogen of Cotton Conclusively Demonstrates Feeding by First Instar Southern Green Stink Bug (Hemiptera: Pentatomidae). Environ. Entomol. 43:110–115. doi: 10.1603/EN13051.

Favre R, Iaccarino M, Levinthal M. 1974. Complementation between different mutations in the ilvA gene of Escherichia coli K-12. J. Bacteriol. 119:1069–71. http://www.ncbi.nlm.nih.gov/pubmed/4604254 (Accessed January 6, 2020).

Fernandes Marin JA, Juliete Da Silva V, Oliveira Correia A, Mendes Nunes B. 2015. New species of Edessa Fabricius, 1803 (Hemiptera: Pentatomidae) from Costa Rica. Zootaxa. 3999:511–536.

Filippi, L., Baba, N., Inadomi, K., Yanagi, T., Hironaka, M., and Nomakuchi, S. 2009. Pre- and post-hatch trophic egg production in the subsocial burrower bug,

166

Canthophorus niveimarginatus (Heteroptera: Cydnidae). Naturwissenschaften 96, 201–211. doi:10.1007/s00114-008-0463-z.

Finn RD et al. 2014. Pfam: the protein families database. Nucleic Acids Res. 42:D222– D230. doi: 10.1093/nar/gkt1223.

Fukatsu T, Hosokawa T. 2002. Capsule-transmitted gut symbiotic bacterium of the Japanese common plataspid stinkbug, Megacopta punctatissima. Appl. Environ. Microbiol. 68:389–396. doi: 10.1128/AEM.68.1.389-396.2002.

Funk, D.J., Helbling, L., Wernegreen, J.J., Moran, N.A., 2000. Intraspecific phylogenetic congruence among multiple symbiont genomes. Proc. Biol. Sci. 267, 2517–21. doi:10.1098/rspb.2000.1314

Futahashi, R., Tanaka, K., Tanahashi, M., Nikoh, N., Kikuchi, Y., Lee, B. L., and Fukatsu, T. 2013. Gene expression in gut symbiotic organ of stinkbug affected by extracellular bacterial symbiont. PLoS One 8. doi:10.1371/journal.pone.0064557.

Gabelli SB et al. 2007. Structure and Function of the E. coli Dihydroneopterin Triphosphate Pyrophosphatase: A Nudix Enzyme Involved in Folate Biosynthesis. Structure. 15:1014–1022. doi: 10.1016/j.str.2007.06.018.

Galperin MY, Makarova KS, Wolf YI, Koonin E V. 2015. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res. 43:D261–D269. doi: 10.1093/nar/gku1223.

Gariepy, T.D., Bruin, A., Haye, T., Milonas, P., Vétek, G., 2015. Occurrence and genetic diversity of new populations of Halyomorpha halys in Europe. J. Pest Sci. (2004). 88, 451–460. doi:10.1007/s10340-015-0672-0

Gariepy, T.D., Haye, T., Fraser, H., Zhang, J., 2013. Occurrence, genetic diversity, and potential pathways of entry of Halyomorpha halys in newly invaded areas of Canada and Switzerland. J. Pest Sci. (2004). 87, 17–28. doi:10.1007/s10340-013- 0529-3

Gil R et al. 2017. Tremblaya phenacola PPER: an evolutionary beta- gammaproteobacterium collage. ISME J. doi: 10.1038/ismej.2017.144.

Giovannoni SJ et al. 2005. Genome streamlining in a cosmopolitan oceanic bacterium. Science. 309:1242–5. doi: 10.1126/science.1114057.

Giron D et al. 2017. Influence of Microbial Symbionts on Plant–Insect Interactions. Adv. Bot. Res. 81:225–257. doi: 10.1016/BS.ABR.2016.09.007.

Gordon ERL, McFrederick Q, Weirauch C. 2016. Phylogenetic Evidence for Ancient and Persistent Environmental Symbiont Reacquisition in Largidae (Hemiptera: 167

Heteroptera) Stabb, E V., editor. Appl. Environ. Microbiol. 82:7123–7133. doi: 10.1128/AEM.02114-16.

Hafner, M., Sudman, P., Villablanca, F., Spradling, T., Demastes, J., Nadler, S., 1994. Disparate rates of molecular evolution in cospeciating hosts and parasites. Science (80-. ). 265, 1087–1090. doi:10.1126/science.8066445

Haft DH et al. 2018. RefSeq: an update on prokaryotic genome annotation and curation. Nucleic Acids Res. 46:D851–D860. doi: 10.1093/nar/gkx1068.

Hansen AK, Moran NA. 2011. Aphid genome expression reveals host-symbiont cooperation in the production of amino acids. Proc. Natl. Acad. Sci. U. S. A. 108:2849–54. doi: 10.1073/pnas.1013465108.

Harris RS. 2007. Improved pairwise alignment of genomic DNA. PhD Thesis.

Hayashi, T., Hosokawa, T., Meng, X.-Y., Koga, R., and Fukatsu, T. 2015. Female- specific specialization of a posterior end region of the midgut symbiotic organ in Plautia splendens and allied stinkbugs. Appl. Environ. Microbiol. 81, AEM.04057–14. doi:10.1128/AEM.04057-14.

Hedtke SM, Townsend TM, Hillis DM. 2006. Resolution of Phylogenetic Conflict in Large Data Sets by Increased Taxon Sampling Collins, T, editor. Syst. Biol. 55:522–529. doi: 10.1080/10635150600697358.

Heijenoort J v. 2001. Formation of the glycan chains in the synthesis of bacterial peptidoglycan. Glycobiology. 11:25R-36R. doi: 10.1093/glycob/11.3.25R.

Hendry TA et al. 2018. Ongoing Transposon-Mediated Genome Reduction in the Luminous Bacterial Symbionts of Deep-Sea Ceratioid Anglerfishes. MBio. 9:e01033-18. doi: 10.1128/mBio.01033-18.

Hirota B et al. 2017. A Novel, Extremely Elongated, and Endocellular Bacterial Symbiont Supports Cuticle Formation of a Grain Pest Beetle. MBio. 8:e01482-17. doi: 10.1128/mBio.01482-17.

Hoebeke, E.R., Carter, M.E., 2003. Halyomorpha halys (Stål) (Heteroptera: Pentatomidae): a polyphagous plant pest from Asia newly detected in North America. Proc. Entomol. Soc. Washingt. 105, 225–237.

Hosokawa T, Kikuchi Y, Nikoh N, Fukatsu T. 2012. Polyphyly of gut symbionts in stinkbugs of the family cydnidae. Appl. Environ. Microbiol. 78:4758–4761. doi: 10.1128/AEM.00867-12.

168

Hosokawa T, Kikuchi Y, Nikoh N, Shimada M, Fukatsu T. 2006. Strict host-symbiont cospeciation and reductive genome evolution in insect gut bacteria. PLoS Biol. 4:1841–1851. doi: 10.1371/journal.pbio.0040337.

Hosokawa T, Kikuchi Y, Shimada M, Fukatsu T. 2008. Symbiont acquisition alters behaviour of stinkbug nymphs. Biol. Lett. 4:45–48. doi: 10.1098/rsbl.2007.0510.

Hosokawa T, Matsuura Y, Kikuchi Y, Fukatsu T. 2016. Recurrent evolution of gut symbiotic bacteria in pentatomid stinkbugs. Zool. Lett. 2:24. doi: 10.1186/s40851-016-0061-4.

Hosokawa, T., Hironaka, M., Inadomi, K., Mukai, H., Nikoh, N., and Fukatsu, T. 2013. Diverse strategies for vertical symbiont transmission among subsocial stinkbugs. PLoS One 8, 4–11. doi:10.1371/journal.pone.0065081.

Hosokawa, T., Hironaka, M., Mukai, H., Inadomi, K., Suzuki, N., and Fukatsu, T. 2012. Mothers never miss the moment: A fine-tuned mechanism for vertical symbiont transmission in a subsocial insect. Anim. Behav. 83, 293–300. doi:10.1016/j.anbehav.2011.11.006.

Hosokawa, T., Kikuchi, Y., Nikoh, N., Shimada, M., and Fukatsu, T. 2006. Strict host- symbiont cospeciation and reductive genome evolution in insect gut bacteria. PLoS Biol. 4, 1841–1851. doi:10.1371/journal.pbio.0040337.

Hosokawa, T., Kikuchi, Y., Nikon, N., Meng, X. Y., Hironaka, M., and Fukatsu, T. 2010. Phylogenetic position and peculiar genetic traits of a midgut bacterial symbiont of the stinkbug Parastrachia japonensis. Appl. Environ. Microbiol. 76, 4130–4135. doi:10.1128/AEM.00616-10.

Hosokawa, T., Kikuchi, Y., Shimada, M., and Fukatsu, T. 2007. Obligate symbiont involved in pest status of host insect. Proc. Biol. Sci. 274, 1979–1984. doi:10.1098/rspb.2007.0620.

Hosokawa, T., Kikuchi, Y., Xien, Y. M., and Fukatsu, T. 2005. The making of symbiont capsule in the plataspid stinkbug Megacopta punctatissima. FEMS Microbiol. Ecol. 54, 471–477. doi:10.1016/j.femsec.2005.06.002.

Huerta-Cepas J et al. 2016. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 44:D286–D293. doi: 10.1093/nar/gkv1248.

Hugerth LW et al. 2015. Metagenome-assembled genomes uncover a global brackish microbiome. Genome Biol. 16:279. doi: 10.1186/s13059-015-0834-7.

169

Husník F, Chrudimský T, Hypša V. 2011. Multiple origins of endosymbiosis within the Enterobacteriaceae (γ-Proteobacteria): convergence of complex phylogenetic approaches. BMC Biol. 9:87. doi: 10.1186/1741-7007-9-87.

Huson DH, Scornavacca C. 2012. Dendroscope 3: An Interactive Tool for Rooted Phylogenetic Trees and Networks. Syst. Biol. 61:1061–1067. doi: 10.1093/sysbio/sys062.

Hyatt D et al. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 11:119. doi: 10.1186/1471-2105-11-119.

Jansson J, Rajaby R, Shen C, Sung W-K. 2018. Algorithms for the Majority Rule (+) Consensus Tree and the Frequency Difference Consensus Tree. IEEE/ACM Trans. Comput. Biol. Bioinforma. 15:15–26. doi: 10.1109/TCBB.2016.2609923.

Kaiwa N et al. 2010. Primary gut symbiont and secondary, sodalis-allied symbiont of the scutellerid stink bug cantao ocellatus. Appl. Environ. Microbiol. 76:3486–3494. doi: 10.1128/AEM.00421-10.

Kaiwa N et al. 2014. Symbiont-Supplemented Maternal Investment Underpinning Host’s Ecological Adaptation. Curr. Biol. 24:1–6. doi: 10.1016/j.cub.2014.08.065.

Kaiwa, N., Hosokawa, T., Kikuchi, Y., Nikoh, N., Meng, X. Y., Kimura, N., Ito, M., and Fukatsu, T. 2011. Bacterial symbionts of the giant jewel stinkbug Eucorysses grandis (Hemiptera: Scutelleridae). Zool. Sci. 28, 169-174. doi: 10.2108/zsj.28.169

Kaltenpoth, M., Winter, S. a., and Kleinhammer, A. 2009. Localization and transmission route of Coriobacterium glomerans, the endosymbiont of pyrrhocorid bugs. FEMS Microbiol. Ecol. 69, 373–383. doi:10.1111/j.1574-6941.2009.00722.x.

Kanehisa M et al. 2016. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44:D457–D462. doi: 10.1093/nar/gkv1070.

Karamipour N et al. 2016. Gammaproteobacteria as essential primary symbionts in the striped shield bug, Graphosoma Lineatum (Hemiptera: Pentatomidae). Sci. Rep. 6:33168. doi: 10.1038/srep33168.

Kashima, T., Nakamura, T., and Tojo, S. 2006. Uric acid recycling in the shield bug, Parastrachia japonensis (Hemiptera: Parastrachiidae), during diapause. J. Insect Physiol. 52, 816-825. doi: 10.1016/j.jinsphys.2006.05.003.

Kearse M et al. 2012. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 28:1647–1649. doi: 10.1093/bioinformatics/bts199.

170

Kenyon LJ, Meulia T, Sabree ZL. 2015. Habitat Visualization and Genomic Analysis of ‘Candidatus Pantoea carbekii,’ the Primary Symbiont of the Brown Marmorated Stink Bug. Genome Biol. Evol. 7:620–635. doi: 10.1093/gbe/evv006.

Kenyon, L. J., and Sabree, Z. L. 2014. Obligate insect endosymbionts exhibit increased ortholog length variation and loss of large accessory proteins concurrent with genome shrinkage. Genome Biol. Evol. 6, 763–75. doi:10.1093/gbe/evu055.

Kikuchi Y et al. 2009. Host-symbiont co-speciation and reductive genome evolution in gut symbiotic bacteria of acanthosomatid stinkbugs. BMC Biol. 7:2. doi: 10.1186/1741-7007-7-2.

Kikuchi Y, Hosokawa T, Fukatsu T. 2007. Insect-microbe mutualism without vertical transmission: A stinkbug acquires a beneficial gut symbiont from the environment every generation. Appl. Environ. Microbiol. 73:4308–4316. doi: 10.1128/AEM.00067-07.

Kikuchi Y, Hosokawa T, Fukatsu T. 2011. An ancient but promiscuous host-symbiont association between Burkholderia gut symbionts and their heteropteran hosts. ISME J. 5:446–460. doi: 10.1038/ismej.2010.150.

Kikuchi, Y., Hosokawa, T., Nikoh, N., and Fukatsu, T. 2012. Gut symbiotic bacteria in the cabbage bugs Eurydema rugosa and Eurydema dominulus (Heteroptera: Pentatomidae). Appl. Entomol. Zool. 47, 1–8. doi:10.1007/s13355-011-0081-7.

Kikuchi, Y., Hosokawa, T., Nikoh, N., Meng, X.-Y., Kamagata, Y., Fukatsu, T., 2009. Host-symbiont co-speciation and reductive genome evolution in gut symbiotic bacteria of acanthosomatid stinkbugs. BMC Biol. 7, 2. doi:10.1186/1741-7007-7- 2

Kikuchi, Y., Hosokawa, T., Nikoh, N., Meng, X.-Y., Kamagata, Y., and Fukatsu, T. 2009. Host-symbiont co-speciation and reductive genome evolution in gut symbiotic bacteria of acanthosomatid stinkbugs. BMC Biol. 7, 2. doi:10.1186/1741-7007-7-2.

Kim J, Copley SD. 2007. Why Metabolic Enzymes Are Essential or Nonessential for Growth of Escherichia coli K12 on Glucose. Biochemistry. 46:12501–12511. doi: 10.1021/bi7014629.

Kim JK, Park HY, Lee BL. 2016. The symbiotic role of O-antigen of Burkholderia symbiont in association with host Riptortus pedestris. Dev. Comp. Immunol. 60:202–208. doi: 10.1016/J.DCI.2016.02.009.

171

Kobayashi H, Kawasaki K, Takeishi K, Noda H. 2011. Symbiont of the stink bug Plautia stali synthesizes rough-type lipopolysaccharide. Microbiol. Res. 167:48–54. doi: 10.1016/j.micres.2011.03.001.

Koga R, Moran NA. 2014. Swapping symbionts in spittlebugs: evolutionary replacement of a reduced genome symbiont. ISME J. 8:1237–46. doi: 10.1038/ismej.2013.235.

Kolbe DL, Eddy SR. 2011. Fast filtering for RNA homology search. Bioinformatics. 27:3102–3109. doi: 10.1093/bioinformatics/btr545.

Konstantinidis KT, Rosselló-Móra R, Amann R. 2017. Uncultivated microbes in need of their own taxonomy. ISME J. doi: 10.1038/ismej.2017.113.

Kostka M, Uzlikova M, Cepicka I, Flegr J. 2008. SlowFaster, a user-friendly program for slow-fast analysis and its application on phylogeny of Blastocystis. BMC Bioinformatics. 9:341. doi: 10.1186/1471-2105-9-341.

Lagesen K et al. 2007. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35:3100–3108. doi: 10.1093/nar/gkm160.

Lartillot N, Brinkmann H, Philippe H. 2007. Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol. Biol. 7:S4. doi: 10.1186/1471-2148-7-S1-S4.

Laslett D, Canback B. 2004. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res. 32:11–16. doi: 10.1093/nar/gkh152.

Lee, D.-H., Short, B.D., Joseph, S. V, Bergh, J.C., Leskey, T.C., 2013. Review of the biology, ecology, and management of Halyomorpha halys (Hemiptera: Pentatomidae) in China, Japan, and the Republic of Korea. Environ. Entomol. 42, 627–641.

Leigh, J.W., Bryant, D., 2015. popart : full-feature software for haplotype network construction. Methods Ecol. Evol. 6, 1110–1116. doi:10.1111/2041-210X.12410

Leskey, T.C., Hamilton, G.C., Nielsen, A.L., Polk, D.F., Rodriguez-Saona, C., Christopher Bergh, J., Ames Herbert, D., Kuhar, T.P., Pfeiffer, D., Dively, G.P., Hooks, C.R.R., Raupp, M.J., Shrewsbury, P.M., Krawczyk, G., Shearer, P.W., Whalen, J., Koplinka-Loehr, C., Myers, E., Inkley, D., Hoelmer, K. a., Lee, D.H., Wright, S.E., 2012. Pest status of the brown marmorated stink bug, Halyomorpha halys in the USA. Outlooks Pest Manag. 23, 218–226. doi:10.1564/23oct07

Li H et al. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25:2078–2079. doi: 10.1093/bioinformatics/btp352.

172

Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 25:1754–1760. doi: 10.1093/bioinformatics/btp324.

Librado, P., Rozas, J., 2009. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25, 1451–2. doi:10.1093/bioinformatics/btp187

Lo W-S, Huang Y-Y, Kuo C-H. 2016. Winding paths to simplicity: genome evolution in facultative insect symbionts. FEMS Microbiol. Rev. 2:158–61. doi: 10.1093/femsre/fuw028.

Luan J-B et al. 2015. Metabolic Coevolution in the Bacterial Symbiosis of Whiteflies and Related Plant Sap-Feeding Insects. Genome Biol. Evol. 7:2635–2647. doi: 10.1093/gbe/evv170.

Manzano-Marín A, Szabó G, Simon J, Horn M, Latorre A. 2017. Happens in the best of subfamilies: establishment and repeated replacements of co‐ obligate secondary endosymbionts within Lachninae aphids. Environ. Microbiol. 19:393–408. doi: 10.1111/1462-2920.13633.

Matsuura Y, Kikuchi Y, Meng XY, Koga R, Fukatsu T. 2012. Novel clade of alphaproteobacterial endosymbionts associated with stinkbugs and other arthropods. Appl. Environ. Microbiol. 78:4149–4156. doi: 10.1128/AEM.00673- 12.

Matsuura, Y., Hosokawa, T., Serracin, M., Tulgetske, G. M., Miller, T. a., and Fukatsu, T. 2014. Bacterial symbionts of a devastating coffee plant pest, the stinkbug Antestiopsis thunbergii (Hemiptera: Pentatomidae). Appl. Environ. Microbiol. 80, 3769–3775. doi:10.1128/AEM.00554-14.

Matsuura, Y., Kikuchi, Y., Hosokawa, T., Koga, R., Meng, X.-Y., Kamagata, Y., Nikoh, N., and Fukatsu, T. 2012. Evolution of symbiotic organs and endosymbionts in lygaeid stinkbugs. ISME J. 6, 397–409. doi:10.1038/ismej.2011.103.

McCutcheon JP, Boyd BM, Dale C. 2019. The Life of an Insect Endosymbiont from the Cradle to the Grave. doi: 10.1016/j.cub.2019.03.032.

McCutcheon JP, Moran N a. 2011. Extreme genome reduction in symbiotic bacteria. Nat. Rev. Microbiol. 10:13–26. doi: 10.1038/nrmicro2670.

McCutcheon JP, Moran NA, Greenberg EP. 2007. Parallel genomic evolution and metabolic interdependence in an ancient symbiosis. www.pnas.org/cgi/content/full/ (Accessed January 9, 2019).

173

McCutcheon JP, von Dohlen CD. 2011. An Interdependent Metabolic Patchwork in the Nested Symbiosis of Mealybugs. Curr. Biol. 21:1366–1372. doi: 10.1016/J.CUB.2011.06.051.

Medina M, Sachs JL. 2010. Symbiont genomics, our new tangled bank. Genomics. 95:129–137. doi: 10.1016/J.YGENO.2009.12.004.

Meseguer AS et al. 2017. Buchnera has changed flatmate but the repeated replacement of co-obligate symbionts is not associated with the ecological expansions of their aphid hosts. Mol. Ecol. 26:2363–2378. doi: 10.1111/mec.13910.

Mira, A., Ochman, H., and Moran, N. A. 2001. Deletional bias and the evolution of bacterial genomes. Trends Genet. 17, 589–596. doi:10.1016/S0168- 9525(01)02447-7.

Mitchell AM, Srikumar T, Silhavy TJ. 2018. Cyclic Enterobacterial Common Antigen Maintains the Outer Membrane Permeability Barrier of Escherichia coli in a Manner Controlled by YhdP. MBio. 9. doi: 10.1128/mBio.01321-18.

Moran N a, McCutcheon JP, Nakabachi A. 2008. Genomics and evolution of heritable bacterial symbionts. Annu. Rev. Genet. 42:165–190. doi: 10.1146/annurev.genet.41.110306.130119.

Moran N a., Bennett GM. 2014. The Tiniest Tiny Genomes. Annu. Rev. Microbiol. 195– 215. doi: 10.1146/annurev-micro-091213-112901.

Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M. 2007. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 35:W182–W185. doi: 10.1093/nar/gkm321.

Murata T, Tseng W, Guina T, Miller SI, Nikaido H. 2007. PhoPQ-mediated regulation produces a more robust permeability barrier in the outer membrane of Salmonella enterica serovar typhimurium. J. Bacteriol. 189:7213–22. doi: 10.1128/JB.00973- 07.

Nakabachi A et al. 2005. Transcriptome analysis of the aphid bacteriocyte, the symbiotic host cell that harbors an endocellular mutualistic bacterium, Buchnera. Proc. Natl. Acad. Sci. U. S. A. 102:5477–82. doi: 10.1073/pnas.0409034102.

Newton ILG, Bordenstein SR. 2011. Correlations Between Bacterial Ecology and Mobile DNA. Curr. Microbiol. 62:198–208. doi: 10.1007/s00284-010-9693-3.

Nieberding CM, Olivieri I. 2007. Parasites: proxies for host genealogy and ecology? Trends Ecol. Evol. 22:156–165. doi: 10.1016/j.tree.2006.11.012.

174

Nieberding, C.M., Olivieri, I., 2007. Parasites: proxies for host genealogy and ecology? Trends Ecol. Evol. 22, 156–165. doi:10.1016/j.tree.2006.11.012

Nikoh N et al. 2014. Evolutionary origin of insect-Wolbachia nutritional mutualism. Proc. Natl. Acad. Sci. 111:10257–62. doi: 10.1073/pnas.1409284111.

Nikoh N, Hosokawa T, Oshima K, Hattori M, Fukatsu T. 2011. Reductive evolution of bacterial genome in insect gut environment. Genome Biol. Evol. 3:702–714. doi: 10.1093/gbe/evr064.

Nováková E et al. 2013. Reconstructing the phylogeny of aphids (Hemiptera: Aphididae) using DNA of the obligate symbiont Buchnera aphidicola. Mol. Phylogenet. Evol. 68:42–54. doi: 10.1016/j.ympev.2013.03.016.

Ohbayashi T et al. 2015. Insect’s intestinal organ for symbiont sorting. Proc. Natl. Acad. Sci. 201511454. doi: 10.1073/pnas.1511454112.

Ohkuma, M. 2008. Symbioses of flagellates and prokaryotes in the gut of lower termites. Trends Microbiol. 16, 345–352. doi:10.1016/j.tim.2008.04.004.

Oishi S, Moriyama M, Koga R, Fukatsu T. 2019. Morphogenesis and development of midgut symbiotic organ of the stinkbug Plautia stali (Hemiptera: Pentatomidae). Zool. Lett. 5:16. doi: 10.1186/s40851-019-0134-2.

Okude G et al. 2017. Novel bacteriocyte-associated pleomorphic symbiont of the grain pest beetle Rhyzopertha dominica (Coleoptera: Bostrichidae). Zool. Lett. 3:13. doi: 10.1186/s40851-017-0073-8.

Olivier-Espejel S, Sabree ZL, Noge K, Becerra JX. 2011. Gut microbiota in nymph and adults of the giant mesquite bug (Thasus neocalifornicus) (Heteroptera: Coreidae) is dominated by Burkholderia acquired de novo every generation. Environ. Entomol. 40:1102–1110.

Opiyo SO, Pardy RL, Moriyama H, Moriyama EN. 2010. Evolution of the Kdo2-lipid A biosynthesis in bacteria. BMC Evol. Biol. 10:362. doi: 10.1186/1471-2148-10- 362.

Otero-Bravo A, Goffredi S, Sabree ZL. 2018. Cladogenesis and Genomic Streamlining in Extracellular Endosymbionts of Tropical Stink Bugs. Genome Biol. Evol. 10:680–693. doi: 10.1093/gbe/evy033.

Otero-Bravo A, Sabree ZL. 2015. Inside or out? Possible genomic consequences of extracellular transmission of crypt-dwelling stink bug mutualists. Front. Ecol. Evol. 3:1–7. doi: 10.3389/fevo.2015.00064.

175

Otero-Bravo A, Sabree ZL. 2018. Comparing the utility of host and primary endosymbiont loci for predicting global invasive insect genetic structuring and migration patterns. Biol. Control. 116:10–16. doi: 10.1016/j.biocontrol.2017.04.003.

Page AJ et al. 2015. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics. 31:3691–3693. doi: 10.1093/bioinformatics/btv421.

Palmer M, Steenkamp ET, Coetzee MPA, Blom J, Venter SN. 2018. Genome-Based Characterization of Biological Processes That Differentiate Closely Related Bacteria. Front. Microbiol. 9:113. doi: 10.3389/fmicb.2018.00113.

Panizzi AR, Grazia J. 2015. True bugs (Heteroptera) of the neotropics. Springer Netherlands: Dordrecht doi: 10.1007/978-94-017-9861-7.

Panizzi AR, Machado-Neto E. 1992. Development of Nymphs and Feeding Habits of Nymphal and Adult Edessa meditabunda (Heteroptera: Pentatomidae) on Soybean and Sunflower. Ann. Entomol. Soc. Am. 85:477–481. doi: 10.1093/aesa/85.4.477.

Parks SL, Goldman N. 2014. Maximum Likelihood Inference of Small Trees in the Presence of Long Branches. Syst. Biol. 63:798–811. doi: 10.1093/sysbio/syu044.

Petersen TN, Brunak S, von Heijne G, Nielsen H. 2011. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Meth. 8:785–786. http://dx.doi.org/10.1038/nmeth.1701.

Pick KS et al. 2010. Improved Phylogenomic Taxon Sampling Noticeably Affects Nonbilaterian Relationships. Mol. Biol. Evol. 27:1983–1987. doi: 10.1093/molbev/msq089.

Pinto-Carbó M et al. 2016. Evidence of horizontal gene transfer between obligate leaf nodule symbionts. ISME J. 10:2092–2105. doi: 10.1038/ismej.2016.27.

Poe S, Swofford DL. 1999. Taxon sampling revisited. Nature. 398:299–300. doi: 10.1038/18592.

Prado SS, Almeida RPP. 2009. Phylogenetic placement of pentatomid stink bug gut symbionts. Curr. Microbiol. 58:64–69. doi: 10.1007/s00284-008-9267-9.

Prado SS, Rubinoff D, Almeida RPP. 2006. Vertical Transmission of a Pentatomid Caeca-Associated Symbiont. Ann. Entomol. Soc. Am. 99:577–585. doi: 10.1603/0013-8746(2006)99[577:VTOAPC]2.0.CO;2.

Prado, S. S., and Almeida, R. P. P. (2009b). Role of symbiotic gut bacteria in the development of Acrosternum hilare and Murgantia histrionica. Entomol. Exp. Appl. 132, 21–29. doi:10.1111/j.1570-7458.2009.00863.x. 176

Price MN, Dehal PS, Arkin AP. 2010. FastTree 2- Approximately maximum-likelihood trees for large alignments. PLoS One. 5:e9490. doi: 10.1371/journal.pone.0009490.

Pruesse E, Peplies J, Glöckner FO. 2012. SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes. Bioinformatics. 28:1823–1829. doi: 10.1093/bioinformatics/bts252.

Qu X-J, Jin J-J, Chaw S-M, Li D-Z, Yi T-S. 2017. Multiple measures could alleviate long-branch attraction in phylogenomic reconstruction of Cupressoideae (Cupressaceae). Sci. Rep. 7:41005. doi: 10.1038/srep41005.

Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., Peplies, J., Glockner, F.O., 2013. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590– D596. doi:10.1093/nar/gks1219

R Core Team. 2017. R: A Language and Environment for Statistical Computing. https://www.r-project.org.

Ratnasingham S, Hebert PDN. 2007. bold: The Barcode of Life Data System (http://www.barcodinglife.org). Mol. Ecol. Notes. 7:355–364. doi: 10.1111/j.1471-8286.2007.01678.x.

Requena, G. S., Nazareth, T. M., Schwertner, C. F., and Machado, G. 2010. First cases of exclusive paternal care in stink bugs (Hemiptera: Pentatomidae). Zool. (Curitiba) 27, 1018–1021. doi:10.1590/S1984-46702010000600026.

Richter M, Rosselló-Móra R. 2009. Shifting the genomic gold standard for the prokaryotic species definition. Proc. Natl. Acad. Sci. U. S. A. 106:19126–31. doi: 10.1073/pnas.0906412106.

Rio RVM et al. 2012. Insight into the transmission biology and species-specific functional capabilities of tsetse (Diptera: glossinidae) obligate symbiont Wigglesworthia. MBio. 3:e00240-11. doi: 10.1128/mBio.00240-11.

Rispe C, Moran NA. 2000. Accumulation of Deleterious Mutations in Endosymbionts: Muller’s Ratchet with Two Levels of Selection. Am. Nat. 156:425–441. doi: 10.1086/303396.

Rizzo HF, Saini ED. 1987. Aspectos morfologicos y biologicos de Edessa rufomarginata (De Geer)(Hemiptera, Pentatomidae). Rev. Fac. Agron. 8:51–63.

Rolston LH, McDonald FJD, Thomas DBJ. 1980. A Conspecuts of Pentatomini Genera of the Western Hemisphere. Part I (Hemiptera: Pentatomidae). 88:120–132.

177

Rolston LH, McDonald FJD. 1979. Keys and Diagnoses for the Families of Western Hemisphere Pentatomoidea , Subfamilies of Pentatomidae and Tribes of Pentatominae (Hemiptera). J. New York Entomol. Soc. 87:189–207.

Rolston LH, McDonald FJD. 1980. Conspectus of Pentatomini Genera of the Western Hemisphere: Part 2 (Hemiptera: Pentatomidae). J. New York Entomol. Soc. 88:257–272.

Rolston LH, McDonald FJD. 1984. A Conspectus of Pentatomini of the Western Hemisphere. Part 3 (Hemiptera: Pentatomidae). 1Journal New York Entomol. Soc. 92:69–86.

Ronquist F et al. 2012. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61:539–42. doi: 10.1093/sysbio/sys029.

Rota-Stabelli O, Lartillot N, Philippe H, Pisani D. 2013. Serine Codon-Usage Bias in Deep Phylogenomics: Pancrustacean Relationships as a Case Study. Syst. Biol. 62:121–133. doi: 10.1093/sysbio/sys077.

Ruano-Rubio V, Fares MA, Collins T. 2007. Artifactual Phylogenies Caused by Correlated Distribution of Substitution Rates among Sites and Lineages: The Good, the Bad, and the Ugly. Syst. Biol. 56:68–82. doi: 10.1080/10635150601175578.

Sabree ZL, Degnan PH, Moran NA. 2010. Chromosome stability and gene loss in cockroach endosymbionts. Appl. Environ. Microbiol. 76:4076–9. doi: 10.1128/AEM.00291-10.

Sabree ZL, Huang CY, Okusu A, Moran NA, Normark BB. 2013. The nutrient supplying capabilities of Uzinura, an endosymbiont of armoured scale insects. Environ. Microbiol. 15:1988–99. doi: 10.1111/1462-2920.12058.

Sabree ZL, Kambhampati S, Moran NA. 2009. Nitrogen recycling and nutritional provisioning by Blattabacterium, the cockroach endosymbiont. Proc. Natl. Acad. Sci. 106:19521–19526. doi: 10.1073/pnas.0907504106.

Sabree, Z. L., Huang, Y. C., Arakawa, G., Tokuda, G., Lo, N., Watanabe, H., and Moran, N.A. 2012. Genome shrinkage and loss of nutrient-providing potential in the obligate symbiont of the primitive termite Mastotermes darwiniensis. Appl. Environ. Microbiol. 78, 204-210. doi: 10.1128/AEM.06540-11.

Salem H et al. 2014. Vitamin supplementation by gut symbionts ensures metabolic homeostasis in an insect host. Proc. Biol. Sci. 281:20141838. doi: 10.1098/rspb.2014.1838. 178

Salem H et al. 2017. Drastic Genome Reduction in an Herbivore’s Pectinolytic Symbiont. Cell. 1–12. doi: 10.1016/j.cell.2017.10.029.

Salem H, Florez L, Gerardo N, Kaltenpoth M. 2015. An out-of-body experience: the extracellular dimension for the transmission of mutualistic bacteria in insects. Proc. Biol. Sci. 282:20142957. doi: 10.1098/rspb.2014.2957.

Scopel W, Cônsoli FL. 2018. Culturable symbionts associated with the reproductive and digestive tissues of the Neotropical brown stinkbug Euschistus heros. Antonie Van Leeuwenhoek. 1–12. doi: 10.1007/s10482-018-1130-9.

Seemann T. 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 30:2068–2069. doi: 10.1093/bioinformatics/btu153.

Selengut JD et al. 2007. TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes. Nucleic Acids Res. 35:D260–D264. doi: 10.1093/nar/gkl1043.

Sheffield NC, Song H, Cameron SL, Whiting MF. 2009. Nonstationary Evolution and Compositional Heterogeneity in Beetle Mitochondrial Phylogenomics. Syst. Biol. 58:381–394. doi: 10.1093/sysbio/syp037.

Siddall ME, Whiting MF. 1999. Long-Branch Abstractions. Cladistics. 15:9–24. http://www.idealibrary.com (Accessed June 5, 2018).

Silva DP, Oliveira PS. 2010. Field Biology of Edessa rufomarginata (Hemiptera: Pentatomidae): Phenology, Behavior, and Patterns of Host Plant Use. Environ. Entomol. 39:1903–1910. doi: 10.1603/EN10129.

Skidmore IH, Hansen AK. 2017. The evolutionary development of plant-feeding insects and their nutritional endosymbionts. Insect Sci. 24:910–928. doi: 10.1111/1744- 7917.12463.

Sloan DB, Moran NA. 2013. The evolution of genomic instability in the obligate endosymbionts of whiteflies. Genome Biol. Evol. 5:783–93. doi: 10.1093/gbe/evt044.

Smith JM, Smolin DE, Umbarger HE. 1976. Polarity and the regulation of the ilv gene cluster in Escherichia coli strain K-12. MGG Mol. Gen. Genet. 148:111–124. doi: 10.1007/BF00268374.

Song F et al. 2016. Capturing the Phylogeny of Holometabola with Mitochondrial Genome Data and Bayesian Site-Heterogeneous Mixture Models. Genome Biol. Evol. 8:1411–1426. doi: 10.1093/gbe/evw086.

179

Stackebrandt E et al. 2002. Report of the ad hoc committee for the re-evaluation of the species definition in bacteriology. Int. J. Syst. Evol. Microbiol. 52:1043–1047. doi: 10.1099/00207713-52-3-1043.

Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 30:1312–1313. doi: 10.1093/bioinformatics/btu033.

Steeghs L et al. 1998. Meningitis bacterium is viable without endotoxin. Nature. 392:449–449. doi: 10.1038/33046.

Sudakaran S, Kost C, Kaltenpoth M. 2017. Symbiont Acquisition and Replacement as a Source of Ecological Innovation. Trends Microbiol. doi: 10.1016/j.tim.2017.02.014.

Tada A et al. 2011. Obligate association with gut bacterial symbiont in Japanese populations of the southern green stinkbug Nezara viridula (Heteroptera: Pentatomidae). Appl. Entomol. Zool. 46:483–488. doi: 10.1007/s13355-011- 0066-6.

Tajima, F., 1993. Simple methods for testing the molecular evolutionary clock hypothesis. Genetics 135, 599–607.

Tamames J et al. 2007. The frontier between cell and organelle: genome analysis of Candidatus Carsonella ruddii. BMC Evol. Biol. 7:181. doi: 10.1186/1471-2148-7- 181.

Tamas I et al. 2002. 50 Million Years of Genomic Stasis in Endosymbiotic Bacteria. Science (80-. ). 296. http://science.sciencemag.org/content/296/5577/2376.full (Accessed August 3, 2017).

Tatusova T et al. 2016. NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res. 44:6614–6624. doi: 10.1093/nar/gkw569.

Taylor C, Johnson V, Dively G. 2017. Assessing the use of antimicrobials to sterilize brown marmorated stink bug egg masses and prevent symbiont acquisition. J. Pest Sci. (2004). 90:1287–1294. doi: 10.1007/s10340-016-0814-z.

Taylor CM, Coffey PL, DeLay BD, Dively GP. 2014. The importance of gut symbionts in the development of the brown marmorated stink bug, Halyomorpha halys (Stal). PLoS One. 9. doi: 10.1371/journal.pone.0090312.

Wang L, Wang J, Jing C. 2017. Comparative Genomic Analysis Reveals Organization, Function and Evolution of ars Genes in Pantoea spp. Front. Microbiol. 8:471. doi: 10.3389/fmicb.2017.00471.

180

Wang X, Quinn PJ. 2010. Lipopolysaccharide: Biosynthetic pathway and structure modification. Prog. Lipid Res. 49:97–107. doi: 10.1016/J.PLIPRES.2009.06.002.

Watanabe, K., Yukuhiro, F., Matsuura, Y., Fukatsu, T., and Noda, H. 2014. Intrasperm vertical symbiont transmission. Proc. Natl. Acad. Sci. U. S. A. 111, 7433–7. doi:10.1073/pnas.1402476111.

Waterhouse RM et al. 2018. BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics. Mol. Biol. Evol. 35:543–548. doi: 10.1093/molbev/msx319.

Wernegreen JJ. 2015. Endosymbiont evolution: predictions from theory and surprises from genomes. Ann. N. Y. Acad. Sci. 1360:16–35. doi: 10.1111/nyas.12740.

Wernegreen, J. J. 2002. Genome evolution in bacterial endosymbionts of insects. Nat. Rev. Genet. 3, 850–861. doi:10.1038/nrg931.

Wick RR, Judd LM, Gorrie CL, Holt KE. 2017. Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads Phillippy, AM, editor. PLOS Comput. Biol. 13:e1005595. doi: 10.1371/journal.pcbi.1005595.

Wick RR, Schultz MB, Zobel J, Holt KE. 2015. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics. 31:3350–3352. doi: 10.1093/bioinformatics/btv383.

Wilson ACC et al. 2010. Genomic insight into the amino acid relations of the pea aphid, , with its symbiotic bacterium Buchnera aphidicola. Insect Mol. Biol. 19:249–258. doi: 10.1111/j.1365-2583.2009.00942.x.

Wilson ACC, Duncan RP. 2015. Signatures of host/symbiont genome coevolution in insect nutritional endosymbioses. Proc. Natl. Acad. Sci. 112:201423305. doi: 10.1073/pnas.1423305112.

Xu, J., Fonseca, D.M., Hamilton, G.C., Hoelmer, K. a., Nielsen, A.L., 2014. Tracing the origin of US brown marmorated stink bugs, Halyomorpha halys. Biol. Invasions 16, 153–166. doi:10.1007/s10530-013-0510-3

Yang Z. 2007. PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol. Biol. Evol. 24:1586–1591. doi: 10.1093/molbev/msm088.

Yuan M-L, Zhang Q-L, Guo Z-L, Wang J, Shen Y-Y. 2015. Comparative mitogenomic analysis of the superfamily Pentatomoidea (Insecta: Hemiptera: Heteroptera) and phylogenetic implications. BMC Genomics. 16:460. doi: 10.1186/s12864-015- 1679-x.

181

Zhang G, Meredith TC, Kahne D. 2013. On the essentiality of lipopolysaccharide to Gram-negative bacteria. Curr. Opin. Microbiol. 16:779–85. doi: 10.1016/j.mib.2013.09.007.

Zhu, G.-P., Ye, Z., Du, J., Zhang, D.-L., Zhen, Y.-H., Zheng, C.-G., Zhao, L., Li, M., Bu, W.-J., 2016. Range wide molecular data and niche modeling revealed the Pleistocene history of a global invader (Halyomorpha halys). Sci. Rep. 6, 23192. doi:10.1038/srep23192

182

Appendix A: Additional Tables

Table A.1. BOLD ID matches for the COX1 gene of sequenced stink bugs ACC ID BOLD ID Note MN783643 Arvelius - albopunctatus MN783644 Brachystethus Brachystethus rubromaculatus rubromaculatus MN783645 Edessa bella/sp Edessa nov 1 bugabensis MN783646 Edessa eburatula Edessa Other similar sequences found eburatula MN783647 Edessa F. - Closest match to Edessa jugata (98.3%) MN783648 Edessa loxdalii -

MN783649 Edessa oxcarti - MN783650 Edessa sp. CR2 - Closest match to Edessa jugata (98.15%) MN783651 Euschistus servus Euschistus Other similar sequences found servus MN783652 Mormidea sp. Mormidea sp. Match to M. collaris (99.07%) and M. ypsilon (98.92%) MN783653 Murgantia Murgantia histrionica histrionica MN783654 Nezara viridula Nezara viridula

MN783655 Pentatomidae sp. -

MN783656 Sibaria Sibaria englemanni englemanni MN783657 Taurocerus Taurocerus edessoides edessoides

183

Table A.2. Accession numbers used for core and pangenome analysis Bacteria ACC Arvelius_albopunctatus_symbiont SZZU00000000 Brachystethus_symbiont VOQV00000000 Edessa_F_symbiont VOQW00000000 Edessa_oxcarti_symbiont SZZT00000000 Euschistus_servus_symbiont SZZY00000000 Mormidea_symbiont VOQX00000000 Murgantia_histrionica_symbiont SZZX00000000 Nezara_viridula_symbiont SZZW00000000 Plautia stali symbiont A BBNZ00000000 Plautia stali symbiont B BBOA00000000 Plautia stali symbiont C BBOB00000000 Plautia stali symbiont D BBOC00000000 Plautia stali symbiont E BBOD00000000 Plautia stali symbiont F BBOE00000000 Pantoea agglomerans GCA_001709315.1 Pantoea ananatis GCA_000025405.2 Candidatus Pantoea carbekii GCA_000971765.1 Pentatomidae_sp_symbiont VOQY00000000 Pantoea rwandensis GCA_000759475.1 Pantoea stewartii GCA_002082215.1 Sibaria_englemani_symbiont SZZV00000000 Edessa eburatula symbiont PDKT00000000 Edessa loxdalii symbiont PDKU00000000 Edessa sp nov 1 symbiont PDKR00000000 Edessa sp. 2 symbiont PDKS00000000 Taurocerus_edessoides SZZZ00000000

184

Table A.3. SINA best match of 16S gene sequence against the SILVA database. Species SILVA identity Order Family Genus score Arvelius albopunctatus 0.937867 94.3005 Enterobacterales Erwiniaceae symbiont Brachystethus 0.926469 93.391 Enterobacterales rubromaculatus symbiont Edessa_F symbiont 0.910574 92.5432 Enterobacterales

Edessa oxcarti 0.918434 91.9216 Enterobacterales symbiont Euschistus servus 0.995991 99.6601 Enterobacterales Erwiniaceae Pantoea symbiont Mormidea sp symbiont 0.995271 100 Enterobacterales Erwiniaceae Pantoea Murgantia histrionica 0.946696 93.8472 Enterobacterales Erwiniaceae Pantoea symbiont Nezara viridula 0.999383 100 Enterobacterales Erwiniaceae Pantoea symbiont Pentatomidae sp 0.994315 99.6288 Enterobacterales Erwiniaceae Pantoea symbiont Sibaria englemani 0.999432 99.3662 Enterobacterales Erwiniaceae Pantoea symbiont Taurocerus sp 0.999421 98.9699 Enterobacterales Erwiniaceae Pantoea symbiont

185