California State University, San Bernardino CSUSB ScholarWorks

Electronic Theses, Projects, and Dissertations Office of aduateGr Studies

8-2021

INVESTIGATION INTO THE EVOLUTION OF HETEROGENEITY WITHIN SECONDARY REPLICONS AND THEIR MAINTENANCE IN GENUS

Christopher Ne Ville California State University - San Bernardino

Follow this and additional works at: https://scholarworks.lib.csusb.edu/etd

Part of the Bioinformatics Commons, Biology Commons, Biotechnology Commons, and the Evolution Commons

Recommended Citation Ne Ville, Christopher, "INVESTIGATION INTO THE EVOLUTION OF HETEROGENEITY WITHIN SECONDARY REPLICONS AND THEIR MAINTENANCE IN GENUS VARIOVORAX" (2021). Electronic Theses, Projects, and Dissertations. 1309. https://scholarworks.lib.csusb.edu/etd/1309

This Thesis is brought to you for free and open access by the Office of aduateGr Studies at CSUSB ScholarWorks. It has been accepted for inclusion in Electronic Theses, Projects, and Dissertations by an authorized administrator of CSUSB ScholarWorks. For more information, please contact [email protected]. INVESTIGATION INTO THE EVOLUTION OF HETEROGENEITY WITHIN

SECONDARY REPLICONS AND THEIR MAINTENANCE IN GENUS

VARIOVORAX

A Thesis

Presented to the

Faculty of

California State University,

San Bernardino

In Partial Fulfillment

of the Requirements for the Degree

Master of Science

in

Biology

by

Christopher Ne Ville

August 2021 INVESTIGATION INTO THE EVOLUTION OF HETEROGENEITY WITHIN

SECONDARY REPLICONS AND THEIR MAINTENANCE IN GENUS

VARIOVORAX

A Thesis

Presented to the

Faculty of

California State University,

San Bernardino

by

Christopher Ne Ville

August 2021

Approved by:

Dr. Paul Orwin, Committee Chair, Biology

Dr. Jeremy Dodsworth, Committee Member

Dr. Daniel Nickerson, Committee Member © 2021 Christopher Ne Ville

ABSTRACT

Approximately 10% of all bacterial genomes sequenced thus far contain a secondary replicon. This considerable genetic reservoir contains many potentially mobilizable elements, allowing for the formation of many unique secondary replicons. This property of bacterial populations vastly increases the genomic diversity available to species that effectively take up and maintain these replicons. Members of the genus Variovorax have extensive heterogeneity in genome architecture, including sequenced isolates containing plasmids, megaplasmids, and chromids. Using available Illumina data on the NCBI database, we have completed these assemblies using 3rd generation sequencing methods on 17 members of this genus. We have sequenced, assembled, and evaluated these now complete Variovorax genomes to examine the diversity of elements. From these assemblies 9 chromids, 7 megaplasmids, 2 plasmids, and an integrated megaplasmid were identified using genomic frequency characteristics. We observed in the genomic characteristics data that there is a pattern of evolution where a plasmid is picked up by the organism and over evolutionary time, there is expansion of the replicon in size through acquisition of mobile elements and interreplicon transfer. As this process goes on, the secondary replicons’ genomic frequency characteristics regress toward the chromosome. Given the evolutionary pattern seen, we were interested to find out if the high levels of heterogeneity present in Variovorax are due to factors controlling secondary replicon maintenance present within the primary

iii

chromosome. Using two strains of strains, VAI-C

(multipartite genome) and EPS (single chromosome) and two plasmids pRU1105 and pBBR5pemIKpBAD (which contains a Toxin-Antitoxin system), were used to evaluate the maintenance of plasmids in the absence of selection. Surprisingly, the Toxin-Antitoxin system had a deleterious effect and showed no additional phenotype in either organism.

iv

ACKNOWLEDGEMENTS

We would like to thank Cal State University, San Bernardino’s Office of

Student Research and the office of Associated Students Incorporated their funding of this project. We also thank the Vital and Expanded Technologies

Initiative at CSU San Bernardino for funding of classroom-based sequencing.

Without their support much of the sequencing done in this project would not have been possible. We would also like to thank our wonderful collaborators Dr. Jeff

Dangl at UNC-Chapel Hill, Dr. Jared Ledbetter at Cal Tech, Dr. Kostas

Konstantinidis at Georgia Institute of Technology and the Noble research group for providing the Variovorax strains and Illumina data used in this project. Starting out on this project we were a bit unsure of ourselves if we were actually going to be able to make this project work, without your support early on it would have never got off the ground. We thank Jose de la Torre for generously providing the

MiSeq short read sequencing of V. paradoxus CSUSB.

I personally would like to thank Dr. Paul M. Orwin, for taking me under his wing as an undergraduate and through my graduate studies. Without his unwavering support and dedication, I would not be here today. I spend much of my first year in his office (I am fairly certain he thought I was a pain in his rear) because I felt so overwhelmed, and behind in my studies. It is through his guidance and relentless challenging of me that I was able to catch up in my studies and graduate with both my degrees. I personally would also like to thank

Dr. Dan P. Nickerson, that first year in molecular biology I was a slacker full of

v

self-doubt that had failed too many times. It was your early belief and assistance that allowed me to finally start clawing my way back into my scholastic studies.

Thank you for guidance throughout my experience at CSUSB. Lastly, I would like to thank Dr. Jeremy Dodsworth it was his passion for genomics in my final year of my undergraduate studies which helped alter the course of my studies. It was from those two years I got to TA with you that I learned a lot about scientific education and working with students. Not to mention our many random hallway chats that helped me develop much of the DNA analysis in my thesis turning it into what it became. I would like to thank all 3 of these professors without their ceaseless dedication I was able to grow as a scientist, a scholar and grow as a person.

I would like to thank my partner Monique Quinn for her amazing support throughout my graduate degree, there were many nights where I was tired and on the verge of a total mental collapse. Without your support during this time, I do not know what I would have done or what would have happened. It was your support during the CoViD-19 crisis that helped me get through it and focus on my thesis rather than the world falling apart around us. I also thank my cat Rex for being the Fluffiest lambkin kitty and helping to relieve stress with the going got rough. Finally, I would like to thank my family, it was because of your support that

I was able to make this far. To my grandparents thank you for your moral support and the many hours you spent listening to me, as well as your financial support during my education it is due to your assistance, I was able to afford to go to

vi

university. To my loving parents I would like to thank you for their moral encouragement, financial assistance as well their spiritual support in my college education. Without whom I would have never been able to enjoy the many opportunities I have been given. It is unfortunate that my father passed in my 2nd year of my master’s thesis and never got to see me finish but I am certain he would be proud. Thank you to my mother, I know that last couple years have been rough, but you have always supported and been there for me. You have always been there for me and there are no words to thank you for all that you have done for me.

vii

TABLE OF CONTENTS ABSTRACT ...... iii

ACKNOWLEDGEMENTS ...... v

LIST OF TABLES ...... xi

LIST OF FIGURES ...... xii

CHAPTER ONE: INTRODUCTION ...... 1

Types of Replicons ...... 2

How Secondary Replicons Arise ...... 3

Genomic Signatures and What They Tell Us ...... 11

Mobilization of Large Secondary Replicons and Purpose ...... 16

Toxin Antitoxin Systems ...... 21

Project Focus ...... 23

Thesis Statement ...... 30

Specific Aims ...... 32

Aim 1 ...... 32

Aim 2...... 33

CHAPTER TWO: MATERIALS AND METHODS ...... 35

Specific Aim 1 ...... 35

Genomic DNA Isolation ...... 37

Library Preparation for Illumina Sequencing ...... 39

Sample Quantification and Illumina, iSeq 100 Library Preparation 40

Library Preparation for Oxford Nanopore Sequencing ...... 40

MinION Sequencing ...... 40

Bioinformatics Pipeline ...... 41

viii

Data Availability ...... 43

Core Gene Identification ...... 43

Dotplot ...... 44

Secondary Replicon Identification: ...... 44

Percent G+C Content ...... 44

Size of Secondary Replicons ...... 45

Dinucleotide Relative Abundance ...... 45

Synonymous Codon Usage Order (SCUO) ...... 46

ANI ...... 46

Marker Based Phylogenetic tree ...... 46

Specific Aim 2 ...... 47

Plasmid Verification ...... 49

Experimental Evolution Passaging ...... 50

OD546 Reading ...... 52

Plasmid Retention ...... 53

CHAPTER THREE: RESULTS AND DISCUSSION ...... 54

Aim 1...... 54

Replicon Identification ...... 54

Identification of Integrated Megaplasmid ...... 56

Percent G+C Content ...... 58

Dinucleotide Relative Abundance Distance ...... 60

Synonymous Codon Usage Order ...... 62

Secondary Replicon Sizes ...... 63

Average Nucleotide Idenity ...... 65

ix

Phylogenetics ...... 68

Aim 2...... 68

Growth over 16 days ...... 70

Plasmid Retention over Time ...... 72

CHAPTER FOUR: CONCLUSION ...... 73

REFERENCES ...... 74

x

LIST OF TABLES

Table 1. Variovorax Sequencing Project ...... 35

Table 2. Assembled Genomes and Core Genes ...... 55

xi

LIST OF FIGURES

Figure 1. Plasmid Invasion Theory ...... 6

Figure 2. Comparison of G+C Content in Secondary Replicons in Domain ...... 13

Figure 3. Phylogenetic Distribution of Secondary Replicons in Domain Bacteria...... 23

Figure 4. Phylogenetic Dispersion of Chromids and Megaplasmids in Order ...... 25

Figure 5. 16s Phylogenetic Tree of Variovorax ...... 28

Figure 6. Oxford Nanopore Sequencing Run...... 41

Figure 7. Experimental Evolution Plasmid Verification ...... 50

Figure 8. Overview of Experimental Evolution Project ...... 52

Figure 9. NFACC27 Megaplasmid Integration Event ...... 56

Figure 10. Percent G+C Differences between Replicons ...... 58

Figure 11. Dinucleotide Relative Abundance Distance between Replicons ...... 60

Figure 12. Synonymous Codon Usage Order between Replicons...... 62

Figure 13. Length of Secondary Replicon Comparison ...... 63

Figure 14. ANI Cluster Matrix ...... 65

Figure 15. ANI Cluster Map ...... 66

Figure 16. Marker Based Phylogenetic Tree Built using Maximum Likelihood ... 68

Figure 17. Optical Density 546nm over a Period of 16 days ...... 70

Figure 18. Percentage of Population Retaining the Plasmids over 16 days ...... 72

xii

CHAPTER ONE

INTRODUCTION

John Cairns published the first DNA autoradiography from Escherichia coli in 1963 showing that the bacteria genome consisted of a single circular chromosome (Cairns 1963; diCenzo and Finan 2017). Studies in Bacillus subtilis

(Wake 1973; diCenzo and Finan 2017) and Mycoplasma hominis (9,15) led to the general view of bacteria having a single primary chromosome, and sometimes a smaller, nonessential, circular plasmid. A linear plasmid was first identified in 1979 in Streptomyces sp. (Hayakawa et al. 1979; diCenzo and Finan

2017), followed by the identification of a linear chromosome in 1989 in Borrelia burgdorferi (Baril et al. 1989; Ferdows and Barbour 1989; diCenzo and Finan

2017). This in fact demonstrated that bacterial chromosomes do not need to be circular. The symbiosis megaplasmid in Sinorhizobium meliloti was identified in

1981 as an important contributor to nodulation and nitrogen fixation (diCenzo and

Finan 2017). The discovery of these elements changed the perception that the entire genome is contained within the chromosome (diCenzo and Finan 2017).

An even larger secondary replicon, containing essential genes, was discovered in Rhodobacter sphaeroides in 1989, and this was termed a secondary chromosome because it demonstrated that essential cell functions can be encoded by multiple replicons in prokaryotes (Suwanto and Kaplan 1989; diCenzo and Finan 2017). In 2010, due in part to the large amount of bacterial sequence data the term Chromid was introduced into nomenclature. The

1

Chromid is a replicon that carries core genes and likely has been co evolving with the Chromosome for a lengthy amount of evolutionary time (Harrison et al.

2010).

Types of Replicons

A single, usually circular, chromosome is the primary replicon in bacteria and is always the largest replicon containing the majority of core/essential genes

(diCenzo and Finan 2017). About 10% of all bacteria for which data is available have taken up a piece of DNA that is maintained as a secondary replicon

(diCenzo and Finan 2017). There are four established types of secondary replicons: plasmids, megaplasmids, chromids, and secondary chromosomes. A plasmid is a small secondary replicon defined functionally to carry no core genes and exist as a mobile genetic element. Plasmids are nonessential and thus can be lost without a serious impact to cell viability in most circumstances (diCenzo and Finan 2017). A megaplasmid is a secondary replicon that is currently defined by its size, although there is no established boundary for the size cutoff, they are generally larger than >350kb-460k kb, with the same functionality as the plasmid.

The term megaplasmid was coined to reflect the importance of genes contained within the replicon and the large size when compared to previously identified plasmids. The new plasmid nomenclature refers to large secondary replicons, with a molecular mass of <300x 10^6 Da (~460kb), as megaplasmids.

(Rosenberg et al. 1981; diCenzo and Finan 2017). A chromid is a relatively new

2

subdivision defined by its functional role to carry at least one core gene which is essential for cell viability under most circumstances. The loss of this replicon would result in cell death (Harrison et al. 2010; diCenzo and Finan 2017). The term chromid is a combination of chromosome and a plasmid referring to how a chromid replication is an intermediate between a plasmid and chromosome(diCenzo and Finan 2017). A secondary chromosome shares the same functional role as a chromid but is defined by its unique evolutionary origin, as the result of a fracture of the primary chromosome into two independently maintained replicons (diCenzo and Finan 2017). Historically the term secondary chromosome has been used to describe what is now called a chromid (Harrison et al. 2010), but in this proposal I will use the two separate terms rigorously to distinguish these different types of replicons.

How Secondary Replicons Arise

The presence of secondary replicons in 10% of currently sequenced organisms raises the question of what factors influence the distribution of these elements. Do these bacteria have within their primary chromosome genes or mechanisms aiding in the uptake and maintenance of a secondary replicons, or do secondary replicons contain genes that drive their distribution either through phenotypic effects or “selfish gene” mechanisms? Both mechanisms may contribute to the distribution, and we must also consider the null hypothesis in

3

which these secondary elements are distributed randomly in populations, and the patterns we see are simple inheritance without selection.

Plasmids are typically acquired by a given host species through horizontal gene transfer (HGT), either by conjugation or natural competence. Conjugation is a mechanism that requires direct cell to cell contact and can only be seen as direct transfer of genetic information (Mora et al. 2020 Dec 27). Natural competence is the direct uptake of DNA from the environment, which could have implications for nutrition as well as genetics (Finkel and Kolter 2001). Natural competence has recently been shown to occur widely in bacteria, under the right conditions (Hülter et al. 2017). Evolutionary considerations around plasmid uptake focus on the phenotypic effect of gained functions which may alter the host’s fitness, or the selfish gene idea which focuses on plasmid-borne gene fitness. Plasmid encoded genes often include important accessory factors such as antimicrobial resistance, heavy metal resistance which may be required for growth in some environments, toxins, or genes required for expansion to a niche

(diCenzo and Finan 2017). Many of these mobile elements contain self- promoting features aiding in the ability for the host to survive or adapt to its environment. By facilitating its host’s ability to survive, the host will be able to replicate and pass on the plasmid to its daughter cells. The secondary replicon’s accessory genes are likely a reservoir of functionality that can be transferred to form new and unique combinations, thus creating replicons that are varied and diverse (Hülter et al. 2017). Others can grasp selfish elements holding the cell

4

hostage through Plasmid Addiction Systems (PAS). Plasmid addiction systems can force its host to keep a plasmid through genes like hok/sok (host killing/suppressor of host killing), and other Toxin/Antitoxin systems (TA system) consisting of a long-lived toxin and a short-lived antitoxin (Hülter et al. 2017).

When an organism has plasmid with a TA system, when it goes to divide if a daughter cell does not inherit the TA system it is likely to die. Due to the presence of the stable long live toxin proteins it likely inherited without a means to detoxify it. This is an example of a selection pressure which may have helped selected for defense mechanisms against mobile element/plasmid invasion such as restriction enzymes or the CRISPR-cas system (Hülter et al. 2017).

5

Figure 1. Plasmid Invasion Theory

This figure depicts plasmid invasion theory (image from N. Hülter et al. 2017)

Figure 1 depicts a plausible chain of events when a plasmid invades a host or is picked up through HGT, resulting in persistence of a plasmid that can lead to the formation of a megaplasmid/chromid. The plasmid and the host will go through a series of several adaptations in response to the interaction. The first stage is called invasion where the plasmid is picked up/invades the host cells

6

and is thought to occur via HGT (Hülter et al. 2017). The second stage is when genome evolution begins after gene transfer ensues between the host chromosome and the newly acquired plasmid (Hülter et al. 2017). This can include SNPs, deletions, insertions, rearrangements, or the integration of new genetic information from other mobile sources such as other plasmids, phages, and transposons (Hülter et al. 2017). Gene transfer between the host chromosome and plasmid can increase or decrease the plasmid’s significance for its host based on the nature of the genes that are mobilized and the direction of transfer (Hülter et al. 2017). The third stage is expansion of the host range or the gain of new phenotypes from the genes acquired from the plasmid (Hülter et al. 2017). The fourth stage is persistence, the ability of the plasmid to persist in the bacterial host population by carrying genes allowing the host to thrive or the cost of losing the plasmid being too high (Hülter et al. 2017). Given enough time, the final transition stage can occur based on the transfer of essential genes from the host chromosome to the plasmid, leading to a transition from a plasmid/megaplasmid to a chromid (Hülter et al. 2017).

The difference between a megaplasmid and plasmid is based on size, so conversion from a plasmid to a megaplasmid occurs by expansion of the plasmid. Expansion of a plasmid can occur via the accumulation of transposon insertions, mobile element uptake, interreplicon gene transfers, or duplication events (diCenzo and Finan 2017; Hülter et al. 2017). The transition from megaplasmid to chromid is based on a functional definition and requires two

7

major changes. The first is the acquisition of core-essential genes followed by the regression of the mean megaplasmid genomic signature to that of the chromosome (diCenzo and Finan 2017). These genomic signatures are frequency characteristics that include G+C content, dinucleotide relative abundance, and codon usage. Two possible mechanisms for this process are interreplicon transfer of essential genes from the chromosome to the secondary replicon (diCenzo and Finan 2017), and redundant core genes on the chromosome and the chromid which could result from interreplicon duplication of a chromosomal gene or acquisition of an ortholog via HGT (diCenzo and Finan

2017). If the chromosomal copy became a deleterious or nonfunctional mutant, the copy on the chromid would be the only effective copy of the gene, thus transferring a core gene function to the chromid (diCenzo and Finan 2017).

Chromids are usually larger than plasmids/megaplasmids but smaller than the chromosome and carry many genes with a restricted or sporadic phylogenetic distribution (Harrison et al. 2010). Chromids on average are twice as large as megaplasmids (diCenzo and Finan 2017), so the evolution of a megaplasmid to a chromid likely involves the accumulation of a significant number of genes (diCenzo and Finan 2017). It has been thought that most genes acquired from HGT are lost as the cost outweighs the benefits (diCenzo and

Finan 2017). The high variability of megaplasmids supports this, with most of the genes acquired by megaplasmids being lost as they provide little to no benefit and/or the cost of carrying the gene are too high (diCenzo and Finan 2017).

8

Genes that are acquired from HGT will undergo rapid evolution (diCenzo and

Finan 2017) which can include the modification of genes and regulatory elements though amelioration of codon usage, and promoter modifications to integrate the new genes into existing transcriptional networks (diCenzo and Finan 2017). The gain of essential genes from the chromosome via interreplicon transfer contributes to increased stability of the secondary replicon by reducing gene loss and reducing the likelihood of loss of the entire replicon (diCenzo and Finan

2017). Thus, interreplicon gene transfer aids in the stability of the secondary replicon which in turn aids in the incorporation of the secondary replicon into the cell’s genome (diCenzo and Finan 2017).

Chromids and secondary chromosomes are thought to arise by two possible mechanisms, the schism hypothesis and the plasmid hypothesis (Egan et al. 2005; Chain et al. 2006; Prozorov 2008; Fricke et al. 2009; Choudhary et al.

2012; diCenzo and Finan 2017). The schism hypothesis explains how a secondary chromosome could arise, where an organism having a single chromosome splits into two self-replicating chromosomes. Currently there is little evidence for this occurring in nature except for extremely rare circumstances

(diCenzo and Finan 2017). The resulting chromosomes are expected to have similar coding densities, gene functional biases (COG analyses), evolutionary rates, gene content, and size as they are both derived from a single parent chromosome (Nereng and Kaplan 1999; Mackenzie et al. 2001; Choudhary et al.

2007; Kontur et al. 2012; diCenzo and Finan 2017). The plasmid hypothesis

9

explains how a chromid would arise evolving from a megaplasmid or plasmid origin. Long term co-evolution of the chromosome and chromid is predicted to regress the genomic signatures of the chromid to that of the chromosome. This theory differs from the schism hypothesis in several ways. If the secondary replicon originated from a plasmid, there would be evidence such as the origin of replication (Ori), differences in replication and partitioning machinery such as the repABC operon, the presence of relaxase genes, and likely conjugation machinery (Harrison et al. 2010). Furthermore, if the secondary replicon arose from a megaplasmid, the expectation would be to see few core-essential genes.

The ones that are present, would likely have evidence of interreplicon transfer such as flanking insertion sequences (Harrison et al. 2010). The replication systems of chromids should be similar to plasmids, but likely with additional regulatory controls that integrate their replication into the cell cycle (Agnoli et al.

2012; diCenzo and Finan 2017; Orlova et al. 2017; Ramachandran et al. 2017).

Often, chromids carry plasmid-type maintenance and replications systems while also carrying core genes that are found on the chromosome in other species

(diCenzo and Finan 2017). As mentioned previously, by carrying at least one core gene vital to the survival of the cell, the chromid can create a selective pressure against loss of the replicon.

10

Genomic Signatures and What They Tell Us

Genomic signatures are frequency characteristics of the organism’s DNA, and phylogenetically related organisms often carry similar genomic signatures

(Harrison et al. 2010). Frequency characteristics are measurements of fluctuations in properties of the DNA such as G+C content, dinucleotide relative abundance, and codon usage. In eukaryotic nuclear genomes, these properties do not differ much from chromosome to chromosome or to the degree in which homologs of genes are present on the same chromosome in closely related species (Harrison et al. 2010). In bacterial genomes there are consistently large degrees of variation in these signatures between the secondary replicons and primary chromosomes (Harrison et al. 2010).

Codon bias is the ratio of synonymous codon usage within a genome

(diCenzo and Finan 2017) and is correlated with the level of gene expression.

The translation of multiple synonymous codons by a single tRNA occurs by wobble base pairing at the third position (Crick 1966; Söll et al. 1966; Quax et al.

2015). A more highly expressed gene is more likely to have a codon usage reflective of the relative tRNA abundance in the organism (Quax et al. 2015).

Studies of tRNA content from all domains of life show that a full set of tRNAs with anticodons that complement the 61 codons is rarely present (Quax et al. 2015).

In Escherichia coli there are 39 tRNAs, 35 in Sulfolobus solfataricus, and 45 in

Homo sapiens (Quax et al. 2015).

11

The second genomic signature is G+C content which is the overall percentage of guanine and cytosine present in the genome. G+C content varies widely in prokaryotes and can be influenced by selective pressures like environmental adaptation or by processes such as genetic recombination

(Foerstner et al. 2005; Mann and Chen 2010; Lassalle et al. 2015; diCenzo and

Finan 2017). Recent data in bacterial sequence data has shown there is a positive correlation between G+C content and genome size. With the larger genomes having higher G+C contents but that other factors/forces have a much larger influence (Almpanis et al. 2018). Selection toward increased translational efficiency or mutational bias present within the host organism cellular machinery can drive conversion of thymine/cytosine to adenine/guanine or vice versa

(Hershberg and Petrov 2010; Hildebrand et al. 2010; diCenzo and Finan 2017).

In a recent survey of sequenced multipartite genomes, the G+C content of 89.4% of megaplasmids and 78.5% of plasmids was lower than the G+C content of the corresponding chromosome (diCenzo and Finan 2017). When this analysis was extended to chromids, 58.2% of chromids had lower G+C contents and 41.2% had higher overall G+C contents (diCenzo and Finan 2017). Though the G+C contents of chromids usually only differ by ~1% from the chromosome (Harrison et al. 2010), megaplasmids typically differ by >1% (diCenzo and Finan 2017).

The less pronounced difference in G+C% between chromid and chromosome than plasmids or megaplasmids might be an indication of a convergence or homogenization (Cooper et al. 2010; diCenzo and Finan 2017). Looking at

12

Figure 2. each individual element has a separate normal distribution around the mean with the G+C contents of chromids centered around zero to ~1% (diCenzo and Finan 2017). This pattern is consistent with the process of transition from plasmid to chromid outlined above (diCenzo and Finan 2017).

Figure 2. Comparison of G+C Content in Secondary Replicons in Domain Bacteria.

Showing the difference in G+C contents of individual replicons in comparison to the chromosome. (Image credit C. diCenzo et al. 2017)

The last genomic signature is dinucleotide relative abundance (DRA) which is the deviation of dinucleotides from its expected frequency based on the

13

genome’s nucleotide frequency (Karlin et al. 1994; diCenzo and Finan 2017).

Dinucleotide composition has been shown to be important in DNA structural preferences such as slides, rolls, propeller twists, and helical twists (Karlin et al.

1994). Thus, DRA deviations could impact changes on DNA structure including duplex curvature, supercoiling, and other higher-order DNA structural features

(Karlin et al. 1994). Since some DNA repair enzymes can recognize DNA secondary structures, we can presume that DRA compositions locally are influencing the structure of DNA (Karlin et al. 1994). Thus, the ability of the repair enzymes to recognize the DNA (Karlin et al. 1994). These DRA frequencies may be important in the process of replication and repair (Karlin et al. 1994). DRA frequencies can also affect Nucleosome positioning, some DNA binding protein interactions, and ribosomal binding of mRNA (Karlin et al. 1994). Average DRA distance within a bacterial genome is relatively small. While on a phylogenetic scale interspecies average DRA distances between species are generally larger than intraspecies differences. The differences between genomes indicate that some factors impose limits upon variation within any particular genome (Karlin et al. 1994). This indicates that DRA values within a genome provide a robust signature of that genome. Dinucleotide relative abundance profiles for bacterial genomes have been shown to be reflective of bacterial phylogeny (Karlin and

Burge 1995; van Passel et al. 2006; diCenzo and Finan 2017). Studies in

Sinorhizobium meliloti indicate that dinucleotide relative abundances can be used to differentiate chromosomes from chromids and plasmids (Wong et al. 2002).

14

Genomic signatures like codon usage and dinucleotide composition can be shaped by their environment and often can be adaptive (Wong et al. 2002;

Carbone et al. 2003). Using this measurement, chromids are the most like the chromosomes (DRA: mean=0.21 +/-0.06), with the megaplasmids (DRA: mean=

0.50 +/- 0.28) showing intermediate divergence, and plasmids (DRA: mean=

0.91+/- 0.5) having the most difference (diCenzo and Finan 2017). An explanation could be that megaplasmids are less mobile than plasmids and successful megaplasmid transfer to phylogenetically distant organisms occurs less frequently than plasmids (diCenzo and Finan 2017). Alternatively, this is consistent with a model where megaplasmids develop through acquisition of smaller plasmids, followed by transfer of genes from the chromosome to the mobile element. A reasonable hypothesis to explain this outcome is that the explanation for these genome signature differences is simply time since HGT.

Genomic signatures are shaped by several facts which can have an adaptive advantage. Thus, the amelioration of genomics signatures, we can hypothesize that they are being driven by specific forces selecting for genomic signatures to regress to that of the chromosome (diCenzo and Finan 2017). This can be done for example for improved translation efficiency or mutation biases of cellular machinery (Wong et al. 2002; Carbone et al. 2003; diCenzo and Finan

2017). By improving translation efficiency, we can presume that selection forces are reducing the energic cost to the cell, reducing the likelihood of bulk DNA loss of the secondary replicon. Whatever the reason for the selection the divergence

15

of certain genomic frequencies of the two could be a signal to how recently the element was picked up.

Mobilization of Large Secondary Replicons and Purpose

The propensity for a genomic element to mobilize to another host cell should be considered when discussing genomic signatures as it relates to the frequency with which the genomic element will be seen. Plasmids are highly mobile and diverse, many that are detected in genome sequencing and metagenome sequencing are likely relatively new uptakes on an evolutionary time scale (diCenzo and Finan 2017). While Megaplasmids and chromids, possibly due to poor maintenance following conjugation, an inability to conjugate, or the inability of the host cell to maintain it (diCenzo and Finan 2017). This last constraint might be due either to a high energetic cost or lack of required machinery (diCenzo and Finan 2017). Overall, this pattern suggests that megaplasmids and chromids that are detected in sequencing are likely less recently acquired than plasmids, therefore have had more evolutionary time to regress their genomic signatures to that of the chromosome (diCenzo and Finan

2017). Some megaplasmids do retain conjugative machinery and their transfer between closely related organisms has been observed in nature (Brom et al.

2002; diCenzo and Finan 2017). Probably due to their evolutionary history as megaplasmids, some chromids also retain their conjugative machinery. No direct

16

evidence of successful HGT of a chromid in nature has yet been published

(Harrison et al. 2010).

Successful transfer of a chromid is believed to be extremely rare event but there is no evidence to support the transfer in natural systems (diCenzo and

Finan 2017). The transfer of these large secondary replicons in the lab is also found to be rare but possible if they have retained the machinery necessary for conjugation (diCenzo and Finan 2017). The S. meliloti pSymB chromid was transferred to A. tumefaciens in an experimental setup with low frequency (Finan et al. 1986; diCenzo and Finan 2017). The replicon was shown to be unstable and the transconjugants would lose the replicon under non-selective conditions

(Finan et al. 1986; diCenzo and Finan 2017). A potential decrease in transconjugant fitness was observed based on smaller colonies on agar plates, and reduced ability to compete with the untransformed host strain (Romanchuk et al. 2014; diCenzo and Finan 2017). This reduced fitness in a nutrient-rich lab environment was restored with the loss of the secondary replicon, even though the megaplasmid affected several phenotypes including biofilm formation, antibiotic resistance, and thermal tolerance (Dougherty et al. 2014; diCenzo and

Finan 2017). This shows that under the lab conditions utilized the transconjugant had a reduced ability to compete but gained several functions that could be beneficial under some circumstances.

These two lab experiments demonstrated that transfer of these large secondary replicons was possible, but at a high cost to the host cell. By losing

17

the secondary replicons, the cell could reduce its metabolic costs associated with

DNA replication and replicon gene expression (Mauchline et al. 2006; Morton et al. 2013; Romanchuk et al. 2014; diCenzo and Finan 2017). This would decrease the number of non-essential mRNA transcripts allowing ribosomes to translate core proteins, as well as decreasing the total number of gene products

(Romanchuk et al. 2014; diCenzo and Finan 2017). An additional complication of secondary replicon fitness is the potential for negative epistatic interactions

(diCenzo and Finan 2017). If the secondary replicon has a gene that encodes a product and negatively interacts with a host cells vital pathway, this could promote the loss of the secondary replicon. These negative interactions could be directly between the proteins, or regulatory interactions at the transcriptional level

(Morton et al. 2013; Romanchuk et al. 2014; diCenzo and Finan 2017).

In lab experiments, transfer of these large secondary replicons was shown to come at a high cost even though they exist in nature. The acquisition of a large secondary replicon could be a strategy for increasing genome size. Multipartite genomes on average are larger than their single chromosome counterparts, usually due to the presence of the secondary replicon, while the chromosomes are roughly equal size (diCenzo and Finan 2017). However, of the 50 largest genomes, only 3 of them are multipartite (diCenzo and Finan 2017). This multipartite architecture is thought to allow the two individual pieces to replicate faster than a single chromosome. Some of the species with the highest maximal replication rates are in the genus Vibrio, and the multipartite nature of the

18

genome is thought to play a role in this (diCenzo and Finan 2017). Further experimentation conducted on 214 species across the phylogenetic tree did not identify any correlation between genome size and minimal generational time

(diCenzo and Finan 2017).

Localization of related genes on the same replicon allows for coordinated gene regulation (diCenzo and Finan 2017). The replicon that a gene is carried on can influence gene dosage as individual replicons can be over or underrepresented in genes that are up/down regulated in different environments

(diCenzo and Finan 2017). In several Vibrio species the initiation of replication of the chromosome occurs before the chromid resulting in a higher-than-average gene dosage and transcription/translation of chromosomal genes (diCenzo and

Finan 2017). Genes that are clustered together on a replicon allow for coordinated regulation via transcription factors, via a bias of transcription factors to regulate genes found on the same replicon (diCenzo and Finan 2017). Another factor is the unequal distribution of transcription machinery throughout the cells, especially the colocalization of transcription factors with the genes they regulate

(diCenzo and Finan 2017). An in-silico analysis of S. meliloti showed that the distribution of transcription factors is biased toward having them on the same replicon with the genes they regulate (diCenzo and Finan 2017). A transcriptome analysis of V. cholerae, comparing lab conditions to intestinal growth conditions, showed that more of the chromid genes are expressed under intestinal conditions than in a laboratory environment (diCenzo and Finan 2017). This

19

indicates that some replicons are enriched in genes which are differentially regulated during niche adaptation (diCenzo and Finan 2017).

The uptake of these secondary pieces of DNA may allow for the colonization of new environments or niche expansion in its current environment resulting in a higher fitness. Studies on several megaplasmids/chromids have identified replicon specific regulation patterns, functional biases, and distinct evolutionary patterns supporting the idea that these replicons help in adaptation to new/unique environments (Chain et al. 2006; Galardini et al. 2013; diCenzo et al. 2014; diCenzo and Finan 2017). This supports the inference that the invasion or uptake of a secondary element via HGT allows the host to expand to new niches. The maintenance of the secondary replicon comes down to an argument based on fitness; if the gain of niche specific traits leads to increased benefits that outweighs the fitness cost than it is more likely to be kept and vice versa

(diCenzo and Finan 2017).

The secondary replicon often carries genes that are critical for host colonization and functions as a symbiosis or virulence factor for soil microbes to interact with eukaryotic species (diCenzo and Finan 2017). This pattern has been observed in many genera with strong associations with eukaryotic hosts, including Sinorhizobium, Rhizobium, Agrobacterium, Burkholderia, Cupriavidus,

Vibrio, Pseudoalteromonas, Azospirillum, Ralstonia, and Prevotella (diCenzo and

Finan 2017). In-silico work suggests that the genome expansion within lineages in the Alphaproteobacteria, especially within the Rhizobiales, were linked with

20

plants and the evolution of symbiosis (diCenzo and Finan 2017). The genome expansions in these genera consisted of the acquisition of transcriptional, transport and metabolic functions on secondary replicons (diCenzo and Finan

2017). This allowed for association with a novel niche or eukaryotic organism, followed by environmental adaptation through further gene flow to the secondary replicon (diCenzo and Finan 2017). One obstacle to drawing conclusions for this sort of comparative analysis is the potential for sampling bias in what species are chosen for analysis, and what environmental samples are studied using metagenomic tools (diCenzo and Finan 2017). This can lead to a false emphasis on a relatively narrow set of organisms (diCenzo and Finan 2017).

Toxin Antitoxin Systems

Often found on secondary replicons, Toxin Antitoxin systems (TAS) are often referred to as “Addiction systems” and have been shown to stabilize secondary elements (Magnuson 2007; Unterholzner et al. 2013; Fraikin et al.

2020). They were initially discovered in E. Coli and have been shown to promote the post segregational killing of cells which do not contain the element (Gerdes et al. 1986). There are multiple forms of TAS but the general notion is that they produce a stable long lived toxin protein and an unstable short lived antitoxin

(Hall et al. 2017). Currently there are six different classes of TAS which are differentiated based on the method it uses to inactivate the toxin (Hall et al.

2017). The most common type of TAS is a type II system where the antitoxin

21

binds directly to the toxin to inhibit its function/activity (Fraikin et al. 2020). An example of this is the pemIK system which was discovered in a plasmid carried by Staphylococcus aureus (Janczak et al. 2020). Several studies into TAS have shown their roles in phage defense, Stress response, Antibiotics tolerance, virulence, persisted cell formation, anti-addiction, and maintenance of integrated mobile genetic elements (Saavedra De Bast et al. 2008; Wozniak and Waldor

2009; Page and Peti 2016; Wood and Wood 2016; Song and Wood 2018; Ma et al. 2019; LeRoux et al. 2020; Song and Wood 2020). Given their large presence in mobile genetic elements it is likely their true purpose is in HGT (Yamaguchi et al. 2011).

22

Project Focus

Figure 3. Phylogenetic Distribution of Secondary Replicons in Domain Bacteria. Depiction of 1,708 genomes plotted based on genome architecture. Red coloration indicates no megaplasmid(s) or chromid(s), green for megaplasmid(s) but no chromid(s), blue for species with chromid(s) but no megaplasmid(s), purple for species with both a megaplasmid and chromid(s) (Image provided by diCenzo and Finan 2017)

23

The explosion in sequence data available in the last few years due to high-capacity short read sequencing, and 3rd generation long read sequences produced by technologies like Oxford Nanopore and PacBio, allows for whole genomes to assembled much more easily. As seen in Figure 3, it appears that

~10% of bacteria have multipartite genomes where the bacterial genome is split between two or more large DNA fragments (diCenzo and Finan 2017). This architecture is prevalent in the genera Brucella, Vibrio, and Burkholderia and is important to many organisms including plant symbionts like nitrogen-fixing rhizobia, and pathogens that colonize plants, animals, and humans (diCenzo and

Finan 2017).

24

Figure 4. Phylogenetic Dispersion of Chromids and Megaplasmids in Order Burkholderiales Showing the presence of secondary replicons within the Order Burkholderiales. Of particular interest to this study is the outlier Variovorax paradoxus. (Image provided by diCenzo et al. 2014)

Multipartite genomes are distributed across the domain Bacteria but there are many clear clusters within a number of phylogenetic groups, including the

Burkholderiales as seen in Figure 4 (diCenzo et al. 2014). An outlier in Figure 4

25

is Variovorax paradoxus which is a metabolically diverse, aerobic, ubiquitous gram negative (Satola et al. 2013). This species has been studied because it contains isolates that are hydrogen oxidizers and plant growth promoters, as well as strains involved in quorum quenching and bioremediation

(Satola et al. 2013). The G+C content of their genomes is between 66.5-69.4% with optimal growth temperatures of around 30°C (Satola et al. 2013). The genus

Variovorax belongs to a group of bacteria referred to as plant growth promoting rhizobacteria (PGPR) (Belimov et al. 2005; Belimov et al. 2009; Han et al. 2011).

Rhizosphere Variovorax use mechanisms of auxin synthesis and ethylene degradation to promote root development (Finkel et al. 2019 May 23) and are key determinants of bacteria-plant communication networks in complex model communities (Finkel et al. 2019 May 23). Members of this genus are common inhabitants of soil and water with several strains isolated from polluted environments due to their ability to resist toxins and degrade complex organic compounds (Satola et al. 2013).

Multiple members of this genus have taken up and maintained large pieces of DNA, creating an extensive heterogeneity within the clade (Carbajal-

Rodríguez et al. 2011; Han et al. 2011). The chromosomes of these species show high synteny when aligned on a dotplot but their secondary replicons show little to none (Ne Ville et al. 2019 Jul 12). This implies that the primary chromosomes are highly conserved, and that the secondary replicons have been up taken in multiple uptake events and share little similarity. A large secondary

26

replicon is present within V. paradoxus S110 and V. paradoxus B4 (Carbajal-

Rodríguez et al. 2011; Han et al. 2011) with several indicators it may be a chromid. V. paradoxus S110 was isolated from the interior of a potato plant grown and enriched based on its ability to degrade acyl-homoserine lactones

(AHLs), bacterial signaling molecules used in quorum sensing (Han et al. 2011).

An isolate identified as V. paradoxus B4, from the soil under mesophilic, aerobic conditions, was enriched due to its ability to degrade mercaptosuccinate

(Carbajal-Rodríguez et al. 2011). Both of these strains of Variovorax paradoxus have taken up and maintained a secondary replicon which the authors identified to be a second chromosome, but is likely a chromid based on recently adopted nomenclature (Harrison et al. 2010).

27

Figure 5. 16s Phylogenetic Tree of Variovorax

16S phylogenetic tree depicting 6 classifications of Variovorax. (Image provided by Barbara Satola et al 2013)

The taking up of a secondary piece of DNA is a rare and represents a

change in phylogenetic characteristics. The genus Variovorax is currently

classified into 6 species, as seen in Figure 5 (Satola et al. 2013), each

possessing diverse metabolic abilities (Satola et al. 2013). Looking at the genus

Variovorax, there is some contention on the NCBI and GTDB databases in the

identification of these organisms classified as V. paradoxus versus those that are

not classified to the species level. Classification is usually based on one or more

of the following: 16S sequence comparison, genome wide conserved gene

28

comparison, Average Nucleotide Identity (ANI), or in-silico DNA-DNA hybridization.

The method shown in Figure 5 is a ribosomal 16S tree still commonly used today, but it has a limitation based on the assumption that changes in this sequence are consistent with larger changes in genome architecture such as the uptake of a plasmid or large recombination events. The 16S sequence is most commonly found on the primary chromosome, is often found in multiple copies, and is even sometimes variable in sequence within a single isolate (Větrovský and Baldrian 2013). Utilizing only this gene for phylogeny can obscure changes found elsewhere in the genome, and limits the amount of phylogenetic distance that can be examined.

Average Nucleotide Identity (ANI) is another method that can be used to investigate phylogenetic relationships (Jain et al. 2018). ANI is a measure of nucleotide similarity between the coding regions of two genomes; an ANI score of 95-96% is considered the same species (Jain et al. 2018). As genome sequencing has become easier and cheaper, an issue on the NCBI database has developed where approximately 40% of all organisms are lacking species designations. The GTDB database tried to correct this when it was created (Tang

2020) by clustering 194,600 genomes into 31,910 clusters primarily using commonly accepted ANI criteria to set bounds on species and to propose species clusters (https://gtdb.ecogenomic.org/) (Tang 2020). On the GTDB database Variovorax paradoxus is divided into 5 different species subclusters,

29

suggesting that Variovorax paradoxus phylogeny may need to be reevaluated

(Tang 2020). The diversity of secondary replicons in this group may be a complicating factor in this analysis.

Another type of phylogenetic tree uses multiple highly conserved genes which are usually only found as a single copy, often referred to as a marker based phylogenetics. A tree of this type was constructed using the available assemblies in the genus Variovorax. This tree constructed using this method is consistent with the GTDB database construct that suggests Variovorax paradoxus is not one homogeneous species. Many of the isolates that are classified as V. paradoxus are scattered throughout the phylogenetic tree and do not clade well together. Within Variovorax there are isolates that have a single chromosome and others with a multipartite genome, and several of those that contain the multipartite genomes seem to clade together. This observation leads to the question of whether the genus Variovorax can be subdivided based on the nature of the secondary replicons present.

Thesis Statement

Several strains of Variovorax have taken up large pieces of DNA and maintained them as secondary chromids. These replicons show evidence of acquisition through horizontal gene transfer as either a plasmid or megaplasmid, including the presence of a plasmid origin of replication (ori), plasmid gene markers, differential GC content, divergent synonymous codon usage (indicative

30

of optimal codon usage), tetranucleotide frequency, and dinucleotide relative abundance. The secondary chromids have allowed for new phenotypes to develop and may have been actively selected for by niche expansion. Taking up large pieces of DNA is a rare event and represents a change in phylogenetic characteristics, which is overlooked when creating a phylogeny based around the

16S gene that is often on the primary chromosome, or when using only conserved genes for phylogeny. I hypothesize that multiple plasmids/megaplasmids have been taken up by Variovorax and fixed within the population through gene flow of core genes from the primary chromosome to plasmid, requiring the cell to maintain the secondary chromid to survive. I further hypothesize that the ability to take up extrachromosomal elements is variable between strains, and that it is dependent on factors within the primary chromosome. I divided my project into two specific aims. Aim 1 was to sequence and assemble 17 strains of

Variovorax to build an accurate phylogenetic tree based around whole genome analysis instead of a single gene (Figure 15), and annotate the genomes to characterize the genes that are on the secondary chromid. With many Variovorax

Illumina draft genomes available on the NCBI database, we generated 3rd generation long read sequences to complete these genomes using hybrid assembly. Aim 2 evaluated the ability of the organism to maintain secondary replicons between isolates that have a single chromosome and ones that maintain a chromid, using a broad host range indicator plasmid. If my hypothesis

31

was correct, the plasmid would have be maintained at a higher frequency in the absence of selection in the strain that contains a second natural replicon, even if this strain has a larger total genome.

Specific Aims

Aim 1

Investigations into bacterial genome structure have shown that 10% of all bacteria contain a secondary replicon, providing a genetic reservoir involved in widespread influence of mobile elements. The genus Variovorax has shown an extensive heterogeneity in genome architecture with several isolates containing combinations of plasmids, megaplasmids, and chromids. In the classic models of multipartite genomes, such as Burkholderia and Vibrio, genomes when compared to one another have high synteny in both replicons, showing that the uptake and maintenance of a large secondary replicon is rare. Variovorax has high synteny in its chromosome, but its secondary replicon shows limited synteny, suggesting multiple uptake events and a relatively high rate of invasion.

Data comparing these replicons form clusters when plotted (Figure 5 vs Figure

15, and Figure 16), suggesting that the uptake of a secondary replicon is a phylogenetically important event in this group. Using available Illumina data on the NCBI database, we have completed these assemblies using 3rd generation sequencing methods on 17 members of this genus, we have sequenced, assembled, and evaluated these now complete Variovorax genomes to examine

32

the diversity of elements. Using several in house scripts we extracted G+C contents, SCOU, DRA distance and replicon size (Figures 10-13) from each individual replicon assembled in this project. Using this data 9 chromids, 7 megaplasmids, 2 plasmids (Table 2), and an integrated megaplasmid (Figure 9) were identified using genomic frequency characteristics. This was followed by

ANI analysis of the completed genomes vs each other, allowing us to build an

ANI cluster map from the ANI matrix (Figure 14, Figure 15). A marker based phylogenetic tree was then constructed using maximum-likelihood (Figure 16).

Aim 2.

Given the evolutionary pattern seen in Aim 1, we were interested to find out if the high levels of heterogeneity present in Variovorax are due to factors controlling secondary replicon maintenance present within the primary chromosome. We used an experimental approach to test for a differential ability to maintain secondary elements between Variovorax isolates. To study this, we utilized a pRU1105 and pBBR5pemIKpBAD vector both of which carry an antibiotic resistance marker (gentamycin) and an unused pBAD expression system while the pBBR5pemIKpBad plasmid also carries a pemIK TA system.

This vector was introduced into two well characterized Variovorax isolates (EPS and VAI-C) with different genome architectures (EPS: Single Chromosome, VAI-

C: Chromosome, Chromid and Plasmid) via electroporation and selected on plates containing the appropriate antibiotic concentration. This simulated the process of plasmid invasion. We then passage these strains in the presence or

33

the absence of antibiotic selection and passaged over 16 days under differential selection to study the population(s) ability to maintain it (Figure 18).

34

CHAPTER TWO

MATERIALS AND METHODS

Specific Aim 1

Nine Variovorax isolates were acquired from Jeff Dangl at the University of

North Carolina at Chapel Hill, two from Jared Leadbetter at Caltech, One from

Kostas Konstaninidis at Georgia Tech, and four from the Noble Research group

(Table 1). Three additional isolates were already present in the Orwin lab collection (Table 1).

Table 1. Variovorax Sequencing Project

Species name Strain Name Acquired from Illumina Data Nanopore Data Variovorax MF-004 J. Dangl Lab Obtained from Data paradoxus Dangl Lab obtained 4MFCol3.1 as part of this project Variovorax MF-110 J. Dangl Lab Obtained from Data paradoxus 110B Dangl Lab obtained as part of this project Variovorax sp. MF-160 J. Dangl Lab Obtained from Data 160MFSha2.1 Dangl Lab obtained as part of this project Variovorax sp. MF-278 J. Dangl Lab Obtained from Data 278MFTsu5.1 Dangl Lab obtained as part of

35

this project Variovorax MF-295 J. Dangl Lab Obtained from Data paradoxus Dangl Lab obtained 295MFChir4.1 as part of this project Variovorax MF-349 J. Dangl Lab Obtained from Data paradoxus Dangl Lab obtained 349MFTsu5.1 as part of this project Variovorax sp. MF-350 J. Dangl Lab Obtained from Data 350MFTsu5.1 Dangl Lab obtained as part of this project Variovorax MF-369 J. Dangl Lab Obtained from Data paradoxus Dangl Lab obtained 369MFTsu5.1 as part of this project Variovorax sp. MF-375 J. Dangl Lab Obtained from Data 375MFSha3.1 Dangl Lab obtained as part of this project Variovorax EPS P. Orwin Lab Full Sequence Data paradoxus EPS Already obtained Published as part of this project Variovorax CSUSB P. Orwin Lab Sequenced Data paradoxus Data obtained obtained CSUSB as part of this as part of project this project Variovorax VAI-C Jared Leadbetter Sequenced Data paradoxus VAI-C Data obtained obtained as part of this as part of project this project

36

Variovorax ATCC-17713 Jared Leadbetter Sequenced Data paradoxus ATCC- Data obtained obtained 17713 as part of this as part of project this project Variovorax sp. KK3 Kostas Data Data KK3 Konstantinidis Downloaded obtained from SRA as part of this project Variovorax sp. NFACC26 Noble Research Data Data NFACC26 group Downloaded obtained from SRA as part of this project Variovorax sp. NFACC27 Noble Research Data Data NFACC27 group Downloaded obtained from SRA as part of this project Variovorax sp. NFACC28 Noble Research Data Data NFACC28 group Downloaded obtained from SRA as part of this project Variovorax sp. NFACC29 Noble Research Data Data NFACC29 group Downloaded obtained from SRA as part of this project

Genomic DNA Isolation

The protocol is based on the Bacterial genomic CTAB DNA isolation protocol suggested by the Department of Energy’s (DoE) Joint Genome Institute

(JGI) and Loman/Quick protocol (William et al. 2012; Quick and Loman 2018).

37

Cultures were streaked out on an agar plate (1.5% w/v) containing 5 g/L yeast extract (YE) from a -80oC glycerol stock and allowed to grow for 2 days in an incubator at 27oC. An isolated colony was then used to inoculate a culture in 10 mL YE broth. The cells were grown into log phase on a shaking incubator at

o 30 C as determined by optical density (OD600=~0.4-0.7). A 5 mL aliquot of the cells were centrifuged at 17,000 x g for 10 minutes and the culture fluid was decanted and discarded. The cell pellets were placed in a -80 oC freezer until it was time for DNA extraction. Cell pellets were thawed on ice and resuspended in

1x TE (10mM Tris and 1mM EDTA, pH 8). The JGI-DOE bacterial genomic DNA isolation using CTAB protocol (William et al. 2012) was followed with the following modifications. After incubation for 30 minutes at 37 oC, 20 µL RNAse was added to the solution. In order to denature the copious exopolysaccharide produced by many Variovorax isolates, after the addition of the 10% SDS and

Proteinase K the cells were incubated at 57 oC overnight. The salt precipitated lysate was extracted with two phenol:chloroform:isoamyl alcohol, and the extracted again with two chloroform:isoamyl alcohol extractions. After the second extraction the aqueous layer was removed and placed into a tube on ice containing ice-cold of absolute ethanol and 1/10th the volume of 3M ammonium acetate. The tube was stored at -20°C overnight to allow for DNA precipitation.

The next day the tip of a glass capillary tube was melted to form a hook and the precipitated DNA was removed from solution. The DNA was submerged in 70% ethanol until it constricted around the capillary tube into a whitish pellet. The

38

pellet was scraped into a 1.7mL centrifuge tube along with 1 mL of freshly made

70% ethanol followed by centrifugation at 4°C at 17,000 x g for 10 minutes. The supernatant was removed, and the tube was dried in a heated DNA speed-vac set to medium speed for 10 minutes. The isolated genomic DNA was resuspended in 100 µL of 1x Tris (10mM Tris pH 8.0) and heated at 37°C for 10 minutes to resolubilize for sequencing. When necessary, DNA pellets were stored in a -80°C freezer as a precipitate under absolute ethanol and 1/10th the volume of 3M ammonium acetate until sequencing. Prior to library preparation the DNA was analyzed on a Nanodrop (Fisher) to ensure that the 260/280 ratio was between 1.8-2.0 and the 260/230 ration was between 2.0-2.2. A 26-gauge needle was then used to shear the DNA into approximately 4kb fragments before library preparation. The same genomic DNA preparation was used to generate a

250-300 bp library with Nextera DNA Flex LPK kit, which was sequenced in the

Illumina iSeq platform (2x150 bp) for Variovorax paradoxus VAI-C, and

Variovorax paradoxus ATCC-17713. Short read data for Variovorax sp. CSUSB was obtained from a separate genomic DNA preparation using the FastDNA Spin kit for Soil (Solon, OH) and sequenced on the Illumina MiSeq platform (500-600

Library fragment size using the Nextera XT kit, 2x250 reads).

Library Preparation for Illumina Sequencing

Nextera DNA Flex Library protocol was then followed with no deviations from the protocol.

39

Sample Quantification and Illumina, iSeq 100 Library Preparation

Samples were quantified on a Qubit using a dsDNA fluorescent dye methodology (dsDNA HS assay). The samples were run on the iSeq 100 following standard Illumina setup and loading protocols with the addition of a 5% phiX spike in as a control. To see which samples where sequenced using this method see Table 1.

Library Preparation for Oxford Nanopore Sequencing

The standard SQK-RBK004 library preparation protocol was followed with no deviations from the protocol. To see which samples where sequenced using this method see Table 1.

MinION Sequencing

The MinION with a MIN-106 flowcell was connected to a computer via

USB 3.0 and the MinKnow GUI started. The 9.4.1 flow cell was then selected along with the kit being used (SQK-RBK004) followed by our parameters for directory size. Active pores were checked in the MUX scan and at the end of the

Flow Cell Check. Once the temperature of the device had warmed up to 34°C the sequencing run was started. During the initial few hours of the run the pore occupancy, read length histogram, and quality score was monitored as seen in

Figure 6. The sequencing run was left to run for 24 hours after which it was stopped, and the flow cell was cleaned and stored as recommended by the manufacturer.

40

Figure 6. Oxford Nanopore Sequencing Run Showing the start of a MinION run, in the MinKnow software package (Linux). The image on the left shows an active run monitoring active pores, while the image on the right shows pore availability over the course of a run.

Bioinformatics Pipeline

The raw Nanopore data was basecalled using Guppy (v2.3.1) configuration r9.4.1 Flip Flop (later High-Accuracy Basecalling), running on standard configuration. MinION reads were demultiplexed in Deepbinner (Wick et al. 2018) and Guppy with the agreed upon reads adapters removed with

Porechop (v0.2.4) (Wick et al. 2017a). Quality control of the long reads was done by running the reads in Filtlong (v0.2.0) (GitHub - rrwick/Filtlong) using the

Illumina data as an external reference to throw out low identity reads and barcodes that slipped through the demultiplexing process. For the Illumina data

41

FastQC (v0.11.8) was used for quality assessment of this data (Andrews) and trimming was performed in Trimmomatic (0.38.0) (Bolger et al. 2014). With this tool the IlluminaCLIP procedure was used to remove any remaining sequence adaptors/barcodes, the first 20 nucleotides were cropped off each read

(HEADCROP), and the last five were also removed (TRAILING), leaving a mean length distribution of 126 bp. Short read data from the Dangl lab strains was provided by Dangl Lab University of North Carolina at Chapel Hill, while short read data pertaining to KK3 and the Noble group strains were obtained from publicly available records on the JGI website. Assemblies were created using a hybrid approach in Unicycler (v0.4.8.0) (Wick et al. 2017b) utilizing default parameters without references on the North America Galaxy Hub

(http://usegalaxy.org) (Afgan et al. 2018). For any assembly which did not circularize in Unicycler, Nanopore reads were aligned to the area of discontinuity using Minimap2 (v2.20) (H 2018) to create a .sam file. Samtools (v1.12) (Li et al.

2009; Danecek et al. 2021) was used to sort/index the file into a .bam file which could be viewed in IGV (v2.10.0) (Thorvaldsdóttir et al. 2013). Using Samtools reads which aligned to the area of discontinuity were extracted to a fastq file.

This fastq file was then used in Bandage to create a BLAST bin to confirm they spanned the gap. This bin was used to create a consensus read in ALFRED

(v.0.2.3) (Rausch et al. 2019).

Initial annotation was performed by uploading the assembly to Rast.org and using the RASTtk pipeline (http://rast.nmpdr.org) (Brettin et al. 2015). The

42

assemblies along with the raw data were then submitted to the NCBI database which uses DOE-JGI Microbial Genome Annotation Pipeline prior to making the assemblies available to the public.

Data Availability

The assemblies and raw data have been uploaded to the NCBI database.

Variovorax sp. CSUSB can be found at PRJNA593854, SAMN13494298, assembly CP046622. Raw read data can be found at SRR10662237-40.

Variovorax paradoxus VAI-C can be found under BioProject number

PRJNA667957, BioSample number SAMN16392950, and assembly numbers

CP063166 through CP063168. The read data can be found under SRA accession numbers SRX9260397 and SRX9260396, including demultiplexed fastQ files with the barcodes removed for the MinION runs and paired fastQ files for the Illumina iSeq. Fastq and Fast5 for the other samples are available on request but will be released on final publication.

Core Gene Identification

Core genes in the secondary replicon were identified by taking

Betaproteobacteria 203 gene .hmm file from GtoTree (v.1.5.44) (Edgar 2004;

Capella-Gutiérrez et al. 2009; Eddy 2011; Hyatt et al. 2012; Tange 2018; Lee

2019) and using it as a base for a HMMER3 (v.3.3.2) (Eddy 2011) search. The genes that were identified in each secondary replicon are listed in below in Table

2.

43

Dotplot

The dotplot used in Figure 9 was constructed on rast.org (Brettin et al.

2015) using the nucleotide comparison tools by aligning the two genomes

NFACC27 and NFACC29.

Secondary Replicon Identification:

Secondary replicons found within the genome assemblies were classified as a chromid: if the distance in percent G+C were within 1% of the chromosome, and the DRA distance was within 0.4% of the chromosome. While SCOU was also calculated there were no current standards found in current literature for differentiating between replicon types based on this data. Replicons that did not meet this requirement were classified as either megaplasmid or plasmid based on current standards of size.

Percent G+C Content

G+C content in Figure 10 was calculated for all completed assemblies using an in-house script which utilized the SeqinR package (v4.2-8) (Charif and

Lobry 2007) in R which counted the frequency of all bases on the Chromosome and the secondary replicon. The Secondary replicon was then subtracted from the Chromosome. Variovorax has relatively high G+C contents so all

Chromosomes had higher G+C contents than the secondary replicons. Graphing of this data was done in R in the package ggplots2 (v3.3.4) (Wickham 2009).

44

Size of Secondary Replicons

Sizes of the secondary replicons in Figure 13 was calculated for all completed assemblies using an in-house script which utilized the SeqinR package (v4.2-8) (Charif and Lobry 2007) in R which counted available bases on the secondary replicon and then sum all bases together. Graphing of this data was done in R in the package ggplots2 (v3.3.4) (Wickham 2009).

Dinucleotide Relative Abundance

Dinucleotide relative abundance (DRA) in Figure 11 was calculated for all completed assemblies using an in-house script which utilized the SeqinR package (v4.2-8) (Charif and Lobry 2007) in R. It used a two-frame sliding window to count all 16 DRA combinations present in the Chromosome and secondary replicons. The 16 DRA frequency counts where then divided by the total number of times they occurred on that genomic piece. Using the count function, calculate the frequency of A, T, G, and C that occurred for the same replicon. This data was used to calculate the expected frequency for each dinucleotide by multiplying the applicable frequency. The observed value was then divided by the expected values for each replicon. The absolute difference between the 16 dinucleotides between the Chromosome and the Secondary replicon was calculated. The sum of those differences was used to represent the final DRA distance. Graphing of this data was done in R in the package ggplots2

(v3.3.4) (Wickham 2009).

45

Synonymous Codon Usage Order (SCUO)

Synonymous Codon Usage Order (SCUO) in Figure 12 was calculated for all assemblies using the R package coRdon (v1.8.0) (Elek et al. 2021). The calculated values for the Chromosome was subtracted from the secondary replicon to the SCUO Distance. Graphing of this data was done in R in the package ggplots2 (v3.3.4) (Wickham 2009).

ANI

Average Nucleotide Identity (ANI) in Figure 15 is a measure of nucleotide genomics similarity between 1000 bp regions of two genomes (Jain et al. 2018).

To calculate ANI Best Bidirectional Hits (BBHs) are calculated between genome pairs pairwise, the highest hits on nSimScan having 70% identity and at least

70% coverage of the shorter of the two genes (Varghese et al. 2015). Since this method replies on the identification of areas of similarity between genomes, gene fragmentation caused by errors or sequencing artifacts can distort both values

(Varghese et al. 2015). The ANI calculator used was from the Kostas lab from

Georgia Tech (Rodriguez-R and Konstantinidis 2016) and the Figure 15 was built from ANI data in Figtree (v1.4) (Rambaut et al. 2018) using the data from the ANI cluster matrix.

Marker Based Phylogenetic tree

To construct Figure 16 all strains were ran through GoTtree (v.1.5.54)

(Edgar 2004; Capella-Gutiérrez et al. 2009; Eddy 2011; Hyatt et al. 2012; Tange

2018; Lee 2019), where the program looked for 203 single copy genes for the

46

class Betaproteobacteria in each genome. It ran an alignment for each one of these genes across all genomes, the alignment file generated from this program was used to construct a Phylogenetic tree in FastTree (v2.1) (Price et al. 2010) using the maximum-likelihood, Figure 16 was then modified in Figtree (v1.4)

(Rambaut et al. 2018).

Specific Aim 2

This protocol is based loosely around Richard Lenski’s LTEE as a framework for this experiment (Lenski et al. 1991). E. coli stocks containing the plasmids pRU1105 and pBBR5pemIKpBAD were streak out from a -80 oC stock.

They were grown in a 37 oC incubator for 1 day on LBgent10 plates (10 µg/mL gentamicin). A single colony from each plate was used to inoculate 5mL of LB broth and grown over night in a shacking incubator at 37 oC. Using the Promega

Wizard™ Plus SV Minipreps DNA Purification System both plasmids were extracted. The DNA was analyzed on a Nanodrop (Fisher) to ensure that the

260/280 ratio was between 1.8-2.0 and the 260/230 ration was between 2.0-2.2 then were stored at -20 oC. Variovorax paradoxus EPS and Variovorax paradoxus VAI-C were streaked out on an agar plate (1.5% w/v) containing 5 g/L yeast extract (YE) from a -80oC glycerol stock and allowed to grow for 2 days in an incubator at 27oC. An isolated colony from each was then used to inoculate a culture in 10 mL YE50 (2.5 g/L YE) broth. The cells were grown into log phase on a shaking incubator at 30oC as determined by optical density (OD546=~0.4-0.7).

47

These tubes were used to inoculate a 100mL broth flask of YE50 which was placed on a shaking incubator at 30oC. These cells were grown until the OD546 was between 0.3-0.5, after the cells were split into 50mL falcon tubes and centrifuged at 7000 x g for 15 mins at 4 oC. The supernatant was decanted and discarded while cells were resuspended in 40mL of stile ice cold 300mM sucrose. The cells were centrifuged as previously described with the supernatant decanted again but resuspended in 20mL of stile ice cold 300mM sucrose. This process was repeated once more but with the final resuspension being in 400µL of stile ice cold 300mM sucrose. During this process if the cells were not being centrifuged or worked with, they were kept on ice the entire time to keep them cold. The cells were then placed on ice for 30minutes along with EC2 gap cuvettes (200mM). The stocks of both pRU1105/pBBR55pemIKpBAD plasmids were brought out from the -20 oC to thaw. While the cells were on ice the

Electroporation apparatus was warmed up with the machine being set to EC2 and the voltage set to 1.80kV. For both EPS and VAI-C 100uL of cells were placed in EC2 gap cuvettes with 2 µL of either pRU1105/pBBR55pemIKpBAD.

The samples were mixed by gentle pipetting followed by gentle flicking to ensure the samples were mixed properly. The contact points on the EC2 gap cuvettes were wiped down with a Kimwipe to remove any ice or water. The EC2 cuvettes were then placed into the Electroporation apparatus followed by activating the machine to deliver the pulse. The cells were then placed in 400µL of YE50 broth and incubated in a shaking incubator for 2 hours at 30oC. Aliquots of the

48

electroporation mixture was spread onto YE + 50 µg/mL or 70 µg/mL Gentamycin

(YE-Gm) (for EPS 70 µg/mL was used and 50 µg/mL for VAI-C) plates and incubated at 27 oC for 2 days.

Plasmid Verification

After 2 days of grown, several colonies from each plate (EPS+ pRU1105,

EPS+pBBR5pemIKpBAD, VAIC+pRU1105, VAIC+ pBBR5pemIKpBAD) were selected at random and used to inoculate 5mL of YE broth. These were placed in a 30 oC shaking incubator overnight. Using the Promega Wizard™ Plus SV

Minipreps DNA Purification System all 4 samples plasmids were extracted, cut with SpeI and 10µL of each of the linearized samples plus 2µL of loading dye ran on a .8% (w/v) agarose electrophoresis gel. Organisms verified to contain the plasmid (see Figure 7), were grown overnight in 5mL of YE broth to create -80 oC base stocks to start the experiment.

49

Figure 7. Experimental Evolution Plasmid Verification Variovorax paradoxus VAI-C (left) and Variovorax paradoxus EPS (right) showing the plasmids extract are the right size/length, in Lane 1: 1KB ladder, Lane 2: pRu1105 plasmid control, Lanes 3-5: pRU1105 Experimental plasmids extract from Variovorax, Lanes 6: pBBR5pemIKpBAD control, Lanes 7-9 pBBR5pemIKpBAD Experimental plasmids extract from Variovorax

Experimental Evolution Passaging

Base stocks for Variovorax paradoxus EPS (+pRU1105,

+pBBR5pemIKpBAD) were streaked out on an YE-Gm plates and incubated at

27oC for 2 days as EPS showed a higher innate resistance to gentamicin. Base stocks for Variovorax paradoxus VAI-C (+pRU1105, +pBBR5pemIKpBAD) were streaked out on YE-Gm plates and incubated at 27oC for 2 days. An isolated colony from each was then used to inoculate a culture of appropriate YE-Gm that was incubated in a shaking incubator at 30 oC overnight. The overnight was to inoculate 5 tubes for each of the 4 experimental groups: 1 control containing 5mL

50

YE-Gm and 4 experimental tubes of 5mL YE. The 20 tubes were placed in a shaking incubator at 30 oC overnight, the next day each tube had its OD546 measured. This was followed by taking 100µL from each tube and inoculating

5mL of fresh YE broth (for controls YE-Gm broth was used), this was done daily for each of the 20 tubes daily for 16 day(s). On day(s) 1,4,7,10,13,16 serial dilutions (1/10) were made for each tube in a 96 well plate. A 10 µL multichannel pipette was used to create replica plates by taking 6 of the dilutions (for example

EPS+pRU1105 dilutions 3-8) and plate them on a YE and a YE-Gm plate. On each replica plate 5µL droplets were placed in 5 rows of 6 dilutions each. These plates were placed in a 27 oC incubator for 2 days then the lowest dilution that contained between 1-30 colonies was counted and its dilution recorded. On days

0, 5,10 and 16, 1.5mL were taken from the each of the 20 tubes and used to create -80 oC stocks (see Figure 8 for an overview).

51

Figure 8. Overview of Experimental Evolution Project An overview of the Experimental Evolution Passaging for the 4 samples. Showing how the samples were handled after electroporation, including the 16- day passaging and up to the every 3rd day playing on YE/ YE-Gm

OD546 Reading

During the 16-day passaging each day the OD546 of each experimental/control was taken. Graphing of Figure 17 was done in R in the package ggplots2 (v3.3.4) (Wickham 2009).

52

Plasmid Retention

On days 1,4,7,10,13 and 16 the controls/experimentals were replica plated on both YE and YE-Gm. The counts from the YE were divided by the ones from

YE-Gm to get the percent plasmid retention for each group. Graphing of Figure

18 was done in R in the package ggplots2 (v3.3.4) (Wickham 2009).

53

CHAPTER THREE

RESULTS AND DISCUSSION

Aim 1

Replicon Identification

For the 17 completed assemblies, the fasta files for each individual replicon was extracted for further analysis. Included in our data analysis were the genomes of Variovorax paradoxus B4 and Variovorax paradoxus S110 due to changes in nomenclature since they were published to the NCBI database.

Based on replicon size and genomic signatures, all secondary replicons were identified in Table 2.

Included in the information in Table 2, are the core genes identified on the secondary replicons. Strain Variovorax paradoxus 110B was identified via genomic frequency characteristics as being a chromid, but no core genes were identified on the secondary replicon. This might be because this replicon has more recently transitioned into a chromid, and a core gene has not been transferred over yet. An alternative explanation is that the annotation software at the time was unable to detect the core gene(s) present on the secondary replicon. In all of the genomes surveyed, none of the core genes found on the secondary replicon were missing from the primary chromosome.

54

Table 2. Assembled Genomes and Core Genes Species name Replicon Type Name of Core Gene Present on secondary replicon Variovorax paradoxus Chromid 1. DHQ synthase 4MFCol3.1 2. OMPdecase 3. Peptidase A8 Variovorax paradoxus 110B Chromid None Variovorax sp. 160MFSha2.1 Megaplasmid 1. Peptidase A8 Variovorax sp. 278MFTsu5.1 Megaplasmid 1. OmpH 2. Peptidase A8 Variovorax paradoxus Chromid 1. DHQ synthase 295MFChir4.1 2. Peptidase A8

Variovorax paradoxus Chromid 1. DHQ synthase 349MFTsu5.1 2. Peptidase A8

Variovorax sp. 350MFTsu5.1 Megaplasmid None

Variovorax paradoxus Chromid 1. DHQ synthase 369MFTsu5.1 2. Peptidase A8 Variovorax sp. 375MFSha3.1 Megaplasmid 1. OmpH 2. Peptidase A8 Variovorax paradoxus CSUSB Single Chromosome Variovorax paradoxus VAI-C 1 Chromid and 1 Chromid: Plasmid 1. Histidinol dh 2. MltA 3. Peptidase A8 Plasmid: 1. RmuC Variovorax paradoxus ATCC- Chromid 1.Peptidase A8 17713 Variovorax sp. KK3 Plasmid 1.Porphobil deam Variovorax sp. NFACC26 Megaplasmid 1. Histidinol dh

Variovorax sp. NFACC27* Single *Likely integration event of a Chromosome Megaplasmid into the Chromosome Variovorax sp. NFACC28 Megaplasmid 1. Histidinol dh

Variovorax sp. NFACC29 Megaplasmid 1. Histidinol dh

55

Identification of Integrated Megaplasmid

Figure 9. NFACC27 Megaplasmid Integration Event Depicts the nucleotide alignment between NFACC27 (Y axis) and NFACC29 (X axis). This illustrates the integration of the NFACC27 megaplasmid at approximately 4M BP .

From the assembled genomes there were 18 secondary replicons identified as 9 chromids, 7 megaplasmids, and 2 plasmids. While Variovorax sp.

56

NFACC27 did not contain an obvious secondary replicon, there is evidence to suggest an integrated megaplasmid. The integration site was identified by using closely related organism’s (NFACC26, NFACC27, NFACC29) megaplasmids to

BLASTn search the chromosome. To verify the integration site, both Nanopore and Illumina reads were aligned to see if there were reads that spanned from the chromosome, over the integration site and to the megaplasmid. There were short and long reads that spanned from the chromosome onto the megaplasmid, suggesting that this was not an artifact of sequencing or an assembly error. To further show this point the dotplot in Figure 9 was constructed between

NFACC27 and NFACC29, we can see the inversion designating the megaplasmid right around 4Mbp (boxed).

57

Percent G+C Content

Figure 10. Percent G+C Differences between Replicons Depicts the difference in percent G+C content of replicons sequenced in this work. For each type of secondary replicon, percent G+C was compared against the primary chromosome of the organism. The sample size for this graph is 9 chromids (red), 7 megaplasmids (green) and 2 plasmids (blue).

The ΔG+C contents of plasmids (mean= 6.07 +/- 2.89) are more broadly distributed, while the megaplasmids (mean=2.75 +/- 0.18) and chromids

(mean=.80 +/-.28) were distributed more narrowly (Figure 10). The chromids and megaplasmids cluster together with themselves, but there is still stubstanial divergance between the two. In Variovorax, the G+C contents of the

58

chromosomes where always higher than those of the secondary replicons. This is consistent with larger studies where the G+C contents of plasmids are usually lower than the chromosomes, even in gammaproteobacteria and spirochaetes which has overall very low G+C contents (Harrison et al. 2010). This does imply that the lower G+C content is not just because they are foreign DNA or more/less recently aquired (26). It is likely that the secondary replicons have an intrinsically different equilibrium base which will never fully ameliorate (Harrison et al. 2010).

59

Dinucleotide Relative Abundance Distance

Figure 11. Dinucleotide Relative Abundance Distance between Replicons Depicts the DRA distance for each replicon in this work. DRA was calculated, between the chromosome and the secondary replicon and differences between the two are shown. Lower DRA distance indicates that the secondary replicon is more similar to that of the chromosome. The sample size for this graph consisted of 9 chromids (red), 7 megaplasmids (green) and 2 plasmids (blue).

DRA values have been shown to be reflective of bacterial phylogeny, and thus comparing the DRA values of secondary and primary replicons can correlate to evolutionary distance. Within Figure 11, the DRA distance follows a similar pattern to that of G+C content. With the plasmids (mean=.72 +/- .229) having a

60

broad distribution while the megaplasmids (mean=.39 +/-0.017) and chromids

(mean=.33+/-.05) cluster together more narrowly. The difference between plasmids and chromosomes is considerably larger than that of the chromid/megaplasmid. While the DRA values of the chromids/megaplasmids are more similar to that of the chromosome. There is some overlap between that of the chromids and megaplasmids. This shows that over evolutionary time the secondary replicons are ameliorating their DRA values to that of the chromosome. When transitioning to megaplasmids followed by chromids, the secondary replicons are becoming more closely associated with the chromosome.

61

Synonymous Codon Usage Order

Figure 12. Synonymous Codon Usage Order between Replicons Depicts the difference in SCUO between the secondary replicon and chromosome. Higher SCOU values indicate high codon usage biases. The sample size for this graph consisted of 9 chromids (red), 7 megaplasmids (green) and 2 plasmids (blue).

Synonymous codon usage order values can be influenced by both evolutionary history and environmental factors related to the process of translation. The SCUO values observed follow the previous two patterns (G+C and DRA) with the plasmids (mean=.05 +/- .019), having a broad distribution

62

while the megaplasmid (mean=.027 +/- .002) and chromid (mean= .0099 +/-.002) cluster together (Figure 12). This indicates that the chromids and megaplasmids are more closely associated with the chromosome, implying that they have spent a longer time ameliorating within their host. While we see a larger divergence with megaplasmids, we do see partial amelioration with the chromosomal values.

Some of the plasmids though have a much larger divergence than either chromids or megaplasmids suggesting they have spent less time within their host.

Secondary Replicon Sizes

Figure 13. Length of Secondary Replicon Comparison Depicts the size distribution of the secondary replicons sequenced in this work. For each type of replicon, the count() feature was used to count the number of “A”, “T”, “G” and “C” present in the secondary replicons. The sample size for this graph consisted of 9 chromids (red), 7 megaplasmids (green) and 2 plasmids (blue).

63

The size distribution of the secondary replicons is shown in Figure 13.

The smallest chromid is at least twice the size of the largest megaplasmid. This supports the idea that there is a large expansion of the megaplasmid as it transitions into a chromid. While the plasmids and megaplasmids cluster together, there is a considerable gap in size between the largest megaplasmid and the smallest chromids. Looking at the patterns shown in Figures 10-12, we observe that as the replicon increases in size, the G+C content, DRA and SCOU start to ameliorate with that of the chromosome, this is a pattern that is consistent across the 3 genomic frequencies characteristics.

What we have observed in this genomic frequency signature data that there is a pattern of evolution where a plasmid is picked up by the organism and over evolutionary time, there is expansion of the replicon in size through acquisition of mobile elements and interreplicon transfer. As this process goes on, the secondary replicons’ genomic frequency characteristics regress toward the primary replicon. The mechanisms underlying this process of amelioration to the primary chromosome likely include interreplicon transfer, selection for improved translation efficiency and mutation biases of cellular machinery. Some of the regression we are seeing may be the result of selection forces acting to reduce the cost of the secondary replicon to the cell thus reducing the likelihood

64

of bulk DNA loss of the secondary replicon. Helping to integrate the new functions the secondary replicon brings into existing transcriptional/biochemical pathways.

Average Nucleotide Idenity

Figure 14. ANI Cluster Matrix The ANI of the genomes assembled in this project along with several other key Variovorax. Included in this matrix is Variovorax Boronicumulans J1 (species outgroup) Comamonas (genus outgroup) which both function as outgroups. This data is complete genomes vs complete genomes for all samples.

65

Figure 15. ANI Cluster Map ANI cluster matrix built from the ANI data matrix with cells containing chromids (red), megaplasmids (green), plasmids (blue), and both a chromid/plasmid (purple). It is important to note that an ANI cluster matrix is not that same as a phylogenetic tree. Those not highlighted do not contain a secondary replicon or function as an outgroup as is the case with Boronicumulans J1 (species outgroup) Comamonas (genus outgroup).

We used pairwise ANI across all of our strains to evaluate relatedness

(Figure 14). There are several organisms classified as V. paradoxus that fall below the species delineation cutoff at 95-96% when compared to other organisms classified as being in this species (EPS, VAI-C, B4). The cluster map built from this data (Figure 15) further reinforces the idea that species delineation

66

within the genus is murky. Figure 15 shows that many of the organisms which were previously listed do not cluster well with the others of their species. We further observe from this data that the organisms carrying similar replicon types do cluster together. Figure 15 shows that the genomic architectural diversity observed is likely from multiple different HGT events and not a single HGT event and diversification. These last two points are further reinforced by the marker based phylogenetic tree in Figure 16, where there is similar clustering in the phylogenetic tree to that of ANI cluster map in Figure 15. This suggests as we theorized that there is a likely connection between the uptake of a secondary replicon and phylogeny.

67

Phylogenetics

Figure 16. Marker Based Phylogenetic Tree Built using Maximum Likelihood Marker based phylogenetic tree was built with 203 genes from a Betaproteobacteria .HMM3 file. With cells containing a chromids (red), megaplasmids (green), plasmids (blue), and both a chromid/plasmid (purple). Those not highlighted do not contain a secondary replicon or function as an outgroup as is the case with Boronicumulans J1 (species outgroup) Comamonas (genus outgroup).

Aim 2

Given the evolutionary pattern seen in Aim 1, I was interested to find out if the high levels of heterogeneity present in Variovorax are due to factors controlling secondary replicon maintenance present within the primary

68

chromosome. Using two strains of Variovorax paradoxus strains, VAI-C

(multipartite genome: chromosome, chromid, and plasmid) and EPS (single chromosome) and two plasmids pRU1105 and pBBR5pemIKpBAD, we evaluated the maintenance of plasmids in the absence of selection. Both plasmids contain a gentamicin resistance marker, but the pBBR5pemIKpbad plasmid (Burbank and Wei 2019) was selected because it has been shown to be maintained in some environmental isolates without selection. Our expectation was that this plasmid would be maintained because of the toxin-antitoxin system activity of the pemIK locus gene products.

69

Growth over 16 days

Figure 17. Optical Density 546nm over a Period of 16 days Optical Density 546nm readings representing growth over the 16 day experiment from each group. In part (A.) EPS+pR51105, (B.) VAI-C+pRu1105, (C.) EPS+pBBR5pemIKpBAD, (D.) VAI-C+ pBBR5pemIKpBAD. For each organism, the control (red), exp. #1 (blue), exp. #2 (green), exp. #3 (purple), exp. #4 (brown) are depicted. This data is at the limits of the spectrophotometer, do not interpret literally.

The growth characteristics of the four strain constructs are shown in

Figure 17. The organisms that contained the pBBR5pemIKpBAD plasmid had trouble growing for the first few days of the experiment. This may be attributed to the presence of the pemIK TA system, requiring both organisms to adapt which slowed down growth. After a few days to adapt all four strains where able to

70

reach saturation. Starting at day 10 organisms, that contained the pemIK TA system started to show new colony morphologies (not shown). From day 10 to day 16 two additional colony morphology phenotypes were detected and recorded. The plasmid maintenance data illustrates that there are differences in this phenotype based both on the strain and the plasmid (Figure 18). Neither plasmid is well maintained in strain VAI-C, while EPS was able to maintain both of them at a higher percentage of the population for a longer period of time after selection was removed. Surprisingly, the TA system had a deleterious effect and showed no additional phenotype in either organism (Figure 18 C and D). Neither organism was able to maintain the pBBR5pemIKpBAD plasmid to a high degree, the percentage of the population retaining the plasmid dropping to less than a fraction of a percent almost immediately. This, combined with the growth issues seen in the early days of the experiment suggest that the pemIK TA system in

Variovorax is deleterious. This suggests that there are genomic determinants of plasmid stability, and that the presence of TA system genes on the secondary replicons can be determinants of stability. It remains to be determined if the plasmid stabilization effect is related to genome size, incompatibility factors, or other genetic determinants of replication, segregation, or plasmid maintenance.

71

Plasmid Retention over Time

Figure 18. Percentage of Population Retaining the Plasmids over 16 days Showing the percentage of the population retaining each plasmid over the 16-day experiment. These graphs represent the colony counts done every 3 days, the percentage population was done by colony counts (YE-Gm/YE) x 100. In part (A.) EPS+pR51105, (B.) VAI-C+pRu1105, (C.) EPS+pBBR5pemIKpBAD, (D.) VAI-C+ pBBR5pemIKpBAD. For each organism, the control (red), exp. #1 (blue), exp. #2 (green), exp. #3 (purple), exp. #4 (brown) are depicted.

72

CHAPTER FOUR

CONCLUSION

With these 17 genomes assemble we report here, illustrate the importance of completing genome assemblies and not just leave them as draft contigs on the

NCBI database. Given the low cost and ease of use of new 3rd generation sequencers, we have shown that there is valuable information to be extracted from complete assemblies. With the 17 assembled Variovorax genomes we have shown that there is extensive heterogeneity which suggests an evolutionary pattern. This pattern we observed is consist with plasmid acquisition and expansion over evolutionary time. During this expansion, the mean genomic frequencies are regressing to that of the chromosome. Uptake of a secondary replicon seems to be a rare event but as we have shown it does suggest a likely connection between the uptake of a secondary replicon and phylogeny. The experimental evolution data showed that EPS, the single chromosome species, did maintain the plasmids better than multipartite VAI-C, but this data does not addres the mechanism of that difference. There are many factors that could have influence this data and further research is required in this area. This data suggest that the deleterious effect the TA system had on both organisms. It does suggest that while TA systems can be genomic determinatants of plasmid stability, the effect might be related to other factors found throughout the genome. The data suggested here is consistent with genomic architecture being repeated to environment, chromosomal determinants and secondary replicons factors.

73

REFERENCES

Afgan E, Baker D, Batut B, van den Beek M, Bouvier D, Cech M, Chilton J, Clements D,

Coraor N, Grüning BA, et al. 2018. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res. 46(W1):W537–

W544. doi:10.1093/nar/gky379.

Agnoli K, Schwager S, Uehlinger S, Vergunst A, Viteri DF, Nguyen DT, Sokol PA, Carlier

A, Eberl L. 2012. Exposing the third chromosome of Burkholderia cepacia complex strains as a virulence plasmid. Mol Microbiol. 83(2):362–378. doi:10.1111/j.1365-

2958.2011.07937.x.

Almpanis A, Swain M, Gatherer D, McEwan N. 2018. Correlation between bacterial G+C content, genome size and the G+C content of associated plasmids and bacteriophages.

Microb Genomics. 4(4):e000168. doi:10.1099/mgen.0.000168.

Andrews S. Babraham Bioinformatics - FastQC A Quality Control tool for High

Throughput Sequence Data. [accessed 2021 Jul 21]. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.

Baril C, Richaud C, Baranton G, Saint Girons IS. 1989. Linear chromosome of Borrelia burgdorferi. Res Microbiol. 140(8):507–516. doi:10.1016/0923-2508(89)90083-1.

Belimov AA, Dodd IC, Hontzeas N, Theobald JC, Safronova VI, Davies WJ. 2009.

Rhizosphere bacteria containing 1-aminocyclopropane-1-carboxylate deaminase increase yield of plants grown in drying soil via both local and systemic hormone signalling. New Phytol. 181(2):413–423. doi:10.1111/j.1469-8137.2008.02657.x.

74

Belimov AA, Hontzeas N, Safronova VI, Demchinskaya SV, Piluzza G, Bullitta S, Glick

BR. 2005. Cadmium-tolerant plant growth-promoting bacteria associated with the roots of Indian mustard (Brassica juncea L. Czern.). Soil Biol Biochem. 37(2):241–250. doi:10.1016/j.soilbio.2004.07.033.

Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 30(15):2114–2120. doi:10.1093/bioinformatics/btu170.

Brettin T, Davis JJ, Disz T, Edwards RA, Gerdes S, Olsen GJ, Olson R, Overbeek R,

Parrello B, Pusch GD, et al. 2015. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes. Sci Rep. 5(1):8365. doi:10.1038/srep08365.

Brom S, Girard L, García-de los Santos A, Sanjuan-Pinilla JM, Olivares J, Sanjuan J.

2002. Conservation of Plasmid-Encoded Traits among Bean-Nodulating Rhizobium

Species. Appl Environ Microbiol. 68(5):2555–2561. doi:10.1128/AEM.68.5.2555-

2561.2002.

Burbank L, Wei W. 2019. Broad-Host-Range Plasmids for Constitutive and Inducible

Gene Expression in the Absence of Antibiotic Selection. Microbiol Resour Announc.

8(36):e00769-19. doi:10.1128/MRA.00769-19.

Cairns J. 1963. The bacterial chromosome and its manner of replication as seen by autoradiography. J Mol Biol. 6:208–213. doi:10.1016/s0022-2836(63)80070-4.

75

Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. 2009. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 25(15):1972–

1973. doi:10.1093/bioinformatics/btp348.

Carbajal-Rodríguez I, Stöveken N, Satola B, Wübbeler JH, Steinbüchel A. 2011. Aerobic

Degradation of Mercaptosuccinate by the Gram-Negative Bacterium Variovorax paradoxus Strain B4. J Bacteriol. 193(2):527–539. doi:10.1128/JB.00793-10.

Carbone A, Zinovyev A, Képès F. 2003. Codon adaptation index as a measure of dominating codon bias. Bioinforma Oxf Engl. 19(16):2005–2015. doi:10.1093/bioinformatics/btg272.

Chain PSG, Denef VJ, Konstantinidis KT, Vergez LM, Agulló L, Reyes VL, Hauser L,

Córdova M, Gómez L, González M, et al. 2006. Burkholderia xenovorans LB400 harbors a multi-replicon, 9.73-Mbp genome shaped for versatility. Proc Natl Acad Sci.

103(42):15280–15287. doi:10.1073/pnas.0606924103.

Charif D, Lobry JR. 2007. SeqinR 1.0-2: A Contributed Package to the R Project for

Statistical Computing Devoted to Biological Sequences Retrieval and Analysis. In:

Bastolla U, Porto M, Roman HE, Vendruscolo M, editors. Structural Approaches to

Sequence Evolution: Molecules, Networks, Populations. Berlin, Heidelberg: Springer.

(Biological and Medical Physics, Biomedical Engineering). p. 207–232. [accessed 2021

Jul 21]. https://doi.org/10.1007/978-3-540-35306-5_10.

Choudhary M, Cho H, Bavishi A, Trahan C, Myagmarjav B-E. 2012. Evolution of

Multipartite Genomes in Prokaryotes. In: Pontarotti P, editor. Evolutionary Biology:

76

Mechanisms and Trends. Berlin, Heidelberg: Springer. p. 301–323. [accessed 2021 Jul

21]. https://doi.org/10.1007/978-3-642-30425-5_17.

Choudhary M, Zanhua X, Fu YX, Kaplan S. 2007. Genome Analyses of Three Strains of

Rhodobacter sphaeroides: Evidence of Rapid Evolution of Chromosome II. J Bacteriol.

189(5):1914–1921. doi:10.1128/JB.01498-06.

Cooper VS, Vohr SH, Wrocklage SC, Hatcher PJ. 2010. Why Genes Evolve Faster on

Secondary Chromosomes in Bacteria. PLoS Comput Biol. 6(4):e1000732. doi:10.1371/journal.pcbi.1000732.

Crick FHC. 1966. Codon—anticodon pairing: The wobble hypothesis. J Mol Biol.

19(2):548–555. doi:10.1016/S0022-2836(66)80022-0.

Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane

T, McCarthy SA, Davies RM, et al. 2021. Twelve years of SAMtools and BCFtools.

GigaScience. 10(2). doi:10.1093/gigascience/giab008. [accessed 2021 Jul 21]. https://doi.org/10.1093/gigascience/giab008. diCenzo GC, Finan TM. 2017. The Divided Bacterial Genome: Structure, Function, and

Evolution. Microbiol Mol Biol Rev MMBR. 81(3):e00019-17. doi:10.1128/MMBR.00019-

17. diCenzo GC, MacLean AM, Milunovic B, Golding GB, Finan TM. 2014. Examination of

Prokaryotic Multipartite Genome Evolution through Experimental Genome Reduction.

PLOS Genet. 10(10):e1004742. doi:10.1371/journal.pgen.1004742.

77

Dougherty K, Smith BA, Moore AF, Maitland S, Fanger C, Murillo R, Baltrus DA. 2014.

Multiple phenotypic changes associated with large-scale horizontal gene transfer. PloS

One. 9(7):e102170. doi:10.1371/journal.pone.0102170.

Eddy SR. 2011. Accelerated Profile HMM Searches. PLOS Comput Biol.

7(10):e1002195. doi:10.1371/journal.pcbi.1002195.

Edgar RC. 2004. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 5:113. doi:10.1186/1471-2105-5-113.

Egan ES, Fogel MA, Waldor MK. 2005. Divided genomes: negotiating the cell cycle in prokaryotes with multiple chromosomes. Mol Microbiol. 56(5):1129–1138. doi:10.1111/j.1365-2958.2005.04622.x.

Elek A, Kuzman M, Vlahovicek k. 2021. coRdon. Bioconductor. http://bioconductor.org/packages/coRdon/.

Ferdows MS, Barbour AG. 1989. Megabase-sized linear DNA in the bacterium Borrelia burgdorferi, the Lyme disease agent. Proc Natl Acad Sci U S A. 86(15):5969–5973. doi:10.1073/pnas.86.15.5969.

Finan TM, Kunkel B, De Vos GF, Signer ER. 1986. Second symbiotic megaplasmid in

Rhizobium meliloti carrying exopolysaccharide and thiamine synthesis genes. J

Bacteriol. 167(1):66–72.

Finkel OM, Salas-González I, Castrillo G, Law TF, Conway JM, Jones CD, Dangl JL.

2019 May 23. Root development is maintained by specific bacteria-bacteria interactions within a complex microbiome. bioRxiv.:645655. doi:10.1101/645655.

78

Finkel SE, Kolter R. 2001. DNA as a nutrient: novel role for bacterial competence gene homologs. J Bacteriol. 183(21):6288–6293. doi:10.1128/JB.183.21.6288-6293.2001.

Foerstner KU, von Mering C, Hooper SD, Bork P. 2005. Environments shape the nucleotide composition of genomes. EMBO Rep. 6(12):1208–1213. doi:10.1038/sj.embor.7400538.

Fraikin N, Goormaghtigh F, Van Melderen L. 2020. Type II Toxin-Antitoxin Systems:

Evolution and Revolutions. J Bacteriol. 202(7):e00763-19. doi:10.1128/JB.00763-19.

Fricke WF, Kusian B, Bowien B. 2009. The genome organization of Ralstonia eutropha strain H16 and related species of the Burkholderiaceae. J Mol Microbiol Biotechnol.

16(1–2):124–135. doi:10.1159/000142899.

Galardini M, Pini F, Bazzicalupo M, Biondi EG, Mengoni A. 2013. Replicon-dependent bacterial genome evolution: the case of Sinorhizobium meliloti. Genome Biol Evol.

5(3):542–558. doi:10.1093/gbe/evt027.

Gerdes K, Rasmussen PB, Molin S. 1986. Unique type of plasmid maintenance function: postsegregational killing of plasmid-free cells. Proc Natl Acad Sci U S A. 83(10):3116–

3120.

GitHub - rrwick/Filtlong: quality filtering tool for long reads. GitHub. [accessed 2021 Jul

21]. https://github.com/rrwick/Filtlong.

H L. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinforma Oxf Engl.

34(18). doi:10.1093/bioinformatics/bty191. [accessed 2021 Jul 21]. https://pubmed.ncbi.nlm.nih.gov/29750242/.

79

Hall AM, Gollan B, Helaine S. 2017. Toxin–antitoxin systems: reversible toxicity. Curr

Opin Microbiol. 36:102–110. doi:10.1016/j.mib.2017.02.003.

Han J-I, Choi H-K, Lee S-W, Orwin PM, Kim J, Laroe SL, Kim T-G, O’Neil J, Leadbetter

JR, Lee SY, et al. 2011. Complete genome sequence of the metabolically versatile plant growth-promoting endophyte Variovorax paradoxus S110. J Bacteriol. 193(5):1183–

1190. doi:10.1128/JB.00925-10.

Harrison PW, Lower RPJ, Kim NKD, Young JPW. 2010. Introducing the bacterial

‘chromid’: not a chromosome, not a plasmid. Trends Microbiol. 18(4):141–148. doi:10.1016/j.tim.2009.12.010.

Hayakawa T, Tanaka T, Sakaguchi K, Otake N, Yonehara H. 1979. A LINEAR

PLASMID-LIKE DNA IN STREPTOMYCES SP. PRODUCING LANKACIDIN GROUP

ANTIBIOTICS. J Gen Appl Microbiol. 25(4):255–260. doi:10.2323/jgam.25.255.

Hershberg R, Petrov DA. 2010. Evidence That Mutation Is Universally Biased towards

AT in Bacteria. PLoS Genet. 6(9):e1001115. doi:10.1371/journal.pgen.1001115.

Hildebrand F, Meyer A, Eyre-Walker A. 2010. Evidence of Selection upon Genomic GC-

Content in Bacteria. PLOS Genet. 6(9):e1001107. doi:10.1371/journal.pgen.1001107.

Hülter N, Ilhan J, Wein T, Kadibalban AS, Hammerschmidt K, Dagan T. 2017. An evolutionary perspective on plasmid lifestyle modes. Curr Opin Microbiol. 38:74–80. doi:10.1016/j.mib.2017.05.001.

80

Hyatt D, LoCascio PF, Hauser LJ, Uberbacher EC. 2012. Gene and translation initiation site prediction in metagenomic sequences. Bioinformatics. 28(17):2223–2230. doi:10.1093/bioinformatics/bts429.

Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S. 2018. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries.

Nat Commun. 9(1):5114. doi:10.1038/s41467-018-07641-9.

Janczak M, Hyz K, Bukowski M, Lyzen R, Hydzik M, Wegrzyn G, Szalewska-Palasz A,

Grudnik P, Dubin G, Wladyka B. 2020. Chromosomal localization of PemIK toxin- antitoxin system results in the loss of toxicity - Characterization of pemIKSa1-Sp from

Staphylococcus pseudintermedius. Microbiol Res. 240:126529. doi:10.1016/j.micres.2020.126529.

Karlin S, Burge C. 1995. Dinucleotide relative abundance extremes: a genomic signature. Trends Genet TIG. 11(7):283–290. doi:10.1016/s0168-9525(00)89076-9.

Karlin S, Ladunga I, Blaisdell BE. 1994. Heterogeneity of genomes: measures and values. Proc Natl Acad Sci U S A. 91(26):12837–12841. doi:10.1073/pnas.91.26.12837.

Kontur WS, Schackwitz WS, Ivanova N, Martin J, Labutti K, Deshpande S, Tice HN,

Pennacchio C, Sodergren E, Weinstock GM, et al. 2012. Revised sequence and annotation of the Rhodobacter sphaeroides 2.4.1 genome. J Bacteriol. 194(24):7016–

7017. doi:10.1128/JB.01214-12.

81

Lassalle F, Périan S, Bataillon T, Nesme X, Duret L, Daubin V. 2015. GC-Content

Evolution in Bacterial Genomes: The Biased Gene Conversion Hypothesis Expands.

PLOS Genet. 11(2):e1004941. doi:10.1371/journal.pgen.1004941.

Lee MD. 2019. GToTree: a user-friendly workflow for phylogenomics. Bioinformatics.

35(20):4162–4164. doi:10.1093/bioinformatics/btz188.

Lenski RE, Rose MR, Simpson SC, Tadler SC. 1991. Long-Term Experimental Evolution in Escherichia coli. I. Adaptation and Divergence During 2,000 Generations. Am Nat.

138(6):1315–1341.

LeRoux M, Culviner PH, Liu YJ, Littlehale ML, Laub MT. 2020. Stress Can Induce

Transcription of Toxin-Antitoxin Systems without Activating Toxin. Mol Cell. 79(2):280-

292.e8. doi:10.1016/j.molcel.2020.05.028.

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G,

Durbin R, 1000 Genome Project Data Processing Subgroup. 2009. The Sequence

Alignment/Map format and SAMtools. Bioinforma Oxf Engl. 25(16):2078–2079. doi:10.1093/bioinformatics/btp352.

Ma D, Mandell JB, Donegan NP, Cheung AL, Ma W, Rothenberger S, Shanks RMQ,

Richardson AR, Urish KL. 2019. The Toxin-Antitoxin MazEF Drives Staphylococcus aureus Biofilm Formation, Antibiotic Tolerance, and Chronic Infection. mBio.

10(6):e01658-19. doi:10.1128/mBio.01658-19.

Mackenzie C, Choudhary M, Larimer FW, Predki PF, Stilwagen S, Armitage JP, Barber

RD, Donohue TJ, Hosler JP, Newman JE, et al. 2001. The home stretch, a first analysis

82

of the nearly completed genome of Rhodobacter sphaeroides 2.4.1. Photosynth Res.

70(1):19–41. doi:10.1023/A:1013831823701.

Magnuson RD. 2007. Hypothetical Functions of Toxin-Antitoxin Systems. J Bacteriol.

189(17):6089–6092. doi:10.1128/JB.00958-07.

Mann S, Chen Y-PP. 2010. Bacterial genomic G+C composition-eliciting environmental adaptation. Genomics. 95(1):7–15. doi:10.1016/j.ygeno.2009.09.002.

Mauchline TH, Fowler JE, East AK, Sartor AL, Zaheer R, Hosie AHF, Poole PS, Finan

TM. 2006. Mapping the Sinorhizobium meliloti 1021 solute-binding protein-dependent transportome. Proc Natl Acad Sci. 103(47):17933–17938. doi:10.1073/pnas.0606673103.

Mora M, Mell JC, Ehrlich GD, Ehrlich RL, Redfield RJ. 2020 Dec 27. Genome-wide analysis of DNA uptake across the outer membrane of naturally competent Haemophilus influenzae. bioRxiv.:866426. doi:10.1101/866426.

Morton ER, Merritt PM, Bever JD, Fuqua C. 2013. Large Deletions in the pAtC58

Megaplasmid of Agrobacterium tumefaciens Can Confer Reduced Carriage Cost and

Increased Expression of Virulence Genes. Genome Biol Evol. 5(7):1353–1364. doi:10.1093/gbe/evt095.

Ne Ville C, Dodsworth J, Orwin P. 2019 Jul 12. Three High G+C soil proteobacterial genomes assembled using a hybrid approach. Nanopore Tech. https://nanoporetech.com/resource-centre/three-high-gc-soil-proteobacterial-genomes- assembled-using-hybrid-approach.

83

Nereng KS, Kaplan S. 1999. Genomic complexity among strains of the facultative photoheterotrophic bacterium Rhodobacter sphaeroides. J Bacteriol. 181(5):1684–1688. doi:10.1128/JB.181.5.1684-1688.1999.

Orlova N, Gerding M, Ivashkiv O, Olinares PDB, Chait BT, Waldor MK, Jeruzalmi D.

2017. The replication initiator of the cholera pathogen’s second chromosome shows structural similarity to plasmid initiators. Nucleic Acids Res. 45(7):3724–3737. doi:10.1093/nar/gkw1288.

Page R, Peti W. 2016. Toxin-antitoxin systems in bacterial growth arrest and persistence. Nat Chem Biol. 12(4):208–214. doi:10.1038/nchembio.2044. van Passel MW, Bart A, Luyf AC, van Kampen AH, van der Ende A. 2006.

Compositional discordance between prokaryotic plasmids and host chromosomes. BMC

Genomics. 7:26. doi:10.1186/1471-2164-7-26.

Price MN, Dehal PS, Arkin AP. 2010. FastTree 2 – Approximately Maximum-Likelihood

Trees for Large Alignments. PLOS ONE. 5(3):e9490. doi:10.1371/journal.pone.0009490.

Prozorov AA. 2008. Additional chromosomes in bacteria: Properties and origin.

Microbiology. 77(4):385–394. doi:10.1134/S0026261708040012.

Quax TEF, Claassens NJ, Söll D, van der Oost J. 2015. Codon Bias as a Means to Fine-

Tune Gene Expression. Mol Cell. 59(2):149–161. doi:10.1016/j.molcel.2015.05.035.

Quick J, Loman NJ. 2018. Nanopore Sequencing Book: DNA extraction and purification methods.

84

Ramachandran R, Jha J, Paulsson J, Chattoraj D. 2017. Random versus Cell Cycle-

Regulated Replication Initiation in Bacteria: Insights from Studying Vibrio cholerae

Chromosome 2. Microbiol Mol Biol Rev MMBR. 81(1):e00033-16. doi:10.1128/MMBR.00033-16.

Rambaut A, Suchard M, Nenarokov S, Klötzl F. 2018. FigTree. https://github.com/rambaut/figtree.

Rausch T, Hsi-Yang Fritz M, Korbel JO, Benes V. 2019. Alfred: interactive multi-sample

BAM alignment statistics, feature counting and feature annotation for long- and short- read sequencing. Bioinforma Oxf Engl. 35(14):2489–2491. doi:10.1093/bioinformatics/bty1007.

Rodriguez-R LM, Konstantinidis KT. 2016. The enveomics collection: a toolbox for specialized analyses of microbial genomes and metagenomes. PeerJ Inc. Report No.: e1900v1. [accessed 2021 Jul 21]. https://peerj.com/preprints/1900.

Romanchuk A, Jones CD, Karkare K, Moore A, Smith BA, Jones C, Dougherty K,

Baltrus DA. 2014. Bigger is not always better: Transmission and fitness burden of ∼1MB

Pseudomonas syringae megaplasmid pMPPla107. Plasmid. 73:16–25. doi:10.1016/j.plasmid.2014.04.002.

Rosenberg C, Boistard P, Dénarié J, Casse-Delbart F. 1981. Genes controlling early and late functions in symbiosis are located on a megaplasmid in Rhizobium meliloti. Mol Gen

Genet MGG. 184(2):326–333. doi:10.1007/BF00272926.

85

Saavedra De Bast M, Mine N, Van Melderen L. 2008. Chromosomal Toxin-Antitoxin

Systems May Act as Antiaddiction Modules. J Bacteriol. 190(13):4603–4609. doi:10.1128/JB.00357-08.

Satola B, Wübbeler JH, Steinbüchel A. 2013. Metabolic characteristics of the species

Variovorax paradoxus. Appl Microbiol Biotechnol. 97(2):541–560. doi:10.1007/s00253-

012-4585-z.

Söll D, Jones DS, Ohtsuka E, Faulkner RD, Lohrmann R, Hayatsu H, Khorana HG,

Cherayil JD, Hampel A, Bock RM. 1966. Specificity of sRNA for recognition of codons as studied by the ribosomal binding technique. J Mol Biol. 19(2):556–573. doi:10.1016/S0022-2836(66)80023-2.

Song S, Wood TK. 2018. Post-segregational Killing and Phage Inhibition Are Not

Mediated by Cell Death Through Toxin/Antitoxin Systems. Front Microbiol. 9:814. doi:10.3389/fmicb.2018.00814.

Song S, Wood TK. 2020. A Primary Physiological Role of Toxin/Antitoxin Systems Is

Phage Inhibition. Front Microbiol. 11:1895. doi:10.3389/fmicb.2020.01895.

Suwanto A, Kaplan S. 1989. Physical and genetic mapping of the Rhodobacter sphaeroides 2.4.1 genome: presence of two unique circular chromosomes. J Bacteriol.

171(11):5850–5859. doi:10.1128/jb.171.11.5850-5859.1989.

Tang L. 2020. of Bacteria and Archaea. Nat Methods. 17(6):562–562. doi:10.1038/s41592-020-0863-3.

Tange O. 2018. GNU Parallel 2018|. Lulu.com.

86

Thorvaldsdóttir H, Robinson JT, Mesirov JP. 2013. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform.

14(2):178–192. doi:10.1093/bib/bbs017.

Unterholzner SJ, Poppenberger B, Rozhon W. 2013. Toxin–antitoxin systems. Mob

Genet Elem. 3(5):e26219. doi:10.4161/mge.26219.

Varghese NJ, Mukherjee S, Ivanova N, Konstantinidis KT, Mavrommatis K, Kyrpides

NC, Pati A. 2015. Microbial species delineation using whole genome sequences. Nucleic

Acids Res. 43(14):6761–6771. doi:10.1093/nar/gkv657.

Větrovský T, Baldrian P. 2013. The Variability of the 16S rRNA Gene in Bacterial

Genomes and Its Consequences for Bacterial Community Analyses. PLOS ONE.

8(2):e57923. doi:10.1371/journal.pone.0057923.

Wake RG. 1973. Circularity of the Bacillus subtilis chromosome and further studies on its bidirectional replication. J Mol Biol. 77(4):569–575. doi:10.1016/0022-2836(73)90223-4.

Wick RR, Judd LM, Gorrie CL, Holt KE. 2017a. Completing bacterial genome assemblies with multiplex MinION sequencing. Microb Genomics. 3(10):e000132. doi:10.1099/mgen.0.000132.

Wick RR, Judd LM, Gorrie CL, Holt KE. 2017b. Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads. PLoS Comput Biol. 13(6):e1005595. doi:10.1371/journal.pcbi.1005595.

87

Wick RR, Judd LM, Holt KE. 2018. Deepbinner: Demultiplexing barcoded Oxford

Nanopore reads with deep convolutional neural networks. PLOS Comput Biol.

14(11):e1006583. doi:10.1371/journal.pcbi.1006583.

Wickham H. 2009. ggplot2: Elegant Graphics for Data Analysis. New York: Springer-

Verlag (Use R!). [accessed 2021 Jul 21]. https://www.springer.com/gp/book/9780387981413.

William S, Feil H, Copeland A. 2012. Bacterial genomic DNA isolation using CTAB. http://1ofdmq2n8tc36m6i46scovo2e.wpengine.netdna-cdn.com/wp- content/uploads/2014/02/JGI-Bacterial-DNA-isolation-CTAB-Protocol-2012.pdf.

Wong K, Finan TM, Golding GB. 2002. Dinucleotide compositional analysis of

Sinorhizobium meliloti using the genome signature: distinguishing chromosomes and plasmids. Funct Integr Genomics. 2(6):274–281. doi:10.1007/s10142-002-0068-0.

Wood TL, Wood TK. 2016. The HigB/HigA toxin/antitoxin system of Pseudomonas aeruginosa influences the virulence factors pyochelin, pyocyanin, and biofilm formation.

MicrobiologyOpen. 5(3):499–511. doi:10.1002/mbo3.346.

Wozniak RAF, Waldor MK. 2009. A Toxin–Antitoxin System Promotes the Maintenance of an Integrative Conjugative Element. PLOS Genet. 5(3):e1000439. doi:10.1371/journal.pgen.1000439.

Yamaguchi Y, Park J-H, Inouye M. 2011. Toxin-antitoxin systems in bacteria and archaea. Annu Rev Genet. 45:61–79. doi:10.1146/annurev-genet-110410-132412.

88

.

89