Astrobiology Science Conference 2015 (2015) 7059.pdf

GENE TRANSFER AGENTS: THEIR EVOLUTION AND POTENTIAL ROLE IN MICROBIAL COOPERATION. M. Shakya1, D. P. Birnbaum1,2, T. B. Neely1, and O. Zhaxybayeva1,3, 1Department of Biological Sciences, Dartmouth College, Hanover, NH 03755 2Present address: Broad Institute, Cambridge, MA 02142 3Department of Computer Science, Dartmouth College, Hanover, NH 03755 ([email protected])

The existence of cooperation is one of the greatest parts, traditional sequence similarity search methods puzzles in evolutionary biology. Cooperation allows (for example, BLAST) might not be able to differenti- individual organisms (or cells) to build complex multi- ate between the two types. Therefore, some of the de- cellular consortia that communicate with each other tected GTA homologs may simply represent prophages and share resources, and is likely to have evolved early (i.e., embedded in the host genomes). To ad- in Earth’s history. Genes could be viewed as a valuable dress this problem, we are developing a machine learn- to share, because they may encode traits that ing method that would be able to classify -like could increase a recipient's under certain envi- sequences in a prokaryotic genome as either "GTA" or ronmental conditions. Both and (pro- "prophage". Our approach is to use amino acid k-mers karyotes) routinely exchange genes via horizontal gene from known GTA genes and their viral homologs as transfer (HGT). However, if and how HGT affects a features (so-called "bag of words" model [2]) and a population of cooperating individuals is not known. support vector machine as a classification algorithm One specific mechanism of HGT, mediated by Gene [3]. Here, we are presenting the results of the method's Transfer Agents (GTAs), promises to provide insights cross-validation, as well as classification tests of into the evolution of cooperation. known α-proteobacterial prophages. GTAs are virus-like entities encoded within a host Given the wide distribution of RcGTA gene homo- genome, and are controlled by the host and environ- logs across α-proteobacterial orders, how early did mental factors (reviewed in [1]). However, unlike vi- GTA arise in the evolution of this bacterial class? In ruses, GTAs package seemingly random pieces of host other words, did the α-proteobacterial last common DNA. Moreover, GTA particles are too small to incor- ancestor already encode a GTA? Or is RcGTA a recent porate all of the genes necessary for their production, invention that has spread within the class via HGT? To precluding GTAs from easily propagating themselves address these questions, we are reconstructing evolu- across a population. When GTAs are produced, host tionary histories of GTA gene homologs throughout α- cell lyses and GTA particles are released. The particles proteobacteria, and comparing them to the histories of can deliver the packaged DNA to other cells, which α-proteobacterial core genes. In addition, we are inves- may help propagating novel adaptive traits throughout tigating biases in GC content, dinucleotide frequencies, the population. and codon usage of GTA genes in comparison to the α- To date, genetically unrelated GTAs have been proteobacterial core genes. found only in four divergent taxonomic groups of pro- If a GTA is advantageous to the host, its genes are karyotes: bacterial classes of α-proteobacteria, δ- expected to be under selective pressure. Moreover, this proteobacteria and Spirochaetia, and archaeal class pressure would be distinct from that acting on the pro- Methanococci. However, genes encoding GTAs have phage genes. We are examining α-proteobacterial GTA indisputable homology to the viral counterparts. More- genes for signatures of selection and comparing the over, the majority of prokaryotes harbor at least one inferred patterns to those observed in the conserved viral-like region in their genomes. This raises the pos- housekeeping genes, as well as in genes from α- sibility that GTAs might be more widespread among proteobacterial viruses and prophages. prokaryotes than currently perceived. Our work is bringing insights into the mode of Here, we trace origin and evolution of GTAs using propagation of GTA genes within α-proteobacteria, bacterial class of α-proteobacteria as a “model” sys- and ultimately it will help us understand how GTAs tem. One of the best-studied GTAs, RcGTA, is hosted originated and why they are maintained. by a free-living marine α-proteobacterium Rhodobac- ter capsulatus. A sequence similarity survey of 258 α- References: proteobacterial genomes revealed that many members [1] Lang, A. S., Zhaxybayeva, O., and Beatty, J. T. (2012) of α-proteobacterial orders have RcGTA gene homo- Nat Rev Micro, 10, 472-482. [2] Forman, G. (2003) The Journal of Machine Learning logs, but are not known to produce GTAs. Therefore, Research, 3, 1289-1305. GTAs may be more widespread than currently per- [3] Hearst, M. A., et al. (1998) Intelligent Systems and Their ceived. Alternatively, due to the shared ancestry be- Applications, IEEE, 13, 18-28. tween bona fide GTA genes and their viral counter-