Multi-Locus Sequence Analysis: Taking Prokaryotic Systematics to the Next Level11
Total Page:16
File Type:pdf, Size:1020Kb
CHAPTER Multi-locus Sequence Analysis: Taking Prokaryotic Systematics to the Next Level11 Xiaoying Rong, Ying Huang1 State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing, P.R. China 1Corresponding author: e-mail address: [email protected] 1 INTRODUCTION The study of genetic variation among prokaryotes is fundamental for understanding their evolution, and untangling their ecology and evolutionary theory-based system- atics. Recently, advances in DNA sequencing technology, the exploration of neglected habitats and the discovery of new species and products have forced micro- bial systematists to undertake comprehensive analyses of genetic variation within and between species and determine its impact on evolutionary relationships. 16S rRNA gene sequence analyses have enhanced our understanding of prokaryotic diversity and phylogeny, including that of non-culturable microorganisms (Doolittle, 1999; Head, Saunders, & Pickup, 1998; Mason et al., 2009). However, 16S rRNA gene phy- logenetic analyses have major drawbacks, notably providing insufficient resolution, especially for distinguishing between closely related species. In addition, there are well-documented cases in which individual prokaryotic organisms have been found to contain divergent 16S rRNA genes, thereby suggesting the potential for horizontal exchange of rRNA genes; such problems pose a challenge for reconstructing the evolutionary history of prokaryotic species and complicate efforts to classify them (Acinas, Marcelino, Klepac-Ceraj, & Polz, 2004; Cilia, Lafay, & Christen, 1996; Michon et al., 2010; Pei et al., 2010; Rokas, Williams, King, & Carroll, 2003). Multi-locus sequence analysis/typing (MLSA/MLST) is a nucleotide sequence- based approach for the unambiguous characterization of prokaryotes via the Internet, which directly characterizes DNA sequence variations in a set of housekeeping genes and evaluates relationships between strains based on their unique allelic profiles or sequences (Maiden, 2006). The method has been extensively used in the classification and identification of diverse bacteria, to determine the extent of gene exchange within and between species and to establish the relative importance of recombination in pop- ulation genetics (Doroghazi & Buckley, 2010; Alvarez-Perez, de Vega, & Herrera, 2013; Freel, Millan-Aguinaga, & Jensen, 2013). MLSA is providing new opportuni- ties to evaluate the status of bacterial taxa using patterns of genetic variation. Methods in Microbiology, Volume 41, ISSN 0580-9517, http://dx.doi.org/10.1016/bs.mim.2014.10.001 221 © 2014 Elsevier Ltd. All rights reserved. 222 CHAPTER 11 Multi-locus Sequence Analysis Actinobacteria are well known for their ability to produce diverse bioactive me- tabolites of commercial significance and to produce signalling molecules of ecolog- ical value (Be´rdy, 2005, 2012; Vetsigian, Jajoo, & Kishony, 2011). Although once considered primarily as soil bacteria, it is now known that actinobacteria are widely distributed in diverse natural habitats where they show a high degree of species and functional diversity (Bull, 2010). Since the discovery of streptomycin from Strepto- myces griseus (Schatz, Bugie, & Waksman, 1944), actinobacteria, notably strepto- mycetes, have been isolated from innumerable habitats and screened for bioactive compounds. In the case of the genus Streptomyces, such studies have led to the pub- lication of over 600 validly named species that pose significant taxonomic chal- lenges (Ka¨mpfer, 2012; Labeda et al., 2012). In this chapter, we summarize the practical procedures and the implementation of MLSA to species assignments of representative taxa within the genus Streptomyces of the phylum Actinobacteria, and how MLSA results are compared with and vali- dated against the current benchmark of DNA–DNA hybridization (DDH) values. We also describe how MLSA could be used to construct a sound framework that bridges taxonomic relationships and ecotype-level functional diversity within prokaryotic species. 2 MULTI-LOCUS SEQUENCE ANALYSIS 2.1 UNDERLYING CONCEPTS MLSA was developed from the application of the MLST method in order to recon- struct evolutionary relationships between prokaryotes (Sawabe, Kita-Tsukamoto, & Thompson, 2007; Vinuesa et al., 2008; Wagner, Varghese, Hemme, & Wiegel, 2013). Comparative MLST is based on sequencing 450–500 bp fragments of five to seven housekeeping genes that provide information on the spread of nucleotide divergence across the chromosomes of sampled populations. Sequences that differ by even a single nucleotide for each gene are assigned as different alleles, thus mak- ing MLSA highly suitable for detecting genetic changes within and between species. Phylogenetic analysis using multi-locus sequences involves several steps: (1) se- lection of strains and housekeeping genes, (2) generation of sequences (polymerase chain reaction (PCR) amplification and DNA sequencing), (3) data analysis to iden- tify homologous sites within each gene and (4) MLSA using concatenated sequences or allelic profiles (the latter are usually used for population genetics). In step 1, the most important factors to consider are the clock-like behaviour and phylogenetic range of the gene sequences; the availability of complete genomes makes it easier to establish new MLSA schemes. The amplicon of each gene needs to be based on accurate sequence information from both strands hence high-quality sequencing chromatograms are at a premium. In data analysis, definitive identification of vari- ation is obtained by nucleotide sequence determination of gene fragments, with no weight given to the number of nucleotide differences between alleles when assigning 2 Multi-Locus Sequence Analysis 223 sequence types (STs), as it is not possible to determine whether differences at mul- tiple nucleotide sites are due to multiple point mutations or to single recombination events (Maiden, 2006). Finally, relatedness between strains in MLSA is obtained by comparing concatenated sequences or allelic profiles (also STs). Those undertaking population genetics or phylogenetic adaptation studies may compare expanded can- didate functional gene sequences (Tankouo-Sandjong et al., 2007; Parker, Havird, & De La Fuente, 2012; Stefanic et al., 2012). 2.2 SELECTION OF GENE LOCI The housekeeping genes selected for MLSA analyses should be single copies, ortho- logous and ubiquitous among all of the sampled strains. They should also be highly conserved, have no linkage disequilibrium on the chromosome but should contain sufficient varying nucleotide positions to accurately establish relationships between closely related strains. To strike a balance among acceptable identification power, time and cost for strain typing, about five to seven housekeeping genes are com- monly used. The five housekeeping genes that we have used in our Streptomyces MLSA scheme are atpD (ATP synthase F1, b-subunit), gyrB (DNA gyrase, B subunit), recA (recombinase A), rpoB (RNA polymerase, b-subunit) and trpB (tryptophan synthase, b-subunit); the physical distance between any two of these genes is greater than 30 kb in a single Streptomyces genome, thereby avoiding hitchhiking effects (Guo, Zheng, Rong, & Huang, 2008; Rong & Huang, 2012). However, it is not uncommon to use up to 10 housekeeping genes, as exemplified in the case of the genus Nocardia where 14 protein-coding genes were examined (Tamura et al., 2012). Thus, both the number and type of housekeeping genes inter- rogated by MLSA may differ from genus to genus. 2.3 GENERATING SEQUENCES The PCR is commonly used to generate sequence fragments. Conserved profiles of protein-coding genes provide highly conserved islands that can be used to design am- plification and sequencing primers that have broad specificity with respect to phy- logenetic diversity. Commonly used primer design tools include Primer Premier (Premier Biosoft International) and OLIGO Primer Analysis software (Molecular Biology Insights), web tools such as Primer3 (http://biotools.umassmed.edu/bioapps/ primer3_www.cgi)andBLAST(http://www.ncbi.nlm.nih.gov/tools/primer-blast/)for finding primers specific to PCR templates. DANMAN software (Molecular Biology Insights) can be used to evaluate the resultant primers, e.g. by checking the minimum free energy of the primer structure and if dimers (including self- and cross-dimers), hairpin and GC clamps exist. It is essential that dimers are situated well away from the 30 ends of the primers and continuous cross-dimers between the forward and reverse primers are to be avoided. The genes and primers used in our streptomycete MLSA scheme are shown in Table 1. The suggested primer set for each gene covers more than the 400 bp internal Table 1 Genes and Primers Used in the Streptomyces MLSA Scheme Chromosomal Gene Function Direction Primer Sequence (50–30)a Positionb Locationb atpD Amplification F atpDPF-GTCGGCGACTTCACCAAGGGCAAGGTGTTC 283–318 5842699-5844135 AACACC R atpDPR-GTGAACTGCTTGGCGACGTGGGTGTTCTGG 1243–1280 GACAGGAA Sequencing F atpDF-ACCAAGGGCAAGGTGTTCAA 295–314 R atpDR-GCCGGGTAGATGCCCTTCTC 1027–1046 gyrB Amplification F gyrBPFA-CTCGAGGGTCTGGACGCGGTCCGCAAGCGA 121–161 c4265438-4263378 CCCGGTATGTA R gyrBPRA-GAAGGTCTTCACCTCGGTGTTGCCCAGCTT 1150–1185 CGTCTT Sequencing F gyrBFA-GCAAGCGACCCGGTATGTAC 143–162 R gyrBRA-GAGGTTGTCGTCCTTCTCGC 1052–1071 recA Amplification* F recAPFA-GABSCCGCDCTCGCNCAGATYGARCG 31–56 6306654-6307778 R recAPRA-GGAAGTTGCGSGCGTTCTCCTTGC 902–925 Sequencing F recAF-ACAGATTGAACGGCAATTCG