<<

GBE

Complete Genome of the Starch-Degrading Myxobacteria Sandaracinus amylolyticus DSM 53668T

Gaurav Sharma, Indu Khatri, and Srikrishna Subramanian* Protein Science and Engineering, CSIR-Institute of Microbial Technology, Sector-39A, Chandigarh, India

*Corresponding author: E-mail: [email protected]. Accepted: June 14, 2016 Data deposition: The complete genome sequence of Sandaracinus amylolyticus DSM 53668T and its annotations are deposited at DDBJ/ENA/ GenBank under accession number CP011125.

Abstract Myxobacteria are members of d- and are typified by large genomes, well-coordinated social behavior, gliding motility, and starvation-induced fruiting body formation. Here, we report the 10.33 Mb whole genome of a starch-degrading myxobacterium Sandaracinus amylolyticus DSM 53668T that encodes 8,962 proteins, 56 tRNA, and two rRNA operons. Phylogenetic analysis, in silico DNA-DNA hybridization and average nucleotide identity reveal its divergence from other myxobacterial species and support its taxonomic characterization into a separate family Sandaracinaceae, within the suborder Sorangiineae. Sequence similarity searches using the Carbohydrate-active (CAZyme) database help identify the repertoire of S. amylolyticus involved in starch, agar, chitin, and cellulose degradation. We identified 16 a- and two g-amylases in the S. amylolyticus genome that likely play a role in starch degradation. While many of the amylases are seen conserved in other d-proteobacteria, we notice several novel amylases acquired via horizontal transfer from members belonging to phylum -Thermus, , and . No agar degrading enzyme(s) were identified in the S. amylolyticus genome. Interestingly, several putative b-gluco- sidases and endoglucanases proteins involved in cellulose degradation were identified. However, the absence of cellobiohydrolases/ exoglucanases corroborates with the lack of cellulose degradation by this . Key words: CAZyme, , agarase, , methylome, phylogeny.

Introduction strain So0157-2 (14.78 Mb) (Han et al. 2013)andS. cellulo- Myxobacteria [order Myxococcales] are Gram-negative and sum strain Soce56 (13.03 Mb) (Schneiker et al. 2007)have aerobic (except Anaeromyxobacter) d-proteobacteria. These been reported from the suborder Sorangiineae. These mem- bacteria possess large genomes and a complex life cycle bers are known to degrade diverse, complex polysaccharides wherein they coordinate within swarms and hunt other mi- such as cellulose, starch, casein, agar, chitin, gelatin, dextrin, crobes (Shimkets 1990;Whitworth 2008). Myxobacterial spe- xylan, agarose, etc. (Garcia and Mu¨ ller 2014). They have also cies have been of industrial interest due to their significant role been reported as microbial predators to organisms such as in secondary metabolites production (Weissman and Muller Escherichia coli, Bacillus subtilis, Mycobacterium chitae, 2010; Steinmetz et al. 2012).OrderMyxococcalesatpresent Cyanobacteria, yeast, etc. (Garcia and Mu¨ ller 2014). Among comprises of more than 27 taxonomically classified genera the four characterized families of the suborder Sorangiineae, and 60 species. This order has been classified in three subor- genomic data are available only for Sorangium and ders, Cystobacterineae (four families), Nannocystineae (two Chondromyces that belong to Polyangiaceae family which families), and Sorangiineae (four families). The members of are the largest source of myxobacterial secondary metabolites suborder Sorangiineae are cellulolytic, bacteriolytic, and pro- (Schneiker et al. 2007;Weissman and Muller 2010;Han et al. teolytic as compared with other family members and form a 2013). separate clade in myxobacterial 16S rRNA-based phylogeny Within suborder Sorangiineae, Sandaracinus amylolyticus (Reichenbach 2005; Garcia and Mu¨ ller 2014). Till date the strain DSM 53668T is the type strain of S. amylolyticus gen. largest complete bacterial genomes Sorangium cellulosum nov.,sp. nov., proposed as a novel species in the family

ß The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

2520 Genome Biol. Evol. 8(8):2520–2529. doi:10.1093/gbe/evw151 Advance Access publication June 29, 2016 Complete Genome of a Starch-Degrading Myxobacteria GBE

Sandaracinaceae within order Myxococcales (Mohr et al. templates were annealed and then loaded onto four SMRT 2012). Morphologically and biochemically the strain is a cells and sequenced using P6C4 chemistry with 180-min Gram-negative, aerobic, rod-shaped during vegetative movie times. Sequencing yielded a total of 241,674 reads states, -negative, oxidase-positive, and pigment form- with a mean read length of 4.67 kb, totaling ing myxobacteria (Mohr et al. 2012). This strain was isolated in 1,129,177,817 bp (~85Â coverage). De novo assembly was 1996 from a gray-brown soil sample (having plant residues) carried out using the hierarchical genome assembly process collection by Helmholtz Zentrums fu¨ r Infektionsforschung (HGAP) protocol in SMRT Analysis v2.0, including consensus [HZI] during 1995 from a botanical garden in Lucknow, polishing with Quiver (Chin et al. 2013). Since the Uttar Pradesh, India (Reichenbach and Dworkin 1992; Mohr BluePippinTM Size Selection (>20,000) protocol filters out et al. 2012). Sandaracinus amylolyticus possesses typical myx- small DNA fragments such as , Illumina paired-end obacterial characteristics including gliding motility, secondary (PE) data were also obtained using the Illumina HiSeq1000 metabolite production (Steinmetz et al. 2012), and spore gen- sequencing platform at the C-CAMP, Bangalore, India. eration during harsh environmental conditions. Here, we Sequencing resulted in 13,103,763 PE reads (insert size: 350 report the complete genome of S. amylolyticus DSM 53668T bp and read length: 101 bp) out of which 11,663,870 PE with special emphasis on the starch degrading properties reads were selected after quality filtering [PHRED quality along with the protein repertoire involved in starch, agarose, score: 20; minimum read length: 70] using NGSQC Toolkit chitin, and cellulose degradation. v2.2.3 (Patel and Jain 2012). To trace the presence of any , the filtered Illumina reads were mapped using Materials and Methods CLCbio wb8.0 (www.clcbio.com, last accessed July 8, 2016) to the bacterial plasmid database (http://www.ebi.ac.uk/ge- Culturing and Genomic DNA Isolation nomes/plasmid.html, last accessed July 8, 2016). Actively growing plate cultures of S. amylolyticus was pro- Gene prediction and functional annotation were per- cured from Deutsche Sammlung von Mikroorganismen und formed by Rapid Annotation using Subsystem Technology Zellkulturen (DSMZ) as strain number DSM 53668T [NOSO-4T]. (RAST) (Aziz et al. 2008). RNAmmer 1.2 (Lagesen et al. Sandaracinus amylolyticus DSM 53668T was grown aerobi- 2007) and tRNAscan-SE-1.23 (Lowe and Eddy 1997)were cally on CY agar plate culture (DSM Medium: 67, as prescribed used to predict rRNA and tRNA genes, respectively. by DSMZ culture collection) at 28–30 C for a minimum of 5 days. The culture was yellow-orange in color attributed to the Phylogenetic Analysis, Genome-Genome Distance, and presence of carotenoids (Mohr et al. 2012). DNA for sequenc- Average Nucleotide Identity ing was obtained using both the Zymogen Research Bacterial/ 16S rRNA genes from various strains of suborder Sorangiineae fungal DNA isolation kit and Phenol–Chloroform–Isoamyl along with neighboring families were extracted from the Alcohol-based manual method. The quantity and quality of National Centre for Biotechnology Information (NCBI), and the extraction were checked by gel electrophoresis along with were aligned using ClustalW module of BIOEDIT sequence Nanodrop method and followed by Qubit quantification. alignment tool (version 7.1.3.0; Hall 1999). The resulting align- ment was used as an input in Molecular Evolutionary Genetics Genome Sequencing, Assembly, and Annotation Analysis (MEGA) v6.06 (Tamura et al. 2013) to generate a max- imum likelihood (ML) tree [model: Tamura 3-param; bootstrap: Sandaracinus amylolyticus was sequenced on four single mol- 100]. Initial tree(s) for the heuristic search were obtained by ecule real-time (SMRT) cells using P6 polymerase and C4 applying the Neighbor-Joining method to a matrix of pairwise chemistry (P6C4) on a Pacific Biosciences RSII instrument at distances estimated using the Maximum Composite Likelihood McGill University and Genome Quebec Innovation Center, approach. In silico DNA-DNA hybridization (DDH) values among Montre´al (Que´bec), Canada. DNA sample was sheared and the members of suborder Sorangiineae and other species mem- concentrated using AMPure magnetic beads and treated by bers of order Myxococcales were calculated using Genome-To- ExoVII to remove single stranded ends. SMRTbell long libraries Genome Distance Calculator (GGDC) server (Auch et al. 2010). were constructed with 10 mg whole genomic DNA using Average Nucleotide Identity (ANI) matrix was also generated “Procedure and Checklist–20 kb Template Preparation Using among the genomes studied (Richter and Rossello-Mora 2009). BluePippinTM Size Selection System protocol” (http://www. pacb.com/wp-content/uploads/2015/09/Shared-Protocol-20- kb-Template-Preparation-Using-BluePippin-Size - Selection- Genome and Proteome Analysis System-15-kb-Size-Cutoff.pdf, last accessed July 5, 2016). Sandaracinus amylolyticus proteome was subjected to Blunt ligation reactions were prepared, and SMRTbell tem- hmmscan (Eddy 2011) against Hidden Markov Model plates were purified using AMPure magnetic beads. (HMM) profiles of Carbohydrate-Active enZYmes (CAZY) BluePippinTM size selection was performed to retain long database (dbCAN HMMs 3.0; Yin et al. 2012;Lombard et al. reads (>20,000) for sequencing. The size-selected SMRTbell 2014). The functional domains and other known sequence

Genome Biol. Evol. 8(8):2520–2529. doi:10.1093/gbe/evw151 Advance Access publication June 29, 2016 2521 Sharma et al. GBE

motifs involved in starch, cellulose, chitin, and agar degrada- This genome encodes two sequentially identical 16S rRNA tion were retrieved on the basis of EC number for each which shows 100% identity with previously reported 16S rRNA enzyme category. Also, the proteins of S. amylolyticus were sequence of S. amylolyticus strain NOSO-4T [NCBI Reference mapped against the CAZY protein database using Basic-Local Sequence: NR_118001.1] followed by 87.98% identity with Alignment Search Tool (BLASTp) (E-value cutoff of 1eÀ5;min- Chondromyces apiculatus strain Cm a14 [NR_025344.1]. imum query coverage of 50% and minimum percent identity Among the myxobacteria with sequenced genomes, it shows of 35%) (Altschul et al. 1990). NCBI BLAST + (v 2.2.26+) and the maximum identity (87.89%) to the 16S rRNA gene of C. nonredundant protein (NR) database downloaded on apiculatus DSM 436 (strain Cm a2) [GenBank: M94274.2] fol- September 6, 2014 was used throughout the study. lowed by 87.5% identity to S. cellulosum 16S rRNA (Soce0157- Clustered regularly interspaced short palindromic repeat 2 and Soce56). The 16S rRNA-based phylogeny of (CRISPR) elements and insertion elements were identified Sandaracinus and related myxobacteria suggest its classification using CRISPRfinder (Grissa et al. 2007) and ISfinder (Siguier as a separate family within the suborder Sorangiineae (supple- et al. 2006), respectively. mentary fig. S1, Supplementary Material online). Sandaracinus amylolyticus 16S rRNA forms a separate branch distinct from other families in the suborder Sorangiineae phylogenetic tree, Sandaracinus amylolyticus Amylases representing family Sandaracinaceae. We also calculated in The a-amylases and g-amylases from S. amylolyticus were sub- silico DDH values for all available myxobacterial genomes jected to the BLASTp against amylase sequences from the using the GGDC server (Auch et al. 2010). Sandaracinus CAZY database (clustered at 70% sequence identity) and amylolyticus is not closely related to other Myxococcales as top hits were retrieved. The amylases from S. amylolyticus the maximum DDH value was identified to be 18.1 with S. were cut into the domains and were aligned using MUltiple cellulosum followed by 17.4 with C. apiculatus DSM 436 sug- Sequence Comparison by Log-Expectation (MUSCLE) (Edgar gesting a distant relation to Polyangiaceae members, which 2004) incorporated in MEGA v6.06. Further the evolutionary further confirms its placement in a separate family, tree was generated by ML method [model: Jones, Taylor, and Sandaracinaceae. ANI calculations likewise support the above Thorton; bootstrap: 100] using MEGA v6.06. Initial tree(s) for conclusions (supplementary table S2, Supplementary Material the heuristic search were obtained by applying the Maximum online). Parsimony method to a matrix of pairwise distances estimated The members of suborder Sorangiineae such as the S. cel- using the Maximum Composite Likelihood approach. Further, lulosum species have been well characterized for degrading the S. amylolyticus GH13 amylases along with the closely re- cellulose and other complex polysaccharides (Schneiker et al. lated members of each phylogenetic clade were aligned using 2007; Han et al. 2013). Similarly, S. amylolyticus has been PROfile Multiple Alignment with predicted Local Structures reported as the only myxobacteria capable of degrading and 3D constraints (PROMALS3D) with Taka-Amylase A of starch (Mohr et al. 2012). Thus in order to gain insights into Aspergillus oryzae [2TAA] (Matsuura et al. 1984) as a struc- the genes involved in the degradation of complex polysaccha- tural template (Pei et al. 2008). rides, especially starch the complete genome of S. amylolyticus was determined and investigated for the presence of carbo- hydrate active enzymes. Results and Discussion Sandaracinus amylolyticus Genome Properties Sandaracinus amylolyticus CAZome Sandaracinus amylolyticus genome was assembled de novo by Two-hundred and forty-three proteins (~2.7% of total prote- HGAP v2.0 (Chin et al. 2013) of the SMRT portal as a single ome) of S. amylolyticus were annotated with diverse combi- contig of 10.33 Mb with no plasmid and representing the nations of CAZY domains that consist of 72 GT domains, 72 complete genome with approximately 85Â coverage and CE domains, 58 GH domains, 19 AA domains, 4 PL domains, 72% G+C content (fig. 1). No complete plasmid could be and 26 CBM domains (fig. 2A). Fourteen out of these 243 retrieved from the mapping of Illumina PE reads against the proteins contained more than one CAZY , whereas bacterial plasmid database. Plasmids are rare in the order the rest had only a single CAZY domain. In multiple CAZY- Myxococcales and till date only one plasmid pMF1 from domain containing proteins, various carbohydrate binding Myxococcus fulvus 124B02 has been reported (Zhao et al. modules such as CBM48, CBM50, and CBM21 are pre- 2008). RAST-based annotation predicted 8,962 coding sent along with other CAZY category proteins for assisting in genes, out of which 4,939 proteins (55.16%) were function- their functions. The CAZyme distribution observed in ally annotated while the remaining (44.84%) were hypothet- S. amylolyticus indicates its ability to degrade various ical proteins (supplementary table S1, Supplementary Material types of biomass. Here, we examine the S. amylolyticus pro- online). Sandaracinus amylolyticus genome has two rRNA op- teome involved in starch, agar, chitin, and cellulose erons and 56 aminoacyl-tRNA synthetase genes. degradation.

2522 Genome Biol. Evol. 8(8):2520–2529. doi:10.1093/gbe/evw151 Advance Access publication June 29, 2016 Complete Genome of a Starch-Degrading Myxobacteria GBE

Sandaracinus amylolyticus DSM 53668T CP011125 10,328,378 bp

T FIG.1.—Circular representation of the Sandaracinus amylolyticus DSM 53668 complete genome. Circles (from inside to outside) 1and2(GC content; black line and GC skew; magenta and green lines), circle 3 (CAZome: Proteins annotated as carbohydrate active enzymes; Red); circle 4, 5, and 6 (Amylases, , and ; Black); circle 7 (encoded RNA, Magenta); circle 8 (CDS present on negative strand; Blue); circle 9 (CDS present on positive strand; Green); circle 10 (S. amylolyticus DSM 53668T complete genome; Red). BRIG 0.95 was used to build the circular representation (Alikhan et al. 2011).

Evolutionary History of S. amylolyticus Amylases GH57, GH119, GH126), b-amylase (EC:3.2.1.2; GH14), and Amylases, belonging to glycoside (GH) enzymatic -amylase (EC:3.2.1.3; GH15, GH97). Sandaracinus category, degrade starch into sugar moieties by the amylolyticus genome contains both a-and-amylases hydrolysis of a-1,4- and a-1,6-glycosidic bond (Van der whereas no b-amylases could be found. Among the 4 a-am- Maarel et al. 2002). Based on the site of action, they are di- ylases GH families, 15 GH13 proteins, and 1 GH57 protein vided into three categories; a-amylase (EC:3.2.1.1; GH13, were found. We could identify two -amylases belonging to

Genome Biol. Evol. 8(8):2520–2529. doi:10.1093/gbe/evw151 Advance Access publication June 29, 2016 2523 Sharma et al. GBE

AB80 AKF03605 GH13 72 72 AKF03642 GH13

70 AKF03796 CBM48 GH13

AKF03860 CBM48 GH13 58 60 AKF03861 GH13 AKF03862 GH13 50 AKF04017 GH13 AKF06951 GH57

40 AKF06952 CBM48 GH13

AKF07406 GH13 30 26 AKF07413 CBM48 GH13 AKF07527 GH13 Number of proteins 19 20 AKF09634 GH13 AKF09643 GH13 10 AKF10340 GH15 4 AKF10487 GH15

0 AKF11765 CBM48 GH13 AA CBM CE GH GT PL AKF11772 GH13

T T FIG.2.—Sandaracinus amylolyticus DSM 53668 CAZome distribution. (A) Sandaracinus amylolyticus DSM 53668 CAZY families presented in a bar graph. x axis represents the CAZY categories and y axis represents the number of proteins. (B) a-andg-amylases along with their CAZome domain architecture. The domains have been colored as GH13: sky blue; GH15: dark red; GH57: green; and CBM48: orange.

GH15 category (fig. 2B). The starch degrading proteins have residues were conserved in all S. amylolyticus GH13 amylases been reported to have CBM20, CBM21, CBM25, CBM26, (supplementary figs. S2 and S3, Supplementary Material CBM34, CBM41, CBM45, CBM48, CBM53, and CBM58 online). Similarly, S. amylolyticus GH57 (AKF06951) was modules that help in glycogen/starch binding, truncation of aligned with its homologs using PROMALS3D based on the which reduces its enzymatic activity (Noguchi et al. 2011; 4-a-glucanotransferase structure of Thermococcus litoralis Janecek et al. 2014). GH15 and GH57 domains are present (PDBid: 1K1W). In the GH57 alignment, all five CSRs and as single domain proteins in S. amylolyticus. Interestingly, in 5 both catalytic residues were conserved (supplementary fig.S4, outof15GH13S. amylolyticus proteins (AKF03796, Supplementary Material online). AKF03860, AKF06952, AKF07413, and AKF11765), the a- In order to understand the provenance of these a-andg- amylase GH13 domain was present along with a CBM48 amylases, they were subjected to BLASTp against the NR data- module at the N-termini. The CBM48 module assists in the base. To include amylases across the entire bacterial phylum, GH13 function by serving as starch-binding domains. the CAZY-NR database for GH13 and GH57 domains were The GH13 family include the a-amylase domains with an (b/ retrieved and subjected to BLASTClust at 70% identity and a)8 barrel catalytic domain. These proteins have 4–7 conserved 70% length coverage that reduced the domain sequence sequence regions (CSRs) and a specific consisting space from 19,590 to 4,038 domains. Further, amylase do- of a catalytic nucleophile (aspartic acid) on strand b4, proton mains from S. amylolyticus were subjected to BLASTp against donor (glutamic acid) on strand b5, and a transition state sta- these 4,038 domains, and their top hits were retrieved and bilizer (aspartic acid) on strand b7(Janecek et al. 2014). GH13 included in the phylogenetic analysis. All these GH13 and members show extremely low sequence identity with their ho- GH57 domains along with S. amylolyticus amylase domains mologs, but the catalytic triad is highly conserved. GH57 family were aligned using MUSCLE as incorporated in MEGAv6.0.6, a-amylaseshavea(b/a)7 barrel structure and have five CSRs as and ML-based phylogeny with 100 bootstrap values was compared with the GH13 family. However, only two of the drawn (fig. 3). The ML tree obtained was divided into three catalytic residues, viz., the nucleophile on strand b4andthe major clades where the clade at root composed of all the GH57 proton donor on strand b7areconserved(Blesak and Janecek domains and the other two major clades further composed of 2012). Irrespective of these differences, GH13, and GH57 both all the GH13 sequences. The representative bacterial amylases employ the same retaining mechanism of action (Janecek and from various clades show close relations with amylases of di- Kuchtova 2012). Sandaracinus amylolyticus GH13 proteins verse bacterial groups suggesting the multiple gain and loss of were retrieved, and the structure of Taka amylase of A. these domains. As all the amylase domains of S. amylolyticus oryzae (PDBid: 2TAA) was used to perform a structure-based are scattered across the tree, the phylogenetic position of these alignment using PROMALS3D (Matsuura et al. 1984; Pei et al. domains could provide us insights about their provenance and 2008). All the seven CSR regions and the three catalytic reveal possible cases of horizontal transfers.

2524 Genome Biol. Evol. 8(8):2520–2529. doi:10.1093/gbe/evw151 Advance Access publication June 29, 2016 Complete Genome of a Starch-Degrading Myxobacteria GBE

TCC 19860 -1

A GH13 ZP 02732055.1 Gemmata obscuriglobus UQM 2246

8 GH13 GH13

6 GH13 . UW GH13 ZP 6 ATCC 33913

GH13

3 GH13 str

GH13 GH13 GH13 GH13 5

GH13 ZP GH13

YP

YP TCC 17025

M GH13 AKF04017 Sandaracinus amylolyticus DSM 53668

S

GH13 427382.1 Rhodospirillum rubrum YP 002138495.1 Geobacter bemidjiensis Bem GH13 GH13

YP 0 D 170

YP YP 003705735.1 YP 01094030.1 Blastopirellula marina DSM 3645 YP 0 1 GH13 ZP 03631034.1 Pedosphaera parvula Ellin514 10 GH13 ZP YP s 003268944.1 Haliangium ochraceum DSM 14365 1 GH13 ZP u 001212136.1 Pelotomaculum thermopropionicum SI 002787899.1 Deinococcus deserti VCD1 YP 2 6

c GH13 YP 0 001613031.1 Sorangium cellulosum So ce56 i TCC 25259 4

1 0

t YP YP YP 0 A

GH13 62738967 6 7 GH13 GH13 y TCC 49957 Y25 0021361 GH13 NP 2 l N TCC 1 -27 8 3 A . campestris str.

004178166.1 Isosphaera pallida 628810.1 Myxococcus xanthus DK 1622 5 o F T

l A 003953271.1 Stigmatella aurantiaca 6 1

3 B-59395 1 y

9 8 GH13 GH13 C 07016950.1 Desulfonatronospira thiodismutans 8

i 02731815.1 Gemmata obscuriglobus UQM 2246 . 9 l m GH57 GH13 YPGH13 ADX84572.1 Sulfolobus islandicus REY15A 6 1

t YP YP 3 001634781.1 J-10-fl a 8

e

S 5

3

s

.

002536676.1 Geobacter daltonii FRC-32 o

003570708.1 Salinibacter ruber M8 . 1 13.1 953405.1 Geobacter sulfurreducens PCA u YP YP 004184234.1 m 1

r

n AB3 str. UPII9-5 a T u D GH57 YP i i

GH13 ZP 06370208.1 Desulfovibrio sp. FW1012B G GH57 ZP 06391561.1 Dethiosulfovibrio peptidovoransGH13 ZP 589320.1 DSM Candidatus 1 Koribacter versatilis Ellin345 ruepera radiovictrix DSM 17093 n c

b GH57 e g GH57 CAJ73817.1 Candidatus Kuenenia stuttgartiensis e 003196918.1 Desulfohalobium retbaense DSM 5692 a s o 873132.1 s Anaeromyxobacter sp. K r

i o

z u u

i a

b Acidovorax avenae subsp. avenae GH57 YP m l

h f d

a

o

n ariovorax paradoxus S1 . Hall’ R YP c v

c 004369289.1 Desulfobacca acetoxidans DSM 1

a GH57 ZP t

i

e

e 1 03128791.1 Chthoniobacter flavus Ellin428 b Accumulibacter phosphatis clade IIA

846466.1 Syntrophobacter fumaroxidans MPOB . S l

r YP r GH57 l

0 u i

4 d o

Acidothermus cellulolyticus 1 7 l Thiobacillus denitrificans o 3 002463416.1 Chloroflexus aggregans DSM 9485 a

7 GH57 YP T s s 6 l

t p erriglobus saanensis SP1PR4 0 A u o 06439127.1 Anaerobaculum hydrogeniformans 9

AKF06951 Sandaracinus amylolyticus DSM 53668 . 7 167979.1 Rhodobacter sphaeroidesA

TCC 11 m n GH57 0 155328.1 Oceanicola granulosus HTCC2516

F GH57 YP 003553106.1 4

i

GH57 F

i 001818894.1 Opitutus terrae PB90-1 W 003154808.1 Brachybacterium faecium DSM 4810

S 004234793.1 437816.1 Sinorhizobium meliloti 1021

F K

o

1 001440575.1 Cronobacter sakazakiiATCC BAA-894

R YP 003806467.1 Desulfarculus baarsii DSM 2075 A 01440199.1 Fulvimarina pelagi HTCC2506 YP TCC 29328 YP YP 002951636.1 Desulfovibrio magneticus RS-1 933302.1 Azoarcus sp. BH72 A 0 YP

c YP C A YP 001 314931.1 TCC 43644 1 GH57 YP e 425597.1 Rhodospirillum rubrum 170

- 2 YP 003052329.1 Methylovorus glucosetrophus SIP3-4 5 GH57 ZP 06370207.1 Desulfovibrio sp. FW1012B 3 001378953.1 Anaeromyxobacter sp. Fw109-5 YP 002943895.1 V 06898163.1 Roseomonas cervicalis B YP 6 2 744570.1 Granulibacter bethesdensis CGDNIH1 TCC 49176 15 635804.1 Xanthomonas campestris pv GH13 YP 821516.1 Candidatus Solibacter usitatus Ellin6076 YP GH13 GH13 YP A GH13 GH13YP 001820541.1 Opitutus terrae PB90-1 GH13 ZP 03130066.1 Chthoniobacter flavus Ellin428 GH13 003188581.1 Acetobacter pasteurianus IFO 3283-01 GH13 GH57 GH13 GH13 NP YP 003964374.1 Ketogulonicigenium vulgare GH13 ZP 01 GH13 002762222.1 Gemmatimonas aurantiaca GH57 GH13 YP 001430947.1 Roseiflexus castenholzii DSM 13941 GH13 ZP YP GH13 GH13 001820718.1 Opitutus terrae PB90-1 GH13 YP 593779.1 Candidatus Koribacter versatilis Ellin345 GH13 YP 003167286.1 Candidatus GH57 GH13 GH57 ADL25994.1 Fibrobacter succinogenes subsp.Aminobacterium succinogenes colombiense S85 DSM 12261 GH13 ZP YP YP GH13 ASO3-1 GH13 NP YP GH57 001047176.1 Methanoculleus marisnigri JR1 GH13 AKF03860 Sandaracinus 003475102.1 amylolyticus Clostridiales DSM genomosp. 53668 BV YP 1B GH13 YP AKF06952 Sandaracinus 004171624.1 amylolyticus Deinococcus DSM maricopensis 53668 DSM 2121 GH13 YP 022845.1 Picrophilus torridus DSM 9790 001995852.1GH57 NP Chloroherpeton thalassium GH13 ZP 08206710.1 Gordonia neofelifaecisYP NRRL GH13 YP YP GH13 YP 003144075.1 Slackia heliotrinireducens DSM 20476 GH13 502032.1 Methanospirillum hungatei JF-1 YP 004309380.1 02025300.1 Clostridium Eubacterium lentocellum ventriosum DSM 5427 ATCC 27560 GH13 YP AnaerostipesAbiotrophia hadrus defectiva GH57 YP 845422.1 Syntrophobacter fumaroxidans MPOB GH13 515657.1 Chlamydophila felis GH57 GH57 YP 953406.2 Geobacter sulfurreducens PCA GH13 1 GH13 YP GH57 ZP 07029279.1 Granulicella mallensis MP5ACTX8 109 GH13 001692347.1 Finegoldia magna GH13 YP 1002 07526221.1 Peptostreptococcus stomatis DSM 17678 ATCC BAA-1850 GH13 ZP YP 002754947.1 477387.1 Synechococcus sp. JA-2-3B GH13 1971.1 Clostridium beijerinckii NCIMB 8052 GH57 GH13 ZP 06299217.1 Parachlamydia 04451931.1 acanthamoebae str GH13 CBL15412.1 Ruminococcus 05391277.1 bromii ClostridiumL2-63 carboxidivorans P7 GH13 00131 GH57 GH57 GH13 ZP YP 001874926.1 Elusimicrobium minutum Pei191 003966833.1 Ilyobacter polytropus DSM 2926 GH57 YP 002523133.1 Acidobacterium capsulatum GH13 CBL37815.1YP 170 YP 003317062.1 ThermanaerovibrioYP acidaminovorans DSM 6589 GH13 ZP YP 001396860.1 Clostridium kluyveri DSM 555 004178167.1 Isosphaera pallida GH13 ZP GH13 04433184.1 Bacillus coagulans 36D1 GH57 CBL28650.1 Fretibacterium fastidiosum ATCC 11 ATCC 351 GH13 001988131.1 Lactobacillus casei BL23 GH13 YP YP 002459656.1 Desulfitobacterium hafniense DCB-2 T7901 GH57 GH13 ZP 02212845.1 Clostridium bartlettii DSM 16795 YP 003947078.1 Paenibacillus polymyxa SC2 GH13 ZP GH57 YPYP ATCC 17029 Thermomicrobium roseum DSM 5159 10 GH13 004184796.1 GH13 003011000.1 Paenibacillus sp. JDR-2 001869522.1 Nostoc punctiforme PCC 73102ATCC 51 GH13 YP GH13 CAJ38414.1 Anaerobranca gottschalkii AMB-1 GH13 YP 04577281.1 Oxalobactereredinibacter formigenes HOxBLSturnerae T A T erriglobus saanensis SP1PR4TCC 43644196 GH13 NP 691327.1 003385621.1 Oceanobacillus Spirosoma iheyensis linguale HTE831 DSM 74 GH13 YP 002560185.1 Macrococcus caseolyticus JCSC5402 GH13 ZP YP 425596.1 Rhodospirillum rubrum GH13 YP AEA65056.1 Burkholderia gladioli BSR3 GH13 YP 003072536.1 ATCC 51599 GH13 003050418.1 Methylovorus glucosetrophus SIP3-4 TCC 19860 YP 001042785.1 Rhodobacter sphaeroides GH13 YP GH13 07589067.1 Sinorhizobium meliloti BL225C GH13 YP 422427.1 Magnetospirillum magneticum GH13 ZP 533535.1 Rhodopseudomonas palustris BisB18 TCC 49030 GH13 YP 865429.1 Magnetococcus marinus MC-1 A YP errucomicrobium spinosum DSM 4136 GH13 08019095.1 Lautropia mirabilis GH13 ZP GH13 004255400.1 DeinococcusAcidovorax proteolyticus avenae subsp. MRP avenae A GH13 ZPYP GH13 YP 06751604.1 Parascardovia denticolens F0305 GH13 004234794.1 YP 003250004.1 FibrobacterGH13 succinogenes ZP subsp. succinogenes S85 GH13 ZP 02930055.1 001639019.1 V Methylobacterium extorquens PA1 GH13 YP Arthrobacter sp. FB24 GH13 02163221.1 Kordia algicida OT GH13 str. UW-1 YP 003570904.1 Salinibacter ruber M8 830231.1 olumonas auensis DSM 9187 GH13 ZP 06806048.1 BrevibacteriumT mcbrellneri 004065457.1 Pseudoalteromonas sp. SM9913 GH13 AKF03862GH13 Sandaracinus amylolyticus DSM 53668 GH13 YPYP YP 003953780.1 Stigmatella aurantiaca GH13 YP 002893874.1 661213.1 Pseudoalteromonas atlantica T6c GH13 GH13 -1 GH13 YP errucomicrobiumAccumulibacter spinosum DSM phosphatis 4136 clade IIA YP 003167877.1 CandidatusGH13 CAJ73823.1 YP 022846.1 Candidatus Picrophilus Kuenenia torridus stuttgartiensis DSM 9790 GH13 YP 828366.1 Candidatus Solibacter usitatus Ellin6076 GH13 Accumulibacter phosphatis clade IIA GH13 ZPYP 02930592.1 003165543.1 V Candidatus GH13 YP 981341.1 Polaromonas naphthalenivorans CJ2 GH13 ZP 02930626.1 V GH13 CAX84126.1 uncultured bacterium GH13 AKF07413 Sandaracinus amylolyticus DSM 53668 str. UW GH13 002373220.1 Cyanothece sp. PCC 8801 GH13 ZP errucomicrobium spinosum DSM 4136-1 YP 03132132.1 Chthoniobacter flavus Ellin428 GH13 001568766.1 Petrotoga mobilis SJ95 GH13 ZP GH13 YP A1 01875979.1 Lentisphaera araneosa HTCC2155 YP 256058.1 Sulfolobus acidocaldarius DSM 639 GH13 GH13 YP 003389435.1 Spirosoma linguale DSM 74 GH13 AKF03796 Sandaracinus amylolyticus DSM 53668 GH13 YP 003775732.1 Herbaspirillum seropedicae SmR1 GH13 EFS35729.1 Propionibacterium acnes HL013P YP 003382750.1 Kribbella flavida DSM 17836 GH13 NP 773406.1 Bradyrhizobium diazoefficiens USDA GH13 1.1 Nocardioides sp. JS614 190 GH13 YP 92301 GH13 YP 110 AEA23968.1 Pseudonocardia dioxanivorans CB1 426691.1 Rhodospirillum rubrum A GH13 Actinosynnema mirum DSM 43827 TCC 11170 YP 003103720.1 GH13 ZP 03507950.1 Rhizobium etli Brasil 5 GH13 GH13 AKF07406 Sandaracinus amylolyticus DSM 53668 1109 GH13 EFR58263.1 Alistipes sp. HGB5 GH13 YP 004369288.1 Desulfobacca acetoxidans DSM 1 GH13 CBK62785.1 Alistipes shahii WAL 8301 GH13 ZP 06370206.1 Desulfovibrio sp. FW1012B GH13 EFR58714.1 Alistipes sp. HGB5 GH13 YP 846467.1 Syntrophobacter fumaroxidans MPOB GH13 YP 473881.1 Synechococcus sp. JA-3-3Ab GH13 YP 001817757.1 Opitutus terrae PB90-1 GH13 YP 002371611.1 Cyanothece sp. PCC 8801 GH13 ZP 01041735.1 Erythrobacter sp. NAP1 GH13 YP 321945.1 Anabaena variabilis GH13 YP 757956.1 Maricaulis maris MCS10 GH13 YP 003319347.1 Sphaerobacter thermophilusATCC 29413 DSM 20745 GH13 YP 003705736.1 Truepera radiovictrix DSM 17093 GH13 ZP 03703035.1 Flavobacteria bacterium MS024-2A GH13 YP 412097.1 Nitrosospira multiformis YP 004096759.1 Bacillus cellulosilyticus DSM 2522 GH13 GH13 YP 643107.1 Rubrobacter xylanophilus DSM 9941 GH13 XP 002535253.1 Ricinus communis GH13 YP ATCC 25196 GH13 002943896.1 V 003291085.1 Rhodothermus marinus DSM 4252 YP 591838.1 Candidatusariovorax Koribacter paradoxus versatilis S1 Ellin345 GH13 YP GH13 ZP 07016947.1 Desulfonatronospira thiodismutans ASO3-1 193561.1 FlavobacteriumTuricibacter johnsoniae sanguinis UW101 PC909 GH13 10 YP 001 YP GH13 06623132.1 ATCC 51271 GH13 ZP 07031501.1004184235.1 Granulicella mallensis MP5ACTX8 GH13 ZP GH13 Terriglobus saanensis SP1PR4 05793069.1 Butyrivibrio crossotus DSM 2876 YP 003450636.1 Azospirillum sp. B510 GH13 ZP GH13 04449974.1 Catonella morbi YP 821518.1 Candidatus Solibacter usitatus Ellin6076 GH13 ZP GH13 BAD38983.1 Rhodobacter sphaeroides f. sp. denitrificans GH13 YP GH13 EGE53424.1GH13 Streptococcus 196049865 Lactobacillusparauberis NCFD Plantarum 2020 GH13 YP 317560.1 Nitrobacter winogradskyi Nb-255 GH13 744568.1 Granulibacter bethesdensis CGDNIH1 004219626.1 Granulicella tundricola MP5ACTX9 GH13 YPYP 001630976.1 002252735.1 Bordetella Ralstonia petriisolanacearum DSM 12804 MolK2 YP YP 003534630.1 Haloferax volcaniiATCC DS243049 GH13 GH13 GH13 YP 003909800.1 Burkholderia sp. CCGE1003 GH13 ZP 02490366.1 Burkholderia pseudomallei NCTC 13177 GH13 YP 002565216.1 Halorubrum lacusprofundi ATCC 49239 GH13 08043674.1 Haladaptatus paucihalophilusATCC DX253 49239 003775906.1 Herbaspirillum seropedicae SmR1 GH13 YP GH13 YP 136876.1 Haloarcula 003535754.1 marismortui Haloferax volcanii DS2 545525.1 Methylobacillus flagellatus KT GH13 ZP YP GH13 YP 001348504.1 Pseudomonas aeruginosa PA7 003535754.1 Haloferax volcanii DS2 GH13 YP GH13 YP GH13 YP 001600832.1 Gluconacetobacter diazotrophicus P GH13 GH13 AEB58485.1 Pseudomonas mendocina NK-01 GH13 YP ATCC BAA-798 GH13 ZP 002799647.1 002566162.1 Halorubrum lacusprofundi YP YP Anaeromyxobacter sp. K GH13 003426283.1 Bacillus pseudofirmus OF4 003404201.1 Haloterrigena turkmenica DSM 5511 GH13 GH13 YP YP 002427966.1 07685431.1 DesulfurococcusOscillochloris trichoides kamchatkensis DG-6 1221n GH13 GH13 ZP Azotobacter vinelandii DJ GH13 YP 001613703.1 002135342.1 Sorangium cellulosum So ce56 GH13 YPYP AKF09643YP Sandaracinus amylolyticus DSM 53668 003841424.1 Caldicellulosiruptor obsidiansis OB47 GH13 04851786.1 Paenibacillus sp. oral taxon 786 str GH13 GH13 GH13 ZP 003477450.1 Thermoanaerobacter italicus Ab9 GH13 ZP YP0191 003430180.1 Streptococcus gallolyticus UCN34 YP 003387924.1 Spirosoma linguale DSM 74 GH13 ZP A1 5 07818404.1 Eremococcus coleocola ACS-139-V-Col8 003322566.1GH13 Thermobaculum terrenum GH13 YP YP ATCC 27029 GH13 ZP 01201568.1 Flavobacteria bacterium BBFL7 003085052.1 Dyadobacter fermentans DSM 18053 036771 GH13 003125267.1 Chitinophaga 03516644.1 pinensis Rhizobium DSM etli 2588 IE4771 GH13 1332.1 Plesiocystis pacifica SIR-1 GH13 YP 003715784.1 Croceibacter atlanticus HTCC2559 GH13 YP ATCC 17025 GH13 YP AKF11765 Sandaracinus12.1 Bacteroides amylolyticus cellulosilyticus DSM 53668 DSM 14838 GH13 YPAKF03642GH13 Sandaracinus 003695476.1 ZP amylolyticus Starkeya novella DSM DSM53668 506 GH13 ZP 01911337.1 Plesiocystis pacifica SIR-1 YP GH13 631881.1 Myxococcus xanthus DK 1622 GH13 GH13 ZP GH13 001617496.1 Sorangium cellulosum So ce56 A7 GH13 ZP YP 002136601.1 GH13 CBK91 . D14 GH13 ZP 03127313.1 Chthoniobacter flavus Ellin428 GH13 GH13 CBL15040.1 Ruminococcus 01460454.1 Stigmatella bromii L2-63 aurantiaca YP 004097725.1 Intrasporangium calvum DSM 43043 A1501 GH13 ZP 02181788.1 08005100.1 Flavobacteriales Bacillus bacterium sp. 2 ALC-1 GH13 YP 03502892.1 Rhizobium etli Kim 5TCC BAA-894 GH13 002936977.1 Eubacterium rectale A GH13 003838265.1 004097729.1 Micromonospora Intrasporangium aurantiaca calvum DSM 43043 A GH13 ZP 127.1 Eubacterium rectale DSM 17629 YP YP GH13 Anaeromyxobacter sp. K 003638306.1 Cellulomonas flavigena DSM 20109 A3 YP 003196251.1 Robiginitalea biformata HTCC2501 GH13 YP YP YP 158090.1 Oceanicola 676219.1 granulosus Chelativorans HTCC2516 sp. BNC1 GH13 ZP 07335152.1 Desulfovibrio fructosovorans JJ GH13 GH13 01 GH13 ZP 10P GH13 004053192.1 Marivirga tractuosa DSM 4126 YP GH13 GH13 03631559.1 Pedosphaera parvula Ellin514 YP 001169169.1 Rhodobacter sphaeroides GH13 001660479.1 Microcystis aeruginosa NIES-843 GH13 ZP 001312779.1 Sinorhizobium medicae WSM419 004341683.1 GH13 ZP GH13 002376886.1YP Cyanothece sp. PCC 7424 GH13 AKF11772 Sandaracinus amylolyticus DSM 53668 GH13 YP YP GH13 ZP AKF07527 Sandaracinus amylolyticus DSM 53668 002238351.1 Klebsiella pneumoniae 342 GH13 YP YP A GH13 ZP 57 CT2 YP ermamoeba vermiformis GH13 YP 003060842.1 Hirschia baltica A GH13 GH13 YP 001618429.1 Sorangium cellulosum So ce56 GH13 GH13 01911339.1 Plesiocystis pacifica SIR-1 TCC 700975TCC 33030 001348465.1 Pseudomonas aeruginosa P GH13 ZP 660955.1 Pseudoalteromonas atlantica 171037.1 Pseudomonas stutzeri GH13 GH13 011 YP ATCC 49239 GH13 GH13 ZP 267734.1 Colwellia psychrerythraea 34H Archaeoglobus veneficus SNP6 001 03505050.1 Rhizobium etli Brasil 5 ATCC 8503 GH13 YP 01 004319011.1 Sphingobacterium sp. 21 GH13 CAJ81030.1 YP GH13 YP 12645.1 Reinekea blandensis MED297 1.1 Petrotoga mobilis SJ95 GH13 TCC 33656 GH13 GH13 728851 Streptomyces lividans YP 757963.1 Maricaulis1 maris MCS10 ATCC 33806TCC 27759 GH13 BAJ28154.1 Kitasatospora setae KM-6054 GH13 ZP YP 001685712.1 Caulobacter12646.1 sp. ReinekeaK31 blandensis MED297 GH13 NP ABD46558.1 V A GH13 01301673.1 Sphingomonas sp. SKA58 GH13 YP GH13 AEB46245.1 Verrucosispora 003819834.1 maris AB-18-032 Brevundimonas subvibrioides ATCC 15264 GH13 GH13 GH13 ZP GH13 497164.1 Novosphingobium aromaticivorans DSM 12444 GH13 YP GH13 GH13 YP GH13 00993856.1 Janibacter sp. HTCC2649 GH13 YP GH13 NP GH13 YP004171 GH13 001972408.1 Stenotrophomonas maltophilia K279a GH13 YP 002755549.1 Acidobacterium capsulatum GH13 YP 1 i

1 i GH13 9 YP A80195.1 Polysphondylium pallidum PN500 - 921429.1 Nocardioides sp. JS614 k

GH13 YP 001437978.1 Cronobacter sakazakii X l 003339020.1 Streptosporangium roseum DSM 43021 YP

T1b G

a T YP ARSEF 23 YP 03925926.1 Actinomyces coleocanis DSM 15436 S YP

YP 00156881 h YP 605210.1 Deinococcus geothermalis DSM 11300 AKF03605 Sandaracinus amylolyticus DSM 53668 938740.1 Corynebacterium diphtheriae NCTC 13129 C YP 002785382.1 Deinococcus deserti VCD1 003697614.1 YP 022847.1 Picrophilus torridus DSM 9790 . YP GH13 c YP 004099942.1 Intrasporangium calvum DSM 43043 003098970.1 A 003637608.1 Cellulomonas flavigena DSM 20109 p s TBF 19.5.1 004171581.1 Deinococcus maricopensis DSM 2121 B-1491 t 003703817.1

5 s t

295195.1 Deinococcus radiodurans R1 GH13 EF 633524.1 Myxococcus xanthus DK 1622

4 o P GH13 s Actinoplanes sp.

6 GH13 g u

M l

2

l

i GH13 EFT25501.1 Propionibacterium acnes HL1 a 1

a

c c l

4

a n o

9

c

a B

AKF03861 Sandaracinus amylolyticus DSM 53668 i 1

. r 001227636.1 Synechococcus sp. RCC307

r 00050865.1 Magnetospirillum magnetotacticum MS-1 1

6

b 1

d

. 6

S Arcanobacterium haemolyticum DSM 20595 YP o

n

1 r . Actinosynnema mirum DSM 43827

y TCC 49814 1

u

6 ibrio alginolyticus 12G01 e GH13 t T n

T6c 9 a D ruepera radiovictrix DSM 17093 GH13 99031621 Halothermothrix orenii t 07090550.1 Corynebacterium genitalium A a 1 r n

l

e GH13 ZP 267649.1 Lactococcus lactis subsp. lactis Il1403 l o GH13 6 A

i GH13 AAD05199.1 Paenibacillus polymyxa e p n 8

c YP 002565259.1 Halorubrum lacusprofundi h 118 Thermoactinomyces vulgaris i o 003699960.1 Bacillus selenitireducens MLS10GH13 BAF98616.1 Hyphopichia burtonii 1 001302320.1 Parabacteroides distasonis l 02207998.1 Coprococcus eutactus u

c 0

YP 002834100.1 Corynebacterium aurimucosum A u

s GH13 BAA07401.1 Xanthomonas campestris Anoxybacillus flavithermus WK1 07707498.1 Bacillus sp. m3-13 o YP n

004052835.1 Marivirga tractuosa DSM 4126 c

a

GH13 ZP a 001908940.1 Podospora anserina S mat+ c

r GH13 03709545.1 Corynebacterium matruchotii c GH13 NP u YP 001568454.1 Petrotoga mobilis SJ95 i

G d GH13 s GH13 41016824 Schwanniomyces occidentalis i 01258420.1 V t

1

W32490.1 GH13 GH13 YP m r GH13 ZP YP .

594401.1 Schizosaccharomyces pombe 972h- o

8 588205.1 Schizosaccharomyces pombe 972h- a GH13 1091 01170936.1 Bacillus sp. NRRL p

GH13 ZP

AA GH13 ZP 6 002886871.1 Exiguobacterium sp.A r

GH13 h 002940188.1 Kosmotoga olearia i 0 GH13 XP c GH13 ZP i

c

6 YP 003252634.1 Geobacillus sp. Y412MC61 o

YP

u

GH13 1 06970575.1 Ktedonobacter racemifer DSMYP 44963 p GH13 EFY95778.1 Metarhizium robertsii 002316532.1 s

2 e

GH13 ZP

4 GH13 n 001545735.1 Herpetosiphon aurantiacus DSM 785 S GH13 NP

0 YP s GH13 NP B GH13 ZP GH13 0 i s

YP GH13

15 GH13

YP 001813251.1 Exiguobacterium sibiricum 255-15 D ATCC 51

YP

S GH13 ZP

GH13 M

GH13

2

GH13

1

GH13

2 1

1 196

1

FIG.3.—Phylogeny of Sandaracinus amylolyticus GH13 and GH57 a-amylases. Sandaracinus amylolyticus a-amylases (GH13 and GH57) are colored in pink along with their domain architecture displayed as Domain: Color Shape [GH13: Red, rectangle; GH57: Black, rectangle; GH77: dark pink, rectangle; CBM48: Dark blue, circle; CBM20: green, circle; CBM41: cyan, circle]. Myxobacterial homologs are colored fluorescent green. Bootstrap values correspond- ing to the tree nodes are provided.

Taking into account the GH57 clade, the sole GH57 other myxobacterial species such as S. cellulosum Soce56 and domain of S. amylolyticus (GenBank ID: AKF06951) shares Plesiocystis pacifica SIR-1 and are further related to the clade the clade with anaerobic Deltaproteobacteria. Among the containing Gammaproteobacteria. Apart from these two am- GH13 domains of S. amylolyticus, AKF03862, AKF07527, ylases AKF03860 and AKF06952 are similar to each other and AKF09643, AKF11765, and AKF11772 shares clade with group with Candidatus accumulibacter phosphtis amylase other myxobacteria. Among these, AKF11772 and (YP_0031672861) from the phylum Acidobacteria. AKF07527 share sister branches and form sister clades with Interestingly, AKF03860 and AKF06952 group with members

Genome Biol. Evol. 8(8):2520–2529. doi:10.1093/gbe/evw151 Advance Access publication June 29, 2016 2525 Sharma et al. GBE

of the phyla Betaproteobacteria suggestive of a possible hor- izontal transfer. AKF04017 and AKF03605 have close homologs in the ex- tremophile Deinococcus desserti VCD115 and shares clade with certain myxobacterial amylases belonging to

Myxococcus xanthus, S. cellulosum,andHaliangium ocraceum 13

which further share clades with Deinococcus amylases. TCC 19860 A AKF04017 and AKF03605 share the highest sequence identity 003705927.1 Truepera radiovictrix DSM 17093 003705927.1 Truepera

003761138.1 Nitrosococcus watsonii C-1

with D. desserti VCD115 plasmid protein (YP_002787899; YP

YP

Acidovorax avenae subsp. avenae GH15

GH15YP 002478443.1Arthrobacter chlorophenolicusA6 plasmid

GH15AKF10487 Sandaracinus amylolyticus DSM 53668

GH15

GH15 YP 003323761.1 Thermobaculum terrenum ATCC BAA-798

GH15 YP 003899615.1 Cyanothece sp. PCC 7822 plasmid

GH15 BAB60480.1 Thermoplasma volcanium GSS1 53.3%) and chromosomal protein (YP_002785382; 45.6%), GH15 YP 023376.1 Picrophilus torridus DSM 9790 003393358.1 Conexibacter woesei DSM 14684 GH15 ZP 05571311.1 Ferroplasma acidarmanus fer1 YP GH15 ZP 06971349.1 Ktedonobacter racemifer DSM 44963

GH15 ZP 06976049.1 KtedonobacterYP 004236291.1 racemifer DSM 44963 03627327.1 Pedosphaera parvula Ellin514 GH15

GH15 respectively, that might point to its plasmid-borne horizontal YP 001618700.1 Sorangium cellulosum So ce56 GH15 ZP 01459561.1 Stigmatella aurantiaca DW4-3-1

GH15

GH15 ZP 644038.1 Rubrobacter xylanophilus DSM 9941 GH15 ZP 05038177.1YP Synechococcus sp. PCC 7335 transfer across the clade (fig. 3). The sequence conservation of 06474665.1 Frankia symbiont of Datisca glomerata GH15

GH15 ZP 06965326.1 Ktedonobacter racemifer DSM 44963 CSRs and the high percentage identity of these amylases to GH15 ZP 06974404.1 Ktedonobacter racemifer DSM 44963 GH15 ZP

GH15 ZP 06966138.1 Ktedonobacter racemifer DSM 44963 Deinococcus support the inference for the putative horizontal GH15 NP 733605.1 Streptomyces coelicolor A3-2 GH15 YP 003528299.1 Nitrosococcus halophilus Nc 4

GH15 YP 004367612.1 Marinithermus hydrothermalis DSM 14884 004368781.1 Marinithermus hydrothermalis DSM 14884 YP GH15 transfers among these members. AKF09634 does not share its 003404287.1 Haloterrigena turkmenica DSM 5511 GH15 YP

GH15 ZP 08046100.1 Haladaptatus paucihalophilus DX253 sister branch with any member, but the clade contains mem- GH15 YP 003570336.1 Salinibacter ruber M8 002524140.1 Thermomicrobium roseum DSM 5159 plasmid GH15 YP 003321444.1 Sphaerobacter thermophilus DSM 20745 bers from diverse phyla, that is, Rhizobium from alphaproteo- GH15 YP Acidimicrobium ferrooxidans DSM 10331 GH15 YP 003109135.1

GH15 AKF10340 Sandaracinus amylolyticus DSM 53668 bacteria, Opitutus from Chlamydae, Chthonibacter from GH15 YP 634299.1 Myxococcus xanthus DK 1622

GH15 YP 001109219.1 Saccharopolyspora erythraea NRRL 2338 GH15 YP suggesting the multiple gain and loss of 001104705.1 Saccharopolyspora erythraea NRRL GH15 2338 YP 001641939.1 Methylobacterium extorquens PA1 GH15 YP 525264.1 Rhodoferax ferrireducens T118 this amylase among . Similarly, AKF07413 shares GH15 YP 004181 GH15 ZP 166.1 Terriglobus saanensis SP1PR4 03133328.1 Chthoniobacter flavus Ellin428 GH15 the clade with members of Cyanobacteria; AKF07406 shares YP 002489270.1 GH15 CCA53726.1 Streptomyces venezuelae Arthrobacter chlorophenolicus A6 GH15 YP the clade with members of Deltaproteobacteria; AKF03861 GH15 YP 001506562.1 Frankia sp. EAN1pec GH15 001820209.1 Opitutus terrae PB90-1 YP 872901.1 Acidothermus cellulolyticus 11B ATCC 10712 GH15 YP with members of Alphaproteobacteria and AKF03642 with GH15 YP 002492909.1 001379559.1 GH15

GH15 CBJ38709.1YP 635512.1 Ralstonia Myxococcus solanacearumAnaeromyxobacter xanthus CMR15 DK 1622 sp. Fw109-5

GH15 YP members of . Taken together, we can suggest Anaeromyxobacter dehalogenans 2CP-1 GH15 318915.1 Nitrobacter winogradskyi Nb-255 GH15 YP 004011578.1 Rhodomicrobium vannielii YP 001234401.1 Acidiphilium cryptum JF-5 GH15 YP

GH15 that many of the S. amylolyticus amylases have been acquired GH15 AEA61621.1 Burkholderia gladioli BSR3 001834431.1 Beijerinckia indica subsp. indica ATCC 9039 GH15 YP 002756025.1 Acidobacterium capsulatum ATCC 51196

GH15

GH15 YP 571630.1 Nitrobacter hamburgensis X14 plasmid YP GH15 ZP 001603280.1 Gluconacetobacter diazotrophicus PA1 5 GH15 YP 412992.1 Nitrosospira multiformis A YP

GH15 EFV84177.1 745137.1 Granulibacter bethesdensis CGDNIH1 via horizontal gene transfer events from other bacterial species GH15 ZP

01226081.1

01 belonging to the phylum Deinococcus-Thermus, 126734.1 Nitrococcus mobilis Nb-231

Aurantimonas manganoxydans SI85-9A1 ATCC 17100 Acidobacteria, Cyanobacteria, and Alphaproteobacteria. Achromobacter xylosoxidans C54 To further support these horizontal transfer events, we

TCC 25196 checked the sequence conservation between S. amylolyticus amylases and closely related members in the phylogenetic tree and aligned using PROMALS-3D with Taka-Amylase A of A. oryzae (PDBid 2TAA) as a structural template. We also per- formed synteny studies analyzing proteins present in the vicin- ity of all identified S. amylolyticus GH13 and GH57 amylases and their closely related homologs as revealed by phylogenetic FIG.4.—Phylogeny of Sandaracinus amylolyticus GH15 g-amylases. studies. We could not trace any conserved syntenic regions in Sandaracinus amylolyticus g-amylases are colored dark pink and myxobac- these genomes whereas in the case of AKF03862 and terial homologs are colored fluorescent green. Bootstrap values corre- AKF09634 we identified glycogen debranching enzymes in sponding to the tree nodes are provided. Stigmatella aurantiaca DW4/3-1 [56% identity and 98% query coverage of AKF03860 with WP_013376091.1] and Opitutus terrae PB90-1 [57% identity and 97% query cover- horizontal transfer events from diverse bacterial phylum such age of AKF09635 YP_001820542.1] which further support as Deinococcus-Thermus, Cyanobacteria, Alphaproteobac- their putative horizontal transfer in S. amylolyticus. teria, Gammaproteobacteria, and Acidobacteria. Similarly, the phylogenetic inference was drawn from two representative domains of g-amylases using the same meth- odology as followed for a-amylases (fig. 4). We found that Agar Degrading Enzymes one of the g-amylases AKF10340 lies closer to M. xanthus Some bacteria are known to degrade agar and use it as a amylase whereas the other AKF10487 shares its sister carbon source and possess agarase enzymes that help in hy- branch with Nitrosococcus watsonii, a Gammaproteobacteria. drolyzing agar in oligosaccharides (Chi et al. 2012). Depending This suggests that while some of the a-andg-amylases onthesiteofaction,theycanbeoftwotypes;a-agarase are present in many myxobacterial species, a majority of the (agarose 3-glycanohydrolase; 3.2.1.158; GH96) and a-andg-amylases are present only in S. amylolyticus, and not b-agarase (agarose 4-glycanohydrolase; 3.2.1.81: GH16, in other myxobacterial genomes, possibly due to the result of GH50, GH86, GH118) (table 1). Both families of agarases

2526 Genome Biol. Evol. 8(8):2520–2529. doi:10.1093/gbe/evw151 Advance Access publication June 29, 2016 Complete Genome of a Starch-Degrading Myxobacteria GBE

Table 1 Cellulose, Starch, Chitin, and Agar Degrading Enzymes and Their Representative CAZY Domains Distribution Substrate Enzyme Category Enzyme Name EC Number CAZY Families (protein repertoire in brackets) Starch Amylase a-Amylase 3.2.1.1 GH13 (15), GH57 (1), GH119, GH126 Amylase b-Amylase 3.2.1.2 GH14 Amylase g-Amylase 3.2.1.3 GH15 (2), GH97 Chitin Chitinase 3.2.1.14 GH18, GH19 (2), GH23 (5), GH48 Agar a-Agarase Agarose 3-glycanohydrolase 3.2.1.158 GH96 b-Agarase Agarose 4-glycanohydrolase 3.2.1.81 GH16, GH50, GH86, GH118 Cellulose Cellulase b-Glucosidase 3.2.1.21 GH1 (3), GH3 (2), GH5 (2), GH9, GH30, GH116 Cellulase Cellobiohydrolase or Exoglucanase 3.2.1.176 GH48, GH6, GH7, GH9 Cellulase Endoglucanase 3.2.1.4 GH5 (2), GH6, GH7, GH8, GH9, GH12, GH44, GH45, GH48, GH51, GH74 (3), GH124

NOTE.—EC number, enzyme category and CAZY families for proteins involved in starch, chitin, agar, and cellulose degradation are shown. The number of Sandaracinus amylolyticus proteins identified having CAZY domains are represented in bold fonts in respective families.

were absent in S. amylolyticus genome which could be related in , prokaryotes, and (Funkhouser and to its agar nondegrading property (Mohr et al. 2012)assup- Aronson 2007). GH18 chitinases were not found in S. amylo- ported by colony morphology where agar was not penetrated lyticus genome while members of GH19 (AKF03006 and or liquefied by the bacteria (supplementary fig. S5, AKF10423) and GH23 (AKF04575, AKF05533, AKF06346, Supplementary Material online). AKF08835, and AKF11377) could be identified. GH19 family chitinases are mostly reported to be present in plants Cellulose Degrading Enzymes with only a few members in prokaryotes (Kasprzewska 2003; We identified only GH5 and GH74 in S. amylolyticus proteome Funkhouser and Aronson 2007). Among the two identified in two and three proteins, respectively. GH5 domain was pre- GH19 homologs, we found that AKF10423 is approximately sent as a single unit in AKF04963 and AKF09194 whereas 65% identical (~68% query coverage) with cyanobacteria GH74 was identified in AKF04859, AKF06963, and such as Nodosilinea nodulosa and Limnoraphis robusta AKF09775 associated with a GH74 domain. CAZY domains along with another myxobacterium Chondromyces crocatus. for cellobiohydrolase/exoglucanases (represented by GH6, Besides these homologs, we also found that AKF10423 shows GH7, GH9, and GH48) were not present in S. amylolyticus. maximum similarity with eukaryotic GH19 chitinases counter- Cellobiohydrolase/exoglucanases plays a significant role in cel- parts as present in Acacia mangium, Festuca arundinacea, lulose degradation as they cut two to four sugar subunits from Triticum aestivum,andLeucaena leucocephala (55–59% iden- the exposed ends formed by the endoglucanases, which are tity with 70% query coverage). We identified that another used as initial substrates for b-glucosidase (Mba Medie et al. chitinase, AKF06346, has GH23 domain along with three 2012). The absence of cellobiohydrolase or exoglucanases copies of CBM50, which is generally present in the enzymes could be a possible reason for the lack of cellulose degradation cleaving either chitin or peptidoglycan. in S. amylolyticus. Out of b-glucosidase representative CAZY families GH1, GH3, GH5, GH9, GH30, and GH116, only GH1, GH3, and GH5 families were found in three, two and two Conclusion proteins in S. amylolyticus, respectively. AKF03843, AKF0 The complete 10.33 Mb circular genome of S. amylolyticus 6028, and AKF10711 have a GH1 domain whereas type strain DSM 53668T was determined. CAZy analysis re- AKF09078, and AKF10875 have only the GH3 domain. vealed the presence of 72 GT domains, 72 CE domains, 58 GH domains, 19 AA domains, 4 PL domains, and 26 CBM do- Chitin Degrading Enzymes mains. Sixteen S. amylolyticus proteins were identified of Chitinases (EC:3.2.1.14) are involved in degradation of chitin, which 15 are a-amylases of class GH13 and 1 GH57. Two a component of the cell wall of fungi and exoskeleton of g-amylases of GH15 family were also identified. Phylogeny arthropods. GH18, GH19, GH23, and GH48 families have analysis revealed that ten GH13 a-amylases and one GH15 been reported in the CAZY database as chitinases that hydro- g-amylase have been aquired via horizontal gene transfer lyse b-1,4 linkages between N-acetylglucosamine units. GH18 from other such as Deinococcus-Thermus, is a chitinase family with a characteristic “[LIVM Cyanobacteria, Alphaproteobacteria, Gammaproteobacteria, FY][DN]G[LIVMF][DN][LIVMF][DN]E” motif and is expressed and Acidobacteria.

Genome Biol. Evol. 8(8):2520–2529. doi:10.1093/gbe/evw151 Advance Access publication June 29, 2016 2527 Sharma et al. GBE

Supplementary Material Han K, et al. 2013. Extraordinary expansion of a Sorangium cellulosum genome from an alkaline milieu. Sci Rep. 3:2101. Supplementary figures S1–S5 and tables S1 and S2 are avail- Janecek S, Kuchtova A. 2012. In silico identification of catalytic residues able at Genome Biology and Evolution online (http://www. and domain fold of the family GH119 sharing the catalytic machinery gbe.oxfordjournals.org/). with the alpha-amylase family GH57. FEBS Lett. 586:3360–3366. Janecek S, Svensson B, Macgregor EA. 2014. alpha-Amylase: an enzyme specificity found in various families of glycoside . Cell Mol Acknowledgments Life Sci. 71:1149–1170. Kasprzewska A. 2003. Plant chitinases–regulation and function. Cell Mol This work is supported by a project “Expansion and modern- Biol Lett. 8:809–824. ization of Microbial Type Culture Collection and Gene Bank Lagesen K, et al. 2007. RNAmmer: consistent and rapid annotation of (MTCC) jointly supported by Council of Scientific and ribosomal RNA genes. Nucleic Acids Res. 35:3100–3108. Industrial Research (CSIR) Grant No. BSC0402 and Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B. 2014. The carbohydrate-active enzymes database (CAZy) in 2013. Department of Biotechnology (DBT) Govt. of India Grant No. Nucleic Acids Res. 42:D490–D495. BT/PR7368/INF/22/177/2012.” The authors thank MTCC for Lowe TM, Eddy SR. 1997. tRNAscan-SE: a program for improved detection providing help in procuring the cultures from DSMZ. They of transfer RNA genes in genomic sequence. Nucleic Acids Res. acknowledge Dr Ramya TNC for providing workbench for cul- 25:955–964. turing myxobacteria. G.S. and I.K. acknowledge the Council Matsuura Y, Kusunoki M, Harada W, Kakudo M. 1984. Structure of Scientific and Industrial Research (CSIR) and University and possible catalytic residues of Taka-amylase A. J Biochem. 95:697–702. Grant Commission (UGC) for the research fellowship, respec- Mba Medie F, Davies GJ, Drancourt M, Henrissat B. 2012. Genome anal- tively. They also thank the Ge´nome Que´bec Innovation yses highlight the different biological roles of cellulases. Nat Rev Centre, McGill University, Canada and C-CAMP next-genera- Microbiol. 10:227–234. tion genomics facility, Bangalore, India for help in obtaining Mohr KI, Garcia RO, Gerth K, Irschik H, Muller R. 2012. PacBio and Illumina NGS data, respectively. Sandaracinus amylolyticus gen. nov., sp. nov., a starch-degrading soil myxobacterium, and description of Sandaracinaceae fam. nov. Int J Syst Evol Microbiol. 62:1191–1198. Literature Cited Noguchi J, et al. 2011. Crystal structure of the branching enzyme I Alikhan NF, Petty NK, Ben Zakour NL, Beatson SA. 2011. BLAST Ring (BEI) from Oryza sativa L with implications for catalysis and sub- Image Generator (BRIG): simple genome comparisons. strate binding. Glycobiology 21:1108–1116. BMC Genomics 12:402. Patel RK, Jain M. 2012. NGS QC Toolkit: a toolkit for quality control of next Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local generation sequencing data. PLoS One 7:e30619. alignment search tool. J Mol Biol. 215:403–410. Pei J, Kim BH, Grishin NV. PROMALS3D: a tool for multiple protein se- Auch AF, Von Jan M, Klenk HP, Goker M. 2010. Digital DNA-DNA hybrid- quence and structure alignments. Nucleic Acids Res. 36: 2295–2300. ization for microbial species delineation by means of genome-to- Reichenbach H, Dworkin M. 1992. The myxobacteria. In: Balows A, genome sequence comparison. Stand Genomic Sci. 2:117–134. Tru¨ per HG, Dworkin M, Harder W, Schleifer KH, editors. The pro- Aziz RK, et al. 2008. The RAST Server: rapid annotations using subsystems karyotes: a handbook on the biology of bacteria: ecophysiology, technology. BMC Genomics 9:75. isolation, identification, applications. New York: Springer New Blesak K, Janecek S. 2012. Sequence fingerprints of enzyme specificities York. p. 3416–3487. from the family GH57. 16:497– Reichenbach H. 2005. “Order VIII. Myxococcales” Bergey’s Manual of 506. Systematic Bacteriology. In: Brenner DJ, Krieg NR, Staley JT, Garrity Chi WJ, Chang YK, Hong SK. 2012. Agar degradation by microorganisms GM, editors. Vol. 2, 2nd ed part C, New York: Springer-Verlag New and agar-degrading enzymes. Appl Microbiol Biotechnol. 94:917– York Inc. pp. 1074–1079. 930. Richter M, Rossello-Mora R. 2009. Shifting the genomic gold standard for Chin CS, et al. 2013. Nonhybrid, finished microbial genome assemblies the prokaryotic species definition. Proc Natl Acad Sci U S A. 106: from long-read SMRT sequencing data. Nat Methods. 10:563–569. 19126–19131. Eddy SR. 2011. Accelerated profile HMM searches. PLoS Comput Biol. Schneiker S, et al. 2007. Complete genome sequence of the myxobacter- 7:e1002195. ium Sorangium cellulosum. Nat Biotechnol. 25: 1281–1289. Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accu- Shimkets LJ. 1990. Social and developmental biology of the myxobacteria. racy and high throughput. Nucleic Acids Res. 32:1792–1797. Microbiol Rev. 54: 473–501. Funkhouser JD, Aronson NN Jr. 2007. Chitinase family GH18: evolutionary Siguier P, Perochon J, Lestrade L, Mahillon J, Chandler M. 2006. ISfinder: insights from the genomic history of a diverse . BMC Evol the reference centre for bacterial insertion sequences. Nucleic Acids Biol. 7:96. Res. 34: D32–D36. Garcia R, Mu¨ ller R. 2014. The family Polyangiaceae. In: Rosenberg E, Steinmetz H, et al. 2012. Indiacens A and B: prenyl indoles from the Delong E, Lory S, Stackebrandt E, Thompson F, editors. The prokary- myxobacterium Sandaracinus amylolyticus. J Nat Prod. 75: 1803– otes. Berlin, Heidelberg: Springer. p. 247–279. 1805. Grissa I, Vergnaud G, Pourcel C. 2007. CRISPRFinder: a web tool to identify Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. 2013. MEGA6: clustered regularly interspaced short palindromic repeats. Nucleic molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. Acids Res. 35:W52–W57. 30: 2725–2729. Hall TA. 1999. BioEdit: a user-friendly biological sequence alignment editor Van Der Maarel MJ, Van Der Veen B, Uitdehaag JC, Leemhuis H, Dijkhuizen and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser. L. 2002. Properties and applications of starch-converting enzymes of 41:95–98. the alpha-amylase family. J Biotechnol. 94: 137–155.

2528 Genome Biol. Evol. 8(8):2520–2529. doi:10.1093/gbe/evw151 Advance Access publication June 29, 2016 Complete Genome of a Starch-Degrading Myxobacteria GBE

Weissman KJ, Muller R. 2010. Myxobacterial secondary metabolites: Zhao JY, et al. 2008. Discovery of the autonomously replicating plasmid bioactivities and modes-of-action. Nat Prod Rep. 27: 1276–1295. pMF1 from Myxococcus fulvus and development of a gene cloning Whitworth DE. 2008. Myxobacteria multicellularity and differentiation. system in Myxococcus xanthus. Appl Environ Microbiol. 74: 1980– Washington, DC: ASM Press, American Society for Microbiology. 1987. Yin Y, et al. 2012. dbCAN: a web resource for automated carbohydrate- active enzyme annotation. Nucleic Acids Res. 40: W445–W451. Associate editor: Howard Ochman

Genome Biol. Evol. 8(8):2520–2529. doi:10.1093/gbe/evw151 Advance Access publication June 29, 2016 2529