Visualizing the Superfamily of Metallo-Β-Lactamases Through Sequence Similarity Network Neighborhood Connectivity Analysis
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.04.16.045138; this version posted September 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1 Visualizing the superfamily of metallo-β-lactamases through sequence similarity network neighborhood connectivity analysis Javier M. González Instituto de Bionanotecnología del NOA (INBIONATEC), Consejo Nacional de Investigaciones Científicas y Técnicas, Universidad Nacional de Santiago del Estero (CONICET-UNSE), G4206XCP Santiago del Estero, Argentina. Email: [email protected] http://orcid.org/0000-0002-3298-2235 Keywords: metallo-lactamase, protein superfamily, tanglegram, sequence similarity network, neighborhood connectivity Abbreviations: MBL (metallo-β-lactamase), MSA (multiple sequence alignment), SSN (sequence similarity network), PSDO (persulfide dioxygenase), FDP (flavo diiron protein), QQL (quorum quenching lactonase), OPH (organophosphorus hydrolase), ZBL (zinc-β-lactamase), CPSF (cleavage and polyadenylation specificity factor), CC (connected component), NC (neighborhood connectivity). ABSTRACT The superfamily of metallo-β-lactamases (MBL) comprises an ancient group of proteins found in all domains of life, sharing a characteristic αββα fold and a histidine-rich motif for binding of transition metal ions, with the ability to catalyze a variety of hydrolysis and redox reactions. Herein, structural homology and sequence similarity network (SSN) analysis are used to assist the phylogenetic reconstruction of the MBL superfamily, introducing tanglegrams to evaluate structure-function relationships. SSN neighborhood connectivity is applied for spotting protein families within SSN clusters, showing that 98 % of the superfamily remains to be explored experimentally. Further SSN research is suggested in order to determine their topological properties, which will be instrumental for the improvement of automated sequence annotation methods. The metallo-β-lactamase (MBL) superfamily Despite its low resolution, the atomic model comprises an ancient group of proteins found in disclosed the new αββα fold and a single Zn(II) all domains of life, sharing a characteristic αββα ion bound to a three-histidine motif, resembling fold and a histidine-rich motif for binding of the active site of carbonic anhydrases. Thus, transition metal ions. The name was coined after BcII and ZBLs in general were believed to use a the first superfamily members to be single Zn(II) ion to activate a water molecule for characterized; a group of zinc-dependent hydrolysis, paralleling the mechanism by which hydrolases produced by bacteria resistant to β- carbonic anhydrases catalyze carbon dioxide lactam antibiotics. These zinc-β-lactamases hydration. This hypothesis was soon questioned (ZBLs) hydrolyze the amide bond present in all when the structure of ZBL CcrA from β-lactams and thus render them ineffective. The Bacteroides fragilis was published, disclosing a first X-ray crystallographic report of a ZBL was bimetallic zinc center, with the second zinc that of BcII from Bacillus cereus 569/H/9 [1]. being coordinated to nearby Asp, Cys and His bioRxiv preprint doi: https://doi.org/10.1101/2020.04.16.045138; this version posted September 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 2 residues [2]. Besides, the second zinc was later InterPro 77.0 database [11] entry IPR001279 for found in B. cereus ZBL too [3-5], starting a the MBL superfamily includes about half a decade-long controversy regarding the role of million members. Indeed, the MBL superfamily each zinc ion. Later on, it was found that has grown astoundingly over the past 30 years, monometallic ZBLs are rather exceptional and and an integrative revision is long overdue. In the hydrolysis reaction generally requires two this report, the Pfam and Protein Data Bank Zn(II) ions [6, 7]. It is important to note that, databases are interrogated in order to obtain an while ZBLs hydrolyze antibiotics by means of a updated picture of the distribution of proteins metal-activated water molecule, most β- throughout the MBL superfamily and to get lactamases use a conserved serine residue in a insights into their structure-function completely different protein scaffold. In other relationships. It is shown that the most words, the majority of β-lactamases are not widespread MBLs are those acting on metallic, and referring to ZBLs and MBLs in phosphoesters, such as nucleic acids, general simply as “β-lactamases” should be nucleotides, phospholipids and phosphonates; avoided, particularly when annotating these whereas the more specialized enzymes are less proteins in public databases. Besides, even distributed, such as zinc-β-lactamases, flavo- though most members of the superfamily are diiron oxidoreductases, and MBLs involved in devoid of β-lactamase activity, the acronym secondary metabolism, with the important MBL has been adopted to annotate most exception of the ubiquitous glyoxalases II. In members of the superfamily. The same addition, the structural diversity of membrane- convention will be followed here to define any associated MBL proteins has not been explored protein with at least one characteristic MBL by experimental structural approaches, despite domain, leaving the acronym ZBL to describe being involved in fundamental processes like metallo-β-lactamases themselves. natural transformation and horizontal gene transfer. A great diversity of proteins evolved in the MBL superfamily by combining catalytic MBL MATERIALS AND METHODS domains and substrate recognition domains in a modular fashion. Besides, subtle changes in the metal coordinating residue networks expand this Structural data harvesting and tanglegram diversity by enabling the coordination of calculation different transition metals, particularly Zn(II), All MBL protein sequences with available Mn(II), and Fe(II)/Fe(III) (Figure 1). Early experimentally determined three-dimensional attempts to build a systematic classification of structure were retrieved from the Protein Data the MBL superfamily were conducted by L. Bank (PDB) with the Dali Lite server [12], Aravind [8], as some of the very first using structures PDB 2gmn and PDB 3i13 as applications of the PSI-Blast algorithm [9], who queries. A set of 105 high-resolution structures showed that many proteins other than ZBLs was obtained after applying a 90 % sequence comprise the characteristic fold and histidine- similarity cutoff. As well, an unrooted structural rich metal-binding motif of MBLs, mapping key dendrogram was obtained for this set with the residues onto the structure of B. cereus ZBL. Dali Lite server all-against-all comparison tool, These observations were updated in 2001 by which calculates a distance matrix of Z-scores Daiyasu et al., when additional crystal structures by aligning the structures all-against-all and of MBL superfamily members were available outputs a dendrogram derived with the average [10]. At present, more than a hundred proteins linkage clustering method [12]. Next, the full have been shown to contain αββα domains amino acid sequence corresponding to each of through X-ray crystallography, whereas the these 105 structures were retrieved from the bioRxiv preprint doi: https://doi.org/10.1101/2020.04.16.045138; this version posted September 22, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 3 UniProt database [13], in order to avoid The resulting MSA consisted of 55,076 sequence artifacts like mutations and missing sequences and 143 columns. Next, the full residues often found in PDB files. A structure- sequences present in this MSA set were guided multiple sequence alignment (MSA) was retrieved from the UniProt 2019-10 database calculated with Promals3D [14]. This MSA was [13] and reduced to a final set of 32,418 manually edited with Jalview 2.9 [15] to discard sequences, by applying a 70 % similarity cutoff highly gapped regions, by applying a 50 % with CD-Hit [22] and ensuring that all 105 alignment quality cutoff. The resulting MSA, sequences present in the tanglegram were comprising 105 sequences and 204 columns, included. A sequence similarity network (SSN) was used to calculate a maximum likelihood [23] was then calculated with this 32,418- cladogram with RAxML [16], running at the sequence dataset, using the EFI-EST online tool Cipres server [17]. A best-scoring bootstrapped [24]. The obtained representative node network tree was obtained after 1002 replicates, using comprised 15,292 nodes at 40 % sequence the WAG substitution matrix as evolutionary similarity, and 762,784 edges at 10−20 Blast model [18], and was displayed as a consensus pairwise similarity threshold. Topology network cladogram by applying the 50 % majority rule. analysis was performed with NetworkAnalyzer Finally, in order to compare the consensus 2.7 [25], as implemented in Cytoscape 3.7.1 sequence-based cladogram with the distance- [26]. Network statistics plots were prepared based dendrogram topologies, a tanglegram with SigmaPlot