Aix-Marseille Université Faculté de Médecine de Marseille Ecole Doctorale des Sciences de la Vie et de la Santé

THÈSE DE DOCTORAT Présentée par

Vivek KESHRI Date et lieu de naissance: 20-Octobre-1985, Inde

Evolutionary Analysis of the β-lactamase Families (Analyse évolutive des familles de β-lactamase)

Soutenance de la thèse le 05-Juillet-2018 En vue de l’obtenir du grade de Docteur de l’Université d’Aix-Marseille

Membres du jury de la thèse

Pr Didier RAOULT Directeur de Thèse Pr Max MAURIN Rapporteur Dr Patricia RENESTO Rapporteur Pr Pierre-Edouard FOURNIER Examinateur

Laboratoire d’accueil

IHU - Méditerranée Infection, 19-21 Boulevard Jean Moulin, Marseille, France

Contents

Abstract ...... 1

Résumé ...... 3

Chapter-1 ...... 7 Introduction ...... 7

Chapter-2 (A) ...... 13 Phylogenomic analysis of β-lactamase in archaea and bacteria enables the identification of putative new members ...... 13

Chapter-2 (B) ...... 32 Metallo-β-lactamase enzymes in humans and archae ...... 32

Chapter-3 ...... 60 Functional convergence of antibiotic resistance in β-lactamase enzymes is not conferred by simple convergent amino acid substitution ...... 60

Chapter-4 ...... 92 An integrative database of β-lactamase enzymes: sequences, structures, functions and phylogenetic trees ...... 92

Conclusions ...... 109

Future Perspective ...... 110

Acknowledgements ...... 114

Abstract

This thesis describes three-phase of research investigations. First, Phylogenomic analysis of β- lactamase in archaea and bacteria enables the identification of putative new members followed by hidden sources of metallo-β-lactamases; second, the functional convergent evolution of β- lactamase and third, development of an integrative database of β-lactamase. The β-lactam antibiotics are one of the oldest and widely used antimicrobial drugs. The bacterial enzyme β-lactamase (falls into four molecular classes A, B, C, and D) hydrolyzes the β- lactam antibiotic by breaking the core structure “β-lactam ring”. Antibiotic resistance (ARGs) are widely distributed in the natural environment. To identify the novel β-lactamases a comprehensive investigation was performed in different biological databases such as Human Microbiome Project (reference and metagenomic), environmental (env_nr), and NCBI’s non-redundant (nr). For identification of putative new sequences and uncovering the diversity, the ancestral sequence and HMM profile of extant β-lactamases were inferred which enables to identify potential homologous sequences from biological databases. The analysis revealed that putative ancestral sequences and HMM profile searches played a significant role in the identification of remote homologous and uncovered the existing β-lactamase enzyme in the metagenomic database as dark-matter. The comprehensive phylogenetic analyses of extant and newly identified β-lactamase represent the novel clades in the trees. Further, the β-lactam antibiotic hydrolysis activity of newly identified sequences (from archaea and human) was investigated in laboratory, which shows β-lactamase activity. The second phase of the investigation was undertaken to examine the functional evolution of β-lactamases. First, 1155 β-lactamase protein sequences were retrieved from ARG- ANNOT database and Minimal Inhibitory Concentration (MIC) values, the measure of antibiotic function, from the corresponding literature. Molecular phylogenetic framework was applied to examine the evolution of β-lactamase functional activity, in particularly the evolutionary association of molecular and phenotypic characteristics. The results revealed that the functional activity of β-lactamase evolved convergently within the molecular class (intra-class level). The third phase of this thesis presents development of an integrative β-lactamase database. The existing public database of β-lactamase has limited information, therefore, an integrative database was developed. The database provides extensive information about the

1 sequences; primary amino acid sequences, the closest structural information in the external structure database- PDB, the functional profile, and phylogenetic trees in a single database. The functional profiles are accessible in the form of Minimum Inhibitory Concentration (MIC) and kinetic parameters. As of now, 1155 antibiotic resistance sequences are available in the current database. The database will facilitate to research community working in the field of β- lactam antibiotic resistance.

Key words: β-lactam antibiotics, β-lactamase enzymes, ancestral sequence, HMM profile, convergent evolution, β-lactamase database.

2

Résumé Cette thèse décrit trois phases d'enquêtes de recherche. Premièrement, l'analyse phylogénomique de β-lactamases chez les archées et les bactéries permettant l'identification de nouveaux membres putatifs suivis de sources cachées de métallo-β-lactamases; deuxièmement, l'évolution fonctionnelle convergente de β-lactamases et la troisième, le développement d'une base de données intégrative de β-lactamases. Les antibiotiques β-lactamines sont parmi les médicaments antimicrobiens les plus anciens et les plus utilisés. L'enzyme bactérienne β-lactamase (se divise quatre classes moléculaires A, B, C et D) hydrolyse l'antibiotique β-lactame en cassant la structure de base "anneau β-lactame". Les gènes de résistance aux antibiotiques (GRA) sont largement distribués dans l'environnement naturel. Pour identifier les nouvelles β-lactamases, une étude complète a été réalisée dans diverses bases de données biologiques telles que Human Microbiome Project (référence et métagénomique), protéine environnementale (env_nr) et NCBI's non redondant (nr). Pour l'identification de nouvelles séquences putatives et la découverte de la diversité, la séquence ancestrale et le profil HMM de β-lactamases existantes sont déduits, ce qui permet d’identifier des séquences homologues potentielles à partir de données biologiques. L'analyse a révélé que les séquences ancestrales putatives et les recherches de profil HMM jouaient un rôle important dans l'identification de la base de données homologue et métagénomique à distance dans l'enzyme β-lactamase existante comme matière noire. Les larges analyses phylogénétiques des β-lactamases existantes et nouvellement identifiées représentent les nouveaux clades dans les arbres. En outre, l'activité d'hydrolyse des antibiotiques β-lactamines de séquences nouvellement identifiées (provenant d'archées et d'humains) a été étudiée en laboratoire, ce qui montre l'activité de la β-lactamase. La deuxième phase de l'étude a été entreprise pour examiner l'évolution fonctionnelle des β-lactamases. Premièrement, des séquences de protéines ß-lactamase 1155 ont été extraites de la base de données ARG-ANNOT et des valeurs de concentration minimale inhibitrice (CMI), la mesure de la fonction d’antibiotique, d'après la littérature correspondante. Un cadre phylogénétique moléculaire a été appliqué pour examiner l'évolution de l'activité fonctionnelle de la β-lactamase, en particulier l'association évolutive de l'anthrax et des caractéristiques phénotypiques. Les résultats ont révélé que l'activité fonctionnelle de la β-lactamase évoluait de manière convergente au sein de la classe moléculaire (niveau intra-classe).

3

La troisième phase de cette thèse représente le développement d'une base de données intégrative de β-lactamases. La base de données publique actuelle de β-lactamases a des informations limitées, par conséquence, une base de données intégrative a été développée. La base de données fournit des informations générales sur les séquences; séquences d'acides aminés primaires, les informations structurelles les plus proches dans la base de données de structure externe-PDB, le profil fonctionnel et les arbres phylogénétiques dans une base de données unique. Les profils fonctionnels sont accessibles sous forme de concentration minimale inhibitrice (CMI) et de paramètres cinétiques. À ce jour, 1155 séquences de gènes de résistance aux antibiotiques sont disponibles dans la base de données actuelle. La base de données facilitera les travaux de recherche dans le domaine de la résistance aux β-lactamines.

Mots-clés: antibiotiques β-lactamines, enzymes β-lactamases, séquence ancestrale, profil HMM, évolution convergente, base de données β-lactamase.

4

Avant-propos

Le format de présentation de cette thèse correspond à une recommandation de la Spécialité Génomique et Bioinformatique, à l‟intérieur du Master de Sciences de la Vie et de la Santé qui dépend de l‟Ecole Doctorale des Sciences de la Vie de Marseille. Le candidat est amené à respecter des règles qui lui sont imposées et qui comportent un format de thèse utilisé dans le Nord de l‟Europe permettant un meilleur rangement que les thèses traditionnelles. Par ailleurs, la partie introduction et bibliographie est remplacée par une revue envoyée dans un journal afin de permettre une évaluation extérieure de la qualité de la revue et de permettre à l‟étudiant de le commencer le plus tôt possible une bibliographie exhaustive sur le domaine de cette thèse. Par ailleurs, la thèse est présentée sur article publié, accepté ou soumis associé d‟un bref commentaire donnant le sens général du travail. Cette forme de présentation a paru plus en adéquation avec les exigences de la compétition internationale et permet de se concentrer sur des travaux qui bénéficieront d‟une diffusion internationale.

Pr Didier Raoult

5

6

Chapter-1 Introduction

7

Introduction The antibiotic is an antimicrobial agent used in the treatment and prevention of bacterial infections. One of the broad class of antimicrobial agents is β-lactam that encompasses penicillins, cephalosporins, carbapenems and monobactams (Holten and Onusko 2000). The common structural features of these antibiotics is a four membered “β-lactam ring” in their chemical structure (Kong et al. 2010). Antimicrobial resistance (AR) is one of the most important public health issues and this have been reported from across the world. The AR is challenging our ability to treat infectious diseases, as well as undermining many other advances in health and medicine. There are three major ways which make bacteria becoming resistance to antibiotics: (i) production of β-lactamase enzymes (ii) altered Penicillin-Binding-Protein (PBP), and (iii) diminished expression of outer membrane (Babic et al. 2006). The β-lactamase enzymes are responsible for resistance towards β-lactam antibiotics by breaking the four membered β-lactam rings in their core structure. These bacterial enzymes are continually expanding and the number of natural β-lactamase exceed 1300 (Bush 2013). There are two major classification system exist for categorizing β-lactamases; (1) Ambler molecular classification system: it’s based on amino acid and it classifies all the β-lactamases into four molecular classes (A through D) (Ambler 1980). (2) Bush-Jacoby- Medeiros classification system: it’s based on catalytic properties of these enzymes distinguished into functional-group namely 1, 1e, 2a, 2b, 2be, 2br, 2ber, 2c, 2ce, 2e, 2f, 3a, 3b, 2d, 2de, and 2df (Bebrone 2007; Bush 2013). Molecular classes A, C and D are serine β-lactamases which contains serine amino acid in their active sites while class B (metallo-β-lactamase) require one or two Zn ions for hydrolyzing the β-lactam ring (Ambler 1980; Bush 2013; Hall and Barlow 2004; Lamotte-Brasseur et al. 1994). A large number of β-lactamase and PBPs encoding gene sequence information are now available in public database, and these sequences can be used to illustrate the evolutionary relationship. With reference to the structures, β-lactamase are structurally related to PBPs, and may have evolved from these β-lactam binding enzymes of cell wall biosynthesis (Massova and Mobashery 1998; Poole 2004). The evolutionary theory of β-lactamases indicates that the serine- β-lactamases are ancient enzymes, originated two billion years ago (Hall and Barlow 2004). Based on structural phylogenetic tree it has been concluded that the class C enzymes predates the

8 divergence of other two classes (class A and D) of serine-β-lactamases from a common ancestral (Hall and Barlow 2004). Antibiotic resistance genes are widespread in the natural environment, but environmental and organismal reservoirs (metagenomic database) have not yet been adequately explored. Hence, to understand the diversity of enzyme, in the first part of this thesis, aimed to undertake a comprehensive investigation of these reservoirs. In the previous survey by Sharma et al and Eddy et al, the novel sequence identification from the distinct biological database is readily accessible through most probable ancestral sequences (Sharma et al. 2014) and HMM (hidden Markov Model) profile search analysis (Eddy et al. 2011). Accordingly, the present study was directed toward the investigation of novel β-lactamase sequences in biological databases such as environmental protein (env_nr), human microbiome project (metagenomic and reference genome) and NCBI’s non-redundant. Ancestral sequence reconstruction (ASR) is the estimation of the most probable putative protein sequences by extant protein sequences. This reconstruction incorporates protein sequences in their phylogenetic tree. The inference of ancestral sequences can be separated into four steps. (i) multiple alignments of the extant sequences (ii) the process of alignment refinement by discarding insignificant alignment region (iii) reconstruction of substitution history using a maximum-likelihood approach and finally (iv) phylogenetic tree reconstruction and ancestral sequence prediction on every node of the tree. Different method have been proposed for reconstruction of ancestor sequence, but parsimony and maximum-likelihood methods are widely used (Cunningham et al. 1998; Elias and Tuller 2007; Koshi and Goldstein 1996; Yang et al. 1995) and the Maximum-likelihood algorithms appear to work better than parsimony (Zhang and Nei 1997). The HMM profile is a description of the consensus of a multiple sequence alignment and uses a position-specific scoring system to capture information about the degree of conservation at different position in the alignment. The HMM profile of four classes of β-lactamases were queried in different biological databases. The human microbiome is also a reservoir of antimicrobial resistance (Penders et al. 2013) such as digestive tract is a complex ecosystem in which communities of microorganisms interact with each other along with their host. In the current study, the human microbiome from different body sites (anterior, attached keratinized gingiva, buccal mucosa, hard palate, left retroauricular crease, mid-vagina, palatine tonsil, posterior fornix, right retroauricular crease,

9 saliva, subgingival plaque, supragingival plaque, throat, tongue dorsum and vaginal introitus) was investigated. This analysis detected numerous β-lactamase from metagenomic reservoir (environmental and human microbiome), and the large-scale comprehensive phylogenetic tree revealed the novel clades of β-lactamase, which has been discussed in Chapter 2(A). The most probable putative ancestral sequences and HMM profiles of bacterial extant β- lactamase has the extensive capability to detect evolutionary distant homologous β-lactamase- like sequences from other domains such as archaeal, eukaryotic and viral. The metallo-β- lactamases were identified through above two approaches in archaea and characterized whoes genes were apparently transferred to bacteria. The β-lactamase domains has been identified in giant viruses too. The detailed analysis of the study have been described in Chapter 2(B). The topic of the second part of the thesis was a functional evolution of β-lactamase enzymes within Ambler molecular classes or intra-class scale. In the evolutionary biology, the convergent evolution is a process whereby organisms not closely related, but independently evolve similar traits as a result of having to adapt to similar environments or ecological niches. The “convergence” term was used in different contexts, such as functional, mechanistic and structural convergence (Doolittle 1994). Several enzymes have evolved and have ability to catalyze the same reactions in many separate occasions, such as those involved in the hydrolysis of peptide bonds. Consequently, the structure of the protein’s active sites determines the biochemical functions, but enzymes within the same structural class may have a different role or varying levels of activity. Therefore, the current study, concentrated on functional convergence, i.e. the independent evolution of a given molecular function, in the present case, the magnitude of β-lactamase activity. Only a few studies have reported the presence of convergent evolution in β-lactamase (Hibbert-rogers et al. 1994). The present study provides comprehensive release of functional convergence in β-lactamase enzymes. In this investigation, Minimum Inhibitory Concentration (MIC) was used as a measure of functional activity of β-lactamase enzymes, which is a standard technique in diagnostic laboratories for determining the susceptibility of organisms to antimicrobials agents (Andrews 2001). The detail of this chapter has been described in Chapter 3. The third part of this thesis founded on the β-lactamase database. A few database of the β-lactamase are accessible in the public domain, such as a Comprehensive β-lactamase Molecular Annotation Resource (CBMAR) (Srivastava et al. 2014), The Lahey Clinic Database

10

(www.lahey.org/Studies/), Antibiotic Resistance Genes Database (ARDB) (Liu and Pop 2009), Lactamase Engineering Database (LacED) (Thai et al. 2009), The Comprehensive Antibiotic Resistance Database (CARD) (McArthur et al. 2013), and a comprehensive database of widely circulated β-lactamases (BLAD) (Danishuddin et al. 2013). CBMAR is a comprehensive molecular annotation resource which provides the information of four class of β-lactamase. In this database, the authors provided the information of phylogenetic tree of a single group of enzymes, but there is no information about the relationship of entire enzymes of a particular class. This database also lacks the information of all possible homologous structure of any specific enzyme. In another database the Lahey Clinic database providing the information of primary sequences, the mutational information, and classification of the functional group. The ARDB database is not updated regularly (last updated in July 3, 2009), and this database has no information about the phylogenetic trees and three-dimensional structural information. In the case of LacED database, the three-dimensional structural information of enzymes are available, but with limitations; at the moment the enzyme’s few structures are available in the database, but not all the possible homologous structure. The LacED database does not provide the information of class C and D β-lactamase enzymes. The CARD database provides the information of ontology, however, have no information of phylogenetic trees and three-dimensional structures. The content of these databases is necessary, but none of above database provided complete information in a single database. These databases provide partial information and have limited functions. Therefore, a centralized and updated resources of β-lactamase required. Accordingly, a new database was designed which unifies the primary amino acid sequences, three-dimensional protein structures in external structure database (), the phylogenetic trees, and functional profiles in a single database. The functional profiles are available in the form of Minimum Inhibitory Concentration (MIC) (Andrews 2001) and Kinetic parameters (kcat, km, and kcat/km) (Koshland 2002). The database contents and their availability has been stated in Chapter 4.

11

12

Chapter-2 (A)

Phylogenomic analysis of β-lactamase in archaea and bacteria enables the identification of putative new members

13

14

15

16

17

18

19

20

21

22

Supplementary Materials

23

Supplementary File S1:

Homologous sequence identification of extant β-lactamases The extanct β-lactamase sequences were queried in a blastp analysis against four biological databases based on ≥30% sequence identity, ≥50% query coverage, and an e-value threshold of 1e-10. We then selected those sequences which showed β-lactamase domains in Pfam and the NCBI’s Conserved Domain Database search analysis.

Ancestral Sequence Reconstuction Ancestral sequence reconstruction (ASR) is an estimation of the most probable ancient protein sequences using extant protein sequences from modern organisms. To date, different methods have been proposed for the reconstruction of ancestral sequences. The parsimony and maximum-likelihood methods are widely used (Elias & Tuller 2007; Koshi & Goldstein 1996; Cunningham et al. 1998; Yang et al. 1995). The most promising use of ASR methods lies in the fact that ancestral sequences could be used to detect remote homology (Collins et al. 2003; Wilke et al. 2005; Sharma et al. 2014). Ancestral sequences are less divergent than modern day sequences, providing a greater chance of detecting homologous sequences in distant species (Collins et al. 2003). Therefore, in this study, we speculate that obtaining reconstructed putative ancestral sequences of β-lactamase will increase the sensitivity for detecting similarities with distant and unknown sequences in biological databases. Phylogenomic studies have grouped β-lactamases according to their Ambler classes, suggesting that sequences are similar within a class; however, there was no detectable sequence similarity between different classes (Medeiros 1997). Thus, the “pieces” or “clustering” approach in the CD-Hit suite web-server was adopted (Huang et al. 2010). In the clustering approach, input sequences were split into a different set of clusters, based on a 30% sequence identity cut-off. Next, multiple sequence alignment was performed using the MUSCLE (Applink) in the MEGA6.05 software (Tamura et al. 2013) with default parameters and low- quality alignment regions were adjusted automatically. Maximum-likelihood trees were constructed using the default JTT (Jones et al. 1992) model of protein evolution and the most probable ancestral sequences were inferred. The ancestral sequences of each cluster (corresponding to each Ambler class) were carried forward to create their ancestral sequences

24

(“ancestral sequence of ancestral sequence”). Finally, we obtained the most probable ancestral sequences of each class (A-D).

Homologous sequence identification of reconstructed ancestral sequences The homologous sequences were screened from four selected biological databases (mentioned in the Materials and methods section). Here, we compared the results of extant and ancestral homologues of class C enzymes. The blast search analysis of extant class C β-lactamases identified a total of 2,468 (env_nr: 22, HMP metagenome: 9, HMP reference genome: 68, NCBI nr: 2,369) significant homologous sequences with a functional β-lactamase domain (Table S1). Similarly, we executed a blast search analysis of class A, B, and D extant enzymes (not all data is presented here to simplify the results). In contrast, the blast analysis of class C ancestral sequences retrieved a total of 4,287 (env_nr: 113, HMP metagenome: 25, HMP reference genome: 75, NCBI nr: 4,074) homologous sequences which contain β-lactamase functional domains (listed in Table S1). All significant homologues of extant enzymes were completely covered by the ancestral sequence blast results. We found a greater number of significant putative β-lactamases compared to extant blast results of class C enzymes, which indicates that ancestral sequences are more sensitive and specific for the detection of significant homologous sequences. Similarly, class A, B and D of reconstructed ancestral sequences could identify a greater number of significant homologous sequences compared to extant enzymes (data not shown).

25

Fig. S1. the phylogenetic tree of class A extant enzymes: sequences were aligned using the MUSCLE program, followed by poorly aligned regions being automatically truncated with trimAL. The phylogenetic tree was built using FastTree, based on the approximate maximum likelihood method, and visualized in the FigTree application with midpoints rooted in increasing order. Nodes are well supported with a high confidence value. A total of 126 sequences were selected (labeled in red) as representatives of the corresponding clades for the large-scale phylogenetic tree in Fig 2.

26

Fig. S2. The phylogenetic tree of Class B extant enzymes: sequences were aligned in MUSCLE. Poor alignment regions were automatically truncated with trimAL. The phylogenetic tree was built using FastTree, based on the JTT model, and visualized in the FigTree application with midpoints rooted in increasing order. Nodes were well supported with a high confidence value. A total of 30 sequences were selected (labeled in red) as representatives of corresponding clades for the large-scale phylogenetic tree in Fig 3.

27

Fig. S3. The phylogenetic tree of Class C extant enzymes: sequences was aligned using MUSCLE, and poorly aligned regions were automatically truncated with trimAL. The phylogenetic tree was built using FastTree, based on the JTT model, and visualized in the FigTree application with midpoints rooted in increasing order. Nodes were well supported with a high confidence value. A total of 27 sequences were selected (labeled in red) as representatives of corresponding clades for the large-scale phylogenetic tree in Fig 4.

28

Fig. S4. The phylogenetic tree of Class D extant enzymes: sequences were aligned using MUSCLE, and poorly aligned regions were automatically truncated with trimAL. The phylogenetic tree was built using FastTree, based on the JTT model, and visualized in FigTree application with midpoints rooted in increasing order. Nodes were well supported with a high confidence value. Representative sequences (labeled in red) of corresponding clades were selected for the large-scale phylogenetic tree in Fig 5

29

References:

Collins LJ, Poole AM, Penny D. 2003. Using ancestral sequences to uncover potential gene homologues. Applied bioinformatics. 2:S85-95. Cunningham CW, Omland KE, Oakley TH. 1998. Reconstructing ancestral character states: a critical reappraisal. Trends in ecology & evolution. 13:361–6. Elias I, Tuller T. 2007. Reconstruction of ancestral genomic sequences using likelihood. Journal of computational biology : a journal of computational molecular cell biology. 14:216–37. Huang Y, Niu B, Gao Y, Fu L, Li W. 2010. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics. 26:680–682. Jones DT, Taylor WR, Thornton JM. 1992. The rapid generation of mutation data matrices from protein sequences. Computer applications in the biosciences : CABIOS. 8:275–82. Koshi JM, Goldstein RA. 1996. Probabilistic reconstruction of ancestral protein sequences. Journal of molecular evolution. 42:313–20. Medeiros AA. 1997. Evolution and dissemination of beta-lactamases accelerated by generations of beta-lactam antibiotics. Clinical Infectious Diseases. 24:S19–S45. Sharma V, Colson P, Giorgi R, Pontarotti P, Raoult D. 2014. DNA-dependent RNA polymerase detects hidden giant viruses in published databanks. Genome Biology and Evolution. 6:1603–10. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. 2013. MEGA6: Molecular evolutionary genetics analysis version 6.0. Molecular Biology and Evolution. 30:2725– 2729. Wilke MS, Lovering AL, Strynadka NC. 2005. Beta-lactam antibiotic resistance: a current structural perspective. Current Opinion in Microbiology. 8:525–533. Yang Z, Kumar S, Nei M. 1995. A new method of inference of ancestral nucleotide and amino acid sequences. Genetics. 141:1641–50.

Supplementary File S2-S5: Available online

Table S1: HMM profile search and ancestral sequence bases as well as extant enzymes based blastp results (http://www.mediterranee-infection.com/article.php?laref=922)

30

31

Chapter-2 (B) Metallo-β-lactamase enzymes in humans and archae

32

Metallo-β-lactamase enzymes in humans and Archae Seydina M. Diene1**, Lucile Pinault2**, Vivek Keshri1-3, Nicholas Armstrong2, Saber Khelaifia3, Eric Chabrière1, Gustavo Caetano-Anolles4, Philippe Colson1-2, Bernard La Scola1- 2, Jean-Marc Rolain1-2, Pierre Pontarotti1-5, Didier Raoult1-2*

1. Aix Marseille Université, MEPHI, IHU-Méditerranée Infection, Marseille, France.

2. Assistance Publique-Hôpitaux de Marseille (AP-HM), IHU-Méditerranée Infection,

Marseille, France.

3. IHU-Méditerranée Infection, Marseille, France.

4. Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of

Illinois at Urbana-Champaign, Urbana, IL 61801, USA.

5. CNRS, Marseille, France.

** These authors contributed equally to this manuscript.

* Corresponding author: Prof. Didier Raoult

Email: [email protected]

Phone: (+33) 4 13 73 24 01

Fax: (+33) 4 13 73 24 02

Keywords: Metallo-β-lactamase (MBL) superfamily proteins, archaea, human, horizontal gene transfer.

(Submitted to nature communications)

33

Abstract

Nonribosomal peptides and polyketide synthases are archaic assemblages, including antibiotics, of canonical amino acids and other molecules. ß-lactamases are enzymes which destroy non-ribosomal peptides and in particular ß-lactams, the most widely used antibiotics. The putative exclusive activity of ß-lactams on the bacterial cell wall has resulted in the neglect of ß-lactamases activity in other domains (i.e. archaeal, eukaryotic, and viral). Here, we identify and characterize the functional metallo-ß-lactamase enzyme, in a group of archaea, whose genes were apparently transferred to bacteria. Moreover, eighteen enzymes have been annotated as metallo-ß-lactamases in humans. We found here that human cells digested penicillin and that one expressed human enzyme cleaved penicillin. Finally, we identified enzymes with ß-lactamase motifs in giant virus. Therefore ß-lactamases are common, widely distributed, archaic, and have with a much wider spectrum than on bacterial cell wall. This opens the way to a reevaluation of the role of ß-lactams and ß-lactamases.

Nonribosomal peptides and polyketide synthases are assemblages of amino acids and other molecules that may reflect the pre-ribosomal synthesis of peptides. First identified in fungi and bacteria and then used as antibiotics or anticancer drugs, it is now possible to identify them in most living organisms; even springtail hexapods secrete penicillin1. As part of our works on antibiotic resistance, we have previously demonstrated that ancient β- lactamase reconstituted from metagenomic datasets can be effective on current antibiotics such as penicillin2. Nowadays, more than 1,600 bacterial β-lactamases have been described and classified into four molecular classes, labelled A, B, C and D 3. While the three classes A, C, and D use a catalytically active serine residue for the inactivation of the β-lactam drug, the class B (metallo-enzymes), which require zinc as a cofactor for their catalytic activity, is the most heterogeneous class and exhibits highly conserved motifs in its catalytic sites. Class B enzymes belongs to the large superfamily of metallo-β-lactamase fold proteins. This superfamily includes more than 34,000 proteins with diverse functions, including β- lactamases, flavoproteins, glyoxalase II, arylsulfatases, cyclases, alkylsulfatases, CMP-NeuAc hydroxylases, cAMP phosphodiesterases, DNA cross-link repair proteins, cleavage and polyadenylation specific factors, phosphonate metabolism proteins, ribonucleases and choline binding proteins 4-7. Class B enzymes, presenting an archaic fold, are widely distributed in nature and have been described in all domains of life, including bacteria, archaea, and

34 eukaryotes5. Looking for active ß-lactamases outside the bacterial world thus became a priority for us.

Results

Archaeal Class B metallo-β-lactamase. An archaeal β-lactamase appeared highly conserved in several classes of archaea including Archaeoglobi, Methanomicrobia, Methanobacteria, Thermococci, Methanococci, Thermoplasmata and Thermoprotei (Fig. 1a; Supplementary Table 1 and Supplementary Fig. 2). β-lactamases in archaea are surprising since the peptidoglycan which is the β-lactam target in bacteria, is absent in archaea 8, 9. To evaluate these archaeal enzymes activity, the protein from Methanosarcina barkeri (gi|851225341; 213 aa; 25.5 kDa.) (Fig. 1 and Supplementary Table 1) was experimentally tested. As expected, this enzyme exhibits a significant hydrolysis activity on nitrocefin (Fig. -3 -1 1c and Supplementary Table 2) (with determined kinetic parameters kcat=18.2×10 s , -1 -1 KM=820 µM and resulting kcat/ KM=22.19 s .M ) and on penicillin G, when measuring complete degradation of penicillin G towards a single metabolite (Fig. 1d). The antibiotic susceptibility testing of a recombinant E. coli mutant containing this archaeal β-lactamase revealed a reduced susceptibility to penicillin (from 1µg/ml to 4µg/ml)(data not shown). Interestingly, it appears that these archaeal β-lactamases are closely related to bacterial enzymes known as “GOB” (AF090141), which are fully functional in vivo and present in a single bacterial genus, namely Elizabethkingia 10, 11 (Fig. 1a and Supplementary Fig. 1). However, the MBL protein sequences of this bacterial genus compared to those of archaea reveal low similarities (less than 36%) and this suggests therefore an ancien HGT from an archaic phylum to this bacterial group, which furthermore exhibited β-lactam hydrolysis activity, previously considered to be fairly atypical for a bacterium (Supplementary Table 3). Indeed, because archaea are naturally resistant to ß-lactams, the role of these ß-lactamases in these microorganisms remains to be clarified. Human class B metallo-β-lactamase. Metallo-β-lactamases in eukaryotic cells, including human cells, were annotated a long time ago 4. They appeared as part of polycistronic genes, whose activities in humans have thus far mostly focused on their anti- ribonuclease activities which contribute to the rapid degradation of unnecessary RNA messengers or DNA/RNA interacting mechanisms, or biochemical mechanisms, especially in mitochondria 7, 12, 13. However, despite the fact that eighteen human metallo-β-lactamase

35 enzymes have been reported in the literature 7, and that some of them (e.g., SNM1 enzymes) are active on chemotherapeutic agents such as cisplatin or mitomycin 7, their activities on β- lactam antibiotics have, to the best of our knowledge, never been investigated. Indeed, our analyses revealed that six of these 18 hMBLs exhibited the conserved bacterial MBLs motif “HxHxDH” and histidine residues (H196 and H263)( Supplementary Figs. 1 and 2). Two of these six hMBLs including LactB2, acting as endoribonuclease 13, and MBLAC2, involved in B-cell exosomes biogenesis 14, were analyzed by three-dimensional (3D) comparison against the Phyre2 investigator database. This comparison revealed 100% confidence and more than 92% coverage with the crystal structure of the β-lactamase domain protein from Burkholderia ambifaria (Phyre2 ID: C5i0pB) for the MBLAC2 enzyme, while LactB2 is already characterized in this database as a human β-lactamase-like protein (Fig. 2a). From the phylogenetic tree analysis, these two enzymes appeared to be more closely related to archaeal enzymes than to those of bacteria (Supplementary Fig. 1). The functionality of human metallo-β-lactamases on β-lactam drugs appears to be a highly significant question since penicillin has been described as being inactive against obligate intracellular bacteria such as Coxiella spp., Rickettsia spp., Anaplasma spp. and Erhlichia spp.15-17. It has been considered as the consequence of either a limited uptake of the drug because of passive penetration through cell membranes or antibiotic degradation by cellular enzymes 15, 18. Furthermore, some β-lactams such as third generation cephalosporins, which resist most β-lactamases, are more active on intracellular C. burnetii 19. In this sense, we have evaluated the degradation of penicillin G by THP1 and MRC5 human cell cultures. At the concentration of 106 cell/mL, THP1 human cell lysates were able to significantly hydrolyze penicillin G (Supplementary Figs. 3e and 3f), nitrocefin, and ceftriaxone (Fig. 3). Moreover, of the six hMBLs with signatures similar to those of bacterial β-lactamases, the two hMBLs (i.e., LactB2 and MBLAC2) were potential candidates for experimental tests. They were cloned and expressed in Escherichia coli and the purified proteins were tested. While no β-lactamase activity was detected for the LactB2 enzyme, a β-lactam hydrolysis activity was detected for the MBLAC2 enzyme on nitrocefin (Fig. 2b) and penicillin G (Fig. 2c). However, on the basis of the enzymatic characterization which was performed, with data 2 -4 fitting to the Michaelis-Menten equation (R =0.967) giving kinetic parameters kcat=4.15×10 -1 -1 -1 s , KM=370.5 µM and resulting kcat/ KM=1.12 s .M (Supplementary Table 2), optimizing the efficacy conditions of these ß-lactamases could improve their hydrolysis activity in vitro. Class B metallo-β-lactamases in giant virus. The presence of genes annotated as ß- lactamases was further investigated in the genomes of giant viruses 20. Their genomes contain

36 homologs of bacterial, archaeal and eukaryotic genes. A gene product annotated as ‘ß- lactamase’ was found for Klosneuvirus (ARF11780.1), Hokovirus (ARF10786.1), and Catovirus (ARF09172.1). They are relatives of mimivirus (proposed subfamily Klosneuvirinae) whose genomes were assembled from environmental metagenomes, as well as for Tupanvirus deep ocean (AUL78925.1) 21, a tail-harboring mimivirus and two iridoviruses (ACF42261.1 and AHA80917.1) 22. These findings might yet broaden the range of organisms harboring putative ß-lactamases. In this study, we showed that some of these enzymes, probably extremely archaic, have retained part of their activity against ß-lactams.

Discussion The source and ancestrality of β-lactamase enzymes are important points to consider. Indeed, β-lactams, like most antibiotics, are part of the microorganisms’ arsenal in their struggle to master microbial ecosystems 23. Most of these antibiotics are metabolic products of non-ribosomal peptide synthetases and polyketide synthetases, the only translation systems outside the ribosome which have structural motifs and which appear to be among the oldest in the living world 24, 25. Most living organisms have non ribosomal peptides, polyketide synthases, or hybrids 26, 27, which may be destroyed by metallo-β-lactamases. Furthermore, most antibiotics have a specific anti-ribosomal activity. Thus, it is possible that the selection of β-lactamases was either a very archaic or a contemporaneous selective response to the activity of β-lactams on the ribosome, to prevent the activity of these non-ribosomal peptides. For all cases, investigation of the putative ancestrality of β-lactamase motifs, as seen by Caetano et al. shows that these are the oldest enzymatic motifs having appeared on the earth surface 24. The existence of natural β-lactamases in eukaryota, bacteria and archaea, highlights the fact that β-lactam medicines may have multiple roles as well as a broad activity that may target non ribosomal peptides from different organisms. These roles should be investigated, especially in humans, to determine whether modifications to human cell transcription can occur in the presence of β-lactam drugs. In conclusion, β-lactams and β-lactamases may be witnesses to an archaic battle, which forms part of the Red Queen hypothesis of evolutionary law 28 describing an arms race, constituted by non ribosomal peptides (i.e. antibiotics) often directed against ribosomes and defense that hydrolyze these peptides in order to neutralize them.

37

Methods Sequence analysis: A common ancestral sequence of bacterial class B β-lactamases was inferred from a total of 174 sequences from the ARG-ANNOT database 29 using the approximate maximum-likelihood method and MEGA7 software 30 (Supplementary Fig. 4). This was then used as query in a BlastP search against the NCBI-nr archaea database. The approximate maximum-likelihood method was used for the performed phylogenetic tree analyzes with MEGA7 software. Protein alignment was first performed using MEGA7 software and alignment gaps were removed using the Trimal program 31 to keep the conserved sequence domains and residues. The uncharacterized MBLAC2 (NP_981951) enzyme, reported as being involved in B-cell exosome biogenesis and the LactB2 (Q53H82) enzyme described as an Endoribonuclease, were selected to investigate their putative hydrolysis activity on β-lactam antibiotics. The nucleotide sequence of the selected genes, including that from archaea (gi|851225341) and those from human cells (LactB2 and MBLAC2), were synthesized and cloned into the pET24a(+) expression plasmid. Moreover, the β-lactam hydrolysis activity was investigated directly into two human cells including the TPH1, a human monocytic cell line and MRC5, a human fibroblast cell line. β-lactamase activity evaluation from human cells: To evaluate the β-lactamase activity into human cell, 106 cell/ml of THP1 cell were incubated on an R10+2 medium with 2µg/ml of ampicillin for three days. The supernatant and pellet of the THP1 cell culture were then deposited as spots on a MH medium. The highly ampicillin-susceptible Streptococcus pneumoniae strain was mapped on the MH medium and incubated for 24 hours. β-lactamase activity was evaluated by estimating the inhibition diameter of the S. pneumoniae growth around the cell culture spot. Degradation of the ampicillin antibiotic was evaluated by the ability of the highly ampicillin-susceptible S. pneumoniae isolate to grow around the cell culture spot. Spectrophotometry assay for detection of β-lactamase activity in human cells: THP1 and MRC5 cell pellets at a concentration of 106 cell/ml were resuspended in 500 µL PBS buffer and were flash-frozen in liquid nitrogen. After undergoing three freeze-thaw cycles, the cells were disrupted by three sonication steps (20 seconds, amplitude 50) performed on a Q700 sonicator system equipped with a Cup Horn (QSonica). Cell debris was discarded following a centrifugation step (15,000 g, 10 minutes). Degradation of the nitrocefin substrate was monitored using a Synergy HT microplate reader (BioTek, USA). Reactions were performed at 25°C in a 96-well plate in PBS buffer, 5% DMSO with a final

38 volume of 100 µL for each well. The time course hydrolysis of nitrocefin (0.5 mM) was monitored for 30 minutes after adding 50 µL of cell lysate, following absorbance variations at 486 nm. β-lactam degradation by human cell lysates monitored using MALDI-TOF MS: Previously obtained THP1 and MRC5 cell lysates were incubated with a solution of ceftriaxone and oxytetracycline with final concentrations of respectively 2.5 mg/mL and 0.5 mg/mL. A PBS buffer incubated with the same amounts of those molecules was used as a negative control. Oxytetracycline was used as a control for intensity normalization since this drug is not a subtrate for the β-lactamase enzymes. 1µL of each mixture was deposited on a polished steel target after incubation times of 0 minute, 90 minutes and 24 hours. Each spot was then covered with 1µL of a saturated HCCA (Sigma Aldrich) matrix solution (α-cyano-4- hydroxy-cinnamic acid in 50% acetonitrile and 2.5% trifluoroacetic acid). Analysis was performed using an AutoFlex™ (Bruker Daltonik GmbH, Bremen, Germany) in the linear positive mode. Obtained average spectra resulted from the accumulation of between 9,000 and 12,000 spectra. Peak intensities were measured using flexAnalysis software (Bruker). β-lactam antibiotic degradation monitored by Liquid chromatography-Mass Spectrometry (LC-MS): Samples preparation: Water and acetonitrile solvents were ULC-MS grade (Biosolve). A penicillin G stock solution at 10mg/mL was freshly prepared in water from the corresponding high purity sodium salt (Sigma Aldrich). A 1X phosphate-buffered saline (PBS) solution at pH 7.4 was prepared in water from a commercial salt mixture (bioMerieux). Pure enzyme (MBLAC2 and Class B M. barkerii) solutions were buffer-exchanged in PBS and their concentration was adjusted to 1 mg/mL. 100μL of each tested enzyme solution was then spiked with penicillin G at a final concentration of 10μg/mL. Several solutions were prepared in order to measure metabolites at 0, 0.5, 1, 2 and 3 hours of incubation at room temperature. Each time point corresponded to duplicate preparations. Penicillin G spiked into PBS was also prepared as a negative control. Analysis of penicillin G and its metabolites by LC/MS: 100μL of acetonitrile was added to each sample and tubes were then vortexed for 10 minutes at 16,000g in order to precipitate proteins. The clear supernatant was collected for analysis using an Acquity I-Class chromatography system connected to a Vion IMS Qtof ion mobility-quadrupole-time of flight mass spectrometer. 5μL of each sample stored at 4°C was injected into a reverse phase column (BEH C18 1.7μm 2.1x100 mm) maintained at 50°C. Compounds were eluted at 0.5 mL/min using water and acetonitrile solvents with 0.1% formic acid. The following

39 composition gradient was used: 10-70% acetonitrile within three minutes, 95 % acetonitrile one-minute wash step, and return to initial composition for one-minute. Compounds were ionized in the positive mode in a Zspray electrospray ion source with the following parameters: capillary 3 kV, cone 80 V, source/desolvation temperatures 120/450°C. Ions were then monitored with a High Definition MS(E) data independent acquisition method with the following settings: 50-1000 m/z, 0.1 s scan time, 6 eV low energy ion transfer, 20-40 eV high energy for collision induced dissociation of all ions (low/high energy alternate scans). Mass calibration was adjusted within each run using a lock mass correction (Leucin Enkephalin 556.2766 m/z). 4D peaks were then collected from raw data using the UNIFI software (version 1.8, Waters). A list of known structures, including penicillin G and its metabolites 32, 33, were targeted with the following parameters: 0.1 minute retention time window, 5% collision cross section (CCS) tolerance, 5 ppm m/z tolerance on parent adducts (H+ and Na+) and 10 mDa m/z tolerance on predicted fragments. Retention times and CCS values were previously measured from penicillin G degradation experiments at pH 2 and pH 10. Antibiotic susceptibility testing of archaeal strains: Antibiotic susceptibility testing was performed on 15 antibiotics including ampicillin, ampicillin/sulbactam, penicillin, piperacillin, piperacillin/tazobactam, cefoxitin, ceftriaxone, ceftazidime imipenem, meropenem, aztreonam, gentamicin, ciprofloxacin, amikacin, and trimethoprim- sulfamethoxazole (I2a, SirScan Discs, France). A filtered aqueous solution of each antibiotic was prepared anaerobically in a sterilized Hungate tube at a concentration of 5 mg/ml. Subsequently, 0.1 ml of each of these solutions was added to a freshly inoculated culture tube containing 4.9 ml of the tested stain in order to obtain a final concentration of 100µg/ml for each antibiotic tested. The antibiotic mixture, to which archaeal cultures were added, was then incubated at 37°C and archaeal growth was observed after five and ten days’ incubation, depending on the tested strain. Control cultures without an antibiotic were also incubated in the same conditions to assess strain growth, and non-inoculated culture tubes were used as a negative control.

In vitro activity tests Protein expression and purification: The selected genes encoding for MBL enzymes from archaea and humans were optimized for protein expression in Escherichia coli and synthesized by GenScript (Piscataway, NJ, USA). The optimized genes were cloned into the pET24a(+) expression plasmid. Recombinant beta-lactamases were expressed in E. coli BL21(DE3)-pGro7/GroEL (TaKaRa) using ZYP-5052 media. Each culture was grown at

40

37°C until it reached an OD600 nm = 0.8, followed by the addition of L-arabinose (0.2% m/v) and induction, with a temperature transition to 16°C over 20 hours. Cells were harvested by centrifugation (5000 g, 30 minutes, 4°C) and the resulting pellets were resuspended in Wash buffer (50 mM Tris pH 8, 300 mM NaCl) and stored at -80°C overnight. Frozen cells were thawed and incubated on ice for one hour after adding lysozyme, DNAse I and PMSF (phenylmethylsulfonyl fluoride) to final concentrations of, respectively, 0.25 mg/mL, 10µg/mL and 0.1 mM. Partially lysed cells were disrupted by three consecutive cycles of sonication (30 seconds, amplitude 45) performed on a Q700 sonicator system (QSonica). Cell debris was discarded following a centrifugation step (10,000 g, 20 minutes, 4°C). Recombinant beta-lactamases were purified using Strep-tag affinity chromatography (wash buffer: 50 mM Tris pH 8, 300 mM NaCl and elution buffer: 50 mM Tris pH 8, 300 mM NaCl, 2.5 mM desthiobiotin) on a 5 mL StrepTrap HP column (GE Healthcare). Fractions containing each protein of interest were pooled. Protein expression and purity were assessed using a 10% SDS-PAGE analysis (Coomassie stain). Protein concentrations were measured using a Nanodrop 2000c spectrophotometer (Thermo Scientific). β-Lactamase detection: Purified recombinant β-lactamases previously obtained were submitted to a BBL™ Cefinase™ paper disc test 34 (Becton Dickinson). All protein samples were adjusted to a final concentration of 2 mg/mL. 15 µL of each recombinant β-lactamase was deposited onto a paper disc impregnated with nitrocefin and incubated at room temperature. 15 µL of extracted proteins from induced BL21(DE3)-pGro7/GroEL strains that did not contain any β-lactamases genes, were used as a negative control. When a change of color from yellow to red was noticeable within 30 minutes of incubation, corresponding to hydrolysis of the amide bond in the β-lactam ring of nitrocefin, we considered that the fraction tested contained an active β-lactamase enzyme. Degradation of the nitrocefin substrate was also monitored using a Synergy HT microplate reader (BioTek, USA). Reactions were performed at 25°C in a 96-well plate in PBS buffer, 5% DMSO with a final volume of 100 µL for each well. The time course hydrolysis of nitrocefin (0.5 mM) was monitored for 10 minutes after adding 50 µL of the previously prepared protein sample, following absorbance variations at 486 nm. β-lactamase kinetic characterization: Kinetic assays were monitored using a Synergy HT microplate reader (BioTek, USA). Reactions were performed at 25°C in a 96-well plate (6.2 mm path length cell) in buffer 50 mM Tris pH 8, 300 mM NaCl, 5% DMSO with a final volume of 100 µL for each well. The time course hydrolysis of nitrocefin (ε486 nm = 20 500 M- 1 .cm-1) with final concentrations varying between 0.05 and 1.5 mM was monitored for 10

41 minutes following absorbance variations at 486 nm, corresponding to the appearance of a red product. Both enzymes were kept at a final concentration of 0.3 mg/mL for kinetic studies. For each substrate concentration, the initial velocity was evaluated by Gen5.1 software. Mean values obtained were fitted using the Michaelis-Menten equation on GraphPad Prism 5 software in order to determine catalytic parameters.

42

References

1. Suring,W. et al. Evolutionary ecology of beta-lactam gene clusters in animals. Mol. Ecol. 26, 3217-3229 (2017). 2. Rascovan,N., Telke,A., Raoult,D., Rolain,J.M., & Desnues,C. Exploring divergent antibiotic resistance genes in ancient metagenomes and discovery of a novel beta-lactamase family. Environ. Microbiol Rep.(2016). 3. Bush,K. The ABCD's of beta-lactamase nomenclature. J. Infect. Chemother. 19, 549-559 (2013). 4. Aravind,L. An evolutionary classification of the metallo-beta-lactamase fold proteins. In Silico. Biol. 1, 69-91 (1999). 5. Daiyasu,H., Osaka,K., Ishino,Y., & Toh,H. Expansion of the zinc metallo- hydrolase family of the beta-lactamase fold. FEBS Lett. 503, 1-6 (2001). 6. Bebrone,C. Metallo-beta-lactamases (classification, activity, genetic organization, structure, zinc coordination) and their superfamily. Biochem. Pharmacol. 74, 1686-1701 (2007). 7. Pettinati,I., Brem,J., Lee,S.Y., McHugh,P.J., & Schofield,C.J. The Chemical Biology of Human Metallo-beta-Lactamase Fold Proteins. Trends Biochem. Sci 41, 338-355 (2016). 8. Visweswaran,G.R., Dijkstra,B.W., & Kok,J. Murein and pseudomurein cell wall binding domains of bacteria and archaea--a comparative view. Appl. Microbiol Biotechnol. 92, 921-928 (2011). 9. Dridi,B., Fardeau,M.L., Ollivier,B., Raoult,D., & Drancourt,M. The antimicrobial resistance pattern of cultured human methanogens reflects the unique phylogenetic position of archaea. J. Antimicrob. Chemother. 66, 2038-2044 (2011). 10. Opota,O. et al. Genome of the carbapenemase-producing clinical isolate Elizabethkingia miricola EM_CHUV and comparative genomics with Elizabethkingia meningoseptica and Elizabethkingia anophelis: evidence for intrinsic multidrug resistance trait of emerging pathogens. Int. J. Antimicrob. Agents 49, 93-97 (2017). 11. Horsfall,L.E. et al. Broad antibiotic resistance profile of the subclass B3 metallo-beta-lactamase GOB-1, a di-zinc enzyme. FEBS J. 278, 1252-1263 (2011). 12. Callebaut,I., Moshous,D., Mornon,J.P., & de Villartay,J.P. Metallo-beta- lactamase fold within nucleic acids processing enzymes: the beta-CASP family. Nucleic Acids Res. 30, 3592-3601 (2002). 13. Levy,S. et al. Identification of LACTB2, a metallo-beta-lactamase protein, as a human mitochondrial endoribonuclease. Nucleic Acids Res. 44, 1813-1832 (2016). 14. Buschow,S.I. et al. MHC class II-associated proteins in B-cell exosomes and potential functional implications for exosome biogenesis. Immunol. Cell Biol. 88, 851- 856 (2010). 15. Raoult,D. Treatment of Q fever. Antimicrob. Agents Chemother. 37, 1733- 1736 (1993). 16. Maurin,M., Bakken,J.S., & Dumler,J.S. Antibiotic susceptibilities of Anaplasma (Ehrlichia) phagocytophilum strains from various geographic areas in the United States. Antimicrob. Agents Chemother. 47, 413-415 (2003). 17. Rolain,J.M., Maurin,M., Vestris,G., & Raoult,D. In vitro susceptibilities of 27 rickettsiae to 13 antimicrobials. Antimicrob. Agents Chemother. 42, 1537-1541 (1998). 18. Maurin,M. & Raoult,D. Antibiotic penetration within Eukaryotic cells in Antimicrobial Agents and Intracellular Pathogens (ed. Raoult,D.) 21-37 1993).

43

19. Torres,H. & Raoult,D. In vitro activities of ceftriaxone and fusidic acid against 13 isolates of Coxiella burnetii, determined using the Shell Vial Assay. Antimicrob. Agents Chemother. 37, 491-494 (1993). 20. Colson,P., La,S.B., Levasseur,A., Caetano-Anolles,G., & Raoult,D. Mimivirus: leading the way in the discovery of giant viruses of amoebae. Nat. Rev Microbiol 15, 243-254 (2017). 21. Abrahao,J. et al. Tailed giant Tupanvirus possesses the most complete translational apparatus of the known virosphere. Nat. Commun. 9, 749 (2018). 22. Schulz,F. et al. Giant viruses with an expanded complement of translation system components. Science 356, 82-85 (2017). 23. Hibbing,M.E., Fuqua,C., Parsek,M.R., & Peterson,S.B. Bacterial competition: surviving and thriving in the microbial jungle. Nat. Rev Microbiol 8, 15-25 (2010). 24. Caetano-Anolles,G., Kim,K.M., & Caetano-Anolles,D. The phylogenomic roots of modern biochemistry: origins of proteins, cofactors and protein biosynthesis. J. Mol. Evol. 74, 1-34 (2012). 25. Caetano-Anolles,D., Kim,K.M., Mittenthal,J.E., & Caetano-Anolles,G. Proteome evolution and the metabolic origins of translation and cellular life. J. Mol. Evol. 72, 14-33 (2011). 26. Wang,H., Fewer,D.P., Holm,L., Rouhiainen,L., & Sivonen,K. Atlas of nonribosomal peptide and polyketide biosynthetic pathways reveals common occurrence of nonmodular enzymes. Proc. Natl. Acad Sci U. S. A 111, 9259-9264 (2014). 27. Tiburzi,F., Visca,P., & Imperi,F. Do nonribosomal peptide synthetases occur in higher eukaryotes? IUBMB. Life 59, 730-733 (2007). 28. Van Valen,L. The new evolutionary law. Evolutionary Theory 1, 1-30 (1973). 29. Gupta,S.K. et al. ARG-ANNOT, a new bioinformatic tool to discover antibiotic resistance genes in bacterial genomes. Antimicrob. Agents Chemother. 58, 212-220 (2014). 30. Kumar,S., Stecher,G., & Tamura,K. MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Mol. Biol. Evol. 33, 1870-1874 (2016). 31. Capella-Gutierrez,S., Silla-Martinez,J.M., & Gabaldon,T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 25, 1972-1973 (2009). 32. Deshpande,A.D., Baheti,K.G., & Chatterjee,N.R. Degradation of -lactam antibiotics. Curr. Sci. 87, 1684-1695 (2004). 33. Aldeek,F. et al. Identification of Penicillin G Metabolites under Various Environmental Conditions Using UHPLC-MS/MS. J. Agric. Food Chem. 64, 6100- 6107 (2016). 34. Qadri,S.M., Perryman,F., Flournoy,D.J., & de Silva,M.I. Detection of beta lactamase producing bacteria by Cefinase: comparison with three other methods. Zentralbl. Bakteriol. Mikrobiol. Hyg. A 255, 489-493 (1983).

44

Acknowledgements: Financial support from the IHU Méditerranée-Infection, Marseille, France and CookieTrad for English corrections of the manuscript are gratefully acknowledged.

Author contributions: D.R., P.P., conceived and designed the study. S.M.D., L.P., V.K., N.A, P.C, S.K., P.P., G.C-A., J.-M.R., B.L, D.R., analyzed and interpreted data. S.M.D., L.P., V.K., N.A, P.C, S.K., G.C-A., P.P., J.-M.R., B.L, and D.R., drafted the manuscript and/or critical revisions. All authors read and approved the final manuscript.

Funding: This work was supported by the French Government under the “Investments for the Future” program managed by the National Agency for Research (ANR), Méditerranée- Infection 10-IAHU-03.

Competing interests: The authors declare no conflict of interest.

45

Figure 1: Characterization of the archaeal class B MBL identified in Methanosarcina barkerii. (a): Phylogenetic tree of class B β-lactamases from archaea and bacteria ; (b): Three-dimensional comparison of M. barkeri Class B MBL protein in the Phyre2 investigator database; (c): β-lactamase activity of the M. barkeri Class B MBL enzyme on a chromogenic cephalosporin substrate (Nitrocefin) using the BBL™ Cefinase™ paper disc test. Appearance of a red-colored product after 15 minutes indicates positive β- lactamase activity, while a continuous yellow color indicates a negative one; (d): LC/MS average relative response of screened metabolite compounds of penicillin G in the presence the M. barkeri Class B MBL enzyme monitored for three hours. Penicillin G (in orange) refers to the intact form of the antibiotic while penilloic acid (in purple) and penillic acid (in light blue) refer to the penicillin G metabolites. Penicillin G in PBS did not show any degradation towards any metabolite (data not shown).

46

Figure 2: Characterization of the two hBML enzymes. (a): Three-dimensional comparison of the hMBL proteins in the Phyre2 investigator database; (b): Spectrophotometric assay of nitrocefin degradation in the presence of MBLAC2 and LactB2 enzymes, showing real hydrolysis activity of MBLAC2 whereas LactB2 exhibits no activity on this substrate; (c): LC/MS average relative response of screened metabolite compounds of penicillin G in the presence of MBLAC2 hMBL enzyme monitored for three hours. Penicillin G (in orange) refers to the intact form of the antibiotic while penilloic acid (in purple) and penillic acid (in light blue) refer to the penicillin G metabolites. Penicillin G in PBS did not show any degradation towards any metabolite (data not shown).

47

Figure 3: Beta-lactamase activity detection from human cell cultures. (a): Nitrocefin solutions with added THP1 and MRC5 cell lysates or PBS (used as a negative control), incubated at 25°C and monitored for 30 minutes. Appearance of a red-colored product indicates a positive β- lactamase activity, while a constant yellow color indicates negative one; (b) Spectrophotometric assay of nitrocefin degradation in the presence of THP1, MRC5 cell lysates for 30 minutes; (c) Monitoring peak intensity ratios of ceftriaxone/oxytetracycline by MALDI-TOF MS for 24 hours in contact with human THP1 and MRC5 cell lysates. The oxytetracycline antibiotic is used here as a control since this drug is not a subtrate for β-lactamase enzymes.

48

Supplementary Figure 1: The approximate Maximum Likelihood phylogentic tree of representative MBL enzymes from all three domains of life including bacteria (red), eukaryota (green) and archaea (blue). The protein enzymes indicated by arrows were experimentally tested for beta- lactamase activity.

49

Supplementary Figure 2: MBL proteins alignment of all analysed sequences here from bacteria (red), archaea (blue) and enkaryotes (green). The conserved residues/motifs are higlighted in yellow. The three higlighted sequences (in grey) refer to the synthetized and experimentally tested enzymes in this study.

50

Supplementary Figure 3: Evaluation of ampicillin degradation by the human THP1 and MRC5 cell culture. Antibiotic degradation is evaluated here by the growth of a highly ampicillin susceptible S. pneumoniae strain around the spot containing a pellet or supernatant of human cell with 2µg/ml of ampicillin. Antibiotic degradation is confirmed when the S. pneumoniae strain grows in contact with the stain, as shown on pictures e and f after 24 hours.

51

Supplementary Figure 4: Phylogenetic tree of class B β-lactamases: The phylogenetic tree was inferred using the approximate Maximum Likelihood method under the JTT matrix-based model. The analysis involved 174 amino acid sequences from class B β-lactamases. Evolutionary analysis was conducted in FastTree and visualized in FigTree.

52

Supplementary Table 1: Blast Hits of Ancestral Beta-lactamase sequences against the NCBI Archaea database % % Query id NCBI hit gi Organism hits Size (bp) Descriptions E-value Identity Q_Cov gi|564600023 Methanolobus tindarius 639 Zn-dependent hydrolase 34.62 5.26E-09 51.17

gi|503410340 Methanobacterium lacus 645 Hydrolase 34.23 0.018 57.75

gi|504218036 Methanocella conradii 624 Hydrolase glyoxylase 33.58 1.17E-06 53.99

gi|145282681 Pyrobaculum arsenaticum DSM 13514 621 Beta-lactamase domain protein 33.33 2.14E-07 52.58

gi|375160102 Pyrobaculum oguniense TE7 621 Zn-dependent hydrolase glyoxylase 32.52 8.80E-05 52.58

Methanocaldococcus jannaschii gi|2495897 618 Probable metallo-hydrolase MJ0296 32.17 1.76E-04 50.70 SM2661

gi|1008838121 Thermoplasmatales archaeon SM1-50 651 Hypothetical protein 32.14 0.28 54.46

gi|82617404 Uncultured archaeon 645 Hypothetical protein 32.12 1.82E-05 53.05 Ancestor class B Thermoplasmatales archaeon DG- gi|1008853468 606 Hypothetical protein 31.78 0.011 51.17 β- 70 lactamase gi|851372756 Methanocaldococcus bathoardescens 567 MBL fold metallo-hydrolase 31.58 1.05E-04 50.70

gi|851225341 Methanosarcina barkeri 641 MBL fold hydrolase 31.54 4.88E-05 51.17

gi|851312700 Methanosarcina barkeri 641 MBL fold hydrolase 31.54 2.85E-05 51.17

gi|490731539 Methanocaldococcus villosus 540 Hypothetical protein 31.53 1.87E-05 50.70

gi|1001921197 Methanolobus sp. T82-4 672 Hypothetical protein AWU59_1880 31.39 2.10E-06 53.99

gi|505138611 Methanomethylovorans hollandica 645 Zn-dependent hydrolase glyoxylase 31.39 6.56E-08 54.46

gi|501690724 Methanosphaerula palustris 696 MBL fold hydrolase 31.37 0.002 56.34

Possibly metallo-beta-lactamase gi|18160175 Pyrobaculum aerophilum str. IM2 624 30.89 2.21E-06 52.58 superfamily

53 gi|494814289 Candidatus Nitrosoarchaeum koreensis 611 Zn-dependent hydrolase 30.83 0.029 54.46 gi|504866623 Methanolobus psychrophilus 641 MBL fold hydrolase 30.77 1.73E-05 51.17 gi|816389003 Lokiarchaeum sp. GC14_75 648 Metallo-beta-lactamase L1 30.77 3.64E-08 59.62 gi|851262085 Methanosarcina horonobensis 641 MBL fold hydrolase 30.77 1.56E-04 51.17 gi|502745672 Methanocaldococcus sp. FS406-22 558 MBL fold metallo-hydrolase 30.70 5.98E-05 50.70 gi|170934313 Pyrobaculum neutrophilum V24Sta 615 Beta-lactamase domain protein 30.65 1.60E-06 53.05 gi|757124828 Thermococcus paralvinellae 717 Zn-dependent hydrolase 30.61 0.002 55.87 gi|756792592 Candidatus Nitrosopumilus piranensis 1404 Rhodanese domain-containing protein 30.56 2.45E-05 63.38 gi|851287001 Palaeococcus ferrophilus 609 Glyoxalase 30.51 8.05E-04 50.70 gi|973113610 Methanocalculus sp. 52_23 603 Beta-lactamase domain protein 30.40 8.24E-06 51.17 gi|973162189 Methanomicrobiales archaeon 53_19 603 Beta-lactamase domain protein 30.40 6.11E-06 51.17 gi|524837456 Methanoculleus sp. CAG:1088 641 Putative uncharacterized protein 30.37 0.035 50.70 gi|851257745 Methanosarcina 600 Hypothetical protein 30.37 0.13 59.62 gi|700303882 Thermococcus eurythermalis 681 Hydrolase 30.28 2.08E-04 52.11 gi|735015437 Archaeon GW2011_AR11 714 Beta-lactamase protein 30.25 0.23 50.70 gi|15623131 Sulfolobus tokodaii str. 7 600 Putative hydrolase 30.23 7.15E-06 54.46 gi|500271928 Metallosphaera sedula 606 MBL fold metallo-hydrolase 30.23 6.05E-04 53.52 gi|919520712 Sulfolobus tokodaii 597 Hypothetical protein 30.23 6.99E-06 54.46 gi|851163782 Geoglobus acetivorans 615 Zn-dependent hydrolase 30.17 1.29E-04 50.70

54 gi|851219309 Candidatus Methanoplasma termitum 576 MBL fold metallo-hydrolase 30.08 0.032 52.58 gi|494104154 Methanotorris formicicus 618 Hypothetical protein 30.08 4.44E-06 53.05 gi|329138039 Candidatus Nitrosoarchaeum limnia 1398 Rhodanese domain-containing protein 30.07 6.43E-07 63.38 gi|500766631 Methanoregula boonei 696 MBL fold metallo-hydrolase 30.07 0.055 56.34

55

Supplementary Table 2: Kinetic values of enzymatic activity of described β-lactamases compared to the archaeal class B β-lactamase

Nitrocefin* Km kcat/Km kcat (s-1) (µM) (M-1 S-1) MBL Subclass B1 BcII 45 70 6.4E+05 CcrA 200 20 1E+07 BlaB 20 70 2.8E+05 Acquired-B1 IMP-1 30 30 2E+06 VIM-1 100 20 5E+06 SPM-1 0.05 4 1.2E+05 GIM-1 6 10 6E+05 MBL Subclass B2 CphA 0.003 1200 2.5 ImiS 0.06 20 3E+03 MBL Subclass B3 L1 20 10 2E+06 FEZ-1 90 100 9E+05 GOB-1 - - - Mbl1b 1800 100 1.8E+07 Methanosarcina barkeri MBL-Class B 0.018 820.5 22.19 Human Metallo- Beta-lactamase 0.0004 370.5 1.12 MBLAC2

* These values are obtained from Bebrone et al.6

56

Supplementary Table 3: Antibiotic resistance pattern of Methanosarcina compared to Elizabethkingia GOB-15

M. barkeri Elizabethkingia class B GOB-13* -lactams

Ampicillin R R

Ampicillin/sulbactam R R

Penicillin R R

Piperacillin R R

Piperacillin/tazobactam R R

Cefoxitin R R

Ceftriaxone R R

Ceftazidime R R

Imipenem R R

Meropenem R R

Aztreonam R R Non -lactams

Gentamicin R S Ciprofloxacin R R Amikacin R S Trimethoprim- R R sulfamethoxazole

Nd - not determined. R, resistance; S, Susceptible. * From Opota et al. 10

57

Supplementary Table 4: Beta-lactamase sequences identified in Giant virus from the NCBI database

NCBI accession Giant virus Size (aa) Predicted functions numbers ARF11780 Klosneuvirus KNV1 291 Beta-lactamase superfamily domain

Metallo-beta-lactamase superfamily ARF10786 Hokovirus HKV1 586 protein

ARF09172 Catovirus CTV1 294 Beta-lactamase superfamily domain

AUL78925 Tupanvirus deep ocean 322 Beta-lactamase superfamily domain

Soft-shelled turtle Putative hydrolase of the metallo-beta- ACF42261 116 iridovirus lactamase

Chinese giant Putative metallo-beta-lactamase AHA80917 108 salamander iridovirus hydrolase \

58

59

Chapter-3

Functional convergence of antibiotic resistance in β- lactamase enzymes is not conferred by simple convergent amino acid substitution

60

Functional convergence of antibiotic resistance in β-lactamase enzymes is not conferred by simple convergent amino acid substitution

Vivek Keshri1,2, Kevin Arbuckle3,4, Jean-Marc Rolain2, Didier Raoult2, Pierre Pontarotti1,5,*

1. Aix-Marseille Université, I2M, UMR-CNRS 7373, Evolution Biologique et Modélisation, Marseille, France 2. Aix-Marseille Université, UM63, CNRS 7278, IRD 198, INSERM 1095, IHU - Méditerranée Infection, 19-21 Boulevard Jean Moulin, Marseille, France 3. Department of Evolution, Ecology and Behaviour, Biosciences Building, Crown Street, University of Liverpool, L69 7ZB, UK 4. Department of Biosciences, College of Science, Swansea University, Swansea, SA2 8PP, UK 5. Aix-Marseille Université, CNRS, IRD, APHM, MEPHI, IHU Méditerranée Infection (Evolutionary Biology team), Marseille, France

*Corresponding author: Pierre Pontarotti, e-mail: [email protected]

(to be submitted to PNAS)

61

Abstract

Bacterial resistance to antibiotics is a serious medical and public health concern globally. Such resistance is conferred by a variety of mechanisms, but the extensive variability in levels of resistance across bacteria is a common finding. Understanding the underlying evolutionary processes governing this functional variation in antibiotic resistance is important as it may allow the development of appropriate strategies to improve treatment options for bacterial infections. The main objective of this study was to examine the functional evolution of β-lactamases, a common enzymatic resistance mechanism which inactivates the widely used β-lactam class of antibiotics. We first obtained β-lactamase protein sequences and Minimal Inhibitory Concentration (MIC), a measure of antibiotic function, from previously published literature. We then used a molecular phylogenetic framework to examine the evolution of β-lactamase functional activity. We find that the functional activity of β-lactamase-mediated antibiotic resistance evolved convergently within the molecular classes, but that this is not associated with any single amino acid substitution. This suggests that the dynamics of convergent evolution in this system vary between functional and molecular (sequence) levels. Such disassociation may hamper bioinformatics approaches to determination of antibiotic resistance and underscore the need for (less efficient but more effective) activity assays as an essential step in evaluating resistance in a given case. Keywords: β-lactamase, β-lactam antibiotics, convergent evolution, antibiotic resistance

Introduction

Antibiotics are medicines that fight bacterial infection. There is a growing problem of antibiotic resistance when bacteria become able to resist the effects of antibiotics. Antibiotic-resistant genes spread in the environment via different sources and mechanism. For instance, humans use a range of antibiotics in agriculture and fish farming as standard prophylactic measures, which leads to the widespread evolution of antibiotic resistance genes (1). One common enzyme-based resistance mechanism used by several groups of bacteria is the production of “β-lactamase” enzymes which inactivate β-lactam antibiotics. There are over 1300 natural β-lactamases and according to Ambler molecular classification schemes these enzymes categorized into four classes (A, B, C, and D) based on amino acid sequence motifs (2). The active site of class A, C

62 and D enzymes contains serine amino acid while class B contains histidine along with Zn metal (3). Further, the class B (metallo-β-lactamase) were classified into three subclasses B1, B2, and B3 (3, 4). Based on substrate-profile inhibitors β-lactamases categorized into functional sub- groups such 2a, 2b, 2be, 2c etc. (5–8). Functional class B β-lactamases, first arose more than two billion years ago suggesting that resistance to antibiotics predates their use in medicine (9). The functional classification of β-lactamase and its correlation with their distinct molecular structure has been described by Bush et.al (10), and the functional evolution of superfamily class B β- lactamase has been stated by Alderson et al (11). However, Anderson et al. (11) were unable to unambiguously demonstrate whether functional class B enzymes have one or two separate evolutionary origins (Figure S7) (11). The antibiotic resistance conferred by β-lactamase spreads quickly (such as horizontal gene transfer), thus understand the evolution of this resistance may help inform strategies for managing bacterial infection in humans, livestock, or other animals before they develop into a major epidemiological event. The number of the β-lactamase enzymes is regularly increasing, but their functional evolution remains unclear. Convergent evolution refers to the similarity that has evolved independently rather than being inherited from a common ancestor. “Convergence” is often used in a very general sense, but can apply to several types of characteristics or levels such as function and structure (12, 13). Several enzymes have evolved the ability to catalyze the same reactions on separate occasions, such as those involved in the hydrolysis of peptide bonds. Consequently, the structure of a protein’s active site(s) determines the biochemical functions, but enzymes within the same structural class may have different roles or varying levels of activity. Russell F. Doolittle (12) has described that convergence is a common phenomenon. The functional convergent evolution occurs when molecular functional activity arises independently on more than one occasion. For example, the hydrolysis of peptide bond has been evolved many times and serine proteases have evolved at least three times (12). The terms “iso-convergent” and “allo-convergent” can be applied to the study of convergent evolution to gain a finer-scale understanding of the dynamics of convergence (14). Iso- convergent traits have converged from the same ancestral state (traditionally ‘parallel evolution’) whereas allo-convergent traits have converged from different ancestral states. The allo- convergent evolution of enzyme function can be seen in two distinct but sometimes joint effects. For example, (i) non-homologous enzymes deliver the same transformation as expressed by the

63 same four-digit enzyme commission (EC) number; (ii) The same (four-digit EC) or related (same three-digit EC) enzyme transformation is effected by a similar disposition of residues in the active site. These two situations are not exclusive because two enzymes are assigned to both classes if they perform exactly the same overall reaction with the same mechanism (12, 15). Independent evolution of the same function, most often using different mechanisms but occasionally using different catalytic machineries for essentially the same mechanism, is well documented for proteins from different non-homologous enzyme families (15). However, this phenomenon is rarer within homologous superfamilies. In another family of proteins - “HAD superfamily” the Iso-convergence has been described (16). Burroughs et al. had inferred the evolutionary history of the HAD (haloacid dehalogenase) superfamily, and shown that HAD superfamily contains 33 major families of three kingdoms. The diversification and archaeal, bacterial and eukaryotic domains had been originated from a common HAD superfamily ancestral. The present study was concentrated on functional convergence; the independent evolution of a given molecular function, in this case, the magnitude of β-lactamase activity. We aim to investigate whether and how much convergent functional evolution occurs within β- lactamase family/class. We use the Minimum Inhibitory Concentration (MIC) as a measure of functional activity of β-lactamase enzymes, a standard technique in diagnostic laboratories for determining the susceptibility of organisms to antimicrobials (17). Hence, high MICs correspond to greater resistance to that antibiotic. The MIC is the lowest concentration of an antimicrobial that inhibits the visible growth of microorganisms after overnight incubation. A few studies have reported the presence of convergent evolution in β-lactamase, but these have been based on similarity of molecular sequences and similarity in the structure of the active site of the proteins rather than functional antibiotic resistance and typically have been small-scale studies using a few enzymes (18, 19). Moreover, the phylogenetic relationships within the molecular classes of β-lactamases remain poorly understood. Therefore the current study focuses on convergent functional evolution within superfamily/class.

64

Material and Methods

Data collection The β-lactamase amino acid sequences were retrieved from the Antibiotics Resistance Gene ANNOTation (ARG-ANNOT) database (20). A total of 1155 sequences were recovered, belonging to four molecular classes (A-D). All sequences were checked in the NCBI PubMed database (http://www.ncbi.nlm.nih.gov/pubmed/) to obtain the publication in which the sequence was described, from which we collected Minimum Inhibitory Concentration (MIC) values of the respective β-lactamase enzymes and β-lactam antibiotics, whenever this information was available. Two types of MIC value were retrieved, clinical (acquired β-lactamase) and microbiological (reference, non-acquired β-lactamase), and we then calculated the fold-change between them. A total of 218 sequences were selected with clinical and microbiological MICs against different β-lactam antibiotics. Rest sequences were discarded from the analysis because sequences were published based on homologous sequences, some sequences has been characterized on the basis of kinetic parameters while some has not reported MICs values of clinical or Microbiological MICs.

Multiple sequence alignment and phylogenetic inference The multiple sequence alignment and phylogenetic tree reconstructions were conducted in MEGA6.05 (21). Sequences were aligned using the MUSCLE algorithm (22) with default settings. The phylogenetic trees were constructed with the maximum-likelihood method under a WAG+G+I amino acid substitution model (estimated to be the best substitution model by maximum likelihood model comparison, also in MEGA 6.05) . Phylogenetic trees were midpoint rooted, branch lengths of the phylogenetic trees were converted to relative time ('dated' with the total length of the tree equal to 1) using maximum penalised likelihood as implemented in the cronos function in the ape package of R 3.3.0 (23). We refer to such trees as ‘dated’ in this paper for convenience (while acknowledging that no absolute dates were estimated) (23). The chronos function allows different types of clock models including strict, correlated, relaxed clocks, and discrete rate models. We fit each of these models including discrete rates models with the number of rates ranging from a single rate for the tree (equivalent to a strict clock model) to a separate rate for each branch in the tree. In all cases, the relaxed clock model provided the best

65 fit as indicated by having the highest penalized log likelihood, and so this was used for the dating of all the trees herein.

Identifying convergent evolution We estimated the ancestral functional activity over the tree using maximum likelihood (ML) estimation via the contMap function in the phytools package (24). Before conducting the analysis, we first logged the functional activity (MIC fold-change) values since raw values varied exponentially, which can cause ancestral state estimation to perform poorly. Our results as presented are therefore based on estimated ancestral values of log (MIC fold-change) rather than the raw activity values. A Python script (provided in Supplementary Material) was used for identification of amino acid residues that are potentially responsible for functional convergence. The python script was able to detect signature amino acid residues among antibiotic resistant/susceptible enzymes followed by identify the different amino acid residues between antibiotic resistant and susceptible.

Results and Discussion

Ancestral state estimation Based on our ML estimation of ancestral enzyme activity, we observed convergent evolution of antibiotic resistance in each molecular class A-D, which we discuss in details for each class below.

Class A The phylogenetic tree of class A β-lactamases shows evidence for convergent evolution of function in respect to the antibiotics cefepime (Figure 1) and cephalothin (Figure 2). The ancestral activity was estimated as moderately resistance (green color) to cefepime and cephalothin antibiotics, however, this could in part be due to a tendency of methods used for estimating ancestral states from continuous data to estimate near average values at the root unless there is strong evidence against this. During evolution the function of some enzymes (VEB-1; GES-2, 18; TEM-134, 30, 141; SHV-72, 48, 49; AST-1, KPC-4, SME-1, BIC-1, RAHN-2 for cefepime, and TEM-125, 187, 30, 59 and SHV-72 for cephalothin) was moderatly resistant and enzymes become susceptible to the antibiotic. On the other hand, activity of enzymes CTX-M-

66

19, 8, 40, 72; TEM-149, 131, 87, 71-72 and CRAB-10 for cefepime antibiotics have not changed, and it shows a similar functional affinity (moderate resistant to cefepime) compared to their ancestors. Similarly, in the case of cephalothin, the CTX-M, GES, PER group of enzymes has a similar scale of activity in compared to their ancestor. Phylogenetically these enzymes were distributed in distinct clades that indicate independent functional evolution. Most of the modern enzymes reveal moderate resistance (green color) with respect to cephalothin (Figure 2). The CTX-M-32, 33 and SFO-1 shows strong resistance to cefepime antimicrobial agents and this resistance appears to have evolved independently in each of those three enzymes (Figure 1). SED-1 also shows a high-level of resistance to cephalothin and smaller increases in resistance have independently evolved in additional lineages.

Class B Metallo-β-lactamase enzymes show evidence for convergent evolution of both high levels of susceptibility and resistance to the antibiotic piperacillin (Figure 3). For instance, the GOB, B, TUS, MUS and IMP groups (except IMP-44), VIM-7, 13, 34, 31, 15, 18, 24 and 16 show an increased susceptibility to piperacillin (decreasing functional affinity), while the NDM-4, VIM- 38, 26, and IMP-44 enzymes have independently evolved to be moderately resistant (green) to piperacillin. The IMIH enzyme show highly resitant (red) to piperacillin antibiotic. The ancestral state of extant enzyme shows resistance to Imipenem antimicrobial agents (Figure 4) and this activity retained in some enzymes such as IMP-10, IND-3, VIM-19, 38, 26. Exceptionally, the functional affinity of B group (B-1, 3, 9-13) are entirely different from their last ancestral, and enzyme reveals the less functional affinity (susceptible) to Imipenem. The IMIH β-lactamase shows strong resistance (strong functional affinity) to Imipenem through red branch color as compared to ancestral. The convergent evolution paradigm also present in meropenem antimicrobial agent (Figure 5). The ancestral activity seems narrow susceptible to meropenem while modern enzymes such as GOB, B, and VIM-16, 24, 18, 15, 31, 13 and 7 are highly susceptible (blue color indicates lower MIC fold). The IMIH enzymes show high resistance to meropenem while VIM-26, 38; IMP-44; NDM-4; have the less resistance in compared to IMIH. The above enzymes are distributed in different clades of the tree and show the independent evolution of the functions. The functional evolution of subclass B1 and B3 was estimated separately and observed that B3 (Figure S4-S6) subclass does not have functional

67 convergence. The subclass B1 shows functional convergent in Imipenem, Meropenem, and Piperacillin antibiotics (Figure S1-S3). The increasing resistance functional activity has been estimated in NDM and IMP enzymes in respect to Imipenem and Meropenem antibiotics. In the matter of Piperacillin the BlaB and MUS IMP shows lesser activity compare to ancestral moderate resistance while DIM, NDM-4, and VIM gesticulate strong resistance also these enzymes function has evolved thorugh convergent manner.

Class C The convergent functional evolution has been seen in class C β-lactamases. Subject to the cefoxitin antibiotics (Figure 6) the ancient function was narrow/medium resistant (green color) to cefoxitin as well as most of the extant enzymes such as FOX-3, 5, 8; TRU-1, CMY-1, 9, MIR-1, CMY-16, 32) have similar functional affinity but during evolution, the ACC-1, 2, 4, MOR and DHA-2 enzymes independently acquired susceptible to cefoxitin (decreasing resistance activity). Apart from this, the distantly related enzymes such as FOX-2, CMY-37, and CMY-15 obtained resistance as compared to their ancestral. This functional shift has been observed in distinct clades of the tree which is another example of convergent functional evolution. Piperacillin-TZB is a combination of Piperacillin antibiotics and Tazobactam inhibitors. The evolutionary connection among the new enzymes and their functional growth can be observed in (Figure 7). This analysis has been conducted on 16 β-lactamase enzymes that contains ancient resistant functional characteristics to Piperacillin-TZB. The modern enzymatic function of CMY- 9, DHA-2, ACT-1, and CMY-32, has susceptible characteristics. Based on the evolutionary tree these enzymes are located in distinct clades along with have similar functional features, which indicates that these enzymes have been evolved through independent manner. As compared to ancestral function the modern enzymes FOX-5, CMY-19, ACC-4, ACC-1, and CMY-36 are strong resistance to Piperacillin-TZB, and evolutionary tree illustrated these functional affinity has evolved through convergent manners.

Class D The antibiotic resistant function of the oxacillin molecular class D with respect to Imipenem has evolved convergently in at least two lineages (Figure 8). The inferred ancestral state clarify that the functional activity of ancestral enzymes were susceptible to carbapenem (Imipenem) which is

68 the case for the vast majority of forms of the enzyme, for example OXA-42, 43, 114, 22, 18, 31, 47, 63, 85, 161, 15, 32, 46, 37, 198, 14, 35, 145 and 147. However, some enzymes (OXA-24, 160, 133, 58, 60, 20, 48, 163, 54 and 55) have lost their ancient functional characteristics and evolved resistant to Imipenem antibiotics. The enzyme OXA-133 is higly resistant (red color) to imipenem. These resistant enzymes are distributed in three different clades within the evolutionary tree and have diverse functional features (resistant) compared to their ancestor (susceptible) that account the convergent functional evolution.

Convergent amino acid residue identification The convergent functional evolution observed at the intra-class scale of β-lactamase superfamilies. We attempted to identify amino acids residues by using a python script that may be responsible for functional changes, which looked for convergent mutations that co-occurred on the phylogeny with each similar change in the functional activity. However we were unable to identify any strong candidate amino acid residue. We believe that the approach may have been overly simplified and would have missed any influence of changes to aspects such as (i) the 3D structure of the enzymes (ii) combinatorial effects of multiple amino acid changes (iii) properties that are shared amongst multiple different amino acids, for instance if changes in charge were important regardless of the specific amino acid that caused the change (iv) homologous sequence identity/similarity threshold criteria. Therefore, future work to pinpoint the molecular changes responsible for altering the functional activity of these enzymes will require a more directed approach which gives full consideration to the range of molecular mechanisms of protein evolution. For our present purposes, we refrain from discussing this result further other than stating that no single amino acid change is responsible for any of the functional changes we document herein.

Conclusion The current work presents evidence of convergent functional evolution of β-lactamase function conferring resistance to a broad range of commonly used antibiotics. The functional variation and convergent similarities occur in distantly related enzymes, indicating that the β-lactamase- mediated antibiotic resistance can’t necessarily be inferred by sequence similarity or phylogenetic relatedness alone. We therefore suggest that although bioinformatics tools to mine

69 information on antibiotic resistance from amino acid sequences are more efficient than laboratory functional assays in principle, they are ineffective and functional assays are essential to assess antibiotic resistance. Functional convergent evolution appears to be a common pattern in antibiotic resistance (at least for β-lactamase), and may therefore provide a useful comparative system for further work. For instance, testing 3D molecular structure for structural similarity and biological reaction pathway analysis that reflects a functional relationship may open up future avenues for predicting antibiotic resistance. In any event, β-lactamase evolution demonstrates that the dynamics of convergent evolution can differ across different levels of life (e.g. from primary sequence structure to function) and does so in a medically-important system, presenting opportunities for applying our evolutionary understanding to management of infections.

Acknowledgement This work was funded by “Infectiopole Sud”, Mediterranee infection is gratefully acknowledged. We thank Mr. Oliver Chabrol for python script.

70

References

1. Economou V, Gousia P (2015) Agriculture and food animals as a source of antimicrobial- resistant bacteria. Infect Drug Resist 8:49–61. 2. Ambler R (1980) The structure of β-lactamases. Philos Trans R Soc Lond B Biol Sci 289(1036):321–331. 3. Bebrone C (2007) Metallo-β-lactamases (classification, activity, genetic organization, structure, zinc coordination) and their superfamily. Biochem Pharmacol 74(12):1686– 1701. 4. Galleni M, et al. (2001) Standard numbering scheme for class B beta-lactamases. Antimicrob Agents Chemother 45(3):660–3. 5. Bush K, Jacoby G a. (2010) Updated functional classification of β-lactamases. Antimicrob Agents Chemother 54(3):969–976. 6. Bush K, Jacoby G a, Medeiros a a (1995) A functional classification scheme for beta- lactamases and its correlation with molecular structure. Antimicrob Agents Chemother 39(6):1211–1233. 7. Bush K (1989) Classification of β-lactamases: Groups 1, 2a, 2b, and 2b’. Antimicrob Agents Chemother 33(3):264–270. 8. Bush K (1989) Classification of beta-lactamases- Group2c, 2d, 2e, 3 and 4. Microbiology 33(3):271–276. 9. D’Costa VM, et al. (2011) Antibiotic resistance is ancient. Nature 477(7365):457–461. 10. Bush K, Jacoby G, Medeiros A (1995) A functional classification scheme for β-lactamases and its correlation with molecular structure. Antimicrob Agents Chemother 39(6):1211– 1233. 11. Alderson RG, Barker D, Mitchell JBO (2014) One origin for metallo-β-lactamase activity, or two? An investigation assessing a diverse set of reconstructed ancestral sequences based on a sample of phylogenetic trees. J Mol Evol 79(3–4):117–29. 12. Doolittle RF (1994) Convergent evolution: The need to be explicit. Trends Biochem Sci 19(1):15–18. 13. Speed MP, Arbuckle K (2016) Quantification provides a conceptual basis for convergent evolution. 44. doi:10.1111/brv.12257. 14. Pontarotti P, Hue I (2016) Road Map to Study Convergent Evolution: A Proposition for Evolutionary Systems Biology Approaches. Evolutionary Biology (Springer International Publishing, Cham), pp 3–21. 15. Gherardini PF, Wass MN, Helmer-Citterich M, Sternberg MJE (2007) Convergent Evolution of Enzyme Active Sites Is not a Rare Phenomenon. J Mol Biol 372(3):817–845. 16. Burroughs AM, Allen KN, Dunaway-Mariano D, Aravind L (2006) Evolutionary Genomics of the HAD Superfamily: Understanding the Structural Adaptations and Catalytic Diversity in a Superfamily of Phosphoesterases and Allied Enzymes. J Mol Biol 361(5):1003–1034. 17. Andrews JM (2001) Determination of minimum inhibitory concentrations. J Antimicrob

71

Chemother 48 Suppl 1:5–16. 18. L V Peixe, J C Sousa, J C Perez-Diaz and FB (1997) A bla(TEM-1b)-derived TEM-6 β- lactamase: a case of convergent evolution. Antimicrob Agents Chemother 41(5):1206. 19. Hibbert-rogers LCF, Heritage J, Todd N, Hawkey PM (1994) Convergent evolution of TEM-26 , a β-lactamase with extended-spectrum activity. J Antimicrob Chemother:707– 720. 20. Gupta SK, et al. (2014) ARG-ANNOT, a new bioinformatic tool to discover antibiotic resistance genes in bacterial genomes. Antimicrob Agents Chemother 58(1):212–220. 21. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: Molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 30(12):2725–2729. 22. Edgar RC (2004) MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797. 23. Paradis E, Claude J, Strimmer K (2004) APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics 20(2):289–90. 24. Revell LJ (2012) phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol Evol 3(2):217–223.

72

Fig. 1: Class A (Cefepime): The midpoint rooted phylogenetic tree was constructed by maximum-likelihood method based on the multiple sequence alignment. Bootstrap values are shown on each node. The phylogenetic tree contains class A β-lactamases. The color of the branch (also in scale bar) indicated functional activity of enzymes against Cefepime β-lactam antibiotic. Blue through red color indicated susceptible to resistance functional activity.

73

Fig. 2: Class A (Cephalothin): The midpoint rooted phylogenetic tree was constructed by maximum-likelihood method based on the multiple sequence alignment. Bootstrap values are shown on each node. The phylogenetic tree contains class A β-lactamases. The color of the branch (also in scale bar) indicated functional activity of enzymes against Cephalothin β-lactam antibiotic. Color scheme and annotations are as in Fig .1.

74

Fig. 3: Class B (Piperacillin): The midpoint rooted phylogenetic tree was constructed by maximum-likelihood method based on the alignment. Bootstrap values are shown on each node. The phylogenetic tree contains class B β- lactamases. The color of the branch (also in scale bar) indicated functional activity of enzymes against Piperacillin β-lactam antibiotic. Color scheme and annotations are as in Fig .1.

75

Fig. 4: Class B (Imipenem): The midpoint rooted phylogenetic tree was constructed by maximum-likelihood method based on the alignment. Bootstrap values are shown on each node. The phylogenetic tree contains class B β- lactamases. The color of the branch (also in scale bar) indicated functional activity of enzymes against Imipenem β- lactam antibiotic. Color scheme and annotations are as in Fig .1.

76

Fig. 5: Class B (Meropenem): The midpoint rooted phylogenetic tree was constructed by maximum-likelihood method based on the alignment. Bootstrap values are shown on each node. The phylogenetic tree contains class B β- lactamases. The color of the branch (also in scale bar) indicated functional activity of enzymes against Meropenem β-lactam antibiotic. Color scheme and annotations are as in Fig .1.

77

Fig. 6: Class C (Cefoxitin): The midpoint rooted phylogenetic tree was constructed by maximum-likelihood method based on the alignment. Bootstrap values are shown on each node. The phylogenetic tree contains class C β- lactamases. The color of the branch (also in scale bar) indicated functional activity of enzymes against cefoxitin β- lactam antibiotic. Blue through red color indicated susceptible to resistance functional activity.

78

Fig. 7: Class C (Piperacillin-TZB): The midpoint rooted phylogenetic tree was constructed by maximum- likelihood method based on the alignment. Bootstrap values are shown on each node. The phylogenetic tree contains class C β-lactamases. The color of the branch (also in scale bar) indicated functional activity of enzymes against Piperacillin-TZB β-lactam antibiotic. Blue through red color indicated susceptible to resistance functional activity. Blue indicates susceptible, Green moderate resistant and red highly resistant.

79

Fig. 8: Class D (Imipenem): The midpoint rooted phylogenetic tree was constructed by maximum-likelihood method based on the alignment. Bootstrap values are shown on each node. The phylogenetic tree contains class D β- lactamases. The color of the branch (also in scale bar) indicated functional activity of enzymes against Imipenem β- lactam antibiotic. Blue through red color indicated susceptible to resistance functional activity. Blue indicates susceptible, Green moderate resistant and red highly resistant.

80

Fig. S1: Class B (B1, Imipenem): The midpoint rooted phylogenetic tree was constructed by maximum-likelihood method based on the alignment. Bootstrap values are shown on each node. The phylogenetic tree contains sub-group B1 β-lactamases. The color of the branch (also in scale bar) indicated functional activity of enzymes against Imipenem β-lactam antibiotic. Blue through red color indicated susceptible to resistance functional activity. Blue indicates susceptible, Green moderate resistant and red highly resistant.

81

Fig. S2: Class B (B1, Meropenem): `The midpoint rooted phylogenetic tree was constructed by maximum- likelihood method based on the alignment. Bootstrap values are shown on each node. The phylogenetic tree contains sub-group B1 β-lactamases. The color of the branch (also in scale bar) indicated functional activity of enzymes against Meropenem β-lactam antibiotic. Blue through red color indicated susceptible to resistance functional activity. Blue indicates susceptible, Green moderate resistant and red highly resistant.

82

Fig. S3: Class B (B1, Piperacillin): `The midpoint rooted phylogenetic tree was constructed by maximum- likelihood method based on the alignment. Bootstrap values are shown on each node. The phylogenetic tree contains sub-group B1 β-lactamases. The color of the branch (also in scale bar) indicated functional activity of enzymes against Piperacillin β-lactam antibiotic. Blue through red color indicated susceptible to resistance functional activity. Blue indicates susceptible, Green moderate resistant and red highly resistant.

83

Fig. S4: Class B (B3, Imipenem): The midpoint rooted phylogenetic tree was constructed by maximum-likelihood method based on the alignment. Bootstrap values are shown on each node. The phylogenetic tree contains sub-group B3 β-lactamases. The color of the branch (also in scale bar) indicated functional activity of enzymes against Imipenem β-lactam antibiotic. Blue through red color indicated susceptible to resistance functional activity. Blue indicates susceptible, Green moderate resistant and red highly resistant.

84

Fig. S5: Class B (B3, Meropenem): The midpoint rooted phylogenetic tree was constructed by maximum- likelihood method based on the alignment. Bootstrap values are shown on each node. The phylogenetic tree contains sub-group B3 β-lactamases. The color of the branch (also in scale bar) indicated functional activity of enzymes against Meropenem β-lactam antibiotic. Blue through red color indicated susceptible to resistance functional activity. Blue indicates susceptible, green moderate resistant and red highly resistant.

85

Fig. S6: Class B (B3, Piperacillin): The midpoint rooted phylogenetic tree was constructed by maximum- likelihood method based on the alignment. Bootstrap values are shown on each node. The phylogenetic tree contains sub-group B3 β-lactamases. The color of the branch (also in scale bar) indicated functional activity of enzymes against Piperacillin β-lactam antibiotic. Blue through red color indicated susceptible to resistance functional activity. Blue indicates susceptible, green moderate resistant and red highly resistant.

86

Fig. S7: Phylogenetic tree of class B β-lactamases (11). Enzymes groups are colour coded by function as follows- Magenta, Pale pink and orange color indicates subclass B1, B2 and B3 metallo-β-lactamases respectively. Red ribonucleases, cyan glyoxalase IIs, green A-type flavoproteins and black no functions assigned.

87

Python Script to identify the convergent amino acid residue

#!/usr/bin/python import re import sys def checkArguments(): if len(sys.argv) -1 != 4: sys.stderr.write("differentAA.py [paml rst file] [convergent species list] [target list] [output fileName]\n") sys.stderr.write("Detect AA present in convergent species and not in the orther\n") sys.stderr.write("[email protected]\n") sys.exit(1); def readPamlFile(pamlFile, convergentList, targetList, outFileName): fastas = {} try: fichierPaml = open(pamlFile, "r") fichierSortie = open(outFileName + ".convergent.txt", "w") fichierSortieCommon = open(outFileName + ".common.txt", "w") log = open(outFileName + ".logs.txt", "w") startReadTree = 0 startReadSeq = 0 seqLineNumber = 0 tree = "" for line in fichierPaml: line = line[:-1] # detection de l'arbre if line.find("TreeView") != -1: startReadTree = 1 else: if startReadTree == 1 and line == "": startReadTree = 0 if startReadTree == 1: line = re.sub("\) ", ")", line) line = re.sub(" \)", ")", line) line = re.sub(" ,", ",", line) # PAML rajoute le numero du noeud _ le nom de la sequence par ex: # 1_HOMO, il faut supprimer ce [0-9]*_ line = re.sub("[0-9]+_", "", line) tree = line if line.find("List of extant and reconstructed sequences") != -1: startReadSeq = 1 else: if startReadSeq == 1: seqLineNumber += 1 if line.find("Overall") != -1: startReadSeq = 0 elif line != "" and seqLineNumber > 3: line = re.sub(" ", "*", line) line = re.sub(" ", "", line) line = re.sub("[\*]+", " ", line)

88

fastaHeader = line[0:line.find(" ")] # PAML reindex les noeuds de l'arbre a sa facon, mais met un node#[0-9]* devant le nom du noeud dans la seq ancestrale # reconstruite, il faut l'effacer if fastaHeader.find("node#") == 0: fastaHeader = fastaHeader[5:] fasta = line[line.find(" ") + 1:] if fastaHeader != "": fastas[fastaHeader] = fasta; #fichierFasta.write(">" + fastaHeader + "\n" + fasta + "\n") #if seqLineNumber > 3: #fichierTree.write(tree) except IOError as e: print ("I/O error({0}): {1}".format(e.errno, e.strerror)) # find the same character in the convergent genes firstGeneFasta = fastas[convergentList[0]] fastaSize = len(firstGeneFasta) difConv = False difOther = False dif = False #print firstGeneFasta #print str(fastaSize) #print "len(convergentList) : " + str(len(convergentList)) for i in range(0, fastaSize): char = firstGeneFasta[i] difConv = False difOther = False dif = False for k in fastas: if k != convergentList[0]: # si le gene regarde devrais etre convergent if k in convergentList: if fastas[k][i] != char: difConv = True log.write("At position " + str(i) + " the AA is different\n") break elif k in targetList: if fastas[k][i] == char: difOther = True log.write("At position " + str(i) + " the AA of gene " + k + " is the same as the first convergent one (" +convergentList[0] + ")\n") break if difConv or difOther: fichierSortie.write("*") else: fichierSortie.write(char) for g in range(1, len(convergentList)): if fastas[convergentList[g]][i] != char: dif = True if dif: fichierSortieCommon.write("*")

89

else: fichierSortieCommon.write(char) #for k in fastas: # fasta = fasta[k] # if k in convergentList: # print k fichierPaml.close() fichierSortie.close() fichierSortieCommon.close() log.close() print (outFileName + ".convergent.txt generated") print (outFileName + ".common.txt generated") print (outFileName + ".log.txt generated") if __name__ == '__main__': checkArguments() readPamlFile(sys.argv[1], sys.argv[2].split(","), sys.argv[3].split(","), sys.argv[4])

90

91

Chapter-4

An integrative database of β-lactamase enzymes: sequences, structures, functions and phylogenetic trees

92

An integrative database of β-lactamase enzymes: sequences, structures, functions and phylogenetic trees Vivek Keshri1#, Seydina M. Diene1#*, Adrien Estienne2, Justine Dardaillon3, Olivier Chabrol2, Laurent Tichit4, Jean-Marc Rolain1, Didier Raoult1, Pierre Pontarotti1,5

1. Aix Marseille Univ, IRD, APHM, MEPHI, IHU-Méditerranée Infection, Marseille, France 2. Aix-Marseille Université, I2M, UMR-CNRS 7373, Evolution Biologique et Modélisation, 13331 Marseille, France 3. Centre de Recherche en Biologie cellulaire de Montpellier (CRBM), UMR5237, CNRS- Université de Montpellier, 1919 route de Mende, F-34090 Montpellier, France 4. Institut de Mathématiques de Marseille, Aix Marseille Université, I2M Centrale Marseille, CNRS UMR 7373, 13453, Marseille, France 5. CNRS, Marseille, France

# These authors contributed equally to this manuscript * Corresponding Author: Email: [email protected] Tel: +(33) 4 91 83 56 49 Fax: +33-4 13 73 24 02 MEPHI, IHU-Méditerranée Infection. 19-21 Boulevard Jean Moulin 13005 Marseille, France Keywords : Online database, β-lactamase genes, gene characterization

(Submitted to Antimicrobial Agents and Chemotherapy)

93

Abstract

Beta-lactamase enzymes have attracted substential medical attention from researchers and clinicians because of their clinical, ecological, and evolutionary interest. Here, we present an online comprehensive database of β-lactamase enzymes. The current database is manually curated and incorporates the primary amino acid sequences, closest structural information in the external structure database (Protein Data Bank) and the functional profiles and phylogenetic trees of the four molecular classes (A, B, C, and D) of β-lactamases. The functional profiles are presented according to the minimum inhibitory concentrations and kinetic parameters that make them more useful for the investigators. Here, a total of 1,155 β-lactam resistance genes are analysed and described in the database. The latter is implemented in MySQL and the related website is developed with Zend Framework 2 on an Apache server, supporting all major web browsers. Users can easily retrieve and visualize biologically important information using a set of efficient queries from a graphical interface. This database is freely accessible at http://ifr48.timone.univ-mrs.fr/beta-lactamase/public/.

Introduction

Countless lives are being saved by antibiotics including β-lactams that continue to be a leading resource for treatment against bacterial infections. β-lactamase enzymes are the primary cause of the β-lactam resistance phenomenon because they are able to inactivate these latter by cleaving their four-membered β-lactam ring (1). Nowadays, more than 1,300 natural β-lactamase enzymes have been documented (2) and their classification into families helps investigators to illustrate how genes are related to each other. All genes of the same family can be closely packed together to form a cluster that can be used to predict the function of novel genes. The classification of β-lactamase enzymes was initially introduced by Ambler who classified them into four distinct molecular classes (i.e. class A, B, C, and D) (3, 4). The existing publicly available β-lactamase databases offer partial information and have limited functions such as sequence search by blast analysis. Therefore, we designed a new database that contains the primary amino acid sequences, three-dimensional protein structures in an external structure database (Protein Data Bank), phylogenetic trees, and functional profiles.

94

The functional profiles are available in the form of Minimum Inhibitory Concentration (MIC) (5) and Kinetic parameters (kcat, km, and kcat/km) (6). A few β-lactamase databases are available in the public domain, such as A Comprehensive β-lactamase Molecular Annotation Resource (CBMAR) (7), The Lahey Clinic Database (www.lahey.org/Studies/), Antibiotic Resistance Genes Database (ARDB) (8), Lactamase Engineering Database (LacED) (9), The Comprehensive Antibiotic Resistance Database (CARD) (10), and A comprehensive database of widely circulated β-lactamases (BLAD) (11). The content of these databases is very useful, but none of the aforementioned databases individually provides a complete set of information containing the phylogenetic trees of each molecular class of enzymes, the closest three- dimensional protein structures and functional profiles. Therefore, a centralized database with updated resources on β-lactamases is required. This article presents the first release of an integrative β-lactamase database which contains the sequence, structure, function and phylogenetic informations on β-lactamase enzymes. Our motivation in creating this database is to provide a centralized collection of information based on antibiotic resistance genes. The current database offers a user-friendly interface and allows query result for future studies.

Data collection and analysis Input Sequences and classification: A total of 1,155 amino acid sequences of β-lactamase were downloaded from the ARG-ANNOT database (12). These sequences were then classified into the four Ambler molecular classes. Classes A, B, C, and D includes respectively a total of 620, 174, 151, and 210 sequences.

Phylogenetic tree construction: All amino acid sequences of each class were aligned through MUSCLE algorithm (13), followed by truncated poor alignment regions by trimAL program (14). Subsequently, the phylogenetic tree was constructed in FastTree (15) with the approximate maximum-likelihood method. FigTree-v1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/) was used for visualization of the phylogenetic tree.

95

Three-dimensional structure available in external database-Protein Data Bank (PDB): A total of 81,117 non-redundant three-dimensional protein structures were downloaded from the PDB database. The total of 1,155 β-lactamase sequences was used as queries and subjected to Blast search analysis (16) against the non-redundant structures from the PDB database (17). The protein blast search was accomplished with default settings and results are available in tabular format with significant blast hits, structure ids, identity, alignment regions, expected thresholds and bit-score entity in the current database.

Collection of MIC and Kinetic data from published literature: Minimum Inhibitory Concentration (MIC) is the lowest concentration of the antibiotic that inhibits the visible growth of a microorganism after overnight incubation (5). The available clinical and microbiological MIC (18) of each enzymes was collected from the most relevant published literature followed by the calculated fold change between them. The wild-type MIC breakpoints sometimes called microbiological breakpoints are distinct from those with acquired or selected resistance mechanisms. Also, the kinetic parameters (kcat, Km, and kcat/km) were obtained for available antibiotics.

Database design and Implementation: The current database is implemented in MySQL 5.6 and aims at being readily accessible according to different search criteria. The online release of the database works in all key browsers and is compatible with all operating systems. The web interface was developed with Zend Framework 2.5 using PHP 5.5 under Apache 2.4 web server which runs on Linux Ubuntu 15.10. Figure 1 depicts the database workflow which explains the data collection process from published journal articles and retrieval of information from the web interface.

How to access the data: The home page of the current database offers different sections for the data search, such as a particular β-lactamase enzyme, Ambler molecular classes, phylogenetic trees, and literature references of individual proteins. Two types of search options are available in the database- text and advanced search. On the home page of the database, text search can be accomplished by entering simple text or a phrase in the search box. For instance, it takes the

96 enzyme's name “TEM”. This type of text search option is associated with all data structures. On the other hand, the advanced search option is more computationally productive since the user can search β-lactamase enzymes according to their Ambler class and get complete information.

Selecting a group of β-lactamases: An easy way to obtain information on enzymes is to browse by name, for instance by typing “TEM” in the given box and clicking on the green button “Find β-lactamase”. If the user does not want to use an enzyme name but other criteria, such as Ambler class, this can be selected on the home page (fig. 2). Once the user has selected a particular data set, he can access all available informations in the database along with more details such as β- lactamase class, the accession number of gene sequences and CDS regions or location on the genetic DNA support (fig. 3). It is also possible to download a CSV file containing the name, description, accession number, and sequence of the β-lactamase of interest.

Data links, Downloads, and Interface: Basic informations such as amino acid sequences, closest 3D-structure in structural database Protein Data Bank (PDB), kinetic parameters and Minimum Inhibitory Concentration (MIC) are accessible via the downloadable links provided at the top of the page (fig. 3). The primary sequence of each gene is available in CSV format. Users can also access the closest three-dimensional structure present in the external database. The structure-related page shows the PDB Ids of the structure along with their sequence similarity with respective known β-lactamase sequences. The percent of identity, query coverage, query sequences, alignment length, total number of mismatches, gaps, e-value and the bit-scores are listed in a separate column. The external database link provides an easy access to the complete three-dimensional structure (fig. 4). The functional profiles of each β-lactamase are accessible in the form of two data sets; Minimum Inhibitory Concentration (MIC) and Kinetic parameters. These data were gathered from published literature and manually curated. Users can access all these data such as a list of enzymes, clinical MICs, Microbiological MIC and fold change between clinical and microbiological MIC's (fig. 5A) in the present database. Regarding kinetic parameters, km, kcat and kcat/km values are obtainable (fig. 5B). These parameters represent the catalytic contact for the conversion of substrate to product, substrate concentration (which initial rate is one-half of

97 the maximum velocity) and specificity constant or catalytic efficiency, respectively. The kinetic parameters are tabulated and are readily attainable for users (fig. 5B). The phylogenetic trees are available in different file formats including Newick, Nexus, PNG, phyloxml, PDB, and EPS (fig. 6). The user can download all database resources from the downloadable links. These include: 1. Primary amino acid sequences of all 1,155 β-lactamases. 2. Phylogenetic trees of Ambler molecular classes A, B, C, and D. 3. Homologous protein structures from external PDB databases. 4. MIC and Kinetic parameters. 5. Published scientific literature or references.

Data volume summary: This section describes the total data volume available for users. The current release of the database presents four phylogenetic trees of Ambler classes. The phylogenetic tree of classes A, B, C, and D contains 620, 174, 151, and 210 sequences, respectively. For the designing of targeted therapy against any protein target, three-dimensional structural information is essential, therefore we brought to light the three-dimensional crystal structure of β-lactamases. The blast search mentioned above to select sequences against a protein structure database (Protein Data Bank) revealed multiple significant hits. Currently, this database provides the external link to 1,516 unique three-dimensional structures in PDB. The extracted functional information from published journal articles in the form of MICs and kinetic parameters are accessible. Users can access MIC and kinetic data of 218 and 137 β-lactamase enzymes.

Discussion

Here, we describe a database of β-lactamases, which brings together most of the available β-lactam resistance genes, their phylogenetic trees, structural information and functional profiles. We have provided a complete set of readily accessible information on β-lactamases to complement and to assist in the growth of its study. The current database incorporates compiled data from existing resources, such as published literature, public domain databases and analyzes the data for meaningful information. For the collection and processing of biological informations, a set of 1,155 amino acid sequences were selected. A total of 53.68 % sequences

98 belonged to class A, 15.06% to class B, 13.07% to class C, and 18.19% to class D. Because of the unpublished articles/data and the lack of information in the literature, we were unable to get experimental data on all enzymes. Therefore, the present version of the database offers a lesser number of functional profiles (MIC and kinetic parameters) compared to the 1,155 sets of β- lactamase. The β-lactamase enzyme has a crucial role in β-lactam antibiotic resistance. There is an urgent need to understand the role and mechanism of enzymes in antibiotic resistance, especially in β-lactam resistance. The β-lactamase databases, developed earlier, compiled information which was insufficient to understand their evolutionary relations and their functional or structural information. The ARGO (19), ARDB (8), CBMAR (7), CARD (10), BLAD (11), and ARG- ANNOT (12) are popular antibiotic resistance databases. These databases cover many aspects of β-lactamase enzymes such as sequence information, subfamily wise phylogenetic trees, protein sequences, nucleotide sequences and mutational information. On the other side, these databases lack functional information (MIC's and Kinetics parameter), the large-scale phylogenetic tree of molecular classes and all possible homologous enzymes or protein 3D-structures. The provided phylogenetic trees in this current database will help researchers in the identification of the closest enzymes and functional profiles providing hydrolyzing properties. Because of the huge diversity of β-lactamase enzymes and rapid identification of novel enzymes from different environments, this database aims at becoming the first index of all available information on the topic and will continue to be updated over time. Future efforts will also strive to include all the diversity of β-lactamase enzymes present in the environmental and human microbiome, as well as newly reported β-lactamases. Currently, this database contains only protein 3D structure ids and their similarities with the core input set of query sequences. Therefore, we will incorporate three-dimensional coordinate values in the future version. In conclusion, the current database is an online database for studying β-lactam resistance enzymes. This database contains comprehensive information and researchers can use and/or share the contents of the database without any restriction. The current database is the first substantial effort to integrate sequences, structures, functions and phylogenetic trees of β- lactamases. Because of the primary amino acid sequences, structures, phylogenetic trees, and functional information, the present database is different from other existing databases, making it a useful resource for researchers. With the vast amount of sequences and three-dimensional

99 structural data becoming available, the future improvements of the current database will focus on expanding available information. With this release, we are committed to being proactive in understanding the needs of all our users and will provide a continuouslyupdated version of this database.

Acknowledgments: The TradOnline for English correction of manuscript, Mr. Olivier Chabrol and Sylvain Buffet for uploading the database and maintenance on website are gratefully acknowledged.

Funding: This work was supported by the French Government under the “Investments for the Future” program managed by the National Agency for Research (ANR), Méditerranée Infection 10-IAHU-03.

Conflict of interest We have no conflicts of interest to declare.

100

References

1. Holten KB, Onusko EM. 2000. Appropriate prescribing of oral beta-lactam antibiotics. Am Fam Physician 62:611–20. 2. Bush K. 2013. The ABCD’s of β-lactamase nomenclature. J Infect Chemother 19:549– 559. 3. Ambler RP, Coulson AF, Frère JM, Ghuysen JM, Joris B, Forsman M, Levesque RC, Tiraby G, Waley SG. 1991. A standard numbering scheme for the class A beta- lactamases. Biochem J 276 ( Pt 1:269–70. 4. Ambler R. 1980. The structure of β-lactamases. Philos Trans R Soc Lond B Biol Sci 289:321–331. 5. Andrews JM. 2001. Determination of minimum inhibitory concentrations. J Antimicrob Chemother 48 Suppl 1:5–16. 6. Koshland DE. 2002. The Application and Usefulness of the Ratio kcat/KM. Bioorg Chem 30:211–213. 7. Srivastava A, Singhal N, Goel M, Virdi JS, Kumar M. 2014. CBMAR: a comprehensive β-lactamase molecular annotation resource. Database 2014:bau111-bau111. 8. Liu B, Pop M. 2009. ARDB — Antibiotic Resistance Genes Database. Nucleic Acids Res 37:443–447. 9. Thai QK, Bös F, Pleiss J. 2009. The Lactamase Engineering Database: a critical survey of TEM sequences in public databases. BMC Genomics 10:390. 10. McArthur AG, Waglechner N, Nizam F, Yan A, Azad MA, Baylay AJ, Bhullar K, Canova MJ, De Pascale G, Ejim L, Kalan L, King AM, Koteva K, Morar M, Mulvey MR, O’Brien JS, Pawlowski AC, Piddock LJ V, Spanogiannopoulos P, Sutherland AD, Tang I, Taylor PL, Thaker M, Wang W, Yan M, Yu T, Wright GD. 2013. The comprehensive antibiotic resistance database. Antimicrob Agents Chemother 57:3348–3357. 11. Danishuddin M, Baig MH, Kaushal L, Khan AU. 2013. BLAD: A comprehensive database of widely circulated beta-lactamases. Bioinformatics 29:2515–2516. 12. Gupta SK, Padmanabhan BR, Diene SM, Lopez-Rojas R, Kempf M, Landraud L, Rolain J-M. 2014. ARG-ANNOT, a new bioinformatic tool to discover antibiotic resistance genes in bacterial genomes. Antimicrob Agents Chemother 58:212–20. 13. Edgar RC. 2004. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797. 14. Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. 2009. trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25:1972–1973. 15. Price MN, Dehal PS, Arkin AP. 2010. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One 5:e9490. 16. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic Local Alignment Search Tool. J Mol Biol 215:403–410. 17. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. 2000. The Protein Data Bank. Nucleic Acids Res 28:235–242. 18. Turnidge J, Paterson DL. 2007. Setting and revising antibacterial susceptibility breakpoints. Clin Microbiol Rev 20:391–408. 19. Scaria J, Chandramouli U, Verma SK. 2005. Antibiotic Resistance Genes Online (ARGO): a Database on vancomycin and beta-lactam resistance genes. Bioinformation 1:5–7.

101

Figures

Fig. 1. Database Scheme: Figure explains the data collection step, literature review, followed by retrieval of information.

102

Fig. 2. Snapshot depicting the database interface. Database home page and β-lactamase search engine for information collection. Queries using one β-lactamase enzyme name, for example, “TEM” or “Tem”. The home page offers to search all β-lactamase enzymes according to their Ambler molecular classes such as Serine β-lactamase (Class A, C, and D) and Metallo-β- lactamase (Class B). The bottom section of the page offers the phylogenetic tree of all enzymes of respective molecular class. The references of the individual enzyme (if published in a scientific journal) are also available in the reference section.

103

Fig. 3. Graphical user interface of the sequential result: The database contains the primary amino acid sequences. Therefore, this interface provides the sequence information such as name, molecular class, accession number, and CDS region or localization.

104

Fig. 4. Graphical user interface of the structural information: The database incorporates primary sequences of β-lactamase. These sequences have greater similarity with the available non-redundant three-dimensional structure in the structure database. This snapshot presents the closest structure of TEM-1D as an example in the external PDB database.

105

Fig. 5. Graphical user interface of functional profile: (A) shows clinical and microbiological MIC (of respective β-lactam antibiotic) along with fold change between them. (B) shows the enzymatic activity of β-lactamase with their corresponding β-lactam antibiotic in the form of kcat, km, and kcat/km values.

106

Fig. 6. Graphical user interface of a phylogenetic tree: The Figure describes the evolutionary relation among β-lactamases at intra-class scale.

107

108

Conclusions

In summary, this thesis explored the area of β-lactam antibiotic hydrolyzing enzyme “β- lactamase”, and it contributes to understand the diversity of β-lactamases in the natural environment, the functional evolution, and database lay a foundation for future work.

 Reconstructed the most probable putative ancestral sequence and HMM profiles of extant β- lactamase, which have extensive characteristics that uncover distantly homologous β-lactamases from bacteria, archaea, eukaryotes, viruses and metagenomes. The analysis revealed numerous cryptic β-lactamase-like sequences available as a dark matter in biological databases such as environmental (env_nr) database, human microbiome metagenomic (HMP-HMASM), human microbiome reference genome (HMP-HMRGD) and NCBI non-redundant (nr). Our analysis revealed the existence of novel β-lactamase sequences in the metagenomic database and these enzymes appeared to be similar to the four different molecular classes (Class A, B, C, and D). The wet-lab experiments also confirmed that the putative sequences have β-lactamase functional activity.  The study demonstrated functional convergent evolution of β-lactamase within the Ambler molecular class (intra-class scale). The functional similarities and variations occur in distantly related enzymes which reveales that the function of β-lactamase can’t infer only by sequence similarity or phylogenetic tree basis. Therefore, it is essential for understanding the causes behind this convergent evolution that could be the next steps of the investigation.  The newly developed database contains comprehensive information of β-lactamase that unifies sequences, structures, functions and phylogenetic information in a single database. This database is overcome the limitations of other existing public databases as it incorporated sequence to structure to functional informations, and making it a useful resource for researchers. The database will provide a continually updated version in future.

109

Future Perspective

 The reconstructed ancestral sequences and HMM profiles of β-lactamase can facilitate future identification of novel β-lactamase enzymes and large-scale phylogenetic analysis (diversity) and it could be useful to gain deeper insight into their origin and evolution. The presence of all antibiotic resistance β-lactamase in database needs to be validated experimentally and can be further classified based on their substrate-profile inhibitors.  Functional convergent evolution can be used as a benchmark for testing molecular structure, protein functional sites conservation, and biological pathway analysis that reflects the functional relationship.  The newly designed β-lactamase database can be extended with newly reported enzymes in upcoming days.

110

References

Ambler, R. (1980). The structure of β-lactamases. Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 289(1036), 321–331. Andrews, J. M. (2001). Determination of minimum inhibitory concentrations. The Journal of antimicrobial chemotherapy, 48 Suppl 1, 5–16. Babic, M., Hujer, A. M., & Bonomo, R. a. (2006). What’s new in antibiotic resistance? Focus on beta-lactamases. Drug Resistance Updates, 9, 142–156. Bebrone, C. (2007). Metallo-β-lactamases (classification, activity, genetic organization, structure, zinc coordination) and their superfamily. Biochemical Pharmacology, 74(12), 1686–1701. Bush, K. (2013). The ABCD’s of β-lactamase nomenclature. Journal of Infection and Chemotherapy, 19(4), 549–559. Cunningham, C. W., Omland, K. E., & Oakley, T. H. (1998). Reconstructing ancestral character states: a critical reappraisal. Trends in ecology & evolution, 13(9), 361–6. Danishuddin, M., Baig, M. H., Kaushal, L., & Khan, A. U. (2013). BLAD: A comprehensive database of widely circulated beta-lactamases. Bioinformatics, 29(19), 2515–2516. Doolittle, R. F. (1994). Convergent evolution: The need to be explicit. Trends in Biochemical Sciences, 19(1), 15–18. Eddy, S. R., Crooks, G., Green, R., Brenner, S., & Altschul, S. (2011). Accelerated Profile HMM Searches. PLoS Computational Biology, 7(10), e1002195. Elias, I., & Tuller, T. (2007). Reconstruction of ancestral genomic sequences using likelihood. Journal of computational biology : a journal of computational molecular cell biology, 14(2), 216–37. Hall, B. G., & Barlow, M. (2004). Evolution of the serine beta-lactamases: past, present and future. Drug Resistance Updates, 7(2), 111–123. Hibbert-rogers, L. C. F., Heritage, J., Todd, N., & Hawkey, P. M. (1994). Convergent evolution of TEM-26, a β-lactamase with extended-spectrum activity. Journal of Antimicrobial Chemotherapy, 33(4), 707–720. Holten, K. B., & Onusko, E. M. (2000). Appropriate prescribing of oral beta-lactam antibiotics. American family physician, 62(3), 611–20. Kong, K.-F., Schneper, L., & Mathee, K. (2010). Beta-lactam antibiotics: from antibiosis to resistance and bacteriology. APMIS : acta pathologica, microbiologica, et immunologica Scandinavica, 118(1), 1–36. Koshi, J. M., & Goldstein, R. A. (1996). Probabilistic reconstruction of ancestral protein sequences. Journal of molecular evolution, 42(2), 313–20.

111

Koshland, D. E. (2002). The Application and Usefulness of the Ratio kcat/KM. Bioorganic Chemistry, 30(3), 211–213. Lamotte-Brasseur, J., Knox, J., Kelly, J. A., Charlier, P., Fonzé, E., Dideberg, O., & Frére, J. M. (1994). The structures and catalytic mechanisms of active-site serine beta-lactamases. Biotechnology & genetic engineering reviews, 12, 189–230. Liu, B., & Pop, M. (2009). ARDB — Antibiotic Resistance Genes Database. Nucleic Acids Research, 37(October 2008), 443–447. Massova, I., & Mobashery, S. (1998). Kinship and diversification of bacterial penicillin-binding proteins and beta-lactamases. Antimicrobial agents and chemotherapy, 42(1), 1–17. McArthur, A. G., Waglechner, N., Nizam, F., Yan, A., Azad, M. A., Baylay, A. J., et al. (2013). The comprehensive antibiotic resistance database. Antimicrobial Agents and Chemotherapy, 57(7), 3348–3357. Penders, J., Stobberingh, E. E., Savelkoul, P. H. M., & Wolffs, P. F. G. (2013). The human microbiome as a reservoir of antimicrobial resistance. Frontiers in microbiology, 4, 87. Poole, K. (2004). Resistance to beta-lactam antibiotics. Cellular and molecular life sciences : CMLS, 61(17), 2200–23. Sharma, V., Colson, P., Giorgi, R., Pontarotti, P., & Raoult, D. (2014). DNA-dependent RNA polymerase detects hidden giant viruses in published databanks. Genome Biology and Evolution, 6(7), 1603–10. Srivastava, A., Singhal, N., Goel, M., Virdi, J. S., & Kumar, M. (2014). CBMAR: a comprehensive β-lactamase molecular annotation resource. Database, 2014(0), bau111- bau111. Thai, Q. K., Bös, F., & Pleiss, J. (2009). The Lactamase Engineering Database: a critical survey of TEM sequences in public databases. BMC genomics, 10, 390. Yang, Z., Kumar, S., & Nei, M. (1995). A new method of inference of ancestral nucleotide and amino acid sequences. Genetics, 141(4), 1641–50. Zhang, J., & Nei, M. (1997). Accuracies of ancestral amino acid sequences inferred by the parsimony, likelihood, and distance methods. Journal of molecular evolution, 44 Suppl 1, S139-46.

112

113

Acknowledgements

First and foremost, I want to thanks my principal advisor Prof. Didier Raoult. It has been an honor to be his Ph.D. student. I appreciate all his contribution of time, ideas, and funding to make my Ph.D. experience productive and stimulating. The joy and enthusiasm his has for his research was contagious and motivational for me, even during tough times in the Ph.D. pursuit. Without his suggestions and guidance it was impossible to accomplish my doctoral degree. A special thanks to Dr. Pierre Pontarotti for his continuous support in the Ph.D. program. Dr. Pontarotti was always there to listen and to give advice. He taught me how to ask question and express my ideas. He showed me different ways in research problem and the need to be persistent to accomplish any goal. I am heartily thankful to Prof. Jean-Marc Rolain being an excellent adviser, he is a person who guides me, teaches me a lot of things and played a major role in learning about the complexity of β-lactamase enzymes. I would like to thank Prof. Seydina M. Diene for suggestions and improvement of scientific manuscript. I am thankful to my thesis committee for having accepted to read my thesis manuscript and assess my work: thanks to Prof. Pierre E. Fournier, Prof. Max Maurin and Dr. Patricia Renesto. I would also like to thank all of my friends (Vikas Sharma, Dhamodharan Ramasamy, Senthil Sankar, Sourabh Jain, Jaishriram Rathored, Sweta Nidhi, Ganesh Warthi, Raja Duraisamy, Arup Panda and Charbel Aboukhater) for their support in all situations. Last, but not least, I thank my family: my parents, Mr. Gopal Das Gupta and Mrs. Ananti Gupta for giving me life in the first place, for all their love and encouragement. My elder brothers Dr. Sanjeev Kumar Gupta and Mr. Rajeev Kumar Gupta raised me with a love of science and supported me in all my pursuits, also for reminding me that my research should always be useful and serve good purposes for all scientific community.

114