<<

Structural Dynamics and Catalytic Mechanism of

Hydroxymethylbilane Synthase

Thesis submitted in partial fulfillment

of the requirements for the degree of

Doctor of Philosophy in Computational Natural Sciences

by

Navneet Bung 201366683 [email protected]

International Institute of Information Technology, Hyderabad

(Deemed to be University)

Hyderabad – 500032, INDIA

November 2019

Copyright © Navneet Bung, 2019

All Rights Reserved

International Institute of Information Technology Hyderabad, India

CETIFICATE

I, hereby certify that the matter embodied in this thesis entitled “Structural Dynamics and

Catalytic Mechanism of Synthase” has been carried out by Navneet

Bung at Tata Consultancy Services Limited, Hyderabad, India and at Center for

Computational Natural Sciences and Bioinformatics, International Institute of Information

Technology, Hyderabad, India under our supervision and that it will not be submitted elsewhere for a degree.

______Date Dr. U Deva Priyakumar

______Date Dr. Gopalakrishnan Bulusu

i

ii

I would like to dedicate this thesis to my grand parents Late. Sri. Satyanarayanji Bung and Smt. Jethi Bung for their constant love and support

iii

Acknowledgments

During my Ph.., I have realized that the most important components for research are time, guidance, infrastructure, moral and financial support. It is a pleasure to acknowledge all the people who have contributed and helped me during the course of my research.

Working for a company and being married it is very difficult to manage time, which is essential for a long term commitment. I thank Tata Consultancy Service Ltd. (TCS) and my supervisor Dr. Gopalakrishnan Bulusu for giving me enough freedom to pursue my research.

I would also like to thank my family, especially my wife Pooja, for taking care of most of my responsibilities at the personal front which provided adequate time to focus on my research.

Along with time, proper guidance is extremely important to pursue research. I am grateful to the highly knowledgeable and insightful scientific expertise of Dr. U. Deva Priyakumar and

Dr. Gopalakrishnan Bulusu, who have guided and provided me constant support throughout my journey towards Ph. D. They have been a source of inspiration throughout my research work. I thank them for their willingness to help and troubleshoot most of the problems that I have faced during my research.

I would like to thank Dr. Arijit Roy, TCS and Dr. N. V. Suresh Kumar, IIIT-H for helping me with the concepts quantum chemistry and molecular mechanics. I am grateful to Dr.

Semparithi Aravindan for help with high-performance computing facility. The master‟s program at IIIT-H has helped me to develop a strong foundation of the basics concepts. I would like to thank Dr. Abhijit Mitra for strengthening my concepts in Chemistry, Dr. Nita

Parekh for introducing me to Statistics and Bioinformatics and Dr. Prabhakar Bhimalapuram for providing the concepts of classical mechanics. Special thanks to the faculty of the center

Dr. Marimuthu Krishnan, Dr. Harjinder Singh and Dr. Tapan Kumar Sau for their constant support, encouragement and helping me to enhance my knowledge in various fields.

iv

I am grateful to TCS for partially funding my Ph.D. I would also like to thank TCS and IIIT-

H for providing me with the necessary infrastructure required to pursue my research.

On such a long journey towards a doctorate, the friends and colleagues have played an important role who were always there in time of despair. I would like to thank my group personnel at IIIT-H, Dr. Shampa, Dr. Swati, Tanashree, Shruthi, Rohit and Chinmayee for their lively discussion and group meetings which always inspired me to explore new ideas during my research work. I would like to thank my batch mates (a long list ...) for sweet memories that we share and making my stay at campus comfortable. I would also like to thank other members of the lab, Bipin, Mohan, Broto, Preethi, Prashanti, Nishta, Kartheek,

Sohini and Sandhya who have inspired me to pursue research.

I would also like to thank my friends at TCS, Siladitya, Shyam, Meenakshi, Harini,

Dibyajyoti, Poulami, Akriti, Sutapa and Sowmya for their moral support, encouragement, fruitful discussions and random chit-chat sessions which have always motivated me. I would like to thank Umesh and company for maintaining the cleanliness at the lab and providing a healthy environment to work in.

I thank my grandparents for their constant support during my entire education and also believing in me. I would also like to thank my parents and other members of my family for their support in all circumstances. I would like to thank my wife, Pooja, and two wonderful kids Aadya and Atharv who are the driving force in my journey towards Ph.D.

v

Abstract

Heme, the second most abundant , serves as the cofactor for proteins involved in respiration and metabolism. is synthesized through a well conserved and established heme biosynthetic pathway in all eukaryotes and most prokaryotes. Hydroxymethylbilane synthase (HMBS), also known as deaminase (HMBS; EC 2.5.1.61), is the third enzyme in the heme biosynthetic pathway. It catalyzes the stepwise polymerization of four molecules of porphobilinogen (PBG) into the linear tetrapyrrole, 1-hydroxymethylbilane

(HMB). In yeast, apart from 5- dehydratase (ALAD), HMBS has been proposed to play a rate limiting role in the heme biosynthetic pathway. In humans, mutations of HMBS have been linked to acute intermittent (AIP). In vitro studies on HMBS have suggested certain residues with catalytic importance, but their specific role in the catalysis and the chain elongation is unclear. In the current thesis, classical and quantum mechanical calculations have been used to understand the structural dynamics and catalytic mechanism of HMBS. Molecular dynamics (MD) simulations of the E. coli HMBS through the different stages of chain elongation suggested the importance of domain movements and the active site loop movement in the polymerization of four units of PBG.

However, in the human HMBS (hHMBS), an additional 29-residue insert wedged between domains 1 and 3 prevents the domain motions. In hHMBS, the cofactor turn movement along with minor domain motions provides space for the addition of first two PBG moieties to the dipyrromethane cofactor, while the movement of the active-site loop away from the active- site region facilitates the accommodation of the next two PBG moieties. Residues R26, D99 and R167 are proposed to be important for the catalysis based on MD simulations and earlier hypothesis. The findings from MD simulations provide a basis to study the catalytic mechanism using quantum mechanical (QM) and QM/MM calculations. The QM calculations were performed on a cluster model consisting of the active site of hHMBS

vi

enzyme. The addition of one molecule of PBG to the cofactor is carried out in four steps: (1) protonation of the PBG substrate; (2) deamination of PBG; (3) electrophilic addition of the deaminated substrate to the terminal pyrrole ring of the enzyme-bound cofactor and (4) deprotonation at the carbon at the α-position of the penultimate ring. The rate limiting step for the complete mechanism was found to be the deamination of the PBG moiety. The

QM/MM calculations demonstrated the significance of protein environment in obtaining accurate energies for the catalytic mechanism. The findings from this study provide a detailed understanding of the chain elongation mechanism using multi-scale modeling and would assist in future work aimed at modulating the activity of HMBS.

vii

Contents

ACKNOWLEDGMENTS ...... IV

ABSTRACT ...... VI

LIST OF FIGURES ...... XII

LIST OF TABLES ...... XXIII

1 INTRODUCTION ...... 1

1.1 PIGMENTS OF LIFE ...... 1

1.2 HEME BIOSYNTHESIS PATHWAY ...... 1

1.3 ...... 3 1.3.1 Acute Intermittent Porphyria ...... 4 1.4 HMBS ...... 4 1.4.1 Sequence comparison of HMBS from different organisms ...... 5 1.4.2 Structure of HMBS ...... 7 1.4.3 Oxidation state of cofactor ...... 9 1.4.4 Catalytic mechanism ...... 10 1.4.5 Hypothesis for polypyrrole accommodation ...... 12 1.4.6 Dual functionality of HMBS...... 13

1.5 IN SILICO STUDIES TO UNDERSTAND ENZYME DYNAMICS AND CATALYTIC MECHANISM ...... 15

1.6 OVERVIEW OF THE THESIS ...... 16

2 METHODS ...... 17

2.1 CLASSICAL MECHANICS ...... 17 2.1.1 Molecular dynamics ...... 17 2.1.1.1 Integration algorithms ...... 19 2.1.1.2 Periodic boundary conditions ...... 21 2.1.1.3 Ensembles ...... 22 2.1.2 Enhanced Sampling methods ...... 22 2.1.2.1 Steered molecular dynamics (SMD) ...... 23 2.1.2.2 Random acceleration molecular dynamics (RAMD) ...... 24 2.2 QUANTUM MECHANICS ...... 25 2.2.1 Born-Oppenheimer approximation ...... 26 2.2.2 Hartee-Fock (HF) approximation ...... 26 2.2.3 Density functional theory ...... 27 2.2.4 Basis set ...... 29

viii

2.3 QM/MM CALCULATIONS ...... 30 2.3.1 Subtractive Scheme ...... 30 2.3.2 Additive Scheme ...... 31 2.3.2.1 Mechanical Embedding ...... 32 2.3.2.2 Electrostatic Embedding ...... 32 2.3.3 QM-MM boundary ...... 33

3 STRUCTURAL DYNAMICS OF E. COLI HMBS ...... 34

3.1 BACKGROUND ...... 34

3.2 SIMULATION DETAILS ...... 36 3.2.1 Preparation of starting structures ...... 36 3.2.2 Intermediate steps of pyrrole chain elongation ...... 38 3.2.3 Exit Mechanism ...... 38 3.2.4 Protein Relaxation ...... 40 3.2.5 Trajectory and structural analyses ...... 40

3.3 RESULTS ...... 41 3.3.1 Pyrrole Chain Elongation ...... 41 3.3.1.1 Active site loop dynamics ...... 43 3.3.1.2 Structural changes in the protein during the chain elongation process ...... 45 3.3.1.3 Principal component analysis ...... 45 3.3.1.4 Volume of active site cavity during tetrapyrrole elongation...... 48 3.3.1.5 Accommodation of the growing pyrrole chain ...... 49 3.3.1.6 Role of active site residues in the catalytic mechanism ...... 52 3.3.2 Exit Mechanism ...... 54 3.3.3 Protein Relaxation after exit of product ...... 58 3.3.3.1 Regaining compactness post relaxation ...... 58 3.3.3.2 Restoration of correlations during relaxation ...... 58 3.4 DISCUSSION ...... 60

3.5 CONCLUSION ...... 62

4 STRUCTURAL DYNAMICS OF HUMAN HMBS ...... 64

4.1 BACKGROUND ...... 64

4.2 SIMULATION DETAILS ...... 66 4.2.1 Preparation of starting structures ...... 66 4.2.2 Pyrrole chain elongation ...... 66 4.2.3 Random acceleration molecular dynamics ...... 67 4.2.4 Trajectory and structural analyses ...... 67

4.3 RESULTS ...... 68 4.3.1 Structural fluctuations in hHMBS during polypyrrole assembly ...... 68

ix

4.3.2 Cofactor turn movement assists in polypyrrole accommodation within the catalytic site ...... 70 4.3.3 Active site loop opens to accommodate the polypyrrole ...... 70 4.3.4 Charged and hydrophilic active site residues stabilize the growing pyrrole chain ...... 72 4.3.5 Water-mediated interactions stabilize the polypyrrole ...... 75 4.3.6 R26 and R167 are critical residues for enzymatic catalysis ...... 77 4.3.7 R167 dynamics are important for HMB exit ...... 79 4.3.8 MD simulations of hHMBS mutations that impaired chain elongation ...... 82 4.3.9 Structural analysis of Non-active site mutations related to AIP ...... 84

4.4 DISCUSSION ...... 86

4.5 CONCLUSION ...... 91

5 CATALYTIC MECHANISM OF HUMAN HMBS: A QM STUDY ...... 92

5.1 BACKGROUND ...... 92

5.2 SIMULATION DETAILS ...... 96 5.2.1 Initial structure preparation ...... 96 5.2.2 MD simulations with PBG and cofactor ...... 98 5.2.3 QM calculations on the model system...... 98

5.3 RESULTS AND DISCUSSION ...... 99 5.3.1 QM calculations using cluster model 2...... 104 5.3.1.1 Protonation of PBG via the arginine sidechain ...... 105 5.3.1.2 Deamination of PBG ...... 105 5.3.1.3 Electrophilic addition: Covalent bond formation between the cofactor and the substrate ...... 109 5.3.1.4 Deprotonation at the α-position carbon of the second ring of DPM ...... 110 5.3.2 Effect of dielectric constants on the cluster model ...... 112 5.3.3 R26 and D99 are crucial for catalysis ...... 113

5.4 CONCLUSIONS ...... 115

6 CATALYTIC MECHANISM OF HUMAN HMBS: A QM/MM STUDY ...... 117

6.1 BACKGROUND ...... 117

6.2 SIMULATION DETAILS ...... 119 6.2.1 Initial structure preparation ...... 119 6.2.2 QM/MM calculations ...... 119

6.3 RESULTS ...... 120 6.3.1 Protonation ...... 121 6.3.2 Deamination of PBG molecule ...... 122 6.3.3 Electrophilic addition ...... 124 6.3.4 Deprotonation ...... 124 6.3.5 Alternative mechanisms ...... 126 6.3.5.1 Water as the proton donor ...... 126

x

6.3.5.2 Water mediated proton transfer from R167 ...... 128 6.3.6 Comparison of energies with the QM cluster model ...... 129

6.4 DISCUSSION ...... 130

6.5 SUMMARY ...... 132

7 CONCLUSIONS ...... 133

7.1 STRUCTURAL DYNAMICS ...... 133

7.2 ROLE OF ACTIVE SITE RESIDUES AND WATER MOLECULES ...... 133

7.3 EXIT MECHANISM ...... 134

7.4 AIP-RELATED MUTATIONS ...... 134

7.5 CATALYTIC MECHANISM ...... 134

7.6 FUTURE WORK ...... 136

8 REFERENCES ...... 137

APPENDIX A ...... 146

APPENDIX B ...... 148

LIST OF PUBLICATIONS ...... 153

xi

List of Figures

Figure 1.1: Overview of tetrapyrrole synthesis in plants and animals...... 3

Figure 1.2: Schematic representation of the reaction catalyzed by HMBS...... 5

Figure 1.3: A. Structure of E. coli HMBS (EcHMBS). The domains 1, 2 and 3 are colored in blue, red and green, respectively. The DPM cofactor is shown in orange sticks. The active site loop and hinge regions are shown in cyan and black, respectively. B. Schematic representation of secondary structure elements of HMBS. The color scheme is similar to the

3D structure. The DPM cofactor is shown as an orange octagon...... 6

Figure 1.4: Structure comparison of E. coli (brown), A. thaliana (pink) and human (blue)

HMBS. The major differences in the structure are highlighted using dashed-ellipse...... 8

Figure 1.5: Different isoforms of cofactor A. Dipyrromethane (reduced), B. Dipyrromethene

(oxidized), . Dipyrromethenone (oxidized). D. The 3D structure of cofactor in dipyrromethane (brown, 2YPN), dipyrromethene (cyan, 1PDA) and dipyrromethenone

(magenta, 4HTG) ...... 10

Figure 1.6: Schematic representation of catalytic mechanism based on earlier hypotheses

(Louie et al., 1996; Roberts et al., 2013; Song et al., 2009)...... 12

Figure 1.7: Schematic representation of enzymatic and non-enzymatic product of 1- hyrdroxymethylbilane...... 14

Figure 2.1: Schematic representation of constant velocity SMD. (Figure adapted from

NAMD tutorial)...... 23

Figure 2.2: Subtractive coupling scheme to calculate the energy in QM/MM calculations ... 31

Figure 3.1: Stereo view of the active site residues R11, D84, R131, R132, R149, R155, R176 and R232 along with conserved residues K55 and K59 (in green) and the DPM cofactor (in pink) shown as sticks...... 35

xii

Figure 3.2: Schematic representation of mechanism of tetrapyrrole chain elongation catalyzed by HMBS. The figure shows (a) protonation of PBG, (b) deamination of PBG to form MePy, and (c) nucleophilic attack by ring B of DPM on MePy, forming an intermediate that (d) undergoes deprotonation to form a tripyrrole moiety (P3M). Subsequent additions of

PBG elongate the chain to form tetrapyrrole (P4M), pentapyrrole (P5M), and hexapyrrole

(P6M) moieties. At the end of the last step, the tetrapyrrole product HMB is hydrolyzed, leaving the DPM cofactor attached to protein. The rings of the elongating pyrrole chain are labeled as A, B, C, D, E, and F starting from the pyrrole ring covalently attached to cysteine residue. The acetate and propionate side groups of the are denoted by “-Ac” and “-

Pr,” respectively...... 37

Figure 3.3: Structure of HMBS showing the 3 possible channels (C1, F1 and F2) for the exit of HMB from HMBS detected by CAVER...... 39

Figure 3.4: RMSD of the protein backbone, with respect to the loop modeled and energy minimized 2YPN structure, at each stage of chain elongation...... 41

Figure 3.5: A. HeatMap showing the residue-wise contribution to RMSD of the protein through the different stages of simulation indicated by a color bar at the right. The simulation stages are denoted by a color bar along the abscissa. B. RMSF plot of the protein from DPM to P6M stages. The color bar at the bottom corresponds to the domain demarcation (domain 1

– blue, domain 2 – red and domain 3 – green, active site loop – cyan, hinge regions – black).

C. Solvent Accessible Surface Area (SASA) and Radius of gyration (Rgyr) values show the loss of compactness of the protein on addition of each PBG molecule through the stages of simulation from DPM to P6M. The error bars shown in the figure represent the standard deviation of the data from the mean...... 42

Figure 3.6: A. Plot of distance between centers of mass of the loop residues (42-60) and the active site residues (11, 19, 84, 131, 132, 155, 176, 242) to track the loop movement in the

xiii

different stages of the simulation. B. Interaction of K55 with E88 (open loop conformation denoted in blue color) and with V306, E305 and Q243 (closed loop conformation denoted in green color) regulate the loop movement in DPM stage. C. Distance graphs depicting the interaction of D50 with R149 during DPM stage to regulate loop movement (along with K55 interactions); interaction of D50 with R149 (black), D50 with G150 (red) and K55 with E88

(green), involved in loop movement during the P3M stage; Interaction of K55 with E239

(black) and G60 with E88 (red) involved in loop movement during the P4M stage...... 44

Figure 3.7: Structural changes observed in the protein in DPM and P6M stages. The length of beta sheets and helices in domain 1 shortens (red arrow) in P6M stage, a shorter helix is observed in domain 3 (green arrow) and the hinge region between domain 1 and domain 2 uncoils (pink arrow) in the P6M stage...... 45

Figure 3.8: The dominant motions of the protein along the top principal components (PC1,

PC2 and PC3) for A. DPM, B. P3M, C. P4M, D. P5M and E. P6M stages of chain elongation. The red arrows show the direction and magnitude of the motion corresponding to the principal components. The HMBS protein is shown in green tube...... 47

Figure 3.9: RMSF plot corresponding to the principal components 1 and 2 for each of the stages of chain elongation with domain demarcation along the abscissa...... 48

Figure 3.10: A. Volume of the active site and cumulative SASA of the active site residues (as reported in Table 1) show an increase from DPM to P6M stage with the addition of each PBG molecule. B. Graph showing the increase in domain separation between domain 1 and 2 during the catalytic stages of HMBS. C. Polypyrrole accommodation within the active site cleft as a result of major domain movements during chain elongation; snapshots of only DPM

& P6M stages are shown. D. Graphs of the interaction between W18 and R176 with B ring of pyrrole chain in P3M stage and with C ring of pyrrole chain in P4M stage showing the shift in interactions to accommodate the polypyrrole chain...... 50

xiv

Figure 3.11: A. A closer view of the stacking interaction of R11 with F62 present at the base of the active site loop measured by the distance between the CZ atom of R11 and center of mass of the phenyl ring of F62 in the DPM stage. B. In the P4M stage D84 interacts with

R11, disrupting the stacking of R11 with F62. C. Stacking interaction of R11 with F62 keeps the active site loop in a position facilitating its movement during DPM and P3M stages, shown as a distance graph between the CZ atom of R11 and the center of mass of the phenyl ring of F62 in DPM, P3M and P4M stages. D. Distance graph depicting the interaction of

R11 with D84 in the P4M stage which causes the stacking between R11 and F62 to break. E.

Distance graphs depicting the interaction of R176 with the B (in black) and C (in red) rings of the polypyrrole chain during the stages of chain elongation...... 53

Figure 3.12: Conformation of 1-hydroxymethylbilane during the HMB stage simulation.

With reference to initial conformation of HMB (green) the C and F rings are displaced by a distance of 3.5 Å and 5.3 Å, respectively in the structure at the end of simulation (cyan) during the HMB stage...... 55

Figure 3.13: Exit mechanism of HMB from HMBS. A. Structure of HMBS showing probable exit directions, either from C or F ring of the HMB unit, that are considered for

SMD simulations: C1 (Direction from the center of mass of the C ring in HMB towards the interface of domain 1 and domain 2), F1 (Direction from the center of mass of the F ring in

HMB towards the active site loop ), F2 (Direction from the center of mass of the F ring in

HMB towards the interface between the active site loop and domain 1). B. Surface representation of the structure of HMBS showing the most probable path predicted for the exit of HMB through the space between domain 1, domain 2 and the active site loop (Video

S3). C. Force as a function of time during the SMD runs in 3 different exit paths: C1, F1 and

F2. D. Graphs showing the interactions of R11, Q19 and R176 with HMB during the SMD

xv

calculations through C1, F1, and F2 path, indicating the possible role of these catalytically important residues in the exit of the product...... 57

Figure 3.14: A. Probability distribution graph of radius of gyration of the protein in DPM,

P6M and no-HMB stages showing that in the no-HMB stage, the Rgyr falls back close to the

DPM stage. DCCM plots of B. no-HMB; C. DPM; and D. P6M stages, showing the differences and similarities in correlation to the DPM stage as the protein relaxes from no-

HMB stage. The marked regions in the no-HMB stage resemble more to DPM stage during protein relaxation...... 59

Figure 4.1: Structure of hHMBS (PDB ID: 3ECR) with modeled missing residues showing the domains (1, 2 and 3 in blue, red and green, respectively) with hinge regions (115-119,

213-218 and 237-240 in pink), the additional 29-residue insert (296-324 in orange) and the active site loop (56-76 in cyan)...... 65

Figure 4.2: A. RMSD of the protein along with domain-wise RMSD, with reference to the

DPM stage structure, during stages of chain elongation; B. Root mean square fluctuation of

Cα atoms of hHMBS protein at each stage of chain elongation, from DPM to P6M. The encircled regions show high fluctuations in the active site loop and 29-residue insert regions;

C. The distance between the centers of mass of the active site loop and the active site residues

(R26, Q34, D99, R149 and R150) as a function of time along the stages of chain elongation emphasizes the role of the loop in polypyrrole accommodation. The standard error is shown as error bars along Y coordinates; D. The cofactor turn shifts in the P4M stage (green) compared to the DPM stage (cyan) to accommodate the growing pyrrole chain during the stages of chain elongation...... 69

Figure 4.3: A. Characterization of the active site loop conformation as a function of i) distance between the centers of mass of the active site residues (R26, Q34, D99, R149 and

R150) and the active site loop on X-axis and ii) the RMSD of the active site loop on Y-axis

xvi

along the concatenated trajectory; and B. The conformation of the active site loop at bins 1, 2 and 3 shown in red, blue and green, respectively. DPM is shown in yellow and P6M is shown in green sticks C. Interaction of T58 and D61 with R26 along with F77 holds the active site loop (red) close to the catalytic cleft in the DPM stage; B. Loss of the interaction of T58 and

D61 with R26 moves the active site loop away from the catalytic cleft in the P4M stage.

Instead, T58 stabilizes the sidechain of ring D...... 71

Figure 4.4: Interactions of the residues lining the active site with the growing pyrrole chain at

A. DPM, B. P3M, C. P4M, D. P5M and E. P6M stages. The interactions are shown by dotted lines. The cofactor is shown in yellow and water molecules are represented as spheres. The carboxylate oxygen (O) of acetate/propionate sidechains and the pyrrole nitrogen (N) for each of the pyrrole rings are labeled. The numeric suffixes (1-4) indicate the positions of the oxygen on the pyrrole and the letter suffixes (A-F) indicate the respective pyrrole rings, from which the carboxylate oxygen is attached...... 74

Figure 4.5: A. Graph showing the total number of water molecules interacting with the polypyrrole during the stages of chain elongation (black); total number of water-mediated interactions between the growing polypyrrole and protein (red). B. N169 formed hydrogen bonds with the acetate sidechain (O1D and O2D) of ring D in the P4M stage. In the P5M and

P6M stages of chain elongation, the direct interaction between N169 and ring D was lost, and instead this interaction was mediated by water molecules which persisted for over 90% of the simulation time. Also, a water-mediated interaction between D99 and the nitrogen atom (NF) of ring F was observed...... 75

Figure 4.6: Positions of probable proton donors R26, Q34, R167, R195 and the PBG in A.

P4M and B. P5M stages obtained from the docking studies. R167 is the most probable proton donor in these stages...... 77

xvii

Figure 4.7: Stereograms of PBG docked in the active site of hHMBS at A. DPM, B. P3M, C.

P4M and D. P5M stages showcase the role of R26 and R167 in proton donation to the incoming PBG. All the measurements are in angstroms. The PBG, polypyrrole, arginine (R26 and R167) and aspartate (D99) residues are shown in cyan, green, orange and violet colors, respectively...... 78

Figure 4.8: Comparison of the DPM cofactor (pink and orange) and the HMB (green and cyan) conformation, before and after a 50 ns MD simulation. Minor changes in the conformation of HMB were observed after 50 ns of MD simulation...... 79

Figure 4.9: A. Three possible exit paths for HMB are depicted using blue spheres. The position of the HMB is shown in sticks (yellow); the arrows are indicative of the probable exit paths A, B and C. B. Exit path A and C. Exit path B. Domains 1, 2 and 3 are represented in blue, red and green, respectively and the active site loop in cyan. R167 and HMB, in green and yellow respectively, are shown in the stage where HMB is beginning to exit from the protein. R167' and HMB', in orange and pink respectively, represent the stage where HMB is almost outside the protein...... 80

Figure 4.10: The D99G mutation shows a shift in the cofactor by a magnitude of 3.5 Å towards the active site loop. Also, the pyrrole nitrogens come close to each other when compared to the DPM cofactor in the wild-type protein...... 82

Figure 4.11: Changes in the polypyrrole conformation in the P3M stage due to the R26C mutation arrange ring C in a position difficult for further substrate addition. The conformation of the growing pyrrole chain in the A. wild-type and B. mutant R26C enzyme.

The polypyrrole and D99 residue are shown in cyan and yellow sticks, respectively...... 83

Figure 4.12: Non-active site mutations responsible for AIP destabilize the PBGD structure.

A. Hydrogen bond between T269 and E250; B. Salt bridge between E250 and R116; C.

xviii

Residues in the secondary structure elements like helix; and D. Residue L245 in the hydrophobic core formed by V301, P324 and L244...... 86

Figure 4.13: A representative structure of the DPM stage in A. EcHMBS and B. hHMBS.

The domains 1, 2 and 3 are colored in blue, red and green, respectively. The 29-residue insert is shown in magenta color. Schematic representation of the domain dynamics in C. EcHMBS and D. hHMBS. The 29-residue insert prevents the motion of both domains 1 and 2. The length of the black arrows in C. and D. is proportional to the extent of the domain motions observed in EcHMBS and hHMBS, respectively...... 87

Figure 4.14: Classification of amino acid mutations those are responsible for altering A. cofactor binding, B. incoming PBG binding sites for pyrrole chain elongation, C. polypyrrole charge stabilization, and D. HMB release...... 90

Figure 5.1: Schematic representation of the catalytic mechanism proposed by Song et al

(Song et al., 2009)...... 94

Figure 5.2: Schematic representation of the catalytic mechanism proposed by Roberts et al

(Roberts et al., 2013) ...... 95

Figure 5.3: Schematic representation of A. Cluster model 1 and B. Cluster model 2, used to study the catalytic mechanism. Important distances (d1, d2, d3 and d4) that were used as reaction coordinates for potential energy scans are shown as red dashed lines. The atom identifiers for important atoms are shown in black...... 97

Figure 5.4: Stereoview of the representative structure obtained from a 10 ns MD simulation.

The substrate, PBG, is shown in blue sticks, while the active site residues are shown in magenta sticks...... 100

Figure 5.5: Optimized structures of reactant, transition states and intermediates for protonation, deamination and nucleophilic attack steps obtained from QM calculations on the cluster model 1. The cofactor, substrate, and residues R26 and D99 are shown in green,

xix

magenta, pink and blue sticks, respectively. The transition states for the protonation and deamination steps are shown as a 2D-representation. All the measurements are in Angstroms

(Å)...... 102

Figure 5.6: The two possible mechanisms for the deprotonation step of catalysis. A.

Deprotonation through the bridging carbon atom between rings B and C. B. Deprotonation through the carboxylate sidechain of D99. The cofactor, substrate, residues R26 and D99 are shown in green, magenta, pink and blue sticks, respectively. All the measurements are in

Angstroms (Å)...... 103

Figure 5.7: Energy profile for the catalytic mechanism of HMBS using cluster model 1. The energy profile for the first three steps of catalysis is shown in green. Two mechanisms are proposed (denoted in blue and orange) for the final deprotonation step of catalysis...... 104

Figure 5.8: Schematic representation of reactant, transition states and intermediates observed during the addition of PBG molecule to the DPM cofactor. Residues T25, S28, N169, and acetate and propionate sidechains of PBG that are part of cluster model 2 are not shown in the schematic. The important distances for each structure are reported in Angstroms (red)...... 106

Figure 5.9: Optimized structures of reactant, transition states and intermediates for protonation, deamination and nucleophilic attack steps obtained from QM calculations on the cluster model 2. The cofactor, substrate, and residues T25, R26 S28, N169 and D99 are shown in green sticks while the cofactor, residues R26 and D99 are shown in, magenta, pink and blue sticks, respectively. All the measurements are in Angstroms (Å)...... 107

Figure 5.10: Energy profile for the catalytic mechanism of HMBS using cluster model 2. . 108

Figure 5.11: 2D-scan for the protonation and deamination steps of the HMBS catalytic mechanism. In the protonation step, the distance between the proton on arginine and nitrogen atom of PBG decreases from 2.4 to 0.9 Å (Y co-ordinate), while for the deamination the

xx

distance between the carbon and the nitrogen of PBG increases from 1.5 to 2.5 Å (X co- ordinate)...... 109

Figure 5.12: Optimized structures of intermediates and transition state for deprotonation 1 and 2 steps. The cofactor, residues T25, S28, N169 are shown in green sticks while the cofactor, residues R26 and D99 are shown in, magenta, pink and blue sticks, respectively. All the measurements are in Angstroms (Å)...... 112

Figure 5.13: Comparison of energy profile by varying the dielectric constant of the medium.

...... 113

Figure 5.14: Schematic representation of the most probable catalytic mechanism for the addition of one unit of the substrate to the DPM cofactor...... 115

Figure 6.1: Stereoview of residues considered in the QM region for A. Model 1 and B.

Model 2. The DPM cofactor, PBG, R26 and D99 residues are shown as green, magenta, pink and blue sticks, respectively. All the other residues in the QM region are shown as cyan sticks. C. Energy profile corresponding to the QM/MM Models 1 (orange) and 2 (blue) used for studying the catalytic mechanism of HMBS...... 121

Figure 6.2: Optimized structures of the reactant, transition states and intermediates for protonation, deamination and electrophilic attack steps obtained from QM/MM calculations.

The protein is shown in blue cartoon representation. The DPM, PBG, and residues R26 and

D99 are shown in green, magenta, pink and blue sticks, respectively. All the measurements are in Angstroms (Å)...... 123

Figure 6.3: Optimized structures of the intermediates, transition states and the product for the deprotonation steps obtained from QM/MM calculations. The protein is shown in blue cartoon representation. The DPM, PBG, and residues R26 and D99 are shown in green, magenta, pink and blue sticks, respectively. All the measurements are in Angstroms (Å). .. 125

xxi

Figure 6.4: Energy profile corresponding to the potential energy scan for the transfer of a proton from the water molecule to the amino group of the PBG moiety. The distance between the hydrogen of the water molecule and the amino group of PBG decreases from 1.8 to 1.0 Å.

The structures corresponding to the reactant and intermediate are overlayed on the plot. The

DPM, PBG, R26 and D99 residues are shown in green, magenta, pink and blue sticks, respectively. The distance mentioned in red is measured in Angstroms (Å)...... 127

Figure 6.5: Optimized structures of A. Reactant and B. Intermediate corresponding to the transfer of the proton from R167 to the amino group of PBG via a water molecule. The DPM,

PBG, R26, D99 and R167 residues are shown in green, magenta, pink, blue and purple sticks, respectively. All the measurements are in Angstroms (Å). C. Energy profile corresponding to the potential energy scan for the transfer of the proton from R167 to the amino group of PBG moiety via a water molecule...... 128

Figure 6.6: Comparison of energy profile obtained from QM/MM calculations (blue) and

QM cluster model calculations (magenta) at M06 level of theory...... 130

xxii

List of Tables

Table 1.1: Pairwise sequence percent identity matrix (obtained from Clustal2.1) for HMBS sequences from six different organisms...... 5

Table 1.2: Structures of HMBS available in Protein Data Bank (PDB)...... 7

Table 3.1: Residues interacting with the pyrrole chain and its terminal ring at each stage of chain elongation...... 50

Table 3.2: Interaction of protein residues with HMB moiety during its exit through each path; numbers indicating the % occupancy of interaction...... 56

Table 4.1: Persistence of interactions between oxygen atoms of the polypyrrole sidechain and polar amino acid residues/backbone atoms and water molecules...... 76

Table 5.1: Natural charges on the atoms, important for the catalytic mechanism, calculated using natural population analysis as implemented in Gaussian09 at M06/6-311++G(d, p) level. Reference to atom numbers used in the above table has been provided in Fig. 5.2B .. 111

xxiii

Chapter 1

Introduction

1.1 Pigments of life

Tetrapyrrole biosynthesis is essential for virtually all forms of life. serve as precursors for heme, sirroheme, , , and (i.e., ) and are involved in various biological processes (Battersby, 2000; Layer et al., 2010). They are also known as „pigments of life‟ due to their color rendering property to leaves (green) and blood (red) (Leeper, 1985, 1989). The color of the blood is primarily determined by the optical properties of the heme group and the associated metal ion

(Kienle et al., 1996). Heme is a prosthetic group at the core of the oxygen carrier metalloprotein (Layer et al., 2010). It also plays an important role in carrying carbon monoxide and nitric oxide (White and Marletta, 1992). Apart from the transport of the diatomic gaseous molecule, heme acts as a cofactor for cytochromes, catalases, peroxidases, nitric oxide synthase (White and Marletta, 1992), nitric oxide reductase and various other enzymes (Layer et al., 2010). Heme consists of a tetrapyrrole, , bound to an iron ion, which consists of four pyrrole molecules linked through methine/methylene bridges. The heme is synthesized by the heme biosynthetic pathway, which is common to all eukaryotes and most prokaryotes (Heinemann et al., 2008).

1.2 Heme biosynthesis pathway

Heme is synthesized through a well conserved pathway in most organisms. The first common precursor for the synthesis of heme, 5-aminolevulinic acid (ALA), is formed through two independent pathways namely Shemin and C5 pathway (Fig. 1.1). In the

Shemin pathway, glycine and succinyl-CoA are condensed to form ALA by the enzyme

1

Chapter 1

5-aminolevulinic acid synthase (ALAS) (Shemin and Rittenberg, 1945; Ferreira and

Gong, 1995). The Shemin pathway is primarily found in mammals, fungi and α- proteobacteria (Fig. 1.1). In the C5 pathway, ALA is formed through a two-step process utilizing the glutamyl-tRNA as a precursor (Beale et al., 1975; Huang et al., 1984). The

C5 pathway is found in plant, archea and most bacteria (Meller and Gassman, 1982). In

Euglena gracilis, genes related to both the C5 and Shemin pathways were found, which is extremely rare (Jahn and Heinz, 2009; Layer et al., 2010). In the next step, δ- aminolevulinic acid dehydratase (ALAD) uses two molecules of ALA to form porphobilinogen (PBG) moiety (Jaffe, 2004; Jordan and Gibbs, 1985; Erskine et al.,

2003). Four units of PBG are polymerized by hydroxymethylbilane synthase (HMBS) enzyme to form a linear tetrapyrrole, 1-hydroxymethylbilane (HMB) or preuroporphyrinogen (Anderson and Desnick, 1980; Battersby et al., 1976; Warren and

Scott, 1990) (Fig. 1.2). The HMBS is also known as porphobilinogen deaminase (PBGD).

The linear form of HMB is unstable and immediately cyclizes to give I

(non-enzymatic product). In presence of uroporhyrinogen III synthase (UROS) the HMB is cyclized to form uroporphyrinogen III (Shoolingin-Jordan, 1995). The steps from ALA to the formation of a cyclic porphyrin tetrapyrrole, uroporphyrinogen III, are conserved in all organisms (Layer et al., 2010) (Fig. 1.1). The next three enzymes, uroporphyrinogen

III decarboxylase (UROD) (Mauzerall and Granick, 1958), coproporphyrinogen III oxidase (CPO) (Yoshinaga and Sano, 1980) and protoporphyrinogen IX oxidase (PPO)

(Siepker et al., 1987) yield protoporphyrin IX, where the four pyrrole moieties are in resonance (Ajioka et al., 2006). In the last step, the enzyme ferrochelatase inserts an iron in the protoporphyrin IX moiety to yield heme molecule (Dailey et al., 2000). Most of the enzymes in the heme biosynthetic pathway are well characterized and structures are available on the Protein Data Bank (Layer et al., 2010). Few archea and bacteria

2

Introduction synthesize heme through slightly different pathways (Ishida et al., 1998; Buchenau et al.,

2006).

Figure 1.1: Overview of tetrapyrrole synthesis in plants and animals.

In humans, the heme biosynthetic pathway is localized to both mitochondria and cytosol

(Ajioka et al., 2006). The first step of the heme biosynthesis takes place in the mitochondria. The product ALA is transferred to cytosol, where it is converted to coporphyrinogen III by the action of four enzymes, ALAD, HMBS, UROS and UROD.

The last three enzymes are localized to mitochondria. Also, the end product, heme, acts as the feedback inhibitor for the first enzyme of the pathway, ALAS (Hayashi et al., 1980).

In erythroid cells, the heme is majorly used in the formation of hemoglobin, while in hepatic cells it is used in cytochrome P450 (Kauppinen, 2005).

1.3 Porphyrias

The porphyrias are a group of disorders that result due to the accumulation of porphyrin in the human body (Anderson, 2001). They are divided into hepatic or erythropoietic

3

Chapter 1 based on their anatomical origin. The porphyrias can further be divided into two categories, acute and cutaneous, based on the clinical impact. In acute porphyria, the nervous system is affected (Albers and Fink, 2004), while the cutaneous porphyria affects the skin (Elder, 1990). Mutations in any of the genes that encode enzymes of the heme biosynthetic pathway lead to porphyria (Anderson, 2001; Layer et al., 2010; Ajioka et al.,

2006). Most of the mutations are inherited from parents. The mutations may be autosomal dominant, autosomal recessive or X-linked dominant inheritance (Anderson, 2001; Layer et al., 2010).

1.3.1 Acute Intermittent Porphyria

In humans, mutations in HMBS cause acute intermittent porphyria (AIP), an autosomal- dominant inborn disorder characterized by life-threatening acute neurovisceral attacks

(Chen et al., 1994, 2018; De Siervi et al., 1999; Whatley and Badminton, 1993). AIP results in elevated levels of the heme precursors ALA and PBG in urine. The prevalence of disease varies from 0.5 - 10 per 100000 people (Anderson, 2001). Over 400 mutations for 93 residues have been reported for the HMBS enzyme (Stenson et al., 2009). The different types of mutations include missense/nonsense, splicing, regulatory, insertions and deletions.

1.4 HMBS

HMBS is the third enzyme in the Shemin heme biosynthetic pathway. It has a unique cofactor, dipyrromethane (DPM), which is covalently attached to a conserved cysteine through a thioether bond (Fig. 1.3A). The cofactor acts as a primer for the addition of four units of PBG molecules (Fig. 1.3A) (Louie et al., 1992; Jordan and Warren, 1987; Warren and Jordan, 1988). At the end of the polymerization, the tetrapyrrole is hydrolyzed leaving the cofactor attached to the cysteine residue for the next catalytic cycle.

4

Introduction

Figure 1.2: Schematic representation of the reaction catalyzed by HMBS.

1.4.1 Sequence comparison of HMBS from different organisms

HMBS gene is well characterized in many organisms. A comparison of protein sequences from Escherichia coli (E. coli), Arabidopsis thaliana (A. thaliana), Plasmodium falciparum (P. falciparum), Homo sapiens (H. sapiens), Vibrio cholerae (V. cholerae) and Bacillus megaterium (B. megaterium) showed 53 conserved residues (Appendix A).

The multiple sequence alignment (MSA) was performed using Clustal Omega (Sievers and Higgins, 2014). Pair wise alignment of all the sequences considered for the MSA study showed a sequence identity of at least 30% (Table 1.1), displaying that the sequences are conserved across species. However, a maximum of 74.92 % of sequence identity is observed for the E. coli and V. cholerae homologs.

Table 1.1: Pairwise sequence percent identity matrix (obtained from Clustal2.1) for HMBS sequences from six different organisms.

P. falciparum A. thaliana H. sapiens B. megaterium E. coli V. cholerae

P. falciparum 100.0 - - - - -

A. thaliana 34.1 100.0 - - - -

H. sapiens 30.7 36.9 100.0 - - -

5

Chapter 1

B. megaterium 31.7 40.7 45.0 100.0 - -

E. coli 32.4 45.8 46.3 46.3 100.0 -

V. cholera 31.7 43.2 42.6 47.7 74.9 100.0

Figure 1.3: A. Structure of E. coli HMBS (EcHMBS). The domains 1, 2 and 3 are colored in blue, red and green, respectively. The DPM cofactor is shown in orange sticks. The active site loop and hinge regions are shown in cyan and black, respectively. B. Schematic representation of secondary structure elements of HMBS. The color scheme is similar to the 3D structure. The DPM cofactor is shown as an orange octagon.

6

Introduction

1.4.2 Structure of HMBS

To date, the X-ray crystal structures of HMBS with their DPM cofactor have been reported for E. coli (Louie et al., 1996), human (Song et al., 2009; Gill et al., 2009; Pluta et al., 2018a), A. thaliana (Roberts et al., 2013), B. megaterium (Azim et al., 2014; Guo et al., 2017) and V. cholera (Uchida et al., 2018) (Table 1.2).

Table 1.2: Structures of HMBS available in Protein Data Bank (PDB).

Number of S.No Organism PDB ID structures

1YPN (K59Q) (Helliwell et al., 1998), 2YPN (Nieh et al., 1999), 1AH5 (Selenomethionine 1 E. coli 5 labeled) (Hädener et al., 1999), 1PDA (Louie et al., 1992), 1GTK (Helliwell et al., 2003),

4MLQ (Azim et al., 2014), 4MLV (Azim et al., 2014), 5OV4 (D82A) (Guo et al., 2017), 5OV5 2 B. megaterium 5 (D82E) (Guo et al., 2017), 5OV6 (D82N) (Guo et al., 2017)

3ECR (Song et al., 2009), 3EQ1 (R167Q) (Gill 3 H. sapiens 4 et al., 2009), 5M6R (with reaction intermediate) (Pluta et al., 2018b), 5M6F (Pluta et al., 2018b)

4 A. thaliana 1 4HTG (Roberts et al., 2013)

5 V. cholera 1 5H6O

The structural alignment of all the crystal structures showed a similar fold. The structure of HMBS (Fig. 1.3A) consists of 3 domains of the α/β class. Domains 1 and 2 share a topology similar to that of transferrins and periplasmic binding proteins (Louie et al.,

1992), while domain 3 differs, having a three-stranded antiparallel β-sheet with one of its faces covered by three α-helices (Fig. 1.3B). The domains are connected by 3 hinge

7

Chapter 1 regions. Domain 3 interacts with domains 1, 2 and the inter-domain hinge regions primarily through polar interactions (Louie et al., 1992). DPM is linked by a thioether bond to the cysteine residue on a flexible cofactor turn and lies in a cleft between domains

1 and 2. In 1PDA and 4HTG crystal structures, the cofactor is in oxidized form (Louie et

Figure 1.4: Structure comparison of E. coli (brown), A. thaliana (pink) and human (blue) HMBS. The major differences in the structure are highlighted using dashed-ellipse. al., 1992; Roberts et al., 2013), while in all the other structures the cofactor is in the active reduced form. The coordinates for most of the residues in the flexible loop region, also known as 'active site loop', were missing in the available crystal structures as they could not be determined. However, these were determined in the crystal structure of A. thaliana HMBS (Fig. 1.4) (Roberts et al., 2013). The human homolog has a 29-residue

8

Introduction insert in domain 3 at the interface between domains 1 and 3 which is absent in other homologs of HMBS (Fig. 1.4). Most of the charged residues that interact with the acetate and propionate sidechain of DPM cofactor are conserved. The aspartate residue that interacts with the pyrrole nitrogen is also conserved.

1.4.3 Oxidation state of cofactor

From the crystal structures of HMBS, it is observed that the cofactor occurs in oxidized and reduced conformation. In the X-ray structures of E. coli, a well-defined electron density is reported for both the oxidized and reduced form of cofactor. The reduced form of the cofactor (dipyrromethane) is found in 2YPN (PDB ID) crystal structure, while the oxidized form (dipyrromethene) in 1PDA (PDB ID) structure (Fig. 1.5A, 1.5B). The oxidized and reduced forms of the cofactor differ in the conformation about the bridging carbon joining two rings (A and B) of the cofactor. In the oxidized form the two rings of dipyrrole cofactor are planar with an inter-planar angle of ~11º, while the pyrroles in the reduced form are bent with an inter-planar angle of ~59º (Fig. 1.5D) (Louie et al., 1996).

In both the cases the pyrrole nitrogens interact with the D84 residue and the acetate and propionate sidechains interact with the positively charged and polar residues in domain 2.

In the X-ray structure from A. thaliana, the cofactor is in oxidized form, dipyrromethenone (Roberts et al., 2013). The dipyrromethenone has an oxygen atom that is double bonded to the free α-position carbon of ring B (Fig. 1.5C, 1.5D). However, it is observed that the reduced form of the cofactor is the active native form (Jordan and

Woodcock, 1991).

9

Chapter 1

Figure 1.5: Different isoforms of cofactor A. Dipyrromethane (reduced), B. Dipyrromethene (oxidized), C. Dipyrromethenone (oxidized). D. The 3D structure of cofactor in dipyrromethane (brown, 2YPN), dipyrromethene (cyan, 1PDA) and dipyrromethenone (magenta, 4HTG) form.

1.4.4 Catalytic mechanism

Although, 16 structures of HMBS are available on the Protein Data Bank, the exact mechanism of the pyrrole addition is not clear. Most of the structures in the PDB correspond to the holo form of the enzyme with a DPM cofactor covalently attached to the cysteine residue. Since not much is known about the intermediate stages of chain elongation, there are various hypotheses proposed for the catalytic mechanism based on the crystal structures of HMBS from E. coli (Louie et al., 1996), human (Song et al.,

2009) and A. thaliana (Roberts et al., 2013). All the hypotheses reported in the literature propose the catalytic mechanism to be carried out in four steps: 1) Protonation, 2)

Deamination, 3) Electrophilic attack and 4) Deprotonation (Fig. 1.6). However, the residues that act as proton donor and stabilize the charge on intermediates are proposed to be different in each of the hypothesis.

10

Introduction

Louie et al. (Louie et al., 1996) have proposed that the D84 residue in EcHMBS (D99 in human HMBS) may be protonated initially and may act as a proton donor to the PBG for the removal of an ammonium ion. They proposed that the carboxylate group of D84 plays an important role in the covalent bond formation between the DPM cofactor and the PBG moiety and removes a proton from the α-position of the second ring to complete the catalytic mechanism.

Another hypothesis based on human HMBS (hHMBS) (Song et al., 2009) stated that residues R26, Q34, or R195, which are positioned close to the second ring of the DPM cofactor, might be potential proton donors for the incoming PBG instead of D99 (D84 in

EcHMBS) as proposed earlier (Louie et al., 1996). The positive charge on the deaminated intermediate could be stabilized by the propionate sidechain on the second ring of the

DPM cofactor. The D99, on the other hand, stabilizes the intermediates formed during the covalent bond formation between the DPM cofactor and the PBG moiety.

In the third hypothesis, based on the structure of A. thaliana HMBS (AtHMBS), D95

(D99 in hHMBS) was proposed to stabilize the positive charge on the methylene pyrrolenine (MePy) intermediate (Roberts et al., 2013). The D95 facilitates the nucleophilic attack by the free α-position carbon of the second ring of cofactor onto the

MePy intermediate. Based on the mechanism proposed by Roberts et al., (Roberts et al.,

2013) only one carboxylate group (D99 in hHMBS) is involved in the catalytic mechanism, contrary to that proposed by Song et al. (Song et al., 2009).

11

Chapter 1

Figure 1.6: Schematic representation of catalytic mechanism based on earlier hypotheses (Louie et al., 1996; Roberts et al., 2013; Song et al., 2009).

1.4.5 Hypothesis for polypyrrole accommodation

Given the complexity of the reactions catalyzed by HMBS, the active site should be sufficiently large to accommodate six units of pyrrole moieties. Louie et al. proposed that the active site could accommodate 3.5 units of pyrrole without major changes (Louie et al., 1996), while Roberts et al. have shown that the active site of AtHMBS can accommodate four units of pyrrole with minimal changes in the enzymes overall conformation (Roberts et al., 2013). Since the active site is not large enough to accommodate six units of PBG, two mechanisms have been proposed by Louie et al. that could guide the elongating polypyrrole chain to accommodate within the active site of

HMBS:

12

Introduction

(1) Sliding active site model: In the sliding active site model, the elongating chain is accommodated in the cavity within the protein and the domains adjust themselves to juxtapose the binding site and catalytic site residues near the terminal pyrrole ring (Louie et al., 1996). Based on the structure of hHMBS, three residues (S96, H120 and L238) in the hinge region have been identified that could facilitate the movement of domains to accommodate the polypyrrole chain (Song et al., 2009).

(2) Moving chain model: In the second model, the elongating chain is progressively pulled past the catalytic site by rotation of the cysteine residue sidechain, placing the penultimate and terminal rings in the substrate binding site (Louie et al., 1996). Roberts et al., proposed that the rotation of the bond linking the cofactor and conserved cysteine would cause the A and B rings to vacate their position for incoming pyrrole rings to bind at the same catalytic site (Roberts et al., 2013).

1.4.6 Dual functionality of HMBS

The dual functionality of HMBS has been reported in Leptospira interrogans (Guégan et al., 2003) and has also been hypothesized in P. falciparum (Nagaraj et al., 2009). In these organisms, HMBS also cyclizes the linear preuroporphyrinogen to uroporphyrinogen III.

The gene for the synthesis of UROS was not found in the L. interrogans. The HMBS gene from L. interrogans when cloned in E. coli, whose UROS gene was removed, was able to restore UROS activity (Guégan et al., 2003). P. falciparum is known to synthesize heme de novo, despite acquiring heme from the host red cell hemoglobin in the intraerythrocytic stage. P. falciparum HMBS (PfHMBS) has a leucine (L116) in place of an otherwise conserved lysine (K55 in EcHMBS). The mutant L116K enzyme has higher activity than the wild type PfHMBS, producing both (non-enzymatic product) and uroporphyrinogen III, the product of wild type HMBS (Nagaraj et al., 2008)

(Fig. 1.7). Inhibition of its heme biosynthetic pathway leads to the death of the parasite

13

Chapter 1

(Bonday et al., 1997; Padmanaban et al., 2007), emphasizing the importance of the enzyme for the parasite.

A detailed description of the biochemical studies on E. coli, human HMBS is discussed in chapters 3, 4 and 5.

Figure 1.7: Schematic representation of enzymatic and non-enzymatic product of 1- hyrdroxymethylbilane.

14

Introduction

1.5 In silico studies to understand enzyme dynamics and catalytic

mechanism

To understand the function of biomolecules, it is crucial to know their three-dimensional

(3D) structures. The X-ray crystallography has been widely used to obtain the 3D coordinates of biomolecules. The major limitation of crystallography is that it provides static information of the protein structure. Nuclear magnetic resonance (NMR) spectroscopy provides an ensemble of multiple probable conformations of a protein in solution. However, most often the ensemble of structures show variability for the regions with lack of experimental data (Hospital et al., 2015). Other methods, such as small-angle

X-ray scattering (SAXS), provide low-resolution envelopes that can be used to calculate the radius of gyration (Rg) and maximum linear dimension (Dmin) of the protein. Recently, the cryo-electron microscopy has been used to obtain the 3D coordinates of large biomolecules and their complexes (protein-protein or protein-nucleic acid) (Kühlbrandt,

2014). Although most of the above mentioned experimental techniques provide an understanding of protein structure, it is difficult to understand the catalytic mechanism and conformational dynamics of proteins. Theoretical calculations such as molecular dynamics (MD) simulations and enhanced sampling methods help to understand the structural dynamics of proteins, while quantum calculations help to decipher the catalytic mechanism. The first computer simulation of the biomolecule, bovine pancreatic trypsin inhibitor, was performed by McCammon, Gelin and Karplus (McCammon et al., 1977).

The field of biomolecular simulations has evolved in the last four decades with advances in the simulation algorithms and force-fields that guide the simulations (Jing et al., 2019;

Dasetty et al., 2019; Huang et al., 2017; Perez et al., 2016; De Vivo et al., 2016; Hospital et al., 2015; Perilla et al., 2015a). In QM/MM calculations, the core of the protein is treated at the quantum mechanical level and the rest of the protein is treated using a

15

Chapter 1 molecular mechanics force field. This enables a balance between accuracy and the computational power. For the enzymes such as HMBS, where the formation of the product takes place in a four-step process and the intermediates are short-lived, it is extremely difficult to get the atomic details using experimental methods. Under such circumstances it is essential to model the structure of intermediates and study their dynamics using computer simulations. In the current thesis, MD simulations and enhanced sampling simulations were used to understand the structural dynamics of

HMBS enzyme. To understand the detailed function of a protein, it is necessary to understand its catalytic reaction. In order to model the catalytic mechanism, both QM cluster models and QM/MM calculations are used. In summary, in silico methods such as

MD simulations, quantum mechanical and hybrid calculations are employed to gain insights into the structural dynamics and catalytic mechanism of HMBS.

1.6 Overview of the thesis

Chapter 1 provides an introduction to the importance of tetrapyrroles in living organisms with emphasis on the heme biosynthetic pathway and describes the significance of the problems addressed in this thesis. Chapter 2 gives an overview of the principles of MD simulations, enhanced sampling methods and quantum chemical calculations. Chapter 3 describes the structural changes that are necessary to polymerize four units of PBG in the active site of EcHMBS. The exit path for the product, HMB, from the protein is discussed. Chapter 4 describes the dynamics of HMBS in human and its comparison with the E. coli homolog. Chapter 5 describes the catalytic mechanism of HMBS using quantum calculations that were performed on the models generated from the active site of the hHMBS enzyme. Chapter 6 describes the use of QM/MM calculations to study the catalytic mechanism. Chapter 7 summarizes the overall thesis with directions to future work.

16

Chapter 2

2 Methods

Computational modeling of biomolecules is widely used to understand their structure and function. Biomolecules are modeled based on the principles of classical mechanics, quantum mechanics and hybrid methods such as QM/MM. In this chapter, the underlying principles of classical mechanics based methods such as molecular dynamics (MD), quantum mechanics and hybrid methods that have been used in the current thesis are discussed.

2.1 Classical Mechanics

Classical mechanics also referred to as Newtonian mechanics, is a branch of physics that deals with the dynamics of macroscopic objects. With the help of Newton‟s laws, the future or past state of an object can be predicted provided the present state of the object is known with certainty. Thus, classical mechanics is deterministic and reproducible in nature.

2.1.1 Molecular dynamics

MD is an application of classical mechanics which is governed by Newton's laws of motion. For a system with N atoms, the force acting on each of the atoms is given by:

̈ (2.1)

(2.2)

where, fi is the force acting on the atom (i) with mass mi. The forces acting on the atoms are calculated with the help of potential energy function which can be simple or complex

17

Chapter 2 depending on its functional form and the parameters included. The most widely used force field equation is defined as below:

(2.3)

The Ubonded is represented as the sum of bond-stretching, angle bending and torsions of the system

∑ ∑

(2.4)

where, b0 and θ0 are the equilibrium bond length and bond angle, respectively. The notations b, θ and φ represent the instantaneous values associated with the bond, angle and torsion term, respectively, while kb, kθ and kφ correspond to their force constants. The n denotes the multiplicity of the function and δ denotes the phase shift.

The van der Waals term is represented using the 12-6 Lennard-Jones (LJ) potential and the electrostatic term using the Coulombic potential.

∑ (( ) ( ) ) ∑ (2.5)

where, rij is the distance between a pair of atoms i and j; εij relates to the LJ well depth, ζij is the distance. The qi, qj are the partial atomic charges on the atoms; and ε is the dielectric constant of the medium (Andrew, 2001).

The force field contains all the parameters that are required for computing the potential energy of the system. There are various types of force fields that have been developed based on the computed parameters either from experimental studies or extensive quantum mechanical calculations. The choice of force field depends on the problem statement and

18

Methods the computer resources that are available. The force fields like G53a6 (Oostenbrink et al.,

2004), AMBER (Maier et al., 2015) and CHARMM (Huang et al., 2017) are widely used to study the dynamics of biomolecules.

2.1.1.1 Integration algorithms

As mentioned before, Newton‟s laws of motion are solved analytically to study the time evolution of the atoms. However, the consideration of independent forces on each particle of the molecule converts the MD simulation into a many-body problem for which the equations of motion cannot be solved analytically. Hence, integration approaches are used to arrive at the trajectories of motion of the molecule. All the integration algorithms utilize the finite difference technique to calculate the positions, velocities and accelerations of the particles at every time step using a Taylor series expansion.

The initial coordinates for the system of interest can either be obtained from experimental techniques (like X-ray, NMR and Cryo-crystallography) or from the modeling protocols.

Initial velocities for the atoms are randomly assigned following the Maxwell-Boltzmann distribution.

⁄ ( ) (2.6) ( )

Where, p(vi) is the probability of an atom i of mass mi to have a velocity vi at a given temperature, T; kb is the Boltzmann constant.

Given the initial position and velocities of atoms, the forces on each of the atom are obtained by computing the negative gradient of the potential energy function. The acceleration of the atoms can be obtained by dividing the force by its mass. In the next step, different integration algorithms are used to obtain a numerical solution to the equations of motion. The most popular integration algorithms in use are: Verlet (Verlet,

19

Chapter 2

1967), Leapfrog (Frenkel and Smit, 2001) and velocity Verlet algorithm (Swope et al.,

1982). The Verlet algorithm is the most widely used integration algorithm. To calculate the positions at time t+δt, r(t+δt), it uses the positions and accelerations at time t and the positions from the previous time step r(t-δt).

(2.7)

(2.8)

Although the velocities do not appear explicitly in the Verlet algorithm, they can be calculated as follows.

[ ] (2.9)

All the other integration algorithms are the variants of the Verlet algorithm. The Leap frog algorithm uses the following equations for integrating Newton's equations of motion.

( ) (2.10)

( ) ( ) (2.11)

The name „leap-frog‟ comes from the fact that, the velocities leap by 1/2δt over their positions to give their values. The velocity Verlet algorithm gives the positions, velocities and accelerations at the same time and does not compromise with the precision, unlike the previous two approaches.

(2.12)

20

Methods

[ ] (2.13)

In the final step the velocities are computed using the following equation.

( ) (2.14)

2.1.1.2 Periodic boundary conditions

Periodic boundaries are of primary importance in a simulation setup because they enable the calculation of the properties of the macroscopic system under study with a relatively small number of particles. Periodic boundary conditions work with a periodic array of cells each containing exactly the same number of particles placed in the same initial positions (Andrew, 2001). One can imagine this condition as a repetition of a central cell of particles in all three dimensions to create a periodic boundary such that, if a particle leaves the boundary of a cell, it re-enters the cell through the opposite boundary of the cell, thus maintaining the total number of particles inside the central cell constant throughout the simulation time period. The coordinates of the particles in the image boxes can be computed by adding or subtracting integral multiples of the sides of the central box. When implementing the periodic boundary conditions for a simulation, the shape of the cells must be decided such that they can fill all the three dimensions of the central box by translation operations. Five major box shapes which satisfy this criterion are: cube, hexagonal prism, truncated octahedron, rhombic dodecahedron and elongated dodecahedron. The choice of the box shape depends on the macromolecule under simulation and whether the dynamics of the solvent molecules in the simulation system majorly influence the calculation of the thermodynamic properties of the macromolecule.

Another consideration while using periodic boundary conditions is that the box size

21

Chapter 2 should be big enough so that molecules do not interact with their image in the neighbouring box..

2.1.1.3 Ensembles

Ensembles refer to the collection of all the systems that have identical thermodynamic or macroscopic state but different microscopic states. Usually, a few properties like the number of particles, pressure and temperature of the simulation system are kept constant

(Andrew, 2001; Nosé, 1984). The most common ensembles in use are:- a) Canonical ensemble: It is also called as the NVT ensemble. The system consists of a constant number of particles placed in a space of fixed volume under a fixed temperature. b) Microcanonical ensemble: Also called as the NVE ensemble, the number of particles is constant and the system is placed in a space of fixed volume with the energy of the system remaining constant throughout the simulation time period. c) Grand canonical ensemble: It is also called as the VT ensemble where  refers to a fixed chemical potential for the system along with fixed volume and temperature. d) Isothermal – Isobaric ensemble: It is also called as the NPT ensemble. The system has a constant number of particles under a fixed pressure and temperature.

2.1.2 Enhanced Sampling methods

Most biological processes, such as the unfolding of a protein, large scale domain motions, transport of ions and the exit of a product from the enzyme proceed at a slower time scale when compared to the simulation time that can be achieved using computers. Although, latest computers have increased the time for which the biomolecules can be simulated to a large extent, it remains a challenge to sample all relevant states that are seen during such phenomena using regular MD simulations. In order to study the exit of the product from

22

Methods the enzyme, enhanced sampling methods such as steered molecular dynamics and random acceleration molecular dynamics are used in the current thesis.

2.1.2.1 Steered molecular dynamics (SMD)

In SMD simulations, an external force is applied along a selected reaction coordinate in order to sample conformations along the reaction coordinate that are difficult to sample in regular MD simulations, owing to the occurrence of large energy barriers along the reaction coordinate (Izrailev et al., 1999; Liu et al., 2008). SMD simulation can be performed either under constant pulling velocity or under a constant pulling force. In the current study, the constant velocity SMD simulations have been used to understand the egress of the product.

Figure 2.1: Schematic representation of constant velocity SMD. (Figure adapted from NAMD tutorial).

23

Chapter 2

In constant velocity SMD simulations, the dummy atom (blue in Fig. 2.1) is pulled at a constant velocity and is harmonically restrained to the SMD atom (orange in Fig. 2.1) using a harmonic spring. When the dummy atom is moved at a constant velocity in a predefined direction, a force is built in the spring connecting both the dummy and SMD atom. If the force in the spring is sufficient to move the SMD atom, it gets displaced in the direction of the external force. During the simulation, the dummy atom is moved at a constant velocity and the potential developed in the spring is measured using the equation below.

[ ⃗ ] (2.15)

where, k is the force constant, v is the pulling velocity, r and r0 are the positions of the

SMD atom at time tk and t0, respectively. ⃗ is the direction of pulling.

The negative gradient of the potential gives the force applied on the SMD atom.

(2.16)

If the force applied is sufficient to move the SMD atom, it is displaced in the direction of the velocity, else more force gets accumulated in the spring when the dummy atom is moved in the next time step as shown in figure 2.1.

2.1.2.2 Random acceleration molecular dynamics (RAMD)

In RAMD simulations (Winn et al., 2002; Long et al., 2009), an external force (Fext) is applied to the center of mass of the ligand in a random direction for a predetermined number of steps (n).

̂ (2.17)

where, f0 is the constant force acting on the ligand with unit vector, ̂. In practice, a random acceleration is provided as input based on the mass of the ligand. The ligand

24

Methods keeps a certain velocity during the chosen step, k. After completion of k steps, the distance between the centers of mass of the ligand and the protein is computed. If this distance is less than the user-defined minimum distance (dmin) that the ligand is supposed to travel, a new random direction is chosen.

2.2 Quantum Mechanics

In the previous section, a brief overview of classical mechanics and its applications to biomolecules has been discussed. While classical mechanics is used to study the large scale motion of biomolecules, it is not possible to study the electronic rearrangements in the system, especially when a bond is formed or broken during the catalytic reaction. It is necessary to take into account the motion of electrons and the nuclei. Quantum mechanics describes a system using a wave function, obtained from the solutions of the Schrödinger equation (SE):

(2.18) where, H is the Hamiltonian operator and E is the energy of the system. The Hamiltonian can be written as the sum of kinetic and potential energy operators. For a system consisting of N number of nuclei and n number of electrons, the Hamiltonian is written as

(Atkins and Friedman, 2011):

(2.19)

where, Tn and Te are the kinetic energy components corresponding to the nuclei and electrons, respectively. Vnn and Vne are the Coulombic components of nucleus-nucleus and electron-electron repulsive interaction and Vne represents the attractive interaction between the nucleus and electrons. The exact solution to the SE is available only for limited, one-electron, systems. For biological systems, with many atoms, it is impossible to solve the SE analytically due to non-separability of variables describing the 25

Chapter 2 interactions between the elementary particles of the system. Therefore, certain approximations are made to solve the SE of many-electron systems (Jensen, 2017). The two basic approximations are 1) Born-Oppenheimer approximation and 2) Hartree-Fock

(HF) approximation.

2.2.1 Born-Oppenheimer approximation

The Born-Oppenheimer approximation allows decoupling the motion of electrons to that of the nucleus. Since the mass of the nucleus is much higher when compared to the mass of electrons, the interaction between the motion of electrons and nuclei can be neglected.

The total energy of the system is the electronic energy including the inter-nuclear repulsion. While solving for the electronic energy, the position of nuclei is fixed and thus the Tn term in the above equation can be omitted. The resulting potential energy surface forms the basis for solving the nuclear motion. Although the Born-Oppenheimer approximation reduces the complexity of the system, further approximations related to electron-electron repulsions are needed before it could be applied to system consisting many atoms.

2.2.2 Hartree-Fock (HF) approximation

The computation of electronic SE for a given set of nuclear positions is computationally intensive. HF approximation reduces the complexity involved in the estimation of electron-electron repulsion by considering the average potential exerted by all the electrons around a single electron. The one electron wave functions are obtained by solving the HF equation using the self-consistent field method. The overall wave function is the Slater determinant of one electron wave functions. The Slater determinant enforces the Pauli‟s exclusion principle and makes sure that the overall wave function is asymmetric in nature.

26

Methods

The HF method is the branching point that leads to semi-empirical methods and methods that give close to exact solutions (Jensen, 2017). The HF method is the simplest ab initio method since it uses one Slater determinant to build the wave function. The semi- empirical methods are derived from HF by making additional approximations, while addition of more Slater determinants leads to the basis set that gives near exact solutions.

The approximations in the semi-empirical methods are compensated by the use of empirical corrections, obtained from experimental data, to improve the performance. The

AM1 (Dewar and Storch, 1985), MNDO (Dewar and Thiel, 1977) and PM3 (Stewart,

1989) are a few semi-empirical methods that are widely used to study biological systems.

The density-functional theory (DFT) methods are an alternative to the ab inito methods.

DFT uses the electron density to model many-body electron correlation instead of the orbitals that are widely used in various quantum chemical methods. At a similar computational cost, the results obtained from DFT are much better than that obtained from HF methods.

2.2.3 Density functional theory

Density Functional Theory (DFT) is based on the assumption that the properties of a system, including its ground-state wavefunction can be determined from the ground-state spatially-dependent electron density ρ(r). The density functional theory (DFT) was introduced by Hohenberg and Kohn (Hohenberg and Kohn, 1964) and further developed by Kohn and Sham (Kohn and Sham, 1965). The electron density is defined as the integral over the spin coordinates of all electrons and over all but one of the spatial variables ( ⃗ ).

∫ ∫ ⃗⃗⃗⃗ ⃗⃗⃗⃗ ⃗⃗⃗ ⃗ ⃗⃗⃗⃗ ⃗⃗⃗ ⃗ (2.20)

27

Chapter 2

The electron density ( ) is non-negative function which vanishes at infinity and integrates to total number of electrons.

∫ (2.21)

From the Hohenberg–Kohn existence theorem, the ground state energy is a functional of the ground-state density, with the wavefunction having kinetic and potential energy components. The energy of the ground state is given as:

[ ] ∫ [ ] (2.22)

[ ] [ ] [ ] (2.23)

The functional [ ] has an unknown form which contains the functional for kinetic energy and electron-electron interaction energy. All the unknown terms are concatenated by Kohn and Sham into a single term term [ ] by considering a completely non- interacting system with same electron density . The term [ ], corresponds to the electron exchange and correlation contributions. One of the major challenges in DFT is to get a reasonable approximation of exchange-correlation function (EXC). Various exchange-correlation functionals have been designed. The simplest method is the Local

Density Approximation (LDA) where the functional depends only on the local density

(ρ). In LDA, electronic densities are more diffused resulting in overestimating of binding energies and a poor representation of band gap. The Generalized Gradient Approximation

(GGA) overcomes few of the shortcomings of LDA by considering the gradient corrections along with density. In the Meta-GGA method, the functional is also dependent on second derivate of electron density ( ). All the above methods give a poor description of localized electronic state due to self-interaction. To overcome above limitations, the exact exchange contribution is taken from HF theory resulting in hybrid

28

Methods functionals. The B3LYP, PBE0 are examples of hybrid GGA functionals, while M06,

M06-2X are examples of hybrid meta-GGA functionals.

2.2.4 Basis set

A basis set is defined as the set of one-particle functions that are used to define the molecular orbitals. The molecular orbitals could be represented as a linear combination of atomic orbitals. The atomic orbitals are defined either as Slater type or Gaussian-type orbital, with the latter being widely used due to the ease of computing (Atkins and

Friedman, 2011; Jensen, 2017, 2013). The most common basis sets are described below:

a) Minimal basis set: In a minimal basis set one basis function is used for each

atomic orbital in the atom. For example: The minimal basis set for a hydrogen

atom (1s1) would consist 1 basis function, while the minimal basis set for a carbon

2 2 2 atom (1s 2s 2p ) would contain 5 basis functions one each for 1s, 2s, 2px, 2py

and 2pz atomic orbitals, respectively.

b) Double zeta and Triple zeta basis set: In a double zeta basis set each atomic

orbital in the atom are defined using two basis functions. Similarly, in the triple

zeta, three basis functions are used for each atomic orbital.

c) Split valence basis set: In split valence basis set the core orbitals are described

with a minimal basis function while the valence electrons are described using a

double or triple zeta basis set. Since during the reaction, it is the valence electrons

that actively take part when compared to the core electrons. They are usually

denoted as 6-31G (double zeta) and 6-311G (triple zeta) (Ditchfield et al., 1971).

d) Polarization and Diffuse basis functions: The atoms polarize under the

influence of an external charge either from another atom or in the presence of an

electric field. To accommodate such changes p functions are added to hydrogen

and d functions are added to heavy atoms. They are specified using * or

29

Chapter 2

parentheses. For example: 3-21G(d,p) (also denoted as 3-21G**) represents a

double zeta basis set with a p-type polarization function to hydrogen atoms and a

d–type polarization function to all the non-hydrogen atoms. For example: 3-

21++G represents double zeta basis set with diffuse function for both hydrogen

and heavy atom.

2.3 QM/MM calculations

The QM calculations help to understand the electronic rearrangements and bond making and breaking steps involved in the catalytic mechanism (Gao and Truhlar, 2002) but, are limited by the number of atoms that can be considered for such calculations. On the other hand, the MM calculations can be used to study the motion of a large number of particles

(Adcock and McCammon, 2006; Karplus and McCammon, 2002; Perilla et al., 2015b;

Scheraga et al., 2007) but, fail to model the catalytic mechanism. In order to overcome the limitations of both the calculations, hybrid QM/MM calculations are widely used

(Acevedo and Jorgensen, 2009; Duarte et al., 2015; Jindal and Warshel, 2016; Warshel and Levitt, 1976). The QM/MM calculations enable the strengths of both the QM and the

MM calculations to study the catalytic mechanisms. In a QM/MM calculation, the biomolecule is divided into two regions: (1) the region involved in the catalytic mechanism is treated at the quantum mechanical level (QM region) and (2) the rest of the biomolecule is modeled using the principles of molecular mechanics (MM region).

2.3.1 Subtractive Scheme

In the subtractive scheme of potential energy calculation, the energy of the QM region at

MM level is subtracted from the sum of energies of the complete system (QM+MM) at

MM level and the QM system at QM level (Fig 2.2, eq. 2.20). The subtraction of the

30

Methods energy of the QM region at MM level is essential to avoid the addition of QM region energies multiple times.

Figure 2.2: Subtractive coupling scheme to calculate the energy in QM/MM calculations

The potential energy in the subtractive scheme is computed as below (Maseras and

Morokuma, 1995; Svensson et al., 1996):

⁄ (2.24)

where, UMM(System) – is the energy of system at MM level; UQM(QM) – is the energy of

QM region at QM level and UMM(QM) – is the energy of QM region at MM level.

One of the major drawbacks of the subtractive scheme of coupling is that force field parameters are required for the QM region for each step of the catalytic mechanism.

2.3.2 Additive Scheme

In the additive scheme, the energies for the QM and the MM region are computed at the

QM and the MM levels, respectively, while the interaction between the QM and the MM region is treated partially at both the QM and the MM level (Groenhof, 2013). The potential energy function can be described as below:

⁄ (2.25)

where, UMM(MM) – is the energy of MM region at MM level; UQM(QM) – is the energy of

QM region at QM level and UQM-MM(QM+MM) – is the energy between QM and MM region 31

Chapter 2

The interaction between the QM and MM regions in additive scheme is modeled using mechanical or electrostatic embedding.

2.3.2.1 Mechanical Embedding

In the mechanical embedding, the QM and the MM atoms have no electrostatic component. The last term of the equation (see 2.25) can be written as UMM(QM+MM), where all the interactions between the interface are computed at the MM level. Since the energy of interface is calculated at the MM level, the QM subsystem is treated as an isolated system with no effect of MM environment. The electrostatic energy between the

QM and MM sub-system is calculated using the Coulomb potential, where the charges on the QM region atoms can either be taken from the force-field or can be updated based on the QM calculation at each step.

2.3.2.2 Electrostatic Embedding

In electrostatic embedding, the QM charges are polarized due to the presence of MM charges. The interaction between the QM and MM region is handled during the calculation of the electronic wave function. The charges of MM atoms are not updated during calculations, which sometimes lead to over polarization of the QM region. The

MM atoms at the boundary might attract or repel the QM atoms strongly due to the use of fixed charge on MM atoms. To overcome the issue of over polarization, a polarizable embedding scheme has been proposed in which the MM atoms are polarized by the QM region. The charge polarization of the MM atoms can be modeled using either charge-on- a-spring model (Lamoureux and Roux, 2003) or the induced dipole model (Warshel et al.,

2006). The large computational cost involved with the polarizable embedding scheme makes the electrostatic scheme most favorable for QM/MM calculations.

32

Methods

2.3.3 QM-MM boundary

While defining the boundary of the QM and the MM regions, care must be taken to avoid breaking of covalent bonds (Senn and Thiel, 2007). In order to reduce the size of the QM region, it is often necessary to define the QM region across a bond such that one atom lies in the QM region and another in the MM region. If the QM and MM regions are separated across a covalent bond, it is important to make sure that there are no unpaired electrons in the QM region. In a simplistic approach, link atoms (pseudo atoms) are introduced at the boundary. To avoid polarization of the QM region, the charge on the link atoms is shifted to the nearby atoms. During geometry optimization, the forces on the link atoms are transferred to the original atoms in the QM and MM regions.

Although QM/MM calculations are helpful, care must be taken while setting up the system to avoid any artifacts during the calculations (Thiel, 2009).

The MD simulation and enhanced sampling methods described in the current chapter are used to study the conformational dynamics of HMBS enzyme in the different stages of the catalytic cycle (chapter 3 and 4). The QM and QM/MM methods are used to understand the catalytic mechanism of HMBS enzyme (chapter 5 and 6). The parameters and the softwares that are used to perform the calculations are mentioned in subsequent chapters.

33

Chapter 3

3 Structural dynamics of E. coli HMBS

3.1 Background

The hydroxymethylbilane synthase (HMBS) catalyzes the stepwise polymerization of four molecules of porphobilinogen (PBG) into a linear tetrapyrrole 1- hydroxymethylbilane (HMB). Among all the HMBS structures available in the PDB the

E. coli HMBS (EcHMBS) structures were the first to be determined by X-ray crystallography. Five crystal structures of EcHMBS have been reported in PDB: 1GTK

(Helliwell et al., 2003), 1YPN (K59Q mutant) (Helliwell et al., 1998), 2YPN (Nieh et al.,

1999), 1AH5 (Hädener et al., 1999), 1PDA (Louie et al., 1992). The structure of

EcHMBS consists of 3 domains of α/β class: Domain 1 (1–99, 200–217), domain 2 (105–

193) and domain 3 (222–313) (Fig. 3.1). The domains are connected by 3 hinge regions

(100–104, 194–199, 218–221). Domain 3 interacts with domain 1, domain 2 and the inter-domain hinge regions primarily through polar interactions, though it also has a hydrophobic interface with domains 1 and 2 (Louie et al., 1992). Dipyrromethane (DPM) is linked by a thioether bond to C242, on a flexible cofactor turn (residues 240-243) and lies in a cleft between domains 1 and 2 (Louie et al., 1996).

Structural and biochemical studies have indicated a single catalytic site in EcHMBS

(Louie et al., 1992). Mutational studies have suggested that several arginines (Fig. 3.1B), conserved in HMBS across species, may be involved in the catalysis (Jordan and

Woodcock, 1991). Mutations R11H, R149H, R155H, R176H and R232H have a detrimental effect on the activity of the enzyme (Jordan and Woodcock, 1991); while

R131L and R132L affect the ability of protein to bind to DPM cofactor as the interaction

34

Structural dynamics of E. coli HMBS of these arginines with the acetate (-Ac) and propionate (-Pr) side groups of the cofactor are lost (Lander et al., 1991a), thus making the protein catalytically inactive. Also the mutations K55Q and K59Q affect the catalytic activity of the enzyme (Lander et al.,

1991a). D84 has been suggested to play a key role in stabilizing the positive charges on the pyrrole rings during the catalysis of chain elongation. D84E mutation shows a reduction in the enzyme's activity, whereas D84A and D84N mutations make the enzyme inactive (Woodcock and Jordan, 1994).

All the hypotheses that have been proposed on the catalytic mechanism of HMBS are based on the structures of the DPM stage alone, as the structures of subsequent catalytic stages of HMBS are unknown. There have been two major hypotheses pertaining to the accommodation of polypyrrole within the active site of HMBS - 1) Sliding active side model; and 2) Moving chain model (see section 1.4.5). Although experimental studies have helped to gain insights on the role of catalytically important residues of HMBS, the structural, conformational and domain dynamics of the various catalytic stages of HMBS and the role of individual amino acids in these processes are not known. Hence, molecular

Figure 3.1: Stereo view of the active site residues R11, D84, R131, R132, R149, R155, R176 and R232 along with conserved residues K55 and K59 (in green) and the DPM cofactor (in pink) shown as sticks.

35

Chapter 3 dynamics (MD) simulations were carried out to understand the structural changes that the enzyme undergoes while catalyzing this reaction. The study is divided into three parts: 1) understanding the changes in the enzyme when the porphobilinogen units get attached to the dipyrromethane cofactor, thereby forming a polypyrrole chain; 2) exit of the product from the active site of the enzyme via steered molecular dynamics (SMD); and 3) the relaxation of the enzyme to the initial stage to resume its catalytic cycle. MD simulations of different stages of pyrrole chain elongation are expected to provide insights into the motions of domains, the active site loop and the role of conserved active site residues in facilitating the accommodation of the growing polypyrrole chain. In addition to this, a possible exit path for the product has been proposed and the relaxation of the enzyme after the exit of the product has also been studied.

3.2 Simulation Details

3.2.1 Preparation of starting structures

The EcHMBS structure, 2YPN (Nieh et al., 1999), was used to study the protein dynamics through the stages of chain elongation, exit of product and subsequent relaxation. Modeller 9v8 (Fiser and Sali, 2003) was used to model the missing residues,

43-59 of the active site loop region (residues 42 to 60). The different stages of pyrrole chain elongation (Fig. 3.2) that were studied in this work are: DPM (HMBS with the

DPM cofactor), P3M (HMBS after the first catalytic addition of PBG to DPM, i.e., with

P3M moiety), P4M (HMBS with a tetrapyrrole (P4M) moiety), P5M (HMBS with a pentapyrrole (P5M) moiety) and P6M (HMBS with a hexapyrrole (P6M) moiety). These stages have also been referred to as E, ES, ES2, ES3 and ES4, respectively, in previous reports (Anderson et al., 1981; Anderson and Desnick, 1980; Jordan and Warren, 1987;

Louie et al., 1996). The starting structure of the protein for each stage was prepared by

36

Structural dynamics of E. coli HMBS

Figure 3.2: Schematic representation of mechanism of tetrapyrrole chain elongation catalyzed by HMBS. The figure shows (a) protonation of PBG, (b) deamination of PBG to form MePy, and (c) nucleophilic attack by ring B of DPM on MePy, forming an intermediate that (d) undergoes deprotonation to form a tripyrrole moiety (P3M). Subsequent additions of PBG elongate the chain to form tetrapyrrole (P4M), pentapyrrole (P5M), and hexapyrrole (P6M) moieties. At the end of the last step, the tetrapyrrole product HMB is hydrolyzed, leaving the DPM cofactor attached to protein. The rings of the elongating pyrrole chain are labeled as A, B, C, D, E, and F starting from the pyrrole ring covalently attached to cysteine residue. The acetate and propionate side groups of the pyrroles are denoted by “-Ac” and “-Pr,” respectively. docking and covalently attaching a moiety of methylene pyrrolenine (MePy) (see Fig.

3.2), the substrate intermediate, to the polypyrrole cofactor in the active site cavity of the protein.

37

Chapter 3

3.2.2 Intermediate steps of pyrrole chain elongation

Explicit solvent MD simulations of different stages of HMBS were performed using

Gromacs 4.5.5 (Hess et al., 2008) with G53a6 united-atom force field (Oostenbrink et al.,

2004). Force field parameters for the covalently attached cofactor DPM and the subsequent chain extensions, P3M, P4M, P5M and P6M, were obtained from the ATB server (Malde et al., 2011). The systems were solvated in an octahedron box with a 9 Å layer of SPC/E water model (Berendsen et al., 1987); protein charges were neutralized by adding sodium ions. The systems were then energy minimized using steepest descent method till convergence was reached or for 7000 cycles. The system was than equilibrated using NVT and NPT ensemble for 200 ps and 1 ns, respectively with V- rescale temperature coupling (Bussi et al., 2007) and Parrinello-Rahman pressure coupling (Parrinello and Rahman, 1981) for the protein and non-protein parts separately.

During the equilibration, the heavy atoms of the protein were position restrained. The temperature was gradually raised from 0 K to 300 K at the rate of 3 K/ps. Bond lengths were constrained using LINCS algorithm (Hess et al., 1997). Periodic boundary conditions were employed to minimize edge effects and the electrostatic computations were done using particle mesh Ewald method (Darden et al., 1993; Essmann et al., 1995) with interpolation order of 4 and Fourier spacing of 1.6 Å. The intermediate stages of chain elongation were simulated for 35 ns each.

3.2.3 Exit Mechanism

After the P6M stage, the hexapyrrole was cleaved at the bridging carbon atom between B and C rings of the moiety, and an -OH group was attached to the linker carbon atom forming the tetrapyrrole product, 1-hydroxymethylbilane, leaving the DPM cofactor attached to HMBS (HMB stage). The system was simulated for 60 ns.

38

Structural dynamics of E. coli HMBS

The possible channels for the exit of HMB from HMBS were identified using a PyMOL plugin CAVER (Chovancova et al., 2012). The structure of HMBS after the HMB stage was used as input for CAVER; center of mass of HMB coordinates were taken as starting point for detecting the channels. CAVER predicted three possible exit channels, two (F1

& F2 channel) near the F ring and the third (C1 channel) close to the C ring of HMB (Fig.

3.3).

The HMBS structure from the HMB stage was used as the starting structure to carry out the SMD simulations. Trial SMD runs at variable pulling rates (50 Å ns-1, 5 Å ns-1) were done to pull HMB from HMBS in each of the 3 directions. The center of mass of the protein is used as the reference group and the center of mass of either the C ring of HMB moiety in the C1 direction or the F ring of HMB moiety in the F1 and F2 directions was

Figure 3.3: Structure of HMBS showing the 3 possible channels (C1, F1 and F2) for the exit of HMB from HMBS detected by CAVER.

39

Chapter 3 pulled at a constant rate of 10 nm ns-1. A force constant of 1000 kJ mol-1 nm-2 was used for the pulling experiments.

3.2.4 Protein Relaxation

The ligand (HMB) was then removed after simulating the HMB stage for 60 ns. The system in the absence of HMB was simulated for 150 ns to observe the protein relaxation process (no-HMB stage).

3.2.5 Trajectory and structural analyses

Trajectories were analyzed using VMD 1.9.1 (Humphrey et al., 1996) and Gromacs 4.5.5.

VMD was used to calculate the hydrogen bond interactions and the root mean square deviations (RMSDs) of the backbone atoms of the protein for all the stages of pyrrole chain elongation. The loop modeled and energy minimized 2YPN structure was used as the reference for RMSD calculations. DisRg, a VMD plugin was used for calculation of the radius of gyration (Rgyr). Gromacs was used to calculate the root mean square fluctuations (RMSFs) of the Cα atoms and the solvent accessible surface area (SASA) of the entire protein. Principal component analysis (PCA) was performed using NMWiz, a

VMD plugin (Bakan et al., 2011).

Interactions among residues and with the cofactor were studied by calculating the minimum distance between possible donor and acceptor atoms of the respective residues and each ring of the polypyrrole chain. Distance between domains 1 and 2 was computed using the center of mass of interface residues from domain 1 (residues 14 to 19) and domain 2 (residues 150 to 152 and 173 to 177), to track the domain movements. Volume of the active site cavity along each trajectory was calculated using POVME (Durrant et al., 2011).

40

Structural dynamics of E. coli HMBS

3.3 Results

3.3.1 Pyrrole Chain Elongation

The process of pyrrole chain elongation was studied by simulating HMBS at each of the four stages during the tetrapolymerization of PBG (after addition of each PBG unit) to understand the concomitant structural changes in the protein. The RMSD of the protein backbone at each stage of chain elongation relative to the EcHMBS reference structure is shown in Figure 3.4. The RMSD showed an increase from DPM to P5M stage indicating structural changes upon addition of each PBG unit. The change in RMSD is marginal going from P5M to P6M stage.

Figure 3.4: RMSD of the protein backbone, with respect to the loop modeled and energy minimized 2YPN structure, at each stage of chain elongation.

In Figure 3.5A, the HeatMap explains the residue-wise RMSD contribution. The active site loop residues (42-60) and the domain 2 region contribute significantly to the observed structural deviations. High RMSD is observed for the active site loop from DPM stage

41

Chapter 3

Figure 3.5: A. HeatMap showing the residue-wise contribution to RMSD of the protein through the different stages of simulation indicated by a color bar at the right. The simulation stages are denoted by a color bar along the abscissa. B. RMSF plot of the protein from DPM to P6M stages. The color bar at the bottom corresponds to the domain demarcation (domain 1 – blue, domain 2 – red and domain 3 – green, active site loop – cyan, hinge regions – black). C. Solvent Accessible Surface Area (SASA) and Radius of gyration (Rgyr) values show the loss of compactness of the protein on addition of each PBG molecule through the stages of simulation from DPM to P6M. The error bars shown in the figure represent the standard deviation of the data from the mean.

42

Structural dynamics of E. coli HMBS and for domain 2 from P4M stage onwards. From the RMSF plot it is observed that the fluctuation in the active site loop region is high in DPM and P4M stage, while low in domain 2 (Fig. 3.5B). The opposite is observed in the P3M stage, where the fluctuation is high in parts of domain 2 and low for the active site loop. Fluctuations of ~2 Å were observed in the P5M and P6M stages, which are small compared to that in other stages.

These observations indicate that in order to accommodate the growing pyrrole chain, either the active site loop or domain 2 readjust to widen the active site cleft. SASA and

Rgyr values (Fig. 3.5C) are also in accordance with the above observations indicating that after the P4M stage, the entire pyrrole chain gets accommodated within the expanded active site with minimal structural changes in the protein.

3.3.1.1 Active site loop dynamics

The dynamics of the active site loop in conjunction with the active site residues was tracked for all stages of PBG polymerization (Fig. 3.6A). The loop fluctuates during the

DPM stage, remains near the active site in the P3M stage and then moves away from the active site in P4M, P5M and P6M stages. The active site loop is seen to move back and forth from the active site cleft in the DPM stage. During this movement, loop residue D50 interacts with R149 and K55 interacts with Q243, E305 and V306 with occupancy of

41% along the trajectory. As a result of these interactions, the loop remains close to the active site (closed conformation of the loop). When the loop stays away from the active site (open conformation of the loop), K55 is observed to be interacting with E88 (Fig.

3.6B). In the P3M stage, the loop is seen to remain close to the active site. D50 interacts with R149 throughout the trajectory while K55 interacts with E88 (Fig. 3.6C); D50 also interacts with the backbone atoms of G150. In the P4M stage, the loop starts moving away from the active site, which leads to replacement of the interaction of D50 with R149 by the interaction of G60 backbone with E88 (Fig. 3.6C). Also, K55 forms a salt-bridge

43

Chapter 3

Figure 3.6: A. Plot of distance between centers of mass of the loop residues (42-60) and the active site residues (11, 19, 84, 131, 132, 155, 176, 242) to track the loop movement in the different stages of the simulation. B. Interaction of K55 with E88 (open loop conformation denoted in blue color) and with V306, E305 and Q243 (closed loop conformation denoted in green color) regulate the loop movement in DPM stage. C. Distance graphs depicting the interaction of D50 with R149 during DPM stage to regulate loop movement (along with K55 interactions); interaction of D50 with R149 (black), D50 with G150 (red) and K55 with E88 (green), involved in loop movement during the P3M stage; Interaction of K55 with E239 (black) and G60 with E88 (red) involved in loop movement during the P4M stage. with E239. No interactions are observed between the loop residues and the active site residues in P5M and P6M stages as the loop stays away from the active site, with K55 interacting with the backbone of C-terminal residue N308, forming a β-bridge. Louie et al., proposed that the active site loop plays an important role during the enzyme catalysis

(Louie et al., 1996). The conformational dynamics of the loop guided by strong interactions is an indication of its possible role in the enzyme catalysis.

44

Structural dynamics of E. coli HMBS

3.3.1.2 Structural changes in the protein during the chain elongation process

Comparison of the average structures of HMBS from the initial DPM stage and the final

P6M stage showed noticeable differences in their secondary structures. As seen in figure

3.7, the length of β-sheets and helices in domain 1 has become shorter in the P6M stage

compared to the DPM stage. A helix in domain 3 (P280-N296) gets shortened (N285-

N296) and a short helix in the hinge region joining domain 1 and domain 2 (E190-L193)

gets uncoiled. These structural changes are a result of widening the gap between the

domains 1 and 2 during the polypyrrole chain elongation.

Figure 3.7: Structural changes observed in the protein in DPM and P6M stages. The length of beta sheets and helices in domain 1 shortens (red arrow) in P6M stage, a shorter helix is observed in domain 3 (green arrow) and the hinge region between domain 1 and domain 2 uncoils (pink arrow) in the P6M stage.

3.3.1.3 Principal component analysis

Principal component analysis (PCA), also known as essential dynamics, was performed

on the individual trajectories of each of the intermediate stages of chain elongation. The

PCA helps to understand the major structural movements in the protein that take place to

45

Chapter 3 accommodate the polypyrrole during the stages of chain elongation. All the principal components (PC) that contribute more than 5% to the total variance were considered for the study. It was observed that for DPM and P5M stage of chain elongation, three PC contributed more than 5% to the variance, while for P3M, P4M and P6M stages the contribution of third PC was less than 5%. The first two PC contributed to around 50% of overall variance. The PC1 (Fig. 3.8A, 3.9A) for the DPM stage showed major movements in the active site loop and the connecting helix, while in the PC2 the active site loop movement was observed (Fig. 3.9B). In P3M stage, the highest fluctuations were observed in parts of domain 2 and region close to the active site loop (Fig. 3.8B, 3.9).

From the active site loop dynamics (section 3.3.1.1), it is observed that the active site loop stays close to the active site cavity in the P3M stage. In the P4M stage, both the PC1 and PC2 showed major fluctuations in the active site loop to accommodate the fourth pyrrole moiety within the active site cavity (Fig. 3.8C, 3.9). From the literature (Louie et al., 1996), it has been speculated that the active site cavity is large enough to accommodate 3.5 units of pyrrole moieties, indicating that structural changes are necessary to accommodate four units of pyrrole. Apart from the minor fluctuations in the active site loop, the PC1 in the P5M stage showed fluctuations in the hinge region connecting domains 1 and 3, while no major movements were observed corresponding to

PC2 (Fig. 3.8D, 3.9). In P6M, the PC1 showed fluctuations in the active site loop and parts of domain 3 (Fig. 3.8E, 3.9).

46

Structural dynamics of E. coli HMBS

Figure 3.8: The dominant motions of the protein along the top principal components (PC1, PC2 and PC3) for A. DPM, B. P3M, C. P4M, D. P5M and E. P6M stages of chain elongation. The red arrows show the direction and magnitude of the motion corresponding to the principal components. The HMBS protein is shown in green tube.

47

Chapter 3

Figure 3.9: RMSF plot corresponding to the principal components 1 and 2 for each of the stages of chain elongation with domain demarcation along the abscissa.

3.3.1.4 Volume of active site cavity during tetrapyrrole elongation

Volume of the active site cavity increases from DPM to P6M stage which is contributed mainly by the active site loop and the domain movements (Fig. 3.10A). The active site loop remained in a closed conformation in the P3M stage and the distance between domains 1 and 2 increased from 9.5 Å in DPM stage to 12 Å in P3M stage with concomitant expansion of the active site volume to accommodate the third pyrrole unit

(Fig. 3.10B). In the P4M stage, the loop moved away from the active site cavity (Fig.

3.6A); moreover, the distance between domain 1 and domain 2 increased to 16 Å (Fig.

3.10B). This resulted in an increase in the active site volume to 1497 + 140 Å3 (Fig.

48

Structural dynamics of E. coli HMBS

3.10A) that is relatively large compared to that observed in the P3M stage, 910 + 110 Å3.

Also the space created for the polypyrrole chain in P4M stage is sufficient to accommodate another pyrrole unit in the P5M stage with minimal rearrangement of sidechains of surrounding residues. Thus, the change in volume is negligible going from

P4M to P5M stage. In the P6M stage, the active site loop moved further away from the cavity (Fig. 3.6A), thereby accommodating another pyrrole unit and creating sufficient space for the exit of the product, HMB. These results are in accordance with the cumulative SASA values of the residues in the active site that interact with the polypyrrole during the stages of chain elongation (Fig. 3.10A).

3.3.1.5 Accommodation of the growing pyrrole chain

The dynamics of the growing pyrrole chain and its accommodation within the active site

(Fig. 3.10C) is best understood by studying the interactions between the growing chain and surrounding protein residues (Table 3.1). From Fig. 3.10D, it is observed that there is a shift in the interactions of W18 and R176 from the B ring in P3M to the C ring in P4M during chain elongation. This pulls the polypyrrole in the P4M stage adjusting it in a way that the chain inclines towards domain 2 (Pradhan et al., 2013), allowing space for the additional pyrrole rings in P5M and P6M stages. The polypyrrole chain gets accommodated within the active site in a completely curled conformation. The pyrrole rings interact with most of the active site residues in domain 1 and 2 during all stages of elongation (Table 3.1). The terminal pyrrole ring interacts with a similar set of residues in each of the stages (Table 3.1), thus providing a suitable environment for catalytic action on the incoming PBG molecule.

49

Chapter 3

Figure 3.10: A. Volume of the active site and cumulative SASA of the active site residues (as reported in Table 1) show an increase from DPM to P6M stage with the addition of each PBG molecule. B. Graph showing the increase in domain separation between domain 1 and 2 during the catalytic stages of HMBS. C. Polypyrrole accommodation within the active site cleft as a result of major domain movements during chain elongation; snapshots of only DPM & P6M stages are shown. D. Graphs of the interaction between W18 and R176 with B ring of pyrrole chain in P3M stage and with C ring of pyrrole chain in P4M stage showing the shift in interactions to accommodate the polypyrrole chain. Table 3.1: Residues interacting with the pyrrole chain and its terminal ring at each stage of chain elongation.

Stages Residues interacting with the Residues interacting with the elongating pyrrole chain terminal ring in each stage

DPM S81, K83, D84, T127, S128, S129, S81, K83, D84, R132, R155, L169*,

50

Structural dynamics of E. coli HMBS

R132, R155, L169, A170*, Q198 A170*, Q198 and G199* and G199*

P3M W18, Q19, P52, S81, M82*, K83, Q19, S81, M82*, D84, Q198 and D84, S128, S129, R131, R132, G199* R155, A170*, G173, Q198 and G199*

P4M R11, L15*, A16*, W18, Q19, S81, R11, L15*, A16*, Q19, S81, M82*, M82*, K83*, D84*, S128, S129, K83* and D84* R131, R132, R155, A170*, V171*, A172*, R176, G173* and Q198

P5M R11, L15*, W18, Q19, T51*, P52*, R11, Q19, H80, K83, D84* and V85* H80, S81, K83, D84*, V85*, S128, S129, R131, R132, N151, R155, V171*, A172*, R176, Q198 and G199*

P6M R11, S13, W18, Q19, S81, K83, R11, S13, S81, K83, D84*, V85* and D84*, V85*, S129, R131, R149, R149 R155, V171*, A172*, R176 and Q198 * denotes interaction with backbone atoms of the residue.

The terminal ring is observed to be interacting with either R11 or Q19 (Table 3.1) from

P3M stage onwards, which supports the hypothesis of Song et al., who suggested involvement of R11 and Q19 in the deamination of the incoming PBG units (Song et al.,

2009). The terminal pyrrole ring is expected to be located close to the entry point of PBG units and proximal to R11 or Q19, which could facilitate the deamination process of incoming PBG units.

51

Chapter 3

3.3.1.6 Role of active site residues in the catalytic mechanism

Active site residues R11, D84 and R176 appear to be involved in controlling crucial steps during tetrapyrrole formation. In the DPM stage (E stage, as per Jordan et al., nomenclature (Jordan and Woodcock, 1991)), R11 stacks with F62, present at the base of the active site loop (Fig. 3.11A), throughout the trajectory (Fig. 3.11C), while D50 interacts with R149 and K55 interacts with Q243, E305 and V306 (Fig. 3.6B, 3.6C) to regulate the loop movement. In the P3M stage (ES stage), the loop is seen to remain close to the active site; D50 interacts with both R149 and G150 (Fig. 3.6C), while K55 interacts with E88. The stacking of R11 continues with F62 (Fig. 3.11C) juxtaposing the active site loop near the active site, which may be aiding catalysis during the initial stages of pyrrole chain elongation. This is supported by biochemical studies: R11H affects the E to ES stage and therefore the binding and attachment of the first substrate, resulting in no ES complex formed (Jordan and Woodcock, 1991).

In the P4M stage (ES2 stage), R11 interacts with D84 throughout the trajectory (Fig.

3.11D). As the pyrrole chain elongates, D84 starts interacting with R11 thus breaking the stacking of R11 with F62 that was persistent till the P3M stage (Fig. 3.11B), causing the flexible loop to slide away from the active site. Therefore, in the absence of D84, stacking of F62 with R11 would have continued to keep the loop close to the active site, possibly obstructing further catalysis. Mutations D84A and D84N caused the enzyme to stall in the

P4M stage (ES2) (Woodcock and Jordan, 1994), thus supporting its role in catalysis.

Song et al., suggested that D99 in hHMBS (analogous to D84 in EcHMBS) could play a role in the nucleophilic attack and in the deprotonation after the formation of the new C-C bond between the polypyrrole chain and the incoming pyrrole ring (Song et al., 2009).

They also hypothesized that other residues in the vicinity, R26, Q34 and R195, could play

52

Structural dynamics of E. coli HMBS

Figure 3.11: A. A closer view of the stacking interaction of R11 with F62 present at the base of the active site loop measured by the distance between the CZ atom of R11 and center of mass of the phenyl ring of F62 in the DPM stage. B. In the P4M stage D84 interacts with R11, disrupting the stacking of R11 with F62. C. Stacking interaction of R11 with F62 keeps the active site loop in a position facilitating its movement during DPM and P3M stages, shown as a distance graph between the CZ atom of R11 and the center of mass of the phenyl ring of F62 in DPM, P3M and P4M stages. D. Distance graph depicting the interaction of R11 with D84 in the P4M stage which causes the stacking between R11 and F62 to break. E. Distance graphs depicting the interaction of R176 with the B (in black) and C (in red) rings of the polypyrrole chain during the stages of chain elongation.

53

Chapter 3 a role in the protonation of the incoming PBG molecule. Roberts et al., hypothesized that the aspartate D95 in AtHMBS (analogous to D84 in EcHMBS) takes part in all the catalytic steps (Roberts et al., 2013). D95 in AtHMBS is implicated to take part in deamination and nucleophilic attack. From the present study, it can be suggested that positively charged residues, which lie close to the elongating pyrrole chain could take part in protonation of the incoming PBG molecule, while negatively charged residues could assist in the deamination and nucleophilic attack.

In the P3M stage (ES), R176 has a pulling effect on the B ring of the pyrrole chain (Fig.

3.11E). This effect continues in the P4M stage (ES2) with interaction of the B and C rings of the pyrrole chain with R176 throughout the trajectory, aiding in accommodation of the pyrrole chain during elongation. The inclination of pyrrole chain towards domain 2 continues in the P5M (ES3) and P6M (ES4) stages as well. Further, in the absence of

R176 there is no pulling effect on the pyrrole chain towards domain 2 and this may cause steric hindrance for the incoming PBG units during chain elongation. This is in accordance with the biochemical studies R176H mutation that affects the progress of ES

(P3M) to ES2 (P4M) and ES2 (P4M) to ES3 (P5M) stages (Jordan and Woodcock, 1991).

3.3.2 Exit Mechanism

After studying the dynamics of HMBS during the tetra polymerization process, the hexapyrrole was cleaved forming the tetrapyrrole product, HMB, leaving DPM cofactor attached to HMBS (HMB stage). The HMB stage was simulated to study the exit mechanism of the product, HMB from the protein. During the simulation it was observed that the C ring of HMB slightly moved towards the opening between domain 1 and domain 2 and the F ring moved towards the space formed by the loop and domain 2 (Fig.

3.12). As it would require longer simulation time for the product to exit from the HMBS,

54

Structural dynamics of E. coli HMBS

Figure 3.12: Conformation of 1-hydroxymethylbilane during the HMB stage simulation. With reference to initial conformation of HMB (green) the C and F rings are displaced by a distance of 3.5 Å and 5.3 Å, respectively in the structure at the end of simulation (cyan) during the HMB stage. an external force was used to overcome the energy barriers. CAVER was used to detect channels for the exit of the product (Fig. 3.3). Based on the result from the HMB stage

MD simulations and CAVER, SMD was performed to study the exit of HMB from

HMBS. Several trial runs, with varying pulling rate, as well as the final SMD runs were tried along directions of C1, F1 and F2 (Fig. 3.13A). The magnitude of SMD expulsion forces is comparable to the unbinding forces of other protein-ligand systems (Lüdemann et al., 2000).

In SMD along C1 path, the E, C and D rings of HMB interact with R11, Y22 and Q198, respectively until about 11 ns. An increase in the pull force is observed around 13 ns when these interactions start breaking (Fig. 3.13C). The loss of interactions of the C ring of HMB moiety with R176, and the sidechains of the E ring with K83, D84, V85 and Q19 cause the fluctuations in force profile till the product exits around 26 ns (Table 3.2).

55

Chapter 3

In SMD along F1 path, the pull force peaks are observed around 11 ns and 15 ns (Fig.

3.13C). Loss of electrostatic interactions of the E ring of HMB with S81, D84 and V85 and the D ring of HMB with Q198 occur during the peak at 11ns as the product starts to exit. The peak around 15 ns corresponds to the loss of interactions of the C ring of HMB with N151 and R176 and the E ring with R11, Q19 and K83.

Table 3.2: Interaction of protein residues with HMB moiety during its exit through each path; numbers indicating the % occupancy of interaction.

Path Residue C1 F1 F2 R11b 49.2 69.5 78.2 Q19b 47.5 45.7 66.3 Y22 21.23 - 53.7 S81 2.1 31.5 4.6 K83a 55.5 65.6 89.3 D84a 52.7 38.9 6.8 V85a 48.1 37.1 6.2 R131 26.5 1.5 6.4 N151a 30.1 46.6 16.0 R176b 92.2 60.0 65.3 Q198 37.5 34.2 14.9 a Residues that interact for more than 30% occupancy of the simulation time. b Residues that are suggested to be catalytically important.

SMD through F2 path has the lowest force profile (Fig. 3.13C); force peaks are observed around 8ns, 15ns and 20ns. The first peak corresponds to the alignment of the F ring of tetrapyrrole chain perpendicular to the α11 helix in domain 1, while the second peak, around 15 ns, corresponds to the loss of interaction of the C ring with R176. The last peak around 20 ns corresponds to the interactions of the C ring with W18, Y22, K175, the D and C ring with Q19, the E ring with R11 and the E, D and C rings with K83 as they start breaking one after another, till the exit of HMB moiety from the protein at around 32 ns.

56

Structural dynamics of E. coli HMBS

Figure 3.13: Exit mechanism of HMB from HMBS. A. Structure of HMBS showing probable exit directions, either from C or F ring of the HMB unit, that are considered for SMD simulations: C1 (Direction from the center of mass of the C ring in HMB towards the interface of domain 1 and domain 2), F1 (Direction from the center of mass of the F ring in HMB towards the active site loop ), F2 (Direction from the center of mass of the F ring in HMB towards the interface between the active site loop and domain 1). B. Surface representation of the structure of HMBS showing the most probable path predicted for the exit of HMB through the space between domain 1, domain 2 and the active site loop (Video S3). C. Force as a function of time during the SMD runs in 3 different exit paths: C1, F1 and F2. D. Graphs showing the interactions of R11, Q19 and R176 with HMB during the SMD calculations through C1, F1, and F2 path, indicating the possible role of these catalytically important residues in the exit of the product.

R11, Q19, K83, R176 are involved in each of the exit paths considered, of which R11, 57

Chapter 3

Q19 and R176 have been suggested to be involved in the catalytic mechanism of the enzyme. Atleast one of these residues (R11, Q19 and R176) interacts with HMB till its exit from HMBS (Fig. 3.13D), indicating their potential role in the exit of the product.

Based on the SMD analysis, the favorable path for the product exit is through the space between domain 1, domain 2 and the active site loop (Fig. 3.13B).

3.3.3 Protein Relaxation after exit of product

The active site cavity of HMBS enlarges to accommodate the tetrapyrrole product HMB, as seen from SASA and radius of gyration data of the protein (Fig. 3.5C). After suggesting possible path for the product exit, further investigation on protein relaxation was carried out, to study if HMBS regains its initial conformation for the next catalytic cycle.

3.3.3.1 Regaining compactness post relaxation

Probability distribution of radius of gyration measured considering the backbone atoms of the protein showed that the value for the protein decreases from 20.2+0.1 Å in P6M stage to 19.6+0.1 Å in the no_HMB stage, which is comparable to that observed during the

DPM stage of chain elongation (19.4+0.1 Å) (Fig. 3.14A).

3.3.3.2 Restoration of correlations during relaxation

The DCCM plots of DPM, P6M and no-HMB (for last 50ns of the 150ns trajectory) stages show few significant changes supporting protein relaxation (Fig. 3.14B, 3.14C,

3.14D). The region 50-60 and 240-260 that are positively correlated in P6M stage, gradually gets negatively correlated in no-HMB stage to resemble that in the DPM stage.

The region 80-100 tends to become more positively correlated to domain 3 as it goes from P6M to no-HMB stage similar to that in the DPM stage. Also, domain 2 region,

58

Structural dynamics of E. coli HMBS which is more negatively correlated to domain 1 in P6M stage gradually, gets few positive correlation regions as it proceeds to no-HMB stage resembling the DPM stage.

Figure 3.14: A. Probability distribution graph of radius of gyration of the protein in DPM, P6M and no-HMB stages showing that in the no-HMB stage, the Rgyr falls back close to the DPM stage. DCCM plots of B. no-HMB; C. DPM; and D. P6M stages, showing the differences and similarities in correlation to the DPM stage as the protein relaxes from no-HMB stage. The marked regions in the no- HMB stage resemble more to DPM stage during protein relaxation.

59

Chapter 3

3.4 Discussion

All the hypotheses and speculations that have been proposed on the catalytic mechanism of HMBS prior to this study were based on the crystallographic structures of the DPM stage alone (Gill et al., 2009; Louie et al., 1992; Roberts et al., 2013; Song et al., 2009), as the structures of subsequent stages of catalysis are unknown. Therefore, it was imperative to use computational studies to understand the protein dynamics through the stages of chain elongation. MD studies on HMBS helped in gaining important insights about its structural changes as it catalyzes the formation of HMB using four units of PBG, which includes polypyrrole elongation, exit of the product and relaxation of the protein.

The flexibility of the active site loop is of significance in modulating step-wise substrate binding. Yan et al., have made similar observations in their NMR study of the bisubstrate enzyme 6-hydroxymethyl-7,8-dihydropterin pyrophosphokinase (HPPK) (Yan and Ji,

2011). The active site loop dynamics and the domain movements are important in the polypyrrole formation. Conserved residues D50, K55 and R149 modulate the dynamics of the active site loop. Most of the conserved arginine residues in the active site contribute in stabilizing the polypyrrole chain through salt-bridge interactions with its acetate and propionate sidechains.

There have been some speculations on the domain movements in HMBS during the polypyrrole accommodation, based on the dynamics of proteins with similar topology

(Gerstein et al., 1993; Keskin et al., 2000). Gerstein et al., studied Lactoferrins that have two domains similar to those in HMBS. They observed a see-saw movement in the hinge region between the lactoferrin domains (Gerstein et al., 1993). Using Gaussian network models, Keskin et al., studied the dynamics of a few substrate binding proteins that have domain architecture resembling the Rossmann fold. They suggested domains 1 and 2 in

HMBS could be involved in 'cooperative opposite direction fluctuations' about a hinge

60

Structural dynamics of E. coli HMBS region similar to lysine/arginine/ornithine-binding protein (LAO) dynamics (Keskin et al., 2000). Louie et al., studied the EcHMBS crystal structure and suggested that domains

1 and 2 might twist open about the hinge between them to increase the volume of active site cleft with domain 3 moving away from the domains 1 and 2 (Louie et al., 1996).

Song et al., studied the hHMBS crystal structure and suggested similar movements of the domains 1 and 2 (Song et al., 2009). From our observations of the different stages of the protein through essential MD, it can be hypothesized that domains 1 and 2 move apart about the hinge region to create space for the growing pyrrole chain while the cofactor turn is inclined towards domain 2.

The accommodation of the polypyrrole within the active site is another question addressed in this study. Roberts et al., hypothesized that the C1 ring (A ring) of the polypyrrole moves from its position in a way that the C2 ring (B ring) could occupy the

C1 position allowing the incoming PBG to occupy the vacated C2 position for catalysis

(Roberts et al., 2013). However, in our MD study, it is observed that the polypyrrole adjusts itself getting inclined towards domain 2 within the active site such that its terminal ring is exposed to active site residues that have a role in the catalytic mechanism of tetra polymerization of PBG.

The mechanism by which the final product (HMB) is formed has not been discussed in the literature. Based on our simulations, it is observed that the C and D rings of the hexapyrrole are exposed to the solvent. Hence, it can be speculated that the solvent accessibility to the methyl linker between the B and C rings may facilitate the hydrolysis to form HMB.

After the tetrapyrrole product is cleaved, it has to exit the active site of the protein in order to bind to the next enzyme in the heme biosynthesis pathway. The most probable

61

Chapter 3 exit path for the product, HMB, has been proposed using SMD. Based on expulsion force profile, accessibilities between paths from the active site and minimal structural changes the probable exit path for the product in HMBS was predicted. The proposed path for the exit of HMB is through the space between domain 1, domain 2 and the active site loop.

Active site residues R11, Q19 and R176, which play a role in protein catalytic mechanism

(Jordan, 1991; Lander et al., 1991b), are also observed to be involved in the exit of the product. Choutko et al., has proposed that residues crucial for catalysis have a possible role in the exit of the product similar to our observation (Choutko and van Gunsteren,

2012).

The exit of product from its catalytic site should cause the distended protein to relax to its initial stage structure. In this study, a 150 ns long MD simulation was performed to observe the relaxation in HMBS after catalysis and product exit. It was evident from the radius of gyration computations that the protein regains the compactness, which was lost in the process of accommodating the tetrapyrrole chain, upon exit of the product. The residue-wise correlations were also seen to regain similar trend as in the DPM stage, thus showing potential of HMBS to resume its next catalytic cycle.

3.5 Conclusion

MD studies of the enzyme HMBS has helped in gaining important insights about its structural changes as it catalyzes the formation of the product 1-hydroxymethylbilane using four units of porphobilinogen. The study of the chain elongation process revealed the importance of the active site loop and the domain movements in accommodating the polypyrrole chain. The domains 1 and 2 move apart and the cofactor turn moves towards domain 2 to accommodate the growing pyrrole chain in the active site cleft. Conserved residues D50, K55 and R149 modulate the dynamics of the active site loop, while R11,

62

Structural dynamics of E. coli HMBS

D84 and R176 play a role in the catalytic mechanism corroborating previous biochemical studies (Jordan, 1991; Lander et al., 1991b). SMD employed to study the exit of the product, HMB, helped to propose the most probable exit path. Based on expulsion force profile and minimal structural changes, the proposed path for the exit of HMB is through the space formed between domain 1, domain 2 and the active site loop. Active site residues R11, Q19 and R176, reported as catalytically important, are also involved in the exit of the product. The distended HMBS relaxes gradually upon exit of the product to its initial stage structure to resume its catalytic cycle. The questions of how the substrate

PBG diffuses into the active site and the process by which it is covalently linked to the terminal ring of the cofactor to synthesize the tetrapyrrole product are yet to be answered.

63

Chapter 4

4 Structural dynamics of human HMBS

4.1 Background

In the previous chapter, a step-wise mechanism for EcHMBS was studied using MD simulations. Although the structures of HMBS in different homologs show similar fold, there are differences between the structure of human HMBS (hHMBS) and its homolog from E. coli. The major difference is the presence of a 29 residue insert in domain 3 at the interface between domains 1 and 3 (Gill et al., 2009; Song et al., 2009) in hHMBS, the role of which is unknown (see Fig. 1.4). The active site loop in the crystal structures of hHMBS (residues 56-76) is disordered and their coordinates are missing, although this flexible loop has been modeled in the crystal structure of A. thaliana HMBS (AtHMBS), in which it effectively covers the active site region (Roberts et al., 2013). In addition, the presence of water molecules has been reported in the high resolution crystal structure of

AtHMBS (Roberts et al., 2013), EcHMBS (Louie et al., 1996) and hHMBS (Song et al.,

2009). These water molecules are trapped within the active site cleft and stabilize the negatively-charged cofactor. However, the role of these water molecules in HMBS enzyme catalysis is largely uncharacterized. The hHMBS enzyme is of particular interest, as it is associated with acute intermittent porphyria (AIP), an autosomal dominant inborn error of heme biosynthesis, characterized by life-threatening acute neurovisceral attacks

(Anderson et al., 1981).

The crystal structures of the hHMBS housekeeping isoform are available in the Protein

Data Bank (PDB: 3EQ1 (Gill et al., 2009), 3ECR (Song et al., 2009)). The hHMBS has three distinct domains - domain 1 (1-114, 219-236), domain 2 (120-212) and domain 3

64

Structural dynamics of human HMBS

(241-361) connected by inter-domain hinge regions (Fig. 4.1). The dipyrromethane cofactor lies in a cleft between domains 1 and 2, similar to EcHMBS, where PBG binding and polypyrrole elongation reactions occur (Gill et al., 2009; Song et al., 2009).

Figure 4.1: Structure of hHMBS (PDB ID: 3ECR) with modeled missing residues showing the domains (1, 2 and 3 in blue, red and green, respectively) with hinge regions (115-119, 213-218 and 237-240 in pink), the additional 29-residue insert (296-324 in orange) and the active site loop (56-76 in cyan). Previous studies of EcHMBS characterized the importance of twelve conserved arginine residues in different steps of pyrrole assembly (Jordan and Woodcock, 1991) and identified a few key residues which may take part in the HMBS catalytic mechanism.

However, in the absence of any structural data for hHMBS with the hexapyrrole and for each intermediate stage of pyrrole polymerization, the exact roles of the residues in hHMBS involved in polypyrrole chain elongation remain largely unknown.

In this chapter, MD simulations of the hPBGD at each intermediate stage of polymerization were performed to address some of the following questions on this key

65

Chapter 4 enzyme. (1) What kind of structural change of the protein helps in polypyrrole assembly?

(2) How do the active site residues stabilize the highly negative polypyrrole chain? (3)

Which residues are critical and associated with proton transfer and deamination steps at each stage? (4) Whether water molecules play any role in the mechanism? Answers to the above questions can considerably improve our understanding about the mechanism of the

PBGD enzyme.

4.2 Simulation Details

4.2.1 Preparation of starting structures

The structure of wild-type hHMBS (PDB ID: 3ECR (Song et al., 2009)) was used as the basis for molecular modeling. The missing coordinates of the residues 60-78 (part of the active site loop and adjoining region) were modeled based on the structure of AtHMBS

(PDB ID: 4HTG (Roberts et al., 2013)) using Modeler 9v8 (Fiser and Sali, 2003). The missing residues at the N-terminal (1-20) and the C-terminal (358 to 361) were also modeled.

4.2.2 Pyrrole chain elongation

The holoenzyme and the intermediate stages of chain elongation were simulated using

Gromacs 4.5.5 (Hess et al., 2008) with G53a6 united-atom force field (Oostenbrink et al.,

2004). The ATB server (Malde et al., 2011) was used to obtain the parameters for the covalently attached DPM cofactor and subsequent chain extensions, P3M, P4M, P5M and

P6M moieties. The systems were soaked in an octahedral solvent box with 9 Å padding of

SPC/E water molecules (Jorgensen et al., 1983). The net charge of the system was neutralized by adding sodium ions. The steepest descent algorithm was used for energy minimization till convergence was reached. The procedure for NVT, NPT and the

66

Structural dynamics of human HMBS production run simulations was similar to that described in section 3.2.2. Each of the intermediate stages of pyrrole elongation was simulated for 50 ns.

4.2.3 Random acceleration molecular dynamics

Random acceleration molecular dynamics (RAMD) (Lüdemann et al., 2000; Vashisth and

Abrams, 2008) was used to study the exit path of HMB from hHMBS. RAMD produces unbiased pathways by enhancing the dynamics of the ligand (Long et al., 2009; Vashisth and Abrams, 2008; Winn et al., 2002). The RAMD simulations were performed in

NAMD program (Phillips et al., 2005) using the CHARMM22 force field (Mackerell et al., 2004; MacKerell et al., 1998). The protein was solvated in a cubic box and ions were added to ensure the charge neutrality of the system. The force field parameters of HMB and the DPM cofactor were prepared using CGenFF (Vanommeslaeghe et al., 2012). The

TIP3P water model (Jorgensen et al., 1986) was used to solvate the system. The RAMD simulations were preceded by a short energy minimization step and 2 ns of equilibration.

Several trial runs were performed with an external force ranging from 90 kcal/mol-Å to 5 kcal/mol-Å with an interval of 5 kcal/mol-Å. The step values (n) of 40 or 80 and the minimum distance (dmin) of 0.008 or 0.016 Å were tested. For each force value, four simulations were performed by varying the n and dmin.

4.2.4 Trajectory and structural analyses

VMD 1.9.1 (Humphrey et al., 1996) and Gromacs 4.5.5 (Hess et al., 2008) were used to analyze the trajectories. Root mean square fluctuations (RMSFs) of the Cα atoms of the entire protein were calculated using the g_rmsf module of Gromacs. The energy- minimized starting structure of the DPM stage was considered as the reference structure.

The concatenated trajectory of all the stages of pyrrole chain elongation was used to analyze the cumulative effect of domain motions. Principal component analysis (PCA) was performed on backbone atoms using the NMWiz plugin of VMD (Bakan et al., 2011). 67

Chapter 4

The distance between domains 1 and 2 was computed using the centers of mass of interface residues from domain 1 (residues 25 to 38) and domain 2 (residues 165 to 175 and 193 to 198). The volume of the active site cavity along each trajectory was calculated using Epock (Laurent et al., 2015). Hydrogen bond interactions of the growing pyrrole chain with the protein and solvent were calculated using the hbond tool in VMD. For hydrogen bond calculations, a donor (D) – acceptor (A) cutoff distance of 3.5 Å and D-

H…A angle cutoff of 30º was considered. To study the water mediated interactions, water molecules that lie within 3.5 Å of polar groups of both the growing pyrrole chain and the protein molecule were selected. The co-occurrence probability to characterize the loop conformation was represented as a 2D-histogram, plotted using R (Team, 2014).

4.3 Results

4.3.1 Structural fluctuations in hHMBS during polypyrrole assembly

MD simulations of hHMBS with the DPM cofactor (DPM stage) and the four subsequent stages of chain elongation were each performed for 50 ns. The backbone root mean square deviation (RMSD) for all the stages of chain elongation was calculated with respect to the reference structure (energy minimized starting structure of the DPM stage).

After the initial reorganization of the protein during the DPM stage, the RMSD values gradually increased upto a magnitude of 1 Å till the P5M stage. In the P6M stage alone, a sharp increase of ~ 0.9 Å in the RMSD is observed (Fig. 4.2a). RMSD values of all three domains were separately computed to further understand the regions that contribute towards the deviation (Fig. 4.2a). Domain 1, consisting of the active site loop contributes significantly to the RMSD of the protein compared to domains 2 and 3.

68

Structural dynamics of human HMBS

Figure 4.2: A. RMSD of the protein along with domain-wise RMSD, with reference to the DPM stage structure, during stages of chain elongation; B. Root mean square fluctuation of Cα atoms of hHMBS protein at each stage of chain elongation, from DPM to P6M. The encircled regions show high fluctuations in the active site loop and 29- residue insert regions; C. The distance between the centers of mass of the active site loop and the active site residues (R26, Q34, D99, R149 and R150) as a function of time along the stages of chain elongation emphasizes the role of the loop in polypyrrole accommodation. The standard error is shown as error bars along Y coordinates; D. The cofactor turn shifts in the P4M stage (green) compared to the DPM stage (cyan) to accommodate the growing pyrrole chain during the stages of chain elongation.

The RMSF values (Fig. 4.2b) suggest large fluctuations of the active site loop. It shows minimal changes in the active site loop (residue 56 to 76) till the P3M stage but a significant increase in the RMSF value is observed from P4M stage onwards (Fig. 4.2b).

This is mainly due to the active site loop movement away from the active site region in

69

Chapter 4 response to the elongating polypyrrole. Fluctuations of the 29 residue insert region (296-

324) in domain 3 are also seen along with both the N- and C-terminal regions (Fig. 4.2b).

The effect of domain dynamics and the active site loop movement were further analyzed.

Distance between domains 1 and 2 shows marginal change in DPM stage but, starts increasing in P3M stage due to the addition of pyrrole ring C. During DPM and P3M stages, the active site loop also stays close to the active site (Fig. 4.2c) and moves gradually away from the active site during the subsequent stages of chain elongation.

4.3.2 Cofactor turn movement assists in polypyrrole accommodation within the

catalytic site

The structures obtained from the trajectory of P3M, P4M, P5M and P6M stages of chain elongation were aligned to the reference structure and the average fluctuations of the Cα atom of C261 (linked to the cofactor) were monitored. The average displacements of the

Cα atom are 1.8, 2.5, 2.0 and 1.6 Å away from the active site loop during P3M, P4M,

P5M and P6M stages of chain elongation, respectively. The effect is greatest in the P4M stage with an average value of 2.5 Å, and the maximum shift during this stage is 3.5 Å

(Fig. 4.2D). These observations, along with the computed RMSD values of the domains clearly indicate that the cofactor turn re-orients in the P3M, P4M and P5M stages along with marginal domain movements to accommodate the elongating polypyrrole. The combined dynamics of the domains, active site loop and cofactor turn provide enough space to accommodate four additional PBG molecules.

4.3.3 Active site loop opens to accommodate the polypyrrole

70

Structural dynamics of human HMBS

The active site loop conformations were clustered based on the distance from the active site residues (R26, Q34, D99, R149, R150) and the RMSD of the active site loop. Fig.

4.3A shows a probability of co-occurrence plot for different conformations of the active site loop. Each bin represents a cluster of active site loop conformations (Fig. 4.3A). Bin

Figure 4.3: A. Characterization of the active site loop conformation as a function of i) distance between the centers of mass of the active site residues (R26, Q34, D99, R149 and R150) and the active site loop on X-axis and ii) the RMSD of the active site loop on Y-axis along the concatenated trajectory; and B. The conformation of the active site loop at bins 1, 2 and 3 shown in red, blue and green, respectively. DPM is shown in yellow and P6M is shown in green sticks C. Interaction of T58 and D61 with R26 along with F77 holds the active site loop (red) close to the catalytic cleft in the DPM stage; B. Loss of the interaction of T58 and D61 with R26 moves the active site loop away from the catalytic cleft in the P4M stage. Instead, T58 stabilizes the sidechain of ring D.

71

Chapter 4

1 represents the active site loop conformation in a 'closed' state, where the loop remains close to the active site (Fig. 4.3B). The structures clustered in bin 1 are from DPM and

P3M stages. Two active site loop residues, T58 and D61 form hydrogen bond and salt bridge with R26 in DPM and P3M stages. These interactions along with a cation-π interaction between F77 (part of the active site loop residue) and R26 keeps the active site loop in the closed conformation (Fig. 4.3C). Bin 2 represents a 'semi-open' state of the active site loop where the loop has moved away by at least 1 Å (Fig. 4.3A, 4.3B). The conformations clustered from bin 2 are mostly from P4M and the initial part of P5M stages where the interactions between T58, D61 and R26 were lost due to the additional pyrrole ring in the vicinity (Fig. 4.3D). In these stages, the domains 1 and 2 also move apart to provide space for the incoming pyrroles while the active site loop moves marginally away from the active site. Bin 3 represents the active site loop conformation in an 'open' state, where the loop moves further away from the active site by ~ 3 Å, providing enough space to accommodate the pentapyrrole and hexapyrrole (Fig. 4.3A).

The conformations clustered in bin 3 are from P5M and P6M stages of chain elongation.

Thus, the combined dynamics of the domains, active site loop and cofactor turn, provided enough space to accommodate the four additional PBG molecules.

4.3.4 Charged and hydrophilic active site residues stabilize the growing pyrrole

chain

It is important for the incoming PBG to assume a correct conformation so that subsequent chemical reactions can take place. Each of the pyrrole rings added during the catalysis contains two carboxylate groups (acetate and propionate). As a result, the net charge on the polypyrrole increases from -4 (DPM stage) to -12 (P6M stage) during chain elongation. It is important for the active site environment to stabilize such an extensive negative charge of the elongating polypyrrole. From the crystal structure data of the DPM

72

Structural dynamics of human HMBS stage (Song et al., 2009) and biochemical studies (Gill et al., 2009; Jordan and

Woodcock, 1991; Louie et al., 1996; Song et al., 2009), it has been observed that the conserved polar and charged residues in the active site stabilize the DPM cofactor.

Atomistic details about the stabilization of the growing pyrrole chain during the various stages are discussed below.

DPM stage: D99, which is part of domain 1, interacts with the nitrogen atoms of each pyrrole ring at all stages of pyrrole elongation (Fig. 4.4A). Sidechains of positively charged residues K98, R149, R150 and R173 neutralize the negatively charged carboxylate groups of the DPM cofactor. Polar residues S146 and S147 also form hydrogen bonds with the DPM (Fig. 4.4A). All these residues are largely conserved across different species (Appendix A). D99 and K98 are part of domain 1, while all the other interacting residues are part of domain 2. The above interactions were also noted in the hHMBS crystal structures (Roberts et al., 2013; Song et al., 2009).

P3M stage: On addition of a new pyrrole ring (ring C) to the DPM cofactor, changes in the active site dynamics are observed. The distance between domains 1 and 2 increases marginally ~0.5 Å to accommodate ring C. The carboxylate sidechains of ring C are oriented towards domain 1 residues R26, S28 and S96. Due to the cofactor turn movement, T145 forms a hydrogen bond with the propionate sidechain of ring A that persists throughout the subsequent stages of chain elongation. Sidechains of S28 and S96 interact with the acetate and propionate sidechain of ring C (Fig. 4.4B). R26 also interacts with the propionate sidechain of ring C.

P4M stage: When ring D is added to the P3M stage, domains 1 and 2 move slightly apart and the cofactor turn moves towards domain 2 (Fig. 4.2D) to create a space for the new

PBG moiety. The active site loop moves away from the active site at this stage, which

73

Chapter 4 allows T58 to interact with the propionate sidechain of ring D along with R167 (Fig.

4.4C). Additionally, R195 interacts with the propionate sidechain of ring C.

P5M stage: Ring E of the polypyrrole is positioned at the base of active site loop, causing it to move away from the active site. In the P5M stage S69, an active site loop residue, interacts with the acetate sidechain of ring E for the first 27 ns but later shifts the interaction to the propionate sidechain of ring D due to the loop movement (Fig. 4.4D).

S262 and Q356 interact with the acetate and propionate sidechains of ring E, respectively

(Fig. 4.4D). Both S69 and S262 are not conserved in HMBS across different species but play an important role in stabilizing the ring E of the polypyrrole in hHMBS.

P6M stage: Due to the addition of ring F (fourth PBG), the active site loop moves further away from the active site. Most of the hydrogen bonds that stabilize the polypyrrole upto the P5M stage are retained in the P6M stage (Fig. 4.4E). The pyrrole nitrogen of ring F interacts with D99 residue through water-mediated hydrogen bond (Fig. 4.4E). R173, previously interacting with the A and B rings of cofactor, now retains its interaction with the sidechains of only ring A. S262 interacts with both the rings E and F, while S165 interactsFigure 4 with.4: Interactions the propionate of thesidechain residues of liningring F the(Fig. active 4.4E). site Additionally, with the growing Q356 interactspyrrole withchain ring at A.F. DPM,The sidechain B. P3M, of C. R167 P4M, interacts D. P5M with and the E. acetateP6M stages. sidechains The interactionsof both rings are E shown by dotted lines. The cofactor is shown in yellow and water molecules are and F instead of only ring D. As a result, no protein residues stabilize the acetate group of represented as spheres. The carboxylate oxygen (O) of acetate/propionate sidechains and ringthe pyrroleD (Fig. nitrogen4.4E). However, (N) for each several of thewater pyrrole molecules rings thatare labeled.can stabilize The numericthe acetate suffixes group in(1 -the4) indicateabsence of the polar positions amino of acid the sidechains oxygen on were the seen pyrrole in the and vicinity the letter. suffixes (A-F) indicate the respective pyrrole rings, from which the carboxylate oxygen is attached.

74

Structural dynamics of human HMBS

Figure 4.5: A. Graph showing the total number of water molecules interacting with the polypyrrole during the stages of chain elongation (black); total number of water-mediated interactions between the growing polypyrrole and protein (red). B. N169 formed hydrogen bonds with the acetate sidechain (O1D and O2D) of ring D in the P4M stage. In the P5M and P6M stages of chain elongation, the direct interaction between N169 and ring D was lost, and instead this interaction was mediated by water molecules which persisted for over 90% of the simulation time. Also, a water-mediated interaction between D99 and the nitrogen atom (NF) of ring F was observed.

4.3.5 Water-mediated interactions stabilize the polypyrrole

The elongating polypyrrole has a number of negatively-charged acetate and propionate sidechains. Based on the simulations of different stages of chain elongation, it was observed that the polar and charged amino acid residues in the HMBS active site might not be sufficient to stabilize the negatively-charged polypyrrole. The role of water molecules becomes crucial in such a situation. To investigate possible water-mediated interactions, water molecules that could form hydrogen bonds with carboxylate groups of the polypyrrole were identified. The number of hydrogen-bonded water molecules during subsequent stages of pyrrole chain elongation increases with increasing number of carboxylate groups (Fig. 4.5A). For P5M and P6M stages, the number of water molecules increases significantly due to opening up of the active site loop (Fig. 4.5A). Some of these water molecules mediate the interaction between the polypyrrole and the rest of the protein. As mentioned earlier, the acetate sidechain of ring D in the P6M stage has no

75

Chapter 4 direct interaction with any polar or positively charged amino acids of the protein. Instead, water molecules interact with the negatively charged acetate sidechain. Additionally,

N169 that interacts with the acetate sidechain of ring D loses the direct interaction with the polypyrrole in the P5M stage. However, N169 retains this interaction with the polypyrrole through a water molecule (Fig. 4.5B). D99 also forms water-mediated interaction with the pyrrole nitrogen of ring F during the P6M stage to maintain its role in polypyrrole formation (Fig. 4.5B). Total persistence of the hydrogen bonded water molecules around each of the acetate and propionate sidechains on the polypyrrole were computed. The persistence of water molecules around the acetate group of ring D was found to be 99% over the P6M trajectory. Table 4.1 summarizes the water-mediated stabilization for all charged groups of the polypyrrole. Any loss of charge stabilization from the residues around the active site could be compensated by water-mediated interactions.

Table 4.1: Persistence of interactions between oxygen atoms of the polypyrrole sidechain and polar amino acid residues/backbone atoms and water molecules.

76

Structural dynamics of human HMBS

The negative charge on oxygen atoms (see Fig. 4.4 for label) of acetate and propionate sidechains are neutralized by sidechain of polar amino acid residues (Ps) or polar backbone atoms (B). In most of the cases, it was observed that any loss of charge stabilization from polar sidechain (Ps) or polar backbone atoms (B) as shown in Fig. 4.4 can be compensated by water molecules (W). The persistence of interactions for each case during each stage of MD simulations is shown. The numbers indicate the percentage of total simulation time during which an interaction could be observed. B, backbone; Ps, polar sidechain; W, water.

4.3.6 R26 and R167 are critical residues for enzymatic catalysis

Figure 4.6: Positions of probable proton donors R26, Q34, R167, R195 and the PBG in A. P4M and B. P5M stages obtained from the docking studies. R167 is the most probable proton donor in these stages.

Based on the previous hypotheses, it is proposed that the catalytic mechanism is carried out in four successive steps (see Fig. 1.6). In the first step, the incoming PBG is protonated by one of the positively charged residues present in the active site of HMBS.

Song et al., speculated that residues R26, Q34 or R195 might play a role in proton transfer prior to deamination of the PBG (Song et al., 2009). The current study, based on

MD simulations, allows us to comment on the residues that might be involved in the protonation of PBG, which triggers the subsequent steps of catalysis. To understand this process, the PBG moieties were docked in the active site at each stage of chain

77

Chapter 4

Figure 4.7: Stereograms of PBG docked in the active site of hHMBS at A. DPM, B. P3M, C. P4M and D. P5M stages showcase the role of R26 and R167 in proton donation to the incoming PBG. All the measurements are in angstroms. The PBG, polypyrrole, arginine (R26 and R167) and aspartate (D99) residues are shown in cyan, green, orange and violet colors, respectively. elongation. Based on the distance between the polar sidechains of residues lining the

78

Structural dynamics of human HMBS active site near the terminal ring of cofactor and the site of docked PBG, R26 is the most probable proton donor for the formation of P3M and P4M moieties (Fig. 4.6a, 4.6b).

However, the residues R26, Q34 and R195 cannot act as the proton donor for the formation of P5M and P6M stages, since they are far away from the docked position of the PBG moieties (Fig. 4.7).Therefore, R167, which is at a favorable position near the terminal rings of the docked PBG moieties (see Fig. 4.6c, 4.6d), is an ideal proton donor for the P5M and P6M stages.

Figure 4.8: Comparison of the DPM cofactor (pink and orange) and the HMB (green and cyan) conformation, before and after a 50 ns MD simulation. Minor changes in the conformation of HMB were observed after 50 ns of MD simulation.

4.3.7 R167 dynamics are important for HMB exit

To investigate the exit mechanism of the HMB from the catalytic cleft, the hexapyrrole was cleaved to form the tetrapyrrole product, HMB, leaving the DPM cofactor attached to the protein (hitherto referred as HMB stage). The HMB stage was simulated for 50 ns to study the exit mechanism of HMB from the enzyme. During this simulation, the HMB

79

Chapter 4 moiety is relaxed and moves marginally away from the D99 residue (Fig. 4.8). Since it would require a longer simulation time for the product to exit from HMBS by natural diffusion, an enhanced sampling method, viz., random acceleration MD (RAMD) was performed (Lüdemann et al., 2000).

Figure 4.9: A. Three possible exit paths for HMB are depicted using blue spheres. The position of the HMB is shown in sticks (yellow); the arrows are indicative of the probable exit paths A, B and C. B. Exit path A and C. Exit path B. Domains 1, 2 and 3 are represented in blue, red and green, respectively and the active site loop in cyan. R167 and HMB, in green and yellow respectively, are shown in the stage where HMB is beginning to exit from the protein. R167' and HMB', in orange and pink respectively, represent the stage where HMB is almost outside the protein.

RAMD simulation can predict the ligand exit path in an unbiased manner provided the force applied on the ligand is optimal. As noted above, a large number of hydrophilic and charged residues stabilized the acetate and propionate sidechains of HMB. The external force required to break such electrostatic interactions would be large. From all 72 80

Structural dynamics of human HMBS trajectories (see section 4.2), it was seen that the applied force was a critical factor in determining the successful exit of HMB. None of the trajectories with 10 kcal/mol-Å of force or less were successful. However, 50% of trajectories with a force magnitude of 20 and 25 kcal/mol-Å were successful. The simulation time for protein exit also depended on the applied force. With an external force of 90 kcal/mol-Å, the ligand exit time was as low as 33 ps, whereas with a force of 25 kcal/mol-Å, it took 1068 ps. Analysis of all the trajectories showed that there may be minor modifications in each path studied due to randomly chosen force vector direction. However, all successful paths can be largely clustered into three pathways, labeled as A, B and C (Fig. 4.9). Among all the successful trajectories, the HMB exits through path A in 46% of cases, whereas the exit through paths B and C is 33% and 25%, respectively.

R167 acts as a gate-keeper during the exit of HMB in all the three exit paths. While a major conformational change is observed for R167 in all the three paths, a few other charged residues also show conformational changes depending on the exit path studied.

Until HMB is formed (P6M stage), the R167 sidechain is oriented towards the active site.

In all the three exit paths, R167 interacts with the acetate and propionate sidechains of

HMB with large conformational fluctuation. This facilitates the exit of HMB from the enzyme (Fig. 4.9A). During the P6M stage, the active site loop moves further away from the active site (Fig. 4.2D), thereby creating a channel for the exit of HMB from the enzyme. In path A, HMB uses this channel for the exit (Fig. 4.9B). Apart from R167,

R173 also interacts with the HMB until it exits from the protein through path A. Path B is through the region between domains 1, 2 and the active site loop. The exit of HMB through path B requires further active site loop dynamics (Fig. 4.9C). In path B, K62 also interacts with HMB during its exit. HMB exit via path C is through the region between domain 2 and the active site loop (Fig. 4.9A). The salt bridge between R167 and the

81

Chapter 4 acidic sidechains of HMB causes major distortion in domain 2 (RMSD change of 4 Å was observed). This major structural distortion made path C less likely compared to those of paths A and B.

4.3.8 MD simulations of hHMBS mutations that impaired chain elongation

Among the several mutations that are responsible for the AIP, three active site residues

(D99, R26 and R167) that are crucial for the catalytic mechanism as suggested by biochemical studies were studied. D99G, R26C and R167W mutations were carried out at

DPM, P3M and P5M stages of chain elongation, respectively (Bung et al., 2018) to further understand the structural effect of the mutations.

Figure 4.10: The D99G mutation shows a shift in the cofactor by a magnitude of 3.5 Å towards the active site loop. Also, the pyrrole nitrogens come close to each other when compared to the DPM cofactor in the wild-type protein.

D99G: D99 is present at the center of the active site cleft in hHMBS. During the stages of chain elongation, D99 interacts with all of the pyrrole nitrogens so that the pyrrole rings

(Fig. 4.4) can orient around it into a compact helicoidal conformation. In the wild type, these interactions help the cofactor pyrrole rings maintain their position, possibly

82

Structural dynamics of human HMBS orienting the addition of the PBG moieties to bind to the terminal ring. D99G mutation is known to form an inactive holo protein (Louie et al., 1996; Shoolingin-Jordan et al.,

2003). MD simulation of D99G hHMBS in the DPM stage was carried out to understand the structural changes, if any. In the D99G mutant, the cofactor is displaced by 3.5 Å and the pyrrole nitrogens move close to each other, presumably due to loss of hydrogen bonds between pyrrole nitrogens and D99 sidechain (Fig. 4.10). D99, previously hypothesized to play a role in stabilizing the positive charge on the pyrrole nitrogens (Song et al.,

2009), is also important in maintaining the conformation of the growing polypyrrole, as evident from this study.

Figure 4.11: Changes in the polypyrrole conformation in the P3M stage due to the R26C mutation arrange ring C in a position difficult for further substrate addition. The conformation of the growing pyrrole chain in the A. wild-type and B. mutant R26C enzyme. The polypyrrole and D99 residue are shown in cyan and yellow sticks, respectively.

R26C: R26 plays an important role in stabilizing the negative charge on the polypyrrole.

Apart from maintaining the conformation of growing pyrrole chain, R26 might also play a role in protonating the first two PBG molecules. Earlier studies on R26C mutation indicated that the enzyme loses activity, with no enzyme-intermediate complexes formed

(Jordan and Woodcock, 1991; Mustajoki et al., 2000). An MD study of R26C mutation in

83

Chapter 4

P3M stage of hHMBS showed that the polypyrrole in the mutant structure is accommodated in a conformation that put ring C in a position difficult for further substrate addition compared with wild type hHMBS (Fig. 4.11). This is due to flipping of the ring C by an angle of 84o around the bond that connects it with the DPM cofactor.

R167W: R167 is present in domain 2, close to the active site loop region. Throughout the catalytic cycle, its sidechain remains pointed towards the active site region. During the

P4M stage, R167 interacts with ring D. R167, being close to the terminal ring of the elongating pyrrole chain in P4M and P5M stage, can stabilize the polypyrrole.Results of

MD simulations of the R167W mutation shows that the W167 sidechain orients towards the loop and no significant structural changes were evident, similar to earlier reports (Gill et al., 2009; Mustajoki et al., 2000). The most probable reason for the enzyme to be inactive is the lack of proton-donating propensity of the tryptophan at physiological pH compared with the arginine in 167th position.

4.3.9 Structural analysis of Non-active site mutations related to AIP

Acute intermittent porphyria (AIP) is a result of a large number of mutations in the

HMBS gene. Various clinical analyses have identified these mutations. The current simulation provides an excellent opportunity to identify the extended active site and role played by amino acids lining the active site in response to the polypyrrole chain elongation. The AIP mutations in the active site could affect the catalytic mechanism, while the non-active site mutations can disrupt the protein folding, which can result in dysfunctioning of the enzyme. Several properties of amino acid residues like hydrogen bonding, salt bridges, and disulfide bonds provide stability to the protein structure.

Substitution of a polar/hydrophilic residue in a hydrophobic environment has a deleterious impact on the protein structure. The change in size of the aminoacid sidechain as a result of mutation can disrupt the local environment in proteins. Also, the protein

84

Structural dynamics of human HMBS structure is thermodynamically less stable when a hydrophobic residue is placed in a polar environment as a result of mutation (Roy et al., 2014; Wang and Moult, 2001).

Another major cause for destabilization of protein structure occurs when an amino acid residue located in an α-helix is mutated to Proline. Based on literature (Wang and Moult,

2001), the effect of missense mutations was evaluated along the trajectory for each step of chain elongation. The E250 residue from domain 3 forms a salt bridge interaction with

R116 from domain 1 (Fig. 4.12). From the literature, a number of AIP causing mutations

(E250Q (Lundin et al., 1995), E250K (Gu et al., 1994), E250V (Puy et al., 1997), E250A

(Puy et al., 1997) and R116W (Gu et al., 1993), R116Q (Mgone et al., 1994)) are reported for these two positions which results in loss of the salt bridge that might play an important role in the protein‟s stability. Residues A252 (Mgone et al., 1993), L254 (Kauppinen and von und zu Fraunberg, 2002), L257 (Pischik et al., 2005), A330 (Yang et al., 2015) and

L343 (Floderus et al., 2002) are all part of the α-helix in HMBS. A proline residue is reported for all the above residues in patients with AIP. The proline in α-helix affects the hHMBS structure, since proline is known to be a helix breaker. T269I (Mgone et al.,

1994) is another mutation reported in few AIP patients. T269 forms a hydrogen bond with

E250 (Fig. 4.12) and its sidechain is surrounded by polar residues. The substitution of a hydrophobic residue like isoleucine can destabilize the local environment. On the other hand, the sidechain of L245 is surrounded by hydrophobic residues such as L244, P324,

A249, V301. The L245R (Delfau et al., 1991) mutation can destabilize the local hydrophobic environment due to mutation to a positively charged arginine residue. A probable explanation for most of the non-active site AIP mutations is provided in

Appendix B.

85

Chapter 4

Figure 4.12: Non-active site mutations responsible for AIP destabilize the PBGD structure. A. Hydrogen bond between T269 and E250; B. Salt bridge between E250 and R116; C. Residues in the secondary structure elements like helix; and D. Residue L245 in the hydrophobic core formed by V301, P324 and L244.

4.4 Discussion

The specific residues and mechanisms by which hHMBS performed each of the step-wise

additions of four molecules of PBG to the DPM cofactor were unknown due to the lack of

structural data for the intermediate elongation stages. From the crystal structures of DPM

stage hHMBS, it was evident that the active site was not large enough to accommodate

four PBG molecules (Gill et al., 2009; Song et al., 2009).

86

Structural dynamics of human HMBS

MD simulations revealed that the cofactor turn movement along with minor domain

motions provided space for the addition of the first two PBG moieties to the DPM

cofactor in hHMBS. The movement of the active site loop away from the active site

Figure 4.13: A representative structure of the DPM stage in A. EcHMBS and B. hHMBS. The domains 1, 2 and 3 are colored in blue, red and green, respectively. The 29-residue insert is shown in magenta color. Schematic representation of the domain dynamics in C. EcHMBS and D. hHMBS. The 29-residue insert prevents the motion of both domains 1 and 2. The length of the black arrows in C. and D. is proportional to the extent of the domain motions observed in EcHMBS and hHMBS, respectively.

region was the necessary next step to facilitate the accommodation of the next two PBG

moieties. Chapter 3, on EcHMBS, indicated the importance of the domain movements

and the active site loop dynamics in the formation of HMB. In EcHMBS, domains 1 and

87

Chapter 4

2 moved towards domain 3 to create space for pyrrole chain elongation (Fig. 4.13A,

4.13C) along with loop movement. However, hHMBS differed from EcHMBS by the presence of an additional 29-residue insert in domain 3. The 29-residue insert was wedged between domains 1 and 3, pressing domain 3 against domain 2, precluding large movements of domains 1 and 2 observed for EcHMBS (Fig. 4.13B, 4.13D). In hHMBS, the domain movement was compensated by the cofactor turn movement and the large active site loop movement. It is important to note that hHMBS and EcHMBS have 53% sequence similarity and most of the active site residues are conserved between these two species. Therefore, the environments for catalysis by EcHMBS and hHMBS were similar with notable exceptions: Q34 in hHMBS (Q19 in EcHMBS) interacted only with ring C at the P4M stage, while in subsequent stages the interactions were mediated by water molecules. In contrast, Q19 in EcHMBS interacted with ring C from the P3M stage onwards. The role played by Q19 in EcHMBS was played mostly by S28 in hHMBS, as

MD simulations suggested that S28 interacted with ring C from the P3M stage onwards.

In vitro studies of the S28N mutation further confirmed its importance (Bung et al.,

2018). The limitation of steered MD, which was used to study EcHMBS and to specify the initial force direction vector, was overcome by using RAMD simulation to study

HMB exit from hHMBS. Out of the three most likely paths that were evaluated for HMB exit, the path between the interface of domains and the active site loop was the most probable path for hHMBS.

It was noted that there were eight and six bound water molecules around the charged sidechains of the DPM cofactor in the EcHMBS and AtHMBS crystal structures (Louie et al., 1996; Roberts et al., 2013), respectively, while only one water molecule was reported in the hHMBS crystal structure (Song et al., 2009). Although water molecules are present on each side of the DPM and have been hypothesized to stabilize the DPM-hHMBS

88

Structural dynamics of human HMBS structure, these studies are the first to demonstrate the important role of water molecules in the subsequent steps of polypyrrole elongation. The MD simulations also indicated that any loss of charge stabilization due to amino acid substitutions in the P4M to P6M steps could be compensated by water-mediated interactions.

Notably, the MD simulations identified the critical active site residues involved in DPM binding and subsequent steps of polypyrrole elongation at atomistic levels. Amino acid residues from all three domains interacted with the polypyrrole. A total of 11, 14 and 3 residues from domains 1, 2 and 3, respectively, took part in stabilizing the highly negative-charged polypyrrole. These studies highlighted previously unknown roles of

R167 in hHMBS catalysis and HMB exit. Not only was R167 a critical proton donor for

PBG, in all three major exit paths, R167 also remained salt-bridged to one of the carboxylates of the polypyrrole chain and facilitated HMB exit. Thus, the mechanistic findings predicted the pathogenic basis of the common R167Q and R167W mutations that severely impair hHMBS activity (Bung et al., 2018) and cause AIP (Lander et al., 1991) by either failing to elongate the pyrrole chain or to facilitate the HMB exit. These mutations are among the most frequently identified in AIP patients (Stenson et al., 2009).

These studies also provided the molecular basis for reported missense AIP-causing mutations that impaired chain elongation (Fig. 4.14). Of the 136 reported missense mutations causing AIP (Stenson et al., 2009), 27 mutations (19.8%) were found at the active site residues, 11 of which altered eight residues that were critical to the covalent binding of the DPM cofactor to apo-hHMBS (Bung et al., 2018); 11 other mutations altered five residues involved in the addition of the third pyrrole (ring C, P3M stage, Fig.

4.14); and R167Q and R167W were involved in the addition of rings E and F. There was no reported missense hHMBS mutation involved in the addition of ring D, with the

89

Chapter 4

Figure 4.14: Classification of amino acid mutations those are responsible for altering A. cofactor binding, B. incoming PBG binding sites for pyrrole chain elongation, C. polypyrrole charge stabilization, and D. HMB release. exception of three reported D99 mutations (D99N, D99G and D99H) (Stenson et al.,

2009), which impaired the addition of all six PBG molecules. Interestingly, the three reported mutations of D99 did not totally prevent HMB assembly as D99G and D99H had about 3% of expressed activity (Bung et al., 2018). The MD simulations showed that the hexapyrrole was able to form a compact helicoidal conformation within the tight active site due to the interactions between D99 and all the nitrogens on the six pyrrole rings. It is notable that an analogous aspartate (D86) in the heme biosynthetic enzyme, uroporphyrinogen decarboxylase, was shown to coordinate with the cyclic uroporphyrinogen III substrate and presumably stabilized the intermediates formed during the enzyme-catalyzed decarboxylation of its substrate (Phillips et al., 2003).

90

Structural dynamics of human HMBS

4.5 Conclusion

In summary, these studies described the atomistic details of the hHMBS catalytic mechanism and provide insights into the molecular basis of the active site mutations that cause AIP (Fig. 4.14). The active site loop dynamics and cofactor turn movement in hHMBS reorganized the active site region such that an arginine was present near the terminal ring for protonation of the PBG substrate. The presence of the 29-residue insert in hHMBS drastically reduced the domain motions in comparison to EcHMBS enzyme.

These studies also delineated the role played by the active site residues and water molecules in binding and stabilizing the negative charges of the elongating polypyrrole at each stage. In addition to the active site residues, these water molecules may also act as proton donors. To date, 27 mutations of 14 of the 27 highly conserved active site residues have been reported to cause AIP (Stenson et al., 2009). Notably, R167 played a crucial role in enzyme catalysis and facilitated HMB exit. Finally, these studies provide a starting point for quantum chemical calculations. The intermediate snapshots at different pyrrole elongation stages should be extremely useful to determine the reaction coordinates and to investigate the enzyme‟s reaction mechanism at an electronic level.

91

Chapter 5

5 Catalytic Mechanism of human HMBS: a QM study

5.1 Background

HMBS has been extensively studied both experimentally and theoretically in a wide range of organisms from E. coli to human (Louie et al., 1996; Song et al., 2009; Roberts et al.,

2013; Guo et al., 2017; Bung et al., 2014, 2018). In chapters 3 and 4, detailed analyses on the structural changes that are responsible for accommodating the polypyrrole in E. coli and human homologs have been discussed. HMBS catalyzes stepwise polymerization of four units of porphobilinogen (PBG) to form 1-hydroxymethylbilane (see Fig. 1.2). The mechanism involved in the addition of PBG to the dipyrromethane cofactor (DPM) is largely unknown. The biochemical studies have helped to identify residues that could play an important role in the structural stability and catalysis of HMBS (Song et al., 2009;

Roberts et al., 2013; Louie et al., 1996), but their specific role in the mechanism is not clear. D99 has been suggested to play a key role in stabilizing the positive charge on the pyrrole rings during stages of chain elongation (Bung et al., 2014, 2018). Mutations

D99G and D99H lead to a significant reduction in the enzyme's activity (Song et al.,

2009; Bung et al., 2018). Also, in EcHMBS, the D84E (D99 in hHMBS) mutation showed a reduction in the enzyme‟s activity, whereas D84A and D84N mutations make the enzyme inactive (Woodcock and Jordan, 1994).There have been various hypotheses proposed on the catalytic mechanism of HMBS based on the crystal structures from E. coli, human and A. thaliana (Louie et al., 1996; Song et al., 2009; Roberts et al., 2013).

Hypothesis 1:

92

Catalytic Mechanism of human HMBS: a QM study

Based on the residues surrounding the active site of EcHMBS, Louie et al. (Louie et al.,

1996) have proposed that the residue D84 (D99 in human) may initially be protonated, as it is inaccessible to the external solvent medium and might receive a proton from a nearby

F62 phenyl ring. The protonated oxygen of D84 acts as a proton donor to the substrate for the removal of an ammonium ion. The resulting carboxylate anion can stabilize the positive charge on the pyrrole nitrogen after the deamination of PBG. Following the deamination, the other oxygen of the D84 carboxylate group can stabilize a positive charge on the pyrrole nitrogen of the terminal ring of the cofactor, thus promoting the generation of a nucleophilic carbon atom at the free α-position. After carbon-carbon bond formation, the carboxylate group of D84 can remove the proton from the α-position of the now penultimate ring to complete one cycle of ring addition (Louie et al., 1996).

Hypothesis 2:

From the X-ray structure of hHMBS (Song et al., 2009), it has been proposed that residues R26, Q34, or R195, that are positioned close to the C2 ring of the cofactor, might be a potential proton donor for the incoming PBG instead of D99 (D84 in E. coli) (Fig.

5.1). In hHMBS structure (PDB: 3ECR) (Song et al., 2009), the F77 phenyl ring

(equivalent to F62 in E. coli) is not in the vicinity of D99 residue to protonate the carboxylate sidechain as proposed by Louie et al. (Louie et al., 1996). They further propose that the positive charge on the deaminated intermediate, methylene pyrrolinene

(MePy), could be stabilized by the propionate sidechain on C2 ring of the cofactor. The free α-position on the C2 ring of the cofactor is activated by D99 to form a covalent bond with the MePy moiety resulting in a tripyrromethane moiety. In the final step, the extra proton at the penultimate ring of the cofactor (C2 ring) is transferred to one of the R26,

Q34 or R195 residue (Song et al., 2009).

93

Chapter 5

Figure 5.1: Schematic representation of the catalytic mechanism proposed by Song et al (Song et al., 2009).

Hypothesis 3:

In the mechanism proposed by Roberts et al. (Roberts et al., 2013), the D95 in AtHMBS

(D99 in hHMBS) may initially be protonated to stabilize the oxidized form of the cofactor. But in the reduced form and at optimal pH the carboxylate sidechain would be in the ionic form. The deprotonated D95 stabilizes the positive charge on the MePy intermediate and also catalyze the nucleophilic attack on the α-position on the C2 ring

(Fig. 5.2). In the current mechanism, the carboxylate group of D95 residue is involved in the catalytic mechanism. While in the mechanism proposed by Song et al. (Song et al.,

94

Catalytic Mechanism of human HMBS: a QM study

2009) apart from the aspartate residue, the carboxylate of the propionate sidechain plays a

role in stabilizing the MePy intermediate formed during the catalytic mechanism.

Figure 5.2: Schematic representation of the catalytic mechanism proposed by Roberts et al (Roberts et al., 2013)

Hypothesis 4:

In Chapter 4, residues that are crucial for the formation of HMB at each step of the

polymerization have been predicted by molecular dynamics (MD) simulations and

verified by in vitro mutagenesis studies. From these studies on hHMBS, residues R26,

D99 were suggested to play an important role in the addition of the first two pyrrole units

to the cofactor, while R167 and D99 were proposed to play a role in the addition of third

and fourth pyrrole moieties.

Based on the above hypotheses, the addition of one unit of PBG to the cofactor takes place in four major steps (see Fig. 1.6). Given the complexity of the mechanism proposed

95

Chapter 5 in the literature, it is crucial to understand the role played by each of the residues in the catalytic mechanism. Although classical MD simulations have helped to understand the large scale motions that are responsible for the dynamic nature of the proteins this cannot be used to study the enzymatic reactions. To study the catalytic mechanism of enzymes the QM cluster model calculations (Himo, 2017) and QM/MM calculations (Sousa et al.,

2017) are widely used. In recent studies, the QM cluster model approach has been successfully applied to many biological systems including cytochrome c oxidase

(Blomberg and Siegbahn, 2012), cytochrome c dependent nitric oxide reductase

(Blomberg and Siegbahn, 2013) and photosystem II (Siegbahn, 2013). Himo and co- workers have used a similar approach to study the reaction mechanism of peptidyl transferase (Kazemi et al., 2016), phenolic acid decarboxylase (Sheng et al., 2015), limonene epoxide hydrolase enzymes (Lind and Himo, 2013). In this study, QM calculations on model systems generated from the active site of hHMBS (QM cluster) have been used to understand the addition of one unit of substrate, PBG, to the DPM cofactor. The QM cluster models were built from the active site of the hHMBS by considering the residues that are known to be important for the catalytic mechanism from previous MD and biochemical studies (Song et al., 2009). A detailed step by step reaction mechanism for the addition of one unit of PBG has been proposed. The catalytic role of residues that have been proposed to be crucial has been explained in the current chapter.

5.2 Simulation details

5.2.1 Initial structure preparation

The structure of wild-type hHMBS (PDB ID: 3ECR) (Song et al., 2009) with the best X-

ray resolution (2.18 Ǻ) was used for generating the model structures. The missing

coordinates of residues 60-78 (part of the active site loop and adjoining region), the N-

96

Catalytic Mechanism of human HMBS: a QM study terminal (residues 1-20) and the C-terminal (residues 358 to 361) regions were modeled using Modeller 9v8 (Fiser and Sali, 2003). In order to study the catalytic mechanism of

HMBS, the substrate (PBG) was docked in the energy minimized crystal structure of hHMBS containing DPM cofactor using Autodock (Trott and Olson, 2010). From the

Figure 5.3: Schematic representation of A. Cluster model 1 and B. Cluster model 2, used to study the catalytic mechanism. Important distances (d1, d2, d3 and d4) that were used as reaction coordinates for potential energy scans are shown as red dashed lines. The atom identifiers for important atoms are shown in black.

97

Chapter 5 docking calculations, the minimum energy structure that had pyrrole nitrogen of PBG interacting with D99 and had no short contacts with the active site residues of protein was selected for subsequent studies.

5.2.2 MD simulations with PBG and cofactor

The structure obtained from docking calculations was subjected to energy minimization and 1 ns of equilibration. During equilibration, the heavy atoms (C, N, O and S) of the protein were restrained. The resulting structure of HMBS was subjected to 10 ns of unrestrained MD simulation. The MD simulations were performed using the NAMD program (Phillips et al., 2005) and CHARMM22 force field (Mackerell et al., 2004;

MacKerell et al., 1998). The protein was solvated in a cubic box using TIP3P water model (Jorgensen et al., 1983) and ions were added to ensure charge neutrality of the system. The force field parameters for DPM cofactor and PBG moiety were prepared using CGenFF (Vanommeslaeghe et al., 2012).

5.2.3 QM calculations on the model system

For studying the catalytic mechanism of HMBS using QM calculations two model systems were created from the active site of the representative structure. In model 1, residues R26 (Cα atom along with sidechain), D99 (Cα atom along with sidechain), DPM cofactor (without acetate and propionate sidechains) and docked PBG (without acetate and propionate sidechains) were considered (Fig. 5.3A). In model 2, apart from the atoms of model 1, residues T25, S28 and N169, acetate and propionate sidechains of PBG were also considered (Fig. 5.3B). From MD simulations, it is observed that hydrogen bonds were present between these residues and acetate and propionate sidechains of PBG. The total number of atoms in the models 1 and 2 were 78 and 126, respectively. All the preliminary calculations were performed on model 1. Initially, AM1 was used for exploring the potential energy surface by scanning along the desired internal coordinate

98

Catalytic Mechanism of human HMBS: a QM study for each step of catalysis (Fig. 5.3). To prevent unrealistic movements of the residues in the context of the overall structure of the enzyme, the Cα atoms of residues R26, D99 and

C261 were fixed to their positions during the geometry optimizations (Fig. 5.3). The minimum energy states and transition states obtained from the potential energy scans were further optimized using AM1 level of theory. The transition states were confirmed by visual analysis of normal modes corresponding to the imaginary frequency. Based on the literature review, M06 has been shown to perform better for systems with dispersion and ionic hydrogen-bonding interactions (Walker et al., 2013). The M06-class functionals have been extensively used to understand the catalytic mechanism of various enzymes

(Brás et al., 2016; Bryantsev et al., 2009; Cerqueira et al., 2013). In the current study,

M06 (Zhao and Truhlar, 2008) functional was used to calculate energies using 6-

311++G(d,p) basis set as implemented in Gaussian 09 (Frisch et al., 2009). The natural population atomic (NPA) charges were obtained from natural bond orbital (NBO) analysis at M06/6-311++G(d,p) level. The effect of solvent was studied using the polarizable continuum model (PCM) (Miertus and Tomasi, 1982) with a dielectric constant of 78.4. The most probable mechanism from cluster model 1 was re-computed using cluster model 2 using a similar approach as mentioned above. Gaussview

(Dennington et al., 2009) and Pymol (DeLano, 2002; Schrodinger, 2010) programs were used for visualization.

5.3 Results and Discussion

To understand the catalytic mechanism, one molecule of PBG was docked in the active site of hHMBS. Among various docking poses that were obtained, the minimum energy pose in which PBG is proximal to the terminal ring (ring B) of the DPM cofactor and interacting with D99 was selected for MD simulations. An inspection of trajectories showed that the interactions of polar and charged residues with the cofactor are conserved

99

Chapter 5 during 10 ns of simulation. From the simulations, a representative structure was identified based on the orientation of the PBG moiety with respect to D99 residue.

The representative structure showed a Root Mean Square Deviation (RMSD) of 1.8 Å with respect to the 3ECR crystal structure (Song et al., 2009). The residue R26 stacks above the pyrrole ring of PBG and is in the vicinity of the amino group of the PBG (Fig.

5.4). The other two residues, Q34 and R195, that were proposed as probable proton donor

(Song et al., 2009), are away from the amino group of PBG in the representative structure

(Fig. 5.4). R26 was therefore selected as a probable donor for the first step of the catalytic

Figure 5.4: Stereoview of the representative structure obtained from a 10 ns MD simulation. The substrate, PBG, is shown in blue sticks, while the active site residues are shown in magenta sticks. mechanism. A cluster model was created using the representative structure for studying the catalytic mechanism.

In cluster model 1, residues R26, D99, DPM cofactor and PBG molecule are considered

(Fig. 5.3A). Based on the X-ray structure of AtHMBS (Roberts et al., 2013), it has been

100

Catalytic Mechanism of human HMBS: a QM study suggested that a proton on the aspartate residue (D99 in human) could interact with the oxygen atom at the α-position of the second ring of DPM in its inactive oxidized form.

On the other hand, the 3ECR (PDB ID) crystal structure has the cofactor in the active reduced form and does not necessitate the presence of a proton on D99 residue (Song et al., 2009). The net charge on the cluster model 1 is zero. Since not much is known about the catalytic mechanism, initial calculations were performed on the cluster model 1 (Fig.

5.3A). In the preliminary QM calculations on the cluster model 1, it was observed that the addition of first PBG to the DPM cofactor is carried out in four steps: 1) protonation of substrate, PBG; 2) deamination; 3) electrophilic addition and 4) deprotonation (Fig. 5.5,

5.6). The deprotonation step proceeds via two sub-steps.

Based on the proximity of donor and acceptor atoms, two mechanisms have been proposed for the deprotonation step (Fig. 5.5). In mechanism 1, a proton is transferred from the bridging carbon atom (between rings B and C) to the R26 residue, followed by the transfer of hydrogen from the α-position to the bridging carbon atom (Fig. 5.6A).

While in mechanism 2, the proton is transferred from the α-position carbon of ring B to the R26 residue via the carboxylate oxygen of D99 residue (Fig. 5.6B). From the energy profile, it is observed that the mechanism 2 is energetically more favorable when compared to mechanism 1 (Fig. 5.7). The deamination step is seen to be the rate-limiting step for the catalytic mechanism (Fig. 5.7).

101

Chapter 5

Figure 5.5: Optimized structures of reactant, transition states and intermediates for protonation, deamination and nucleophilic attack steps obtained from QM calculations on the cluster model 1. The cofactor, substrate, and residues R26 and D99 are shown in green, magenta, pink and blue sticks, respectively. The transition states for the protonation and deamination steps are shown as a 2D-representation. All the measurements are in Angstroms (Å).

102

Catalytic Mechanism of human HMBS: a QM study

Figure 5.6: The two possible mechanisms for the deprotonation step of catalysis. A. Deprotonation through the bridging carbon atom between rings B and C. B. Deprotonation through the carboxylate sidechain of D99. The cofactor, substrate, residues R26 and D99 are shown in green, magenta, pink and blue sticks, respectively. All the measurements are in Angstroms (Å).

103

Chapter 5

Figure 5.7: Energy profile for the catalytic mechanism of HMBS using cluster model 1. The energy profile for the first three steps of catalysis is shown in green. Two mechanisms are proposed (denoted in blue and orange) for the final deprotonation step of catalysis.

5.3.1 QM calculations using cluster model 2

Although mechanism 2 is energetically more favorable than mechanism 1, the energy barrier for deamination remains high. It is possible that other residues that are not part of model 1 may actively take part in the reaction in order to reduce the barrier. Another cluster model with acetate and propionate sidechains of PBG along with the residues T25,

S28, and N169 was created (Fig. 5.3B). The additional residues in cluster model 2 stabilize the negative charge on the PBG and might play a role in stabilizing the intermediates during the catalytic mechanism. The QM calculations for each step of the energetically favorable mechanism 2 were performed using cluster model 2.

104

Catalytic Mechanism of human HMBS: a QM study

5.3.1.1 Protonation of PBG via the arginine sidechain

In the first step of the catalytic mechanism, the amino group of the PBG is activated by transferring a proton from the positively charged amino acid, R26 (Fig. 5.8, 5.9). During the reaction, the distance (denoted as d1 in Fig. 5.3B) between the nitrogen of the substrate (13N, see Fig. 5.3B) and the hydrogen (14H) in the guanidinium group of arginine sidechain reduces from 2.4 Å to 1.0 Å. The transition state for the reaction is seen at a distance of 1.2 Å from the nitrogen (13N) of the substrate (Fig. 5.8, 5.8). The energy of the intermediate (Int1), protonated PBG, is 9.5 kcal/mol higher when compared to the initial structure used to study the catalytic mechanism (reactant, Fig. 5.10). Due to the transfer of a proton, a decrease in the charge on all the three nitrogen atoms of the guanidinium group (1N, 2N and 3N) is observed (Table 5.1). The charge on the nitrogen

(2N) that has donated a proton decreases from -0.79 to -0.87, while the charge on the nitrogen atom of the amino group of PBG (13N) increases from -0.87 to -0.77. The proton on the nitrogen atom of PBG (14H) forms a hydrogen bond with the R26 residue. At the end of the first step, the gain in positive charge on PBG is stabilized by the carboxylate sidechain of D99 through a salt bridge interaction (Fig. 5.9).

5.3.1.2 Deamination of PBG

In the next step of catalysis, the distance between the carbon (12C) and the nitrogen

(13N) of the PBG moiety (denoted as d2 in Fig. 5.3B) was scanned in steps of 0.1 Å. The transition state for the deamination is observed at a distance of 2.2 Å with an energy barrier of 14.7 kcal/mol (Fig. 5.10). After the deamination, the hybridization of the carbon attached to the ammonium group of PBG (12C) changes from sp3 to sp2 (Fig. 5.8). The positive charge on the carbon atom (12C) is in conjugation with the lone pair of electrons on the nitrogen atom of pyrrole moiety (9N) (Fig. 5.8). The charge on the carbon atom

105

Chapter 5

Figure 5.8: Schematic representation of reactant, transition states and intermediates observed during the addition of PBG molecule to the DPM cofactor. Residues T25, S28, N169, and acetate and propionate sidechains of PBG that are part of cluster model 2 are not shown in the schematic. The important distances for each structure are reported in Angstroms (red).

106

Catalytic Mechanism of human HMBS: a QM study

Figure 5.9: Optimized structures of reactant, transition states and intermediates for protonation, deamination and nucleophilic attack steps obtained from QM calculations on the cluster model 2. The cofactor, substrate, and residues T25, R26 S28, N169 and D99 are shown in green sticks while the cofactor, residues R26 and D99 are shown in, magenta, pink and blue sticks, respectively. All the measurements are in Angstroms (Å).

107

Chapter 5

Figure 5.10: Energy profile for the catalytic mechanism of HMBS using cluster model 2.

(12C) increases from -0.25 to -0.17 (Table 5.1). The charge on all the ring atoms of the substrate increases after the deamination step. Also, a decrease in the partial negative charge on the nitrogen atom of the MePy (9N) is stabilized by the carboxylate sidechain of D99 residue (5O), which is at a distance of 2.05 Å. The other oxygen of D99 residue

(4O) interacts with both rings A and B (Fig. 5.9). The deamination follows an E1 elimination mechanism as proposed earlier (Pichon et al., 1992).

Alternate mechanisms for the deamination step were also explored. The protonation and deamination steps of catalysis (electrophilic addition) were modeled as a concerted reaction. However, from the 2D-scan plot, it was observed that the deamination was followed by protonation of the substrate (Fig. 5.11).

108

Catalytic Mechanism of human HMBS: a QM study

Figure 5.11: 2D-scan for the protonation and deamination steps of the HMBS catalytic mechanism. In the protonation step, the distance between the proton on arginine and nitrogen atom of PBG decreases from 2.4 to 0.9 Å (Y co-ordinate), while for the deamination the distance between the carbon and the nitrogen of PBG increases from 1.5 to 2.5 Å (X co-ordinate).

5.3.1.3 Electrophilic addition: Covalent bond formation between the cofactor and the

substrate

The NH3 molecule that was formed in the deamination step was removed from the cluster model as it does not play any further role in the catalytic mechanism. In Int2′, the distance

(denoted as d3 in Fig. 5.3B) between the α-position carbon of DPM (7C, ring B) and the

MePy moiety (12C) is 3.9 Å (Fig. 5.8, 5.9). The carboxylate sidechain of D99 forms a hydrogen bond with the pyrrole nitrogen of ring B (8N). The conjugation due to the lone pair of electrons on the nitrogen of ring B (8N) creates a partial negative charge on the α- position carbon (7C). The partial negative charge attracts the positively charged MePy intermediate (electrophile) that is formed during the deamination step. The energy barrier

109

Chapter 5 for the electrophilic addition reaction is 11.4 kcal/mol (Fig. 5.10). The transition state

(TS3) is located at a distance of 2.1 Å between the carbon atoms forming the covalent bond (Fig. 5.8, 5.9). The newly added pyrrole ring is referred to as ring C. The charge on the bridging carbon (12C) that forms the covalent bond between two pyrrole moieties decreases from -0.18 to -0.45 (Table 5.1). The positive charge on the MePy moiety and the partial negative charge on the α-carbon (7C) atom of the terminal ring of the cofactor

(ring B) drive the overall reaction. In previous studies, the formation of a covalent bond between the terminal ring of DPM and the MePy intermediate has been termed as a nucleophilic attack (Louie et al., 1996; Song et al., 2009). In the current study, it is noted that more than the partial negative charge on the α-carbon (7C), the positive charge on the

MePy moiety could be responsible for the formation of a covalent bond. Accordingly, it would be more appropriate to term the reaction as an electrophilic addition.

5.3.1.4 Deprotonation at the α-position carbon of the second ring of DPM

In the final step, a proton at the α-position of ring B (15H) has to be transferred to the

R26 residue to complete one cycle of catalytic reaction. The cluster model 1 calculation show that the transfer of a proton to R26 residue is via D99.In the first step, the proton

(15H) of ring B is transferred to the carboxylate oxygen of D99 residue (5O) which is at a distance of 3.5 Å (Fig. 5.8, 5.12). The energy barrier for the deprotonation step is 2.5 kcal/mol (Fig. 5.10). The transition state was identified at a distance of 1.3 Å from the α- position carbon of ring B (7C) (Fig. 5.8, 5.12). The resultant intermediate (Int4′) is 17.8 kcal/mol lower in energy when compared to Int3 (Fig. 5.10). The charge on both the oxygen atoms of the D99 sidechain increases from -0.81/-0.85 to -0.73/-0.68 after the first deprotonation step (Table 5.1). Furthermore, the charge on the carbon at α-position (7C) also increases from -0.06 to 0.21. For the reaction to proceed, it is necessary that the sidechain of D99 is rotated by 92° such that the hydrogen can be transferred from D99 to

110

Catalytic Mechanism of human HMBS: a QM study

R26 (Fig. 5.12). The movement of the D99 sidechain reduces the distance between the proton and the nitrogen of R26 residue to 2.6 Å (Fig. 5.8). After the rotation, the intermediate (Int4′) formed is 6.2 kcal/mol more stable compared to Int4 (Fig. 5.10). The final deprotonation step from the oxygen of the carboxylate sidechain (5O) to the nitrogen of the guanidinium group (2N) is without a barrier. The proton transfer is spontaneous and the resultant product is 26.3 kcal/mol lower in energy compared to the reactant (Fig.

5.10). The potential energy scan using AM1 shows an energy barrier for the transfer of a proton from the carboxylate sidechain of D99 to the nitrogen atom in R26, both in vacuum and with implicit solvent. However, DFT calculations have shown no barrier in the energy profile for the transfer of a proton. At the end of the catalytic reaction, residue

R26 is in its protonated state to donate a proton to the next PBG moiety (Fig. 5.8, 5.12).

Table 5.1: Natural charges on the atoms, important for the catalytic mechanism, calculated using natural population analysis as implemented in Gaussian09 at M06/6- 311++G(d, p) level. Reference to atom numbers used in the above table has been provided in Fig. 5.2B

Atom Number Reactant TS1 Int1 TS2 Int2 Int2ʹ+NH3 TS3 Int3 TS4 Int4 Int4ʹ Product N 1 -0.66 -0.68 -0.70 -0.71 -0.71 -0.71 -0.72 -0.72 -0.72 -0.72 -0.71 -0.66 N 2 -0.79 -0.85 -0.87 -0.85 -0.84 -0.83 -0.82 -0.83 -0.83 -0.83 -0.86 -0.81 N 3 -0.80 -0.84 -0.85 -0.85 -0.85 -0.85 -0.86 -0.86 -0.85 -0.85 -0.85 -0.80 O 4 -0.82 -0.85 -0.86 -0.83 -0.82 -0.83 -0.83 -0.81 -0.81 -0.73 -0.71 -0.84 O 5 -0.84 -0.83 -0.82 -0.83 -0.83 -0.82 -0.84 -0.85 -0.81 -0.68 -0.67 -0.81 N 6 -0.57 -0.57 -0.57 -0.57 -0.57 -0.57 -0.56 -0.56 -0.57 -0.56 -0.57 -0.56 C 7 -0.02 -0.03 -0.04 -0.02 -0.02 -0.03 -0.06 -0.06 0.80 0.21 0.19 0.15 N 8 -0.57 -0.57 -0.58 -0.57 -0.57 -0.57 -0.58 -0.54 -0.64 -0.56 -0.57 -0.57 N 9 -0.57 -0.91 -0.57 -0.54 -0.55 -0.54 -0.59 -0.57 -0.61 -0.58 -0.57 -0.56 O 10 -0.82 -0.83 -0.85 -0.81 -0.79 -0.78 -0.79 -0.84 -0.84 -0.85 -0.85 -0.83 O 11 -0.87 -0.84 -0.83 -0.82 -0.80 -0.80 -0.85 -0.81 -0.81 -0.82 -0.81 -0.83 C 12 -0.23 -0.49 -0.25 -0.13 -0.17 -0.18 -0.29 -0.45 0.49 -0.46 -0.47 -0.45

N 13 -0.87 -0.81 -0.77 -0.95 -1.06 NH3 removed from the cluster model

111

Chapter 5

Figure 5.12: Optimized structures of intermediates and transition state for deprotonation 1 and 2 steps. The cofactor, residues T25, S28, N169 are shown in green sticks while the cofactor, residues R26 and D99 are shown in, magenta, pink and blue sticks, respectively. All the measurements are in Angstroms (Å).

5.3.2 Effect of dielectric constants on the cluster model

The influence of the protein environment on the catalytic mechanism was studied using a series of dielectric constants that mimic the protein environment. Earlier studies have shown that the hydrophobic core of the protein can be modeled using a dielectric constant value of 4 (Gesto et al., 2013), while other studies have proposed that the dielectric

112

Catalytic Mechanism of human HMBS: a QM study constant inside the protein is about 6-7 and the value goes up to 20-30 for the surface of protein exposed to solvent (Li et al., 2013). Since the active site of HMBS is surrounded by a large number of charged and polar residues, a series of dielectric constants were considered. The energies were computed at four different dielectric constants (ε= 8, 10,

20 and 30). The energies of reactant, intermediates and product are similar even with a change in the dielectric constant (Fig. 5.13). The minor changes in the energies reflect that the solvation effects have already been saturated at the size of the cluster model chosen to study the catalytic mechanism and no catalytically important residues are missing from the model (Himo, 2017).

Figure 5.13: Comparison of energy profile by varying the dielectric constant of the medium.

5.3.3 R26 and D99 are crucial for catalysis

From the extensive QM calculations, it is observed that the residues R26 and D99 play an important role in the catalytic process. R26 donates a proton to the substrate, PBG, in the first step of catalysis. During the subsequent steps, R26 remains deprotonated and at the

113

Chapter 5 end of the catalytic mechanism R26 regains its protonation state and is ready to donate the proton for the subsequent addition of a PBG unit. The effect of R26C and R26H mutations using MD simulations and in vitro mutagenesis has been recently reported

(Bung et al., 2018). Mutation R26C results in an inactive enzyme with an activity that is

0.3% of that of the wild type (Bung et al., 2018).

D99 plays multiple roles in the catalytic mechanism. It stabilizes the intermediates that are formed during the catalysis and may also act as the mediator for the transfer of a proton from the bridging carbon to the R26 residue in the final deprotonation step. From

MD simulations it was observed that the aspartate residue plays a crucial role in maintaining the compact helicoidal conformation within the tight active site of HMBS by forming hydrogen bonds with the nitrogen atoms of all the six pyrrole rings (see Chapter

4). The HMBS crystal structures obtained from B. megaterium showed that the cofactor is assembled in D82E mutant structure, while D82N and D82A mutants affect the assembly of cofactor due to significant domain movements as observed in the X-ray structures

(Guo et al., 2017). In other enzymes of the heme biosynthesis pathway that utilize the product of HMBS to form heme as a final product, an aspartate residue has been found to play a crucial role. For example, in uroporphyrinogen decarboxylase (UROD), an aspartate residue was shown to coordinate with the cyclic uroporphyrinogen III substrate and also play an important role in the catalysis (Phillips et al., 2003). Also, in the oxygen- dependent coproporphyrinogen III oxidase (CPO), it has been suggested that an invariant aspartate residue coordinates the pyrrole nitrogens similar to the UROD and might serve as the base for NH proton abstraction (Stephenson et al., 2007). The above examples further emphasize the role of a small negatively charged residue (aspartate) in stabilizing and maintaining the conformation of polypyrrole in various enzymes of the heme biosynthetic pathway.

114

Catalytic Mechanism of human HMBS: a QM study

Figure 5.14: Schematic representation of the most probable catalytic mechanism for the addition of one unit of the substrate to the DPM cofactor.

5.4 Conclusions

The current study on HMBS helps to understand the catalytic mechanism and characterize the role of important residues involved. In the first step of catalysis, the PBG is activated by transferring a proton to its primary amino group (Fig. 5.14). The activation of the substrate, PBG, involves the positively charged amino acid, R26. In the second step, the positively charged PBG expels ammonia to form a MePy intermediate (electrophile), which is stabilized by the carboxylate sidechain of D99 residue (Fig. 5.14). The deamination of the PBG moiety is the rate-limiting step with an energy barrier of 24.2 kcal/mol. After the deamination step, an electrophilic addition reaction takes place, leading to a covalent bond formation between the MePy intermediate and ring B of the

DPM cofactor (Fig. 5.14). The D99 residue plays a crucial role in driving the overall reaction. In the fourth step, the proton at the α-position of the second pyrrole, ring B, is transferred to the R26 residue via the carboxylate sidechain of D99 residue, thus completing one cycle of substrate addition (Fig. 5.14).

115

Chapter 5

The QM calculations are also in agreement with mutational studies and earlier hypothesis, where the role of R26 and D99 residues in the addition of PBG molecules have been proposed based on MD simulations. The study also demonstrates the importance of including appropriate residues in the cluster model and its effect on the energetics. Both

R26 and D99 are conserved across species, suggesting that a similar catalytic mechanism would be operating in other homologs of HMBS. The study provides a stereochemical basis for the role of functionally critical residues in the enzyme and reveals a unique catalytic mechanism.

116

Chapter 6

6 Catalytic mechanism of human HMBS: a QM/MM

study

6.1 Background

Enzymes act as catalysts and play an important role in accelerating the chemical reactions. The factors that enhance the rate of a reaction are not clearly understood and differ from enzyme to enzyme. Few of the ways in which the enzyme accelerates the rate of a reaction are - stabilizing the transition state (Schramm, 2013), desolvation of the substrate (Devi-Kesavan and Gao, 2003; Rao et al., 2010), structural dynamics of the enzyme (Hammes et al., 2011; Kamerlin and Warshel, 2010) and quantum mechanical tunneling (Nagel and Klinman, 2010). There could be various other factors apart from those mentioned above that help in reducing the energy barrier of the chemical reaction

(Bushnell et al., 2015). In chapter 5, a detailed mechanism for the addition of one unit of

PBG to the cofactor was studied using the QM cluster model approach. The effect of the protein environment in the QM cluster model was approximated using the polarizable continuum model (PCM) (Miertus and Tomasi, 1982) with different values of dielectric constant. However, the anisotropy of the protein environment is necessary to obtain accurate energies, which can be considered in QM/MM calculations (Quesne et al., 2016;

Sousa et al., 2017). In a QM/MM calculation, the active site of the enzyme, where the reaction takes place is treated using the QM theory while the rest of the protein along with the solvent is treated with MM force-field. The above approach overcomes the limitations of both the QM and MM calculations. Warshel and Levitt in 1976 (Warshel and Levitt,

1976) were the first to introduce the concept of QM/MM calculations. There have been

117

Chapter 6 major advancements in the way the QM/MM calculations are performed in the past three decades (Acevedo and Jorgensen, 2009; Bushnell et al., 2015; Duarte et al., 2015;

Martins-Costa and Ruiz-López, 2015; Quesne et al., 2016; Sousa et al., 2017; Zhang et al., 2003, 2018). The QM/MM calculations are widely used to understand the enzyme catalysis. Various factors, such as the size of the QM region (Jindal and Warshel, 2016;

Karelina and Kulik, 2017; Kulik et al., 2016), the interaction of the QM and MM regions

(Olsen et al., 2015; Roßs sbach and Ochsenfeld, 2017) and the scheme for energy calculations (Ratcliff et al., 2017) affect the results. Recent work has demonstrated the importance of the MM environment while investigating the catalytic mechanism

(Bushnell et al., 2015). However, the study by Prejanò et al., has shown that the energies obtained using both the QM cluster model and QM/MM calculations are similar for the

LigW-Decarboxylase protein (Prejanò et al., 2018). It is estimated that if the size of the cluster model is large enough such that most of the electrostatic and long-range interactions are considered, then effectively both the cluster model and QM/MM approaches should converge and give similar results.

In the current chapter, the catalytic mechanism of HMBS has been studied using QM/MM calculations by explicitly considering the steric and electrostatic effect of the protein. The size of the QM region is one of the important factor in obtaining accurate energy barriers.

Different QM regions were considered to study the effect of residues considered in the

QM region on the energy profile. This will further validate the reaction mechanism obtained using QM cluster model. Also, the role of water molecules in the catalytic mechanism, if any, has been explored.

118

Catalytic mechanism of human HMBS: a QM/MM study

6.2 Simulation details

6.2.1 Initial structure preparation

The structure of the wild-type hHMBS (PDB ID: 3ECR) (Song et al., 2009) was used.

The missing residues were modeled and PBG was docked in the active site of HMBS as mentioned in section 5.2. The representative structure that was used for generating the

QM cluster model (in chapter 5) was used as the starting structure for the QM/MM calculations.

6.2.2 QM/MM calculations

In the QM/MM calculations, the most crucial part is to define the QM and MM regions.

The residues in the QM region affect the overall energy barrier of the catalytic mechanism. For the current study two models were considered based on the residues in the QM region (Fig. 6.1):

1. Model 1: DPM cofactor, PBG, sidechain of residues T25, R26, S28, D99 and

N169 (similar to QM cluster model 2) (Fig. 6.1B).

2. Model 2: DPM cofactor (without acetate and propionate sidechain for ring A),

PBG, sidechain of residues T25, R26, S28, F77, S96, K98, D99, N169 and R173

(Fig. 6.1C).

The above models were created based on the residues interacting with PBG (Model 1), and both cofactor and PBG (Model 2). The rest of the protein along with the water molecules were considered in the MM region. The QM/MM calculations were performed using pDynamo (Field, 2008). The initial potential energy scans (PES) were carried out by using the AM1 semi-empirical method for the QM region and CHARMM22 parameters for the MM region. The electrostatic embedding (Fox et al., 2011) approach was used to include the effect of the MM region on the QM region. The atoms in the MM

119

Chapter 6 region beyond 20 Å of the QM region were fixed. The conjugate gradient method was used for the optimization of structures. The hydrogen link atoms were used to saturate the valence at the QM/MM interface (Field, 2008). The pDynamo uses an additive scheme

(see section 2.3.2) for computing energies. The structures obtained from PES were further optimized using AM1 semi-empirical method for the QM region. The single point energies were computed using M06 level of theory for the QM region and CHARMM22 forcefield for the MM region. ORCA 4.0 (Neese, 2012) was used for calculating the single point energies of the QM region.

6.3 Results

The HMBS structure that was used to obtain the cluster model for the QM calculations was used for the QM/MM calculations. In the HMBS structure (model 2 in chapter 5), the pyrrole nitrogens of the DPM cofactor and the PBG moiety interact with the carboxylate sidechain of D99 residue (Fig. 6.1). The acetate sidechain of PBG forms hydrogen bonds with R26, S28 and N169 residues while the propionate sidechain interacts with R26 and

T25 residues (Fig. 6.1B). From the QM cluster model calculations, it was observed that the catalytic mechanism is carried out in four steps - 1) Protonation, 2) Deamination, 3)

Electrophilic addition and 4) Deprotonation. The PES was performed for each of these steps to locate a probable transition state and intermediates along with the catalytic mechanism, similar to the QM cluster model approach.

A comparative analysis of two model systems showed a similar mechanism. The energy barrier for the protonation step is similar in both model 1 and model 2. However, for all the other steps of catalytic mechanism (Fig. 6.1C) the model 2 shows a low energy profile when compared to model 1. The structural changes and the factors that affect the energy barrier for each of the step of catalytic mechanism are discussed below using results

120

Catalytic mechanism of human HMBS: a QM/MM study

Figure 6.1: Stereoview of residues considered in the QM region for A. Model 1 and B. Model 2. The DPM cofactor, PBG, R26 and D99 residues are shown as green, magenta, pink and blue sticks, respectively. All the other residues in the QM region are shown as cyan sticks. C. Energy profile corresponding to the QM/MM Models 1 (orange) and 2 (blue) used for studying the catalytic mechanism of HMBS. obtained from QM/MM calculations on model 2.

6.3.1 Protonation

In the first step of catalysis, the PBG in the active site of HMBS is activated by transferring a proton from the R26 residue. The R26 residue is positively charged and can 121

Chapter 6 donate a proton to the amino group of PBG. The distance between the hydrogen atom of the guanidinium group of R26 and the nitrogen atom of the amino group of PBG is 2.9 Å

(Fig. 6.2). The transition state (TS1) is located at a distance of 1.3 Å. The resultant intermediate (Int1) is 2.4 kcal/mol higher in energy when compared to the reactant (Fig.

6.1C). The protonated amino group of the PBG forms a salt bridge interaction with the carboxylate group of D99 residue. The distance between the proton and the acceptor is

1.7 Å. The loss of conjugation in the guanidinium group is stabilized by the formation of a salt bridge interaction.

6.3.2 Deamination of PBG molecule

The deamination of PBG is one of the crucial steps in the catalytic mechanism. From QM cluster model calculations, the deamination of the PBG was the rate-limiting step of the complete catalytic cycle. The distance between the carbon and the nitrogen of the amino group of PBG is 1.5 Å (Fig. 6.2). The transition state (TS2) for the deamination reaction is observed at a distance of 1.9 Å with an activation energy of 25.5 kcal/mol from Int2

(Fig. 6.1C). After the deamination, the distance between the carbon and the nitrogen atom of the NH3 group is 3.4 Å (Fig. 6.2). The intermediate is 25 kcal/mol high in energy when compared to the reactant (Fig. 6.1C). The positive charge on the terminal carbon of PBG is stabilized by resonance with the pyrrole ring and by the negative charge on the acetate sidechain of pyrrole, which is at a distance of 2.5 Å.

122

Catalytic mechanism of human HMBS: a QM/MM study

Figure 6.2: Optimized structures of the reactant, transition states and intermediates for protonation, deamination and electrophilic attack steps obtained from QM/MM calculations. The protein is shown in blue cartoon representation. The DPM, PBG, and residues R26 and D99 are shown in green, magenta, pink and blue sticks, respectively. All the measurements are in Angstroms (Å). 123

Chapter 6

6.3.3 Electrophilic addition

In the cluster model, the NH3 that was formed after the deamination was removed, but in

QM/MM calculations the NH3 molecule is retained in the QM region for the subsequent calculations. In the third step of the catalytic mechanism, a covalent bond is formed between the MePy moiety and the α-position carbon of the terminal ring of DPM cofactor. The distance between the corresponding carbon atoms is 3.6 Å (Fig. 6.2). The transition state is observed at a distance of 2.2 Å. The activation energy for the reaction is

2 kcal/mol (Fig. 6.1C). The positive charge on the MePy moiety and the partial negative charge at the α-position carbon of the terminal ring of the cofactor drive the overall reaction. The intermediate, Int3, is 12 kcal/mol high in energy when compared to the reactant (Fig. 6.1C). All the nitrogens of the pyrrole moieties interact with the carboxylate sidechain of the D99 residue.

6.3.4 Deprotonation

In the final step, the extra proton at the α-position carbon of the penultimate ring (ring B) of the polypyrrole has to be transferred to the R26 residue to complete one cycle of the catalytic mechanism. The distance between the proton and the guanidinium group of the

R26 residue is 6.9 Å. As seen in the previous chapter, the best possible way to transfer the proton to the R26 residue is through the carboxylate group of D99. The carboxylate group of the D99 residue is at a distance of 2.9 Å from the proton at α-position carbon of ring B

(Fig. 6.3). The energy barrier for the reaction is 23 kcal/mol (Fig. 6.1C). The transition state is observed at a distance of 1.3 Å from the carboxylate group of D99 residue. In the second step, the proton from the carboxylate group of D99 is transferred to the guanidinium group of the R26 residue. For the reaction to proceed, it is necessary that the sidechain of D99 rotates such that the hydrogen is transferred from D99 to R26 (Fig. 6.3).

The rotation of the carboxylate sidechain by an angle of 88º reduces the distance between

124

Catalytic mechanism of human HMBS: a QM/MM study

Figure 6.3: Optimized structures of the intermediates, transition states and the product for the deprotonation steps obtained from QM/MM calculations. The protein is shown in blue cartoon representation. The DPM, PBG, and residues R26 and D99 are shown in green, magenta, pink and blue sticks, respectively. All the measurements are in Angstroms (Å). the carboxylate sidechain and the guanidinium group of R26 from 6.1 to 2.9 Å (Fig. 6.3).

The intermediate formed (Int4′) is 5.3 kcal/mol more stable when compared to Int4. The final transfer of proton to the guanidinium group is barrier-less and the product is 1.1

125

Chapter 6 kcal/mol more stable when compared to the reactant (Fig. 6.1C). In the final structure of the product, the pyrrole nitrogens of rings A, B and C interact with the D99 residue. Also, the sidechain of the R26 residue is positioned to donate a proton to the next PBG molecule (Fig. 6.3).

6.3.5 Alternative mechanisms

Although extensive calculations were performed on the active site of HMBS to understand its catalytic mechanism, the role of active site water has not been considered.

The water molecules have been reported in the high-resolution crystal structure of

AtHMBS (Roberts et al., 2013), hHMBS (Song et al., 2009) and EcHMBS (Louie et al.,

1992). It was noted that there were eight and six bound water molecules around the charged sidechains of the DPM cofactor in the EcHMBS and AtHMBS crystal structures, respectively, while only one water molecule was reported in the hHMBS crystal structure.

Also, in Chapter 4, it was observed that there are a considerable number of water molecules within the active site and any loss of charge stabilization from the residues around the active site could be compensated by water-mediated interactions. It would be interesting, if the water molecules could aid in the catalysis. The probable role of water molecules as proton donors during the protonation step of catalytic mechanism has been studied.

6.3.5.1 Water as the proton donor

From the 10 ns MD simulation of HMBS with PBG moiety, it was observed that a water molecule was forming the hydrogen bond with the amino group of the PBG moiety. The water molecule was included in the existing QM region. The updated QM region now consists of DPM, PBG, the sidechain of residues T25, R26, S28, F77, S96, K98, D99,

126

Catalytic mechanism of human HMBS: a QM/MM study

Figure 6.4: Energy profile corresponding to the potential energy scan for the transfer of a proton from the water molecule to the amino group of the PBG moiety. The distance between the hydrogen of the water molecule and the amino group of PBG decreases from 1.8 to 1.0 Å. The structures corresponding to the reactant and intermediate are overlayed on the plot. The DPM, PBG, R26 and D99 residues are shown in green, magenta, pink and blue sticks, respectively. The distance mentioned in red is measured in Angstroms (Å).

N169 and R173, and a water molecule (Fig. 6.4). In the earlier mechanism, the R26 residue was used as a probable proton donor for the activation of the PBG moiety.

Instead, a PES for the transfer of the proton from the water molecule was studied. The distance between the hydrogen of the water molecule and the nitrogen of the amino group of PBG is 1.9 Å. The distance between the proton and the amino group of PBG was reduced in steps of 0.1 Å. During the scan, the energy increases steeply and no transition

127

Chapter 6

state was found for the reaction. The intermediate formed after the proton transfer is 56

kcal/mol high in energy when compared to the reactant. Such a high energy barrier for the

proton transfer shows that the reaction might not be feasible since the hydroxyl ion (OH-)

is unstable.

6.3.5.2 Water mediated proton transfer from R167

A careful analysis of the active site residues showed that the water molecule that is

forming a hydrogen bond with the PBG moiety is also interacting with the guanidinium

group of R167. The distance between the water and the guanidinium group is 2.0 Å (Fig.

Figure 6.5: Optimized structures of A. Reactant and B. Intermediate corresponding to the transfer of the proton from R167 to the amino group of PBG via a water molecule. The DPM, PBG, R26, D99 and R167 residues are shown in green, magenta, pink, blue and purple sticks, respectively. All the measurements are in Angstroms (Å). C. Energy profile corresponding to the potential energy scan for the transfer of the proton from R167 to the amino group of PBG moiety via a water molecule.

128

Catalytic mechanism of human HMBS: a QM/MM study

6.5A). Another QM/MM calculation was performed by including the sidechain of R167 residue and the bridging water molecule in the existing QM region. During the PES, the water molecule donates a proton to the amino group of the PBG and pulls a proton from the guanidinium group of R167 residue (Fig. 6.5B). The energy barrier for the protonation of PBG via water molecule is reduced to 50 kcal/mol and the intermediate formed after the protonation of PBG is 18.5 kcal/mol higher in energy when compared to the reactants

(Fig. 6.5C). The energy barrier for the transfer of the proton from the R167 residue via the water molecule is high when compared to the proton transfer from R26, thus making the latter most favorable for the catalytic mechanism.

Both the mechanisms with a water molecule in the catalytic mechanism, showed a higher energy barrier when compared to the reaction mechanism with the absence of water molecules.

6.3.6 Comparison of energies with the QM cluster model

A comparison of the energy profile obtained from the QM cluster model and QM/MM calculations showed that the energy barrier for the initial protonation decreases in the

QM/MM calculations (Fig. 6.6). In the QM/MM calculations, the carboxylate sidechain of D99 interacts with the amino group of PBG and stabilizes the transition state and the

Int1 during the catalytic mechanism. The energy barriers for the deamination and electrophilic addition steps are similar in both the QM and QM/MM calculations.

However, in the QM/MM calculations the activation energy of the first deprotonation step, where a proton is transferred from the penultimate ring of the polypyrrole (ring B) to the D99 residue, is higher when compared to the QM cluster model (Fig. 6.6). The reason for high barrier is the presence of acetate and propionate sidechains in the QM/MM calculations that are absent in the QM cluster model. The absence of acetate and propionate sidechains allows easy movement of the ring B for the transfer of the proton to

129

Chapter 6 the D99 residue. In the QM cluster model, the –O-H…π interaction formed between the protonated carboxylate sidechain of D99 residue with the pyrrole ring stabilizes the TS4 and Int4 structures. Due to the presence of acetate and propionate sidechain in QM/MM calculations such stacking interaction is unlikely.

Figure 6.6: Comparison of energy profile obtained from QM/MM calculations (blue) and QM cluster model calculations (magenta) at M06 level of theory.

6.4 Discussion

In the current chapter, the catalytic mechanism has been studied using QM/MM calculations. The catalytic mechanism is carried out in four steps as observed in QM cluster model approach. In the first step the substrate, PBG, is activated by transferring a proton from R26 residue. The protonated PBG moiety is deaminated resulting in a positively charged MePy moiety. In the third step, a covalent bond is formed resulting in the extension of polypyrrole by another unit. In the last step, the extra proton at the α- position carbon of the penultimate ring of the polypyrrole is transferred to the R26 residue

130

Catalytic mechanism of human HMBS: a QM/MM study via the carboxylate sidechain of the D99 residue. The alternative mechanisms including water molecule showed a higher energy barrier.

A comparison of energies of QM/MM and QM cluster model showed that the energy barrier for the deprotonation 1 step is significantly high due to the absence of the stacking interaction between the D99 residue and the pyrrole moiety in the former. In the current thesis, two possible mechanisms for the deprotonation have been explored based on the neighboring residues in the active site of HMBS enzyme. While there could be various other possibilities for the final deprotonation step. One of the possibilities could be the transfer of proton to a nearby water molecule, which can act as a proton shuttle to transfer the proton to the R26 residue. Proton shuttle via water molecule has been proposed in the catalytic mechanism of various enzymes (Elder et al., 2007; Gilmour, 2010; Tripathi et al., 2017). Although, in the current study no such water chain could be identified that is persistent in the MD simulations which connect the α-position of ring B and R26 residue, but such a possibility cannot be ruled out given the high hydration number in the active site of HMBS enzyme. Also, it has to be mentioned that the stereochemistry during the formation of covalent bond between the cofactor and the substrate plays an important role in determining the position of the hydrogen at the free α-position of ring B. In all the current calculations, the electrophile (MePy moiety) is above the plane of the cofactor, such that when a covalent bond is formed the position of hydrogen is below the plane of the rings B and C and the R26 residue is above the plane of the polypyrrole. Under such circumstances, the proton cannot be transferred directly to the R26 residue and require an intermediate such as D99 residue. But if the covalent bond formation is such that the proton is above the plane of rings B and C, then the energy barrier for the deprotonation step could be lower when compared to the reaction via. D99 residue. However, the

131

Chapter 6 current mechanism for the deprotonation seems most likely based on the residues surrounding the polypyrrole within the active site of the enzyme.

The size and selection of residues for the QM region play an important role in determining the energetics of the catalytic mechanism (Fig. 6.2). In the cluster model approach, the size of the cluster needs to be sufficiently large to accommodate the effect of the protein environment. While a small QM region is sufficient in the QM/MM calculations, as the protein environment in the MM region provides the necessary steric and electrostatic constraint. However, to reduce the over-polarization effect on the QM region due to the fixed charges of MM atoms, it is necessary to consider a large QM region even in QM/MM calculations.

6.5 Summary

In summary, the energies obtained for the catalytic mechanism of HMBS using QM/MM calculations are comparable to the DFT energies obtained from the QM cluster model calculations, except for the final deprotonation step. This shows that the QM cluster model was sufficient to model the first three steps of the catalytic mechanism. The high energy barrier for the deprotonation step is due to the steric and electrostatic constraint that is provided by the acetate and propionate sidechains on the movement of the ring B of the polypyrrole during the transfer of proton to D99 residue. Based on the extensive calculations performed for studying the catalytic mechanism of HMBS, it can be concluded that the size of QM region, in both the QM cluster model and the QM/MM calculations, plays an important role in determining the energy barrier for the catalytic mechanism. Expanding the QM region to include more atoms/residues from the protein and water molecules may benefit accurate modeling of the system. However, it is not feasible for the current study at this point of time.

132

Chapter 7

7 Conclusions

Hydroxymethylbilane synthase (HMBS) catalyzes the stepwise polymerization of four units of porphobilinogen using dipyrromethane cofactor to form a linear tetrapyrrole, 1- hydroxymethylbilane (HMB). Multi-scale modeling of the enzyme HMBS helped in gaining significant insights about its structural changes as it catalyzes the formation of the product, HMB. Molecular dynamics and enhanced sampling methods were used to study the process of chain elongation and exit of product from the active site of E. coli and human HMBS enzyme. Furthermore, a QM cluster model approach and QM/MM calculations were used to understand its catalytic mechanism.

7.1 Structural dynamics

The study of the chain elongation process revealed the importance of the active site loop, the cofactor turn movement and the domain movements in accommodating the polypyrrole chain. The domains 1 and 2 move apart and the cofactor turn moves toward domain 2 to accommodate the growing pyrrole chain in the active site cleft of both the homologs. The magnitude of domain movements in hHMBS is less when compared to its

E. coli counterpart due to the presence of the 29-residue insert at the interface of domains

1 and 3.

7.2 Role of active site residues and water molecules

Conserved polar residues play an important role in stabilizing the negative charge on the polypyrrole during stages of chain elongation. Based on the analysis of intermediate structures of chain elongation the residues that are important for the catalytic mechanism at each stage have been identified. The active site loop dynamics and cofactor turn

133

Chapter 7 movement reorganized the active site region such that an arginine is present near the terminal ring for protonation of the PBG substrate. R26 is the most probable proton donor for the formation of P3M and P4M moieties, while R167 could donate a proton for P5M and P6M formation. The D99 is at the center of the polypyrrole and interacts with all the pyrrole nitrogens and helps maintain a helicoidal conformation of the polypyrrole.

These studies also emphasis the role of water molecules in binding and stabilizing the negative charges of the elongating polypyrrole at each stage. In addition to the active site residues, these water molecules may also act as proton donors.

7.3 Exit mechanism

The exit of the product, HMB, from HMBS was studied using steered molecular dynamics and random acceleration molecular dynamics. Based on expulsion force profile, the exit of HMB is through the space formed between domain 1, domain 2 and the active site loop. A similar path for the exit was observed from multiple RAMD simulations.

R167 played a crucial role in facilitating the exit of HMB.

7.4 AIP-related mutations

To date, 27 mutations of 14 highly conserved active site residues have been reported to cause AIP (Stenson et al., 2009). The study provides insights into the molecular basis of the active site mutations that cause AIP. A detailed understanding for other non-active site residues has been provided based on the dynamics of intermediate stages of chain elongation and surrounding residues of each amino acid.

7.5 Catalytic mechanism

From QM and QM/MM calculations, it is observed that the catalytic mechanism is carried out in four steps. In the first step of catalysis, the substrate PBG is activated by the transfer of a proton to its primary amino group. The activation of PBG involves the

134

Conclusions positively charged amino acid, R26. In the second step, the positively charged PBG expels ammonia to form a MePy intermediate (electrophile), which is stabilized by the carboxylate sidechain of D99 residue. After the deamination step, an electrophilic addition reaction takes place, leading to a covalent bond formation between the MePy intermediate and ring B of the DPM cofactor. The D99 residue plays a crucial role in driving the overall reaction. In the fourth step, the proton at the α-position of the second pyrrole, ring B, is transferred to the R26 residue via the carboxylate sidechain of D99 residue, thus completing one cycle of substrate addition. The energies obtained for the catalytic mechanism of HMBS using QM/MM calculations are comparable to the DFT energies obtained from the QM cluster model calculations with an exception for the final deprotonation step. The energy barrier for the deprotonation step increases in the

QM/MM calculation when compared to QM cluster model. This can be traced back to the steric and electrostatic constraint that is provided by the acetate and propionate sidechains upon the movement of the second ring of polypyrrole during the transfer of proton to D99 residue. The acetate and propionate sidechains of second ring were not considered in the

QM cluster model calculation, thus providing low energy barrier for the reaction.

However, the deamination of the PBG moiety has the highest activation energy in both the QM and QM/MM calculations. Based on the extensive calculations performed for studying the catalytic mechanism of HMBS, it can be commented that the size of QM region, in both the QM cluster model and the QM/MM calculations, plays an important role in determining the energy barrier for the catalytic mechanism. Residues R26 and D99 that play a crucial role in the catalytic mechanism are conserved across species, suggesting that a similar catalytic mechanism would be operating in other homologs of

HMBS.

135

Chapter 7

7.6 Future work

The stage-specific MD simulations employed in the current thesis provide detailed insights into the mechanism of HMBS protein. Based on above results a similar approach can be applied to proteins with multi-substrate complex, where the active site cavity is not large to accommodate all the substrate molecules.

In the current thesis, MD simulations have been used to study the structural dynamics, and QM and QM/MM calculations to understand the catalytic mechanism. However, the use of QM/MM MD calculations could provide insights into both the catalytic mechanism and the structural changes in the protein during the catalysis. But the amount of time that can be simulated using QM/MM MD simulation is limited due to existing computational power and is much less when compared to timescales at which the domain motions are observed. The exit path for the product, HMB, has been studied using SMD and RAMD simulations. The computation of free energy along each of the proposed exit path and its comparison with experimentally determined off rate constant of the HMBS enzyme would help to get a detailed understanding of the exit of such large product from the enzyme.

Biochemical studies have identified about 400 mutations in 99 residues of hPBGD that cause AIP. While the role of active site mutations can be explained from MD simulations, it is difficult to explain why some of the non-active site mutations cause AIP. The network theory approach can be used to explain the effect of non-active site residues on the active site of HMBS enzyme. Also, the knowledge gained from MD, QM and

QM/MM calculations on HMBS enzyme can be used to design PBG analogs that can be used to treat AIP patients.

136

Chapter 8

References

Acevedo,O. and Jorgensen,W.L. (2009) Advances in quantum and molecular mechanical (QM/MM) simulations for organic and enzymatic reactions. Acc. Chem. Res., 43, 142–151. Adcock,S.A. and McCammon,J.A. (2006) Molecular dynamics: survey of methods for simulating the activity of proteins. Chem. Rev., 106, 1589–1615. Ajioka,R.S. et al. (2006) Biosynthesis of heme in mammals. Biochim. Biophys. Acta, 1763, 723–36. Albers,J.W. and Fink,J.K. (2004) Porphyric neuropathy. Muscle Nerve, 30, 410–422. Anderson,K.E. (2001) Disorders of heme biosynthesis: X-linked sideroblastic anemia and the porphyrias. Metab. Mol. Bases Inherit. Dis., 2991–3062. Anderson,P.M. and Desnick,R.J. (1980) Purification and properties of uroporphyrinogen I synthase from human erythrocytes. Identification of stable enzyme-substrate intermediates. J. Biol. Chem., 255, 1993–1999. Andrew,R.L. (2001) Molecular modeling principles and applications. 2nd Ed. Pearson Educ. Ltd. Atkins,P.W. and Friedman,R.S. (2011) Molecular quantum mechanics Oxford university press. Azim,N. et al. (2014) Structural evidence for the partially oxidized dipyrromethene and dipyrromethanone forms of the cofactor of porphobilinogen deaminase: structures of the Bacillus megaterium enzyme at near-atomic resolution. Acta Crystallogr. D Biol. Crystallogr., 70, 744–51. Bakan,A. et al. (2011) ProDy: Protein Dynamics Inferred from Theory and Experiments. Bioinformatics, 27, 1575–1577. Battersby,A.R. et al. (1976) Biosynthesis of and related macrocycles. Part VI. Nature of the rearrangement process leading to the natural type III porphyrins. J. Chem. Soc. Perkin 1, 273–282. Battersby,A.R. (2000) Tetrapyrroles: the pigments of life. Nat. Prod. Rep., 17, 507–526. Beale,S.I. et al. (1975) Biosynthesis of delta-aminolevulinic acid from the intact carbon skeleton of glutamic acid in greening barley. Proc. Natl. Acad. Sci., 72, 2719–2723. Berendsen,H.J.C. et al. (1987) The missing term in effective pair potentials. J. Phys. Chem., 91, 6269–6271. Blomberg,M.R.A. and Siegbahn,P.E.M. (2012) The mechanism for proton pumping in cytochrome c oxidase from an electrostatic and quantum chemical perspective. Biochim. Biophys. Acta, 1817, 495–505. Blomberg,M.R.A. and Siegbahn,P.E.M. (2013) Why is the reduction of NO in cytochrome c dependent nitric oxide reductase (cNOR) not electrogenic? Biochim. Biophys. Acta, 1827, 826–833. Bonday,Z.Q. et al. (1997) Heme biosynthesis by the malarial parasite. Import of delta- aminolevulinate dehydrase from the host red cell. J. Biol. Chem., 272, 21839–21846. Brás,N.F. et al. (2016) The Catalytic Mechanism of the Marine-Derived Macrocyclase PatGmac. Chem. Eur. J., 22, 13089–13097.

137

Chapter 8

Bryantsev,V.S. et al. (2009) Evaluation of B3LYP, X3LYP, and M06-class density functionals for predicting the binding energies of neutral, protonated, and deprotonated water clusters. J. Chem. Theory Comput., 5, 1016–1026. Buchenau,B. et al. (2006) Heme biosynthesis in Methanosarcina barkeri via a pathway involving two methylation reactions. J. Bacteriol., 188, 8666–8668. Bung,N. et al. (2018) Human hydroxymethylbilane synthase: Molecular dynamics of the pyrrole chain elongation identifies step-specific residues that cause AIP. Proc. Natl. Acad. Sci., 201719267. Bung,N. et al. (2014) Structural insights into E. coli porphobilinogen deaminase during synthesis and exit of 1-hydroxymethylbilane. PLoS Comput. Biol., 10, e1003484. Bushnell,E.A.C. et al. (2015) The Importance of the MM Environment and the Selection of the QM Method in QM/MM Calculations: Applications to Enzymatic Reactions. Adv. Protein Chem. Struct. Biol., 100, 153–185. Bussi,G. et al. (2007) Canonical sampling through velocity rescaling. J. Chem. Phys., 126, 014101. Cerqueira,N.M. et al. (2013) The sulfur shift: an activation mechanism for periplasmic nitrate reductase and formate dehydrogenase. Inorg. Chem., 52, 10766–10772. Chen,B. et al. (2018) Identification and characterization of 40 novel hydroxymethylbilane synthase mutations that cause acute intermittent porphyria. J. Inherit. Metab. Dis., 1– 9. Chen,C.H. et al. (1994) Acute intermittent porphyria: identification and expression of exonic mutations in the hydroxymethylbilane synthase gene. An initiation codon missense mutation in the housekeeping transcript causes „variant acute intermittent porphyria‟ with normal expression of the erythroid-specific enzyme. J. Clin. Invest., 94, 1927– 1937. Choutko,A. and van Gunsteren,W.F. (2012) Molecular dynamics simulation of the last step of a catalytic cycle: product release from the active site of the enzyme chorismate mutase from Mycobacterium tuberculosis. Protein Sci., 21, 1672–1681. Chovancova,E. et al. (2012) CAVER 3.0: a tool for the analysis of transport pathways in dynamic protein structures. PLoS Comput. Biol., 8, e1002708. Dailey,H.A. et al. (2000) Ferrochelatase at the millennium: structures, mechanisms and [2Fe- 2S] clusters. Cell. Mol. Life Sci. CMLS, 57, 1909–1926. Darden,T. et al. (1993) Particle mesh Ewald: An N⋅log(N) method for Ewald sums in large systems. J. Chem. Phys., 98, 10089–10092. Dasetty,S. et al. (2019) Simulations of interfacial processes: recent advances in force field development. Curr. Opin. Chem. Eng., 23, 138–145. De Siervi,A. et al. (1999) Identification and characterization of hydroxymethylbilane synthase mutations causing acute intermittent porphyria: evidence for an ancestral founder of the common G111R mutation. Am. J. Med. Genet., 86, 366–75. De Vivo,M. et al. (2016) Role of molecular dynamics and related methods in drug discovery. J. Med. Chem., 59, 4035–4061. DeLano,W.L. (2002) Pymol: An open-source molecular graphics tool. CCP4 Newsl. Protein Crystallogr., 40, 82–92. Dennington,R. et al. (2009) GaussView, version 5. Semichem Inc Shawnee Mission KS. Devi-Kesavan,L.S. and Gao,J. (2003) Combined QM/MM study of the mechanism and kinetic isotope effect of the nucleophilic substitution reaction in haloalkane dehalogenase. J. Am. Chem. Soc., 125, 1532–1540. Dewar,M.J. and Storch,D.M. (1985) Development and use of quantum molecular models. 75. Comparative tests of theoretical procedures for studying chemical reactions. J. Am. Chem. Soc., 107, 3898–3902.

138

References

Dewar,M.J. and Thiel,W. (1977) Ground states of molecules. 38. The MNDO method. Approximations and parameters. J. Am. Chem. Soc., 99, 4899–4907. Ditchfield,R. et al. (1971) Self-consistent molecular-orbital methods. IX. An extended Gaussian-type basis for molecular-orbital studies of organic molecules. J. Chem. Phys., 54, 724–728. Duarte,F. et al. (2015) Recent advances in QM/MM free energy calculations using reference potentials. Biochim. Biophys. Acta BBA-Gen. Subj., 1850, 954–965. Durrant,J.D. et al. (2011) POVME: An Algorithm for Measuring Binding-Pocket Volumes. J. Mol. Graph. Model., 29, 773–776. Elder,G.H. (1990) The cutaneous porphyrias. Semin. Dermatol., 9, 63–69. Elder,I. et al. (2007) Structural and kinetic analysis of proton shuttle residues in the active site of human carbonic anhydrase III. PROTEINS Struct. Funct. Bioinforma., 68, 337– 343. Erskine,P.T. et al. (2003) X-ray structure of a putative reaction intermediate of 5- aminolaevulinic acid dehydratase. Biochem. J., 373, 733–738. Essmann,U. et al. (1995) A smooth particle mesh Ewald method. J. Chem. Phys., 103, 8577– 8593. Ferreira,G.C. and Gong,J. (1995) 5-Aminolevulinate synthase and the first step of heme biosynthesis. J. Bioenerg. Biomembr., 27, 151–159. Field,M.J. (2008) The pDynamo Program for Molecular Simulations using Hybrid Quantum Chemical and Molecular Mechanical Potentials. J. Chem. Theory Comput., 4, 1151– 1161. Fiser,A. and Sali,A. (2003) Modeller: generation and refinement of homology-based protein structure models. Methods Enzymol., 374, 461–91. Fox,S.J. et al. (2011) Electrostatic embedding in large-scale first principles quantum mechanical calculations on biomolecules. J. Chem. Phys., 135, 224107. Frenkel,D. and Smit,B. (2001) Understanding molecular simulation: from algorithms to applications Elsevier. Frisch,M.J. et al. (2009) Gaussian 09, revision A. 2. Gao,J. and Truhlar,D.G. (2002) Quantum mechanical methods for enzyme kinetics. Annu. Rev. Phys. Chem., 53, 467–505. Gerstein,M. et al. (1993) Domain closure in lactoferrin: two hinges produce a see-saw motion between alternative close-packed interfaces. J. Mol. Biol., 234, 357–372. Gesto,D.S. et al. (2013) Unraveling the enigmatic mechanism of L-asparaginase II with QM/QM calculations. J. Am. Chem. Soc., 135, 7146–7158. Gill,R. et al. (2009) Structure of human porphobilinogen deaminase at 2.8 A: the molecular basis of acute intermittent porphyria. Biochem. J., 420, 17–25. Gilmour,K.M. (2010) Perspectives on carbonic anhydrase. Comp. Biochem. Physiol. A. Mol. Integr. Physiol., 157, 193–197. Groenhof,G. (2013) Introduction to QM/MM Simulations. In, Monticelli,L. and Salonen,E. (eds), Biomolecular Simulations. Humana Press, Totowa, NJ, pp. 43–66. Guégan,R. et al. (2003) Leptospira spp. possess a complete haem biosynthetic pathway and are able to use exogenous haem sources. Mol. Microbiol., 49, 745–754. Guo,J. et al. (2017) Structural studies of domain movement in active-site mutants of porphobilinogen deaminase from Bacillus megaterium. Acta Crystallogr. Sect. F Struct. Biol. Commun., 73, 612–620. Hädener,A. et al. (1999) Determination of the structure of selenomethionine-labelled hydroxymethylbilane synthase in its active form by multi-wavelength anomalous dispersion. Acta Crystallogr. D Biol. Crystallogr., 55, 631–643.

139

Chapter 8

Hammes,G.G. et al. (2011) Flexibility, diversity, and cooperativity: pillars of enzyme catalysis. Biochemistry, 50, 10422–10430. Hayashi,N. et al. (1980) Effects of hemin on the synthesis and intracellular translocation of delta-aminolevulinate synthase in the liver of rats treated with 3,5-dicarbethoxy-1,4- dihydrocollidine. J. Biochem. (Tokyo), 88, 1537–1543. Heinemann,I.U. et al. (2008) The biochemistry of heme biosynthesis. Arch. Biochem. Biophys., 474, 238–251. Helliwell,J.R. et al. (2003) Time-resolved and static-ensemble structural chemistry of hydroxymethylbilane synthase. Faraday Discuss., 122, 131–144; discussion 171-190. Helliwell,J.R. et al. (1998) Time-resolved structures of hydroxymethylbilane synthase (Lys59Gln mutant) as it is loaded with substrate in the crystal determined by Laue diffraction. J. Chem. Soc. Faraday Trans., 94, 2615–2622. Hess,B. et al. (2008) GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J. Chem. Theory Comput., 4, 435–447. Hess,B. et al. (1997) LINCS: A linear constraint solver for molecular simulations. J. Comput. Chem., 18, 1463–1472. Himo,F. (2017) Recent Trends in Quantum Chemical Modeling of Enzymatic Reactions. J. Am. Chem. Soc., 139, 6780–6786. Hohenberg,P. and Kohn,W. (1964) Inhomogeneous Electron Gas. Phys. Rev., 136, B864– B871. Hospital,A. et al. (2015) Molecular dynamics simulations: advances and applications. Adv. Appl. Bioinforma. Chem. AABC, 8, 37–47. Huang,D.D. et al. (1984) delta-Aminolevulinic acid-synthesizing enzymes need an RNA moiety for activity. Science, 225, 1482–1484. Huang,J. et al. (2017) CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat. Methods, 14, 71. Humphrey,W. et al. (1996) VMD: visual molecular dynamics. J. Mol. Graph., 14, 33–8, 27– 8. Ishida,T. et al. (1998) A primitive pathway of porphyrin biosynthesis and enzymology in Desulfovibrio vulgaris. Proc. Natl. Acad. Sci. U. S. A., 95, 4853–4858. Izrailev,S. et al. (1999) Steered Molecular Dynamics. In, Deuflhard,P. et al. (eds), Computational Molecular Dynamics: Challenges, Methods, Ideas, Lecture Notes in Computational Science and Engineering. Springer Berlin Heidelberg, pp. 39–65. Jaffe,E.K. (2004) The porphobilinogen synthase catalyzed reaction mechanism. Bioorganic Chem., 32, 316–325. Jahn,D. and Heinz,D.W. (2009) Biosynthesis of 5-Aminolevulinic Acid. In, Warren,M.J. and Smith,A.G. (eds), Tetrapyrroles: Birth, Life and Death, Molecular Biology Intelligence Unit. Springer New York, New York, NY, pp. 29–42. Jensen,F. (2013) Atomic orbital basis sets. Wiley Interdiscip. Rev. Comput. Mol. Sci., 3, 273– 295. Jensen,F. (2017) Introduction to computational chemistry John wiley & sons. Jindal,G. and Warshel,A. (2016) Exploring the Dependence of QM/MM Calculations of Enzyme Catalysis on the Size of the QM Region. J. Phys. Chem. B, 120, 9913–9921. Jing,Z. et al. (2019) Polarizable force fields for biomolecular simulations: Recent advances and applications. Annu. Rev. Biophys., 48. Jordan,P.M. (1991) The biosynthesis of 5-aminolaevulinic acid and its transformation into uroporphyrinogen III. In, New comprehensive biochemistry. Elsevier, pp. 1–66. Jordan,P.M. and Gibbs,P.N. (1985) Mechanism of action of 5-aminolaevulinate dehydratase from human erythrocytes. Biochem. J., 227, 1015–1020.

140

References

Jordan,P.M. and Warren,M.J. (1987) Evidence for a dipyrromethane cofactor at the catalytic site of E. coli porphobilinogen deaminase. FEBS Lett., 225, 87–92. Jordan,P.M. and Woodcock,S.C. (1991) Mutagenesis of arginine residues in the catalytic cleft of Escherichia coli porphobilinogen deaminase that affects dipyrromethane cofactor assembly and tetrapyrrole chain initiation and elongation. Biochem. J., 280 ( Pt 2), 445–9. Jorgensen,W.L. et al. (1983) Comparison of simple potential functions for simulating liquid water. J. Chem. Phys., 79, 926. Kamerlin,S.C.L. and Warshel,A. (2010) At the dawn of the 21st century: Is dynamics the missing link for understanding enzyme catalysis? Proteins, 78, 1339–1375. Karelina,M. and Kulik,H.J. (2017) Systematic quantum mechanical region determination in QM/MM simulation. J. Chem. Theory Comput., 13, 563–576. Karplus,M. and McCammon,J.A. (2002) Molecular dynamics simulations of biomolecules. Nat. Struct. Mol. Biol., 9, 646. Kauppinen,R. (2005) Porphyrias. Lancet Lond. Engl., 365, 241–252. Kazemi,M. et al. (2016) Peptide release on the ribosome involves substrate-assisted base catalysis. Acs Catal., 6, 8432–8439. Keskin,O. et al. (2000) Proteins with Similar Architecture Exhibit Similar Large-Scale Dynamic Behavior. Biophys. J., 78, 2093–2106. Kienle,A. et al. (1996) Why do veins appear blue? A new look at an old question. Appl. Opt., 35, 1151. Kohn,W. and Sham,L.J. (1965) Self-Consistent Equations Including Exchange and Correlation Effects. Phys. Rev., 140, A1133–A1138. Kühlbrandt,W. (2014) Biochemistry. The resolution revolution. Science, 343, 1443–1444. Kulik,H.J. et al. (2016) How large should the QM region be in QM/MM calculations? The case of catechol O-methyltransferase. J. Phys. Chem. B, 120, 11381–11394. Lamoureux,G. and Roux,B. (2003) Modeling induced polarization with classical Drude oscillators: Theory and molecular dynamics simulation algorithm. J. Chem. Phys., 119, 3025–3039. Lander,M. et al. (1991a) Studies on the mechanism of hydroxymethylbilane synthase concerning the role of arginine residues in substrate binding. Biochem. J., 275, 447– 452. Lander,M. et al. (1991b) Studies on the mechanism of hydroxymethylbilane synthase concerning the role of arginine residues in substrate binding. Biochem. J., 275, 447– 452. Layer,G. et al. (2010) Structure and function of enzymes in heme biosynthesis. Protein Sci., 19, 1137–61. Leeper,F.J. (1985) The biosynthesis of porphyrins, , and vitamin B12. Nat. Prod. Rep., 2, 561–580. Leeper,F.J. (1989) The biosynthesis of porphyrins, chlorophylls, and vitamin B12. Nat. Prod. Rep., 6, 171–203. Li,L. et al. (2013) On the Dielectric “Constant” of Proteins: Smooth Dielectric Function for Macromolecular Modeling and Its Implementation in DelPhi. J. Chem. Theory Comput., 9, 2126–2136. Lind,M.E. and Himo,F. (2013) Quantum chemistry as a tool in asymmetric biocatalysis: limonene epoxide hydrolase test case. Angew. Chem. Int. Ed., 52, 4563–4567. Liu,X. et al. (2008) A steered molecular dynamics method with direction optimization and its applications on ligand molecule dissociation. J. Biochem. Biophys. Methods, 70, 857– 864.

141

Chapter 8

Long,D. et al. (2009) Molecular dynamics simulation of ligand dissociation from liver fatty acid binding protein. PloS One, 4, e6081. Louie,G.V. et al. (1992) Structure of porphobilinogen deaminase reveals a flexible multidomain polymerase with a single catalytic site. Nature, 359, 33–9. Louie,G.V. et al. (1996) The three-dimensional structure of Escherichia coli porphobilinogen deaminase at 1.76-A resolution. Proteins, 25, 48–78. Lüdemann,S.K. et al. (2000) How do substrates enter and products exit the buried active site of cytochrome P450cam? 1. Random expulsion molecular dynamics investigation of ligand access channels and mechanisms. J. Mol. Biol., 303, 797–811. MacKerell,A.D. et al. (1998) All-atom empirical potential for molecular modeling and dynamics studies of proteins. J. Phys. Chem. B, 102, 3586–3616. Mackerell,A.D. et al. (2004) Extending the treatment of backbone energetics in protein force fields: limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations. J. Comput. Chem., 25, 1400–1415. Maier,J.A. et al. (2015) ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. J. Chem. Theory Comput., 11, 3696–3713. Malde,A.K. et al. (2011) An Automated Force Field Topology Builder (ATB) and Repository: Version 1.0. J. Chem. Theory Comput., 7, 4026–4037. Martins-Costa,M.T. and Ruiz-López,M.F. (2015) Advances in QM/MM Molecular Dynamics Simulations of Chemical Processes at Aqueous Interfaces. In, Quantum Modeling of Complex Molecular Systems. Springer, pp. 303–324. Maseras,F. and Morokuma,K. (1995) IMOMM: A new integrated ab initio+ molecular mechanics geometry optimization scheme of equilibrium structures and transition states. J. Comput. Chem., 16, 1170–1179. Mauzerall,D. and Granick,S. (1958) Porphyrin biosynthesis in erythrocytes. III. Uroporphyrinogen and its decarboxylase. J. Biol. Chem., 232, 1141–1162. McCammon,J.A. et al. (1977) Dynamics of folded proteins. Nature, 267, 585–590. Meller,E. and Gassman,M.L. (1982) Biosynthesis of 5-aminolevulinic acid: Two pathways in higher plants. Plant Sci. Lett., 26, 23–29. Miertus, S. and Tomasi,J. (1982) Approximate evaluations of the electrostatic free energy and internal energy changes in solution processes. Chem. Phys., 65, 239–245. Nagaraj,V.A. et al. (2009) Localisation of Plasmodium falciparum uroporphyrinogen III decarboxylase of the heme-biosynthetic pathway in the apicoplast and characterisation of its catalytic properties. Int. J. Parasitol., 39, 559–568. Nagaraj,V.A. et al. (2008) Unique properties of Plasmodium falciparum porphobilinogen deaminase. J. Biol. Chem., 283, 437–444. Nagel,Z.D. and Klinman,J.P. (2010) Update 1 of: Tunneling and dynamics in enzymatic hydride transfer. Chem. Rev., 110, PR41-67. Neese,F. (2012) The ORCA program system: The ORCA program system. Wiley Interdiscip. Rev. Comput. Mol. Sci., 2, 73–78. Nieh,Y.P. et al. (1999) Accurate and highly complete synchrotron protein crystal Laue diffraction data using the ESRF CCD and the Daresbury Laue software. J. Synchrotron Radiat., 6, 995–1006. Nosé,S. (1984) A molecular dynamics method for simulations in the canonical ensemble. Mol. Phys., 52, 255–268. Olsen,J.M.H. et al. (2015) Polarizable density embedding: A new QM/QM/MM-based computational strategy. J. Phys. Chem. A, 119, 5344–5355.

142

References

Oostenbrink,C. et al. (2004) A biomolecular force field based on the free enthalpy of hydration and solvation: the GROMOS force-field parameter sets 53A5 and 53A6. J. Comput. Chem., 25, 1656–76. Padmanaban,G. et al. (2007) An alternative model for heme biosynthesis in the malarial parasite. Trends Biochem. Sci., 32, 443–449. Parrinello,M. and Rahman,A. (1981) Polymorphic transitions in single crystals: A new molecular dynamics method. J. Appl. Phys., 52, 7182–7190. Perez,A. et al. (2016) Advances in free-energy-based simulations of protein folding and ligand binding. Curr. Opin. Struct. Biol., 36, 25–31. Perilla,J.R. et al. (2015a) Molecular dynamics simulations of large macromolecular complexes. Curr. Opin. Struct. Biol., 31, 64–74. Perilla,J.R. et al. (2015b) Molecular dynamics simulations of large macromolecular complexes. Curr. Opin. Struct. Biol., 31, 64–74. Phillips,J.C. et al. (2005) Scalable molecular dynamics with NAMD. J. Comput. Chem., 26, 1781–802. Phillips,J.D. et al. (2003) Structural basis for tetrapyrrole coordination by uroporphyrinogen decarboxylase. EMBO J., 22, 6225–33. Pichon,C. et al. (1992) On the mechanism of porphobilinogen deaminase. Design, synthesis, and enzymatic reactions of novel porphobilinogen analogs. Tetrahedron, 48, 4687– 4712. Pluta,P. et al. (2018a) Structural basis of pyrrole polymerization in human porphobilinogen deaminase. Biochim. Biophys. Acta BBA - Gen. Subj. Pluta,P. et al. (2018b) Structural basis of pyrrole polymerization in human porphobilinogen deaminase. Biochim. Biophys. Acta Gen. Subj., 1862, 1948–1955. Pradhan,M. et al. (2013) Structural dynamics of E. coli porphobilinogen deaminase during the tetrapolymerisation of porphobilinogen. In, Biomolecular Forms and Functions: A Celebration of 50 Years of the Ramachandran Map. World Scientific, pp. 455–467. Prejanò,M. et al. (2018) QM Cluster or QM/MM in Computational Enzymology: The Test Case of LigW-Decarboxylase. Front. Chem., 6. Quesne,M.G. et al. (2016) Quantum mechanics/molecular mechanics modeling of enzymatic processes: caveats and breakthroughs. Chem. Eur. J., 22, 2562–2581. Rao,L. et al. (2010) Electronic properties and desolvation penalties of metal ions plus protein electrostatics dictate the metal binding affinity and selectivity in the copper efflux regulator. J. Am. Chem. Soc., 132, 18092–18102. Ratcliff,L.E. et al. (2017) Challenges in large scale quantum mechanical calculations. Wiley Interdiscip. Rev. Comput. Mol. Sci., 7, e1290. Roberts,A. et al. (2013) Insights into the mechanism of pyrrole polymerization catalysed by porphobilinogen deaminase: high-resolution X-ray studies of the Arabidopsis thaliana enzyme. Acta Crystallogr. D Biol. Crystallogr., 69, 471–85. Roßs sbach,S. and Ochsenfeld,C. (2017) Influence of coupling and embedding schemes on QM size convergence in QM/MM approaches for the example of a proton transfer in DNA. J. Chem. Theory Comput., 13, 1102–1107. Scheraga,H.A. et al. (2007) Protein-folding dynamics: overview of molecular simulation techniques. Annu Rev Phys Chem, 58, 57–83. Schramm,V.L. (2013) Transition States, analogues, and drug development. ACS Chem. Biol., 8, 71–81. Schrodinger,L.L.C. (2010) The PyMOL molecular graphics system. Version, 1, 0. Senn,H.M. and Thiel,W. (2007) QM/MM studies of enzymes. Curr. Opin. Chem. Biol., 11, 182–187.

143

Chapter 8

Shemin,D. and Rittenberg,D. (1945) The Utilization of Glycine for the Synthesis of a Porphyrin. J. Biol. Chem., 159, 567–568. Sheng,X. et al. (2015) Theoretical study of the reaction mechanism of phenolic acid decarboxylase. FEBS J., 282, 4703–4713. Shoolingin-Jordan,P.M. (1995) Porphobilinogen deaminase and uroporphyrinogen III synthase: structure, molecular biology, and mechanism. J. Bioenerg. Biomembr., 27, 181–195. Siegbahn,P.E.M. (2013) Water oxidation mechanism in photosystem II, including oxidations, proton release pathways, O-O bond formation and O2 release. Biochim. Biophys. Acta, 1827, 1003–1019. Siepker,L.J. et al. (1987) Purification of bovine protoporphyrinogen oxidase: immunological cross-reactivity and structural relationship to ferrochelatase. Biochim. Biophys. Acta, 913, 349–358. Sievers,F. and Higgins,D.G. (2014) Clustal Omega, accurate alignment of very large numbers of sequences. Methods Mol. Biol. Clifton NJ, 1079, 105–116. Song,G. et al. (2009) Structural insight into acute intermittent porphyria. FASEB J., 23, 396– 404. Sousa,S.F. et al. (2017) Application of quantum mechanics/molecular mechanics methods in the study of enzymatic reaction mechanisms. Wiley Interdiscip. Rev. Comput. Mol. Sci., 7, e1281. Stenson,P.D. et al. (2009) The human gene mutation database: 2008 update. Genome Med., 1, 13. Stephenson,J.R. et al. (2007) Role of aspartate 400, arginine 262, and arginine 401 in the catalytic mechanism of human coproporphyrinogen oxidase. Protein Sci. Publ. Protein Soc., 16, 401–410. Stewart,J.J. (1989) Optimization of parameters for semiempirical methods II. Applications. J. Comput. Chem., 10, 221–264. Svensson,M. et al. (1996) ONIOM: a multilayered integrated MO+ MM method for geometry optimizations and single point energy predictions. A test for Diels- Alder reactions and Pt (P (t-Bu) 3) 2+ H2 oxidative addition. J. Phys. Chem., 100, 19357– 19363. Swope,W.C. et al. (1982) A computer simulation method for the calculation of equilibrium constants for the formation of physical clusters of molecules: Application to small water clusters. J. Chem. Phys., 76, 637–649. Thiel,W. (2009) QM/MM methodology: Fundamentals, scope, and limitations. Multiscale Simul. Methods Mol. Sci., 42, 203–214. Tripathi,R. et al. (2017) The GTPase hGBP1 converts GTP to GMP in two steps via proton shuttle mechanisms. Chem. Sci., 8, 371–380. Trott,O. and Olson,A.J. (2010) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem., 31, 455–61. Uchida,T. et al. (2018) Heme Binding to Porphobilinogen Deaminase from Vibrio cholerae Decelerates the Formation of 1-Hydroxymethylbilane. ACS Chem. Biol., 13, 750–760. Vanommeslaeghe,K. et al. (2012) Automation of the CHARMM General Force Field (CGenFF) II: assignment of bonded parameters and partial atomic charges. J. Chem. Inf. Model., 52, 3155–68. Verlet,L. (1967) Computer" experiments" on classical fluids. I. Thermodynamical properties of Lennard-Jones molecules. Phys. Rev., 159, 98. Walker,M. et al. (2013) Performance of M06, M06-2X, and M06-HF density functionals for conformationally flexible anionic clusters: M06 functionals perform better than

144

References

B3LYP for a model system with dispersion and ionic hydrogen-bonding interactions. J. Phys. Chem. A, 117, 12590–12600. Warren,M.J. and Jordan,P.M. (1988) Investigation into the nature of substrate binding to the dipyrromethane cofactor of Escherichia coli porphobilinogen deaminase. Biochemistry, 27, 9020–9030. Warren,M.J. and Scott,A.I. (1990) Tetrapyrrole assembly and modification into the ligands of biologically functional cofactors. Trends Biochem. Sci., 15, 486–491. Warshel,A. et al. (2006) Electrostatic basis for enzyme catalysis. Chem. Rev., 106, 3210– 3235. Warshel,A. and Levitt,M. (1976) Theoretical studies of enzymic reactions: dielectric, electrostatic and steric stabilization of the carbonium ion in the reaction of lysozyme. J. Mol. Biol., 103, 227–249. Whatley,S.D. and Badminton,M.N. (1993) Acute Intermittent Porphyria. In, Adam,M.P. et al. (eds), GeneReviews®. University of Washington, Seattle, Seattle (WA). White,K.A. and Marletta,M.A. (1992) Nitric oxide synthase is a cytochrome P-450 type hemoprotein. Biochemistry, 31, 6627–6631. Winn,P.J. et al. (2002) Comparison of the dynamics of substrate access channels in three cytochrome P450s reveals different opening mechanisms and a novel functional role for a buried arginine. Proc. Natl. Acad. Sci. U. S. A., 99, 5361–6. Woodcock,S.C. and Jordan,P.M. (1994) Evidence for participation of aspartate-84 as a catalytic group at the active site of porphobilinogen deaminase obtained by site- directed mutagenesis of the hemC gene from Escherichia coli. Biochemistry, 33, 2688–2695. Yan,H. and Ji,X. (2011) Role of protein conformational dynamics in the catalysis by 6- hydroxymethyl-7, 8-dihydropterin pyrophosphokinase. Protein Pept. Lett., 18, 328– 335. Yoshinaga,T. and Sano,S. (1980) Coproporphyrinogen oxidase. II. Reaction mechanism and role of tyrosine residues on the activity. J. Biol. Chem., 255, 4727–4731. Zhang,D.W. et al. (2003) New advance in computational chemistry: Full quantum mechanical ab initio computation of streptavidin- biotin interaction energy. J. Phys. Chem. B, 107, 12039–12041. Zhang,Y.-J. et al. (2018) The potential for machine learning in hybrid QM/MM calculations. J. Chem. Phys., 148, 241740. Zhao,Y. and Truhlar,D.G. (2008) Density functionals with broad applicability in chemistry. Acc. Chem. Res., 41, 157–167.

145

Appendix A

Multiple sequence alignment of sequences from P. falciparum, A. thaliana, H. sapiens, B. megaterium, E. coli and V. cholerae

P. falciparum ------IGTRDSPLALKQSEKVRKKIMSYFKKMNKNINVTFKYI 38

A. thaliana ------CVAVEQKTRTAIIRIGTRGSPLALAQAYETREKLKKKHPELVEDGAIHIEII 52

H. Sapiens MSGNGNAAATAEENSPKMRVIRVGTRKSQLARIQTDSVVATLKASYPGLQ----FEIIAM 56

B. megaterium ------HMRKIIVGSRRSKLALTQTKWVIEQLKKQGLPFE----FEIKEM 40

E. coli ------MLDNVLRIATRQSPLALWQAHYVKDKLMASHPGLV----VELVPM 41

V. cholerae ------MDRNIIMTETPIRIATRQSPLALWQANYVKDALMAAHPGLQ----VELVTM 47

:.:* * ** *: . : : . : :

P. falciparum KTTGDNILDSKSVGLYGGKGIFTKELDEQLINGNVDLCVHSLKDVPILLPNNIELSCFLK 98

A. thaliana KTTGDKILSQP-LADIGGKGLFTKEIDEALINGHIDIAVHSMKDVPTYLPEKTILPCNLP 111

H. Sapiens STTGDKILDTA-LSKIGEKSLFTKELEHALEKNEVDLVVHSLKDLPTVLPPGFTIGAICK 115

B. megaterium VTKGDQILNVT-LSKVGGKGLFVKEIEQAMLDKEIDMAVHSMKAMPAVLPEGLTIGCIPL 99

E. coli VTRGDVILDTP-LAKVGGKGLFVKELEVALLENRADIAVHSMKDVPVEFPQGLGLVTICE 100

V. cholerae VTRGDVILDTP-LAKVGGKGLFVKELEIAMLEGRADLAVHSMKDVPVDFPDGLGLVTICE 106

* ** **. :. * *.:*.**:: : . . *: ***:* :* :* :

P. falciparum RDTINDAFLSI---KYKSINDMNTVKSVSKTEDIHHINKKDSDHNNDTLCTIGTSSLRRR 155

A. thaliana REDVRDAFICL---TAATLAELP------AGSVVGTASLRRK 144

H. Sapiens RENPHDAVVFHPKFVGKTLETLP------EKSVVGTSSLRRA 151

B. megaterium REDHRDALISK---NGERFEELP------SGAVIGTSSLRRG 132

E. coli REDPRDAFVSN---NYDSLDALP------AGSIVGTSSLRRQ 133

V. cholerae REDPRDAFVSN---TYAKIEDLP------SGAIVGTCSLRRQ 139

*: .**.: : : . :**.****

P. falciparum SQIKNRYKNIYVN-NIRGNINTRIEKLYN-GEVDALIIAMCGIERLIKKA-NLKHLLKNK 212

A. thaliana SQILHKYPALHVEENFRGNVQTRLSKLQG-GKVQATLLALAGLKRLSMTE-NVA------196

H. Sapiens AQLQRKFPHLEFR-SIRGNLNTRLRKLDEQQEFSAIILATAGLQRMGWHN-RVG------203

B. megaterium AQLLSMRSDIEIK-WIRGNIDTRLEKLKN-EDYDAIILAAAGLSRMGWSKDTVT------184

E. coli CQLAERRPDLIIR-SLRGNVGTRLSKLDN-GEYDAIILAVAGLKRLGLES-RIR------184

146

Appendix A

V. cholerae CQLKAARPDLVIK-ELRGNVGTRLSKLDA-GEYDAIILAAAGLKRLELES-RIR------190

.*: : .. :***: **: ** . .* ::* .*:.*: :

P. falciparum EQKNICQPFLLKCNNKKCIDLCHVNIQKLNKNLIYPALGQGIIAVTSHKKNYFISSLLKN 272

A. thaliana ------SILSLDEMLPAVAQGAIGIACRTDDDKMATYLAS 230

H. Sapiens ------QILHPEECMYAVGQGALGVEVRAKDQDILDLVGV 237

B. megaterium ------QYLEPEISVPAVGQGALAIECRENDHELLSLLQA 218

E. coli ------AALPPEISLPAVGQGAVGIECRLDDSRTRELLAA 218

V. cholerae ------SFIEPEQSLPAVGQGAVGIECRVNDQRVRALLAP 224

: : *:.** :.: : .: :

P. falciparum INNKKSEMMAQIERSFLYHIDGNCMMPIGGYTNM-RNEDIYLHVIINDIHGYNKYQVTQK 331

A. thaliana LNHEETRLAISCERAFLETLDGSCRTPIAGYASKDEEGNCIFRGLVASPDGTKVLETSRK 290

H. Sapiens LHDPETLLRCIAERAFLRHLEGGCSVPVAVHTAM-KDGQLYLTGGVWSLDGSDSIQETMQ 296

B. megaterium LNHDETARAVRAERVFLKEMEGGCQVPIAGYGRILDGGNIELTSLVASPDGKTIYK---- 274

E. coli LNHHETALRVTAERAMNTRLEGGCQVPIGSYAEL-IDGEIWLRALVGAPDGSQIIR---- 273

V. cholerae LNHADTADRVRCERAMNLTLQGGCQVPIGSYALL-EGDTIWLRALVGEPDGSQIVR---- 279

::. .: ** : ::*.* *:. : : : .* .

P. falciparum DTLYNYKEI---GPNAAIKMKEI------IGTEQFNKIKAEAELHLLNNK- 372

A. thaliana GPY-VYEDM------VKMGKDAGQELLSRAGPGFFGN-- 320

H. Sapiens ATI-HVPAQHEDGPEDDPQLVGITARNIPRGPQLAAQNLGISLANLLLSKGAKNILDVAR 355

B. megaterium ------EHITGKDP--IAIGSEAAERLTSQGAKLLIDRVK 306

E. coli ------GERRGAPQDAEQMGISLAEELLNNGAREILAEVY 307

V. cholerae ------GEIRGPRTQAEQLGITLAEQLLSQGAKEILERLY 313

:* : : .. ::

P. falciparum ------372

A. thaliana ------320

H. Sapiens QLNDAH 361

B. megaterium EELDK- 311

E. coli NGDAPA 313

V. cholerae CDHE-- 317

147

Appendix B

Mutations reported to be responsible for causing AIP (Stenson et al., 2009). Text in grey denotes the active site residues

S. No Mutation Conserved Probable reason

1 M18I No Start codon

2 R22C Partially Interacts with E89 and crucial for secondary structure maintenance

3 G24S/D Partially Destabilizes the hydrophobic core with charged or hydrophilic residue

4 R26C Yes Probable proton donor for the formation of ES (P3M) and ES2 (P4M) complex. Plays an important role in polypyrrole chain elongation.

5 S28N Yes Stabilizes the polypyrrole chain. Interacts with the polypyrrole from P3M to P6M stages of chain elongation.

6 A31T,P,V Yes Larger sidechain hinders the extension of cofactor from P3M stage. Interacts with the polypyrrole from P3M to P6M stages of chain elongation.

7 Q34K/X/P/ Yes Active site residue interacts with the cofactor in the R P4M, P5M and P6M stages of chain elongation.

8 T35M Partially Hydrogen bond interaction between T35-T25 might be affected due to mutation.

9 L42S Yes Serine destabilizes the hydrophobic core formed by F51 and L234

10 S45X No Sidechain interacts with L234 to provide structural stability.

11 Y46X No Hydrogen bonding with D233

12 Q50 No Interacts with R19 and also exposed to the solvent

13 A55S No Serine can form hydrogen bond with R32 and might restrict the loop movement, which is important for cofactor elongation and product exit

14 T59I No At the base of loop and modulates its flexibility. Mutant has 80% of wild-type deaminase activity.

148

Appendix B

15 D61N/Y/H Yes Modulates loop flexibility. Interacts with active site residues S28 and R26

16 S69X No S69 is part of the loop residue. It interacts with cofactor from P5M stage onwards

17 T78P No Forms hydrogen bond with S75 in DPM stage. Proline, helix breaker, destabilizes the active site loop conformation.

18 E80G Yes Forms hydrogen bond with T59. In P6M stage, interacts with H83 via hydrogen bond. Partially exposed to the surface.

19 L81P Partially Part of a helix. Proline is known to be helix breaker

20 L85R Yes Forms hydrophobic core, difficult to accommodate a charged residue like arginine.

21 E86V No E86-R225 hydrogen bonding will be destabilized with E86V mutation.

22 V90G No Located in a hydrophobic core.

23 L92P Partially Located on a beta strand in the hydrophobic core of the protein formed by V23, V222 and V224. Proline is known to disrupt the secondary structure.

24 V93F/D No Forms hydrophobic core and any mutation with large sidechain residue like phenylalanine or charged aminoacid like aspartate will affect the protein stability

25 S96F Yes Interacts with D99 during DPM stage of chain elongation. Forms hydrogen bond interaction with P3M acetate sidechain

26 K98R Yes Stabilizes the negative charge on cofactor from DPM stage onwards

27 D99H/N/G Yes Aspartate stabilizes the pyrrole nitrogens of growing pyrrole chain and is speculated to play an important role in catalytic mechanism.

28 G111R No Located in a hydrophobic environment. Any substitution with bigger sidechain, especially with a charged residue can disrupt protein stability.

29 I113T No Part of a hydrophobic region. Hydrophilic residue like threonine will disrupt the protein stability

30 C114X No Substitution to charged residue will disrupt the hydrophobic core.

149

Appendix B

31 R116W/Q Yes Forms important salt bridge with E250. Can also form hydrogen bond with R246.

32 P119L No Located in domain2 near the hinge region.

33 D121Y Yes Disrupts the Salt bridge with R150

34 A122G Yes Located on a beta strand in a hydrophobic pocket. Glycine might disrupt the secondary structure.

35 V124D Partially Substitution to charged residue will disrupt the hydrophobic core.

36 R149X/Q/L Yes Active site residue, stabilizes the A ring of cofactor along the stages of chain elongation.

37 A152del No Maintains the hydrophobic core.

38 R167W/Q Yes Probable proton donor in P5M and P6M stages of chain elongation. Also, speculated to play a crucial role in product (HMB) release.

39 R173W/Q Yes Interact with A and B ring of polypyrrole in all the stages of chain elongation. Might play a role in product release.

40 L177R Yes Substitution to charged residue will disrupt the hydrophobic core.

41 D178N No Forms a salt-bridge with R201. Away from active site and surface exposed

42 R195C Yes Stabilize the negative charge on polypyrrole from P4M stage onwards.

43 R201W No Substitution with a hydrophobic residue can decrease the stability of the protein.

44 V202F/L Partially Interacts with the hydrophobic residues in the vicinity. Partially exposed to the surface. Mutation to an amino acid residue with larger sidechain can decrease the stability.

45 Q204K/X No Exposed to the surface. A K in that position can form strong salt bridge with E135 or h-bond with K157

46 M212V No Neighbor of M212 are Q153, R149, R156, W283. The interactions are more conserved in P6M stage than in DPM stage.

47 V215E/M Partially Located in the hydrophobic pocket formed by L97 and L254. Mutation might affect protein

150

Appendix B

conformation.

48 G216D No Mutation would result in a steric clash with L97

49 Q217R/L/H Yes In the neighborhood of Q217 reside K98, D121 and R150 and also polypyrrole chain. Any replacement with larger/bigger sidechain like R/H and with hydrophobic residue might disrupt the balance.

50 A219D No An aspartate at 219th position can interact with R116 (domain 2) which forms salt bridge with E250 (domain 3).

50 V222M Partially Substitution of a residue with large sidechain in the hydrophobic pocket is difficult and might disrupt the protein structure

52 E223K No Interacts with R251, T78 and H95 in DPM stage.

53 R225G/X/Q Partially Exposed to the surface and away from active site. Along the trajectory a number of hydrogen bond with E86, S28 and E228 were observed.

54 G236S No Substitution with a bigger sidechain amino acid will result in the local destabilization of protein.

55 L238R/P Partially Forms hydrophobic core with C114, V38 and L42. Substitution by a charged amino acid residue like Arg with longer sidechain will disturb the protein stability.

56 L244P No Part of a helix and Pro may be a helix breaker.

57 L245R Partially Surrounded by hydrophobic residues like L244, V301, P324 and A249. Substitution to arginine can disrupt the hydrophobic environment.

58 C247R/F No C247 is hydrogen bonded to backbone of A215 and K216. There are no space for bigger amino acid like R or F.

59 E250Q/K/V/ Yes E250 forms salt bridge with R116 (between domain A 2 and 3). This salt bridge is important for the stability of the protein.

60 A252T/P/V Partially A252 is part of a helix within a closely packed hydrophobic core. Proline is a helix breaker. Threonine is a hydrophilic residue which cannot be tolerated in this position and even a hydrophobic amino acid like valine cannot be accommodated in this position

151

Appendix B

61 L254P No Part of a helix and proline is known to be a helix breaker.

62 H256Y/N No Hydrogen bonded with Q340. Exposed to the surface.

63 L257P Partially Part of a helix and closely surrounded by hydrophobic residues like F253, L254 and L351. Proline is known to be a helix breaker.

64 G260D No An aspartate at position 260 will interact with acetate ring of the 5th pyrrole ring, thereby hindering the pyrrole elongation.

65 C261Y Yes DPM cofactor is anchored to the sulphur atom of cysteine 261. Mutation would halt the chain elongation process.

66 V267M No Large sidechain would result in the steric clashes with neighboring residues.

67 T269I Partially T269 makes hydrogen bond with R246 and E250. It resides in a charged environment and difficult to accommodate a hydrophobic residue.

68 A270G/D No Located in a beta sheet. A glycine in this position can disrupt the beta sheet.

69 G274R No Substitution to arginine will cause steric clash and disrupts the hydrogen bond between D173-H300

70 L278P Partially Located in a beta sheet. A proline in this position can disrupt the beta sheet.

71 G281del No G281 is surrounded by S290, W283, A266 and H268. Substitution with any longer sidechain amino acid will result in steric clash.

72 A330P No A330 is part of a helix. A proline in this position will break the helix

73 G335S/D No Longer sidechain amino acids will result in steric clash with F253 and L338. Also, polar (Ser) and charged (Asp) amino acid will affect the stability of the protein since G335, F253 and L338 create a hydrophobic environment.

74 L338P/R No Surrounded by hydrophobic residues F253, L278, and L341. Substitution to arginine can disrupt the hydrophobic environment

75 L343P Yes L343 is part of a helix. A proline in this position breaks the helix

152

List of Publications

Journal Publications

1. Bung N, Pradhan M, Srinivasan H, Bulusu G. Structural insights into E. coli porphobilinogen deaminase during synthesis and exit of 1-hydroxymethylbilane. PLoS Comput Biol. 2014 Mar;10(3):e1003484.

2. Ghosh A, Sengupta A, Seerapu GPK, Nakhi A, Shivaji Ramarao EVV, Bung N, et al. A novel SIRT1 inhibitor, 4bb induces apoptosis in HCT116 human colon carcinoma cells partially by activating p53. Biochem Biophys Res Commun. 2017 01;488(3):562– 9.

3. Bung N, Surepalli S, Seshadri S, Patel S, Peddasomayajula S, Kummari LK, et al. 2- [2-(4-(trifluoromethyl)phenylamino)thiazol-4-yl] (Activator-3) is a potent activator of AMPK. Sci Rep. 2018 Jun 25;8(1):9599.

4. Bung N, Roy A, Chen B, Das D, Pradhan M, Yasuda M, et al. Human hydroxymethylbilane synthase: Molecular dynamics of the pyrrole chain elongation identifies step-specific residues that cause AIP. Proc Natl Acad Sci. 2018 Apr 4;201719267.

5. Bung N, Roy A, Priyakumar UD, Bulusu G. Computational modeling of the catalytic mechanism of hydroxymethylbilane synthase. Phys Chem Chem Phys. 2019 Apr 10;21(15):7932–40.

Poster Presentations

1. Bung N, Pradhan M, Srinivasan H, Bulusu G. Structural Dynamics Of Porphobilinogen Deaminase: A Step Towards AIP and Malarial Therapy presented at 3D-SIG meeting at Boston, July 11-15, 2014. 2. Bung N, Pradhan M, Srinivasan H, Bulusu G. Structural Dynamics of Porphobilinogen Deaminase from Human and Plasmodium Falciparum during Catalytic Extension of Dipyrromethane Cofactor presented at 22nd Annual International Conference on Intelligent Systems Molecular Biology at Boston, July 11-15, 2014.

153

List of Publications

3. Roy A, Pradhan M, Bung N, Das D & Bulusu G. Structural Dynamics of Porphobilinogen Deaminase During the Complex Four Step Tetrapyrrole Synthesis presented at ISMB-ECCB 2015 meeting at Dublin, July 12-14, 2015. 4. Bung N, Das D, Pradhan M, Roy A & Bulusu G. Large scale protein dynamics regulate the mechanism of action of Porphobilinogen Deaminase, Annual Meeting of Indian Biophysical Society, IISc Bengaluru, February 8-10, 2016. 5. Bung N, Surepalli S, Peddasomayajula SK, Poondra RR, Misra P & Bulusu G. Insights into binding of Activator-3, a non –AMP mimetic, to AMP activated protein kinase. Current trends in Computational Natural Science symposium at IIIT Hyderabad, March 20, 2016. 6. Bung N, Das D, Pradhan M, Roy A & Bulusu G. Large scale protein dynamics regulate the mechanism of action of Porphobilinogen Deaminase, Current trends in Computational Natural Science symposium at IIIT Hyderabad, March 20, 2016. 7. Bung N, Roy A, Priyakumar U.D and Bulusu G. Insights into Catalytic Mechanism of Porphobilinogen Deaminase, Theoretical Chemistry Symposium, University of Hyderabad, December 14-17, 2016. 8. Roy A, Bung N, Pradhan M, Das D and Bulusu G. Role of Protein Motion during Synthesis and Exit of 1-Hydroxymethylbilane in the Enzyme Porphobilinogen Deaminase, Theoretical Chemistry Symposium, University of Hyderabad, December 14-17, 2016. 9. Bung N, Roy A, Priyakumar U.D and Bulusu G. Understanding the mechanism of porphobilinogen demainase using molecular dynamics and quantum chemical calculations, Theory and Simulation Across Scales in Molecular Science organised by Gordon Research Conference, Computational Chemistry, Spain, July 25-26, 2016. 10. Bung N, Roy A, Pradhan M and Bulusu G. Structural dynamics of porphobilinogen deaminase during the complex four step tetrapyrrole synthesis organised by Gordon Research Conference, Newport RI, July 17-22, 2016. 11. Bung N, Roy A, Priyakumar U.D and Bulusu G. Insights into Catalytic Mechanism of Porphobilinogen Deaminase, Annual Meeting of Indian Biophysical Society, IISER Mohali, March 22-25, 2017. 12. Bung N, Roy A, Priyakumar U.D and Bulusu G. Insights into the Catalytic Mechanism of Porphobilinogen Deaminase, Indo-German Workshop on Computational Chemistry in Biology and Medicine, IIIT Hyderabad, November 29- 30, 2017. 154

List of Publications

13. Bung N, Surepalli S, Seshadri S, Patel S, Kumar T S, Babu P, Parsa K V L, Reddy R, Bulusu G and Misra P. Activator-3 is a potent AMPK activator, Annual Meeting of Indian Biophysical Society, IISER Pune, March 9-11, 2018. 14. Bung N, Roy A, Chen B, Das D, Pradhan M, Desnick R J and Bulusu G. Human Hydroxymethylbilane Synthase: MD Simulations of the Pyrrole Chain Elongation Identifies Stage-Specific Residues that Cause AIP, Annual Meeting of Indian Biophysical Society, IISER Pune, March 9-11, 2018.

155