<<

Drug Research Program

Division of Pharmaceutical Chemistry and Technology

Faculty of Pharmacy

University of Helsinki

Finland

Julius Sipilä

Sulfonation and in silico - modelling studies on SULT1A3 and COMT

Academic Dissertation

To be presented for public discussion, with the permission of the Faculty of Pharmacy of the University of Helsinki, in Auditorium Porthania PII, on May 25th, 2020, at 5 PM

Helsinki 2020 Supervisors Professor Jyrki Taskinen D.Sc. (Tech.) 2003-2006 Division of Pharmaceutical Chemistry Faculty of Pharmacy University of Helsinki

Professor Jari Yli-Kauhaluoma PhD Division of Pharmaceutical Chemistry and Technology Faculty of Pharmacy University of Helsinki

Reviewers Docent Maija Lahtela-Kakkonen School of Pharmacy University of Eastern Finland UEF Finland

Professor Olli Pentikäinen Institute of Biomedicine University of Turku UTU Finland

Opponent Professor Gerhard Wolber PhD Department of Biology, Chemistry and Pharmacy Institute of Pharmacy Freie Universität Berlin Germany

ISBN 978-951-51-6064-5 (paperback) ISBN 978-951-51-6065-2 (PDF)

Helsinki 2020 Abstract

Phenolic compounds are ubiquitously encountered in all living organisms. Due to the versatile chemical properties of phenols, they can form various types of specific interactions with biological macromolecules and participate in different types of reactions. In humans, endogenous phenols act as important , hormones, and building blocks of . A multitude of potentially bioactive phenols is ingested from different sources every day. To modulate the activities of endogenous and xenobiotic phenols, several families of Phase II metabolic enzymes have evolved, which can eliminate phenolic compounds through conjugation. In humans, the most important Phase II enzymes for phenol are UDP-glucuronosyltransferases (UGTs), cytosolic sulfotransferases (SULTs) and catechol O-methyltransferase (COMT). These enzymes increase the solubility of phenolic substrates, making them less active and easier to excrete. Because many clinically applied drugs also possess phenolic functionalities, UGTs, SULTs, and COMT are potentially important for the , exposure, and efficacy of therapeutics.

In this thesis, the substrate specificity of human SULT1A3 and COMT were studied computationally, using comparative molecular field analysis (CoMFA), which is a widely used 3-dimensional (3D) quantitative structure-activity relationship (QSAR) method, based on the molecular interaction fields of known substrate molecules and their activities. The CoMFA fields describe the shape, size, and electrostatic properties of the substrates, which are the most important determinants of molecular recognition by enzymes. In our CoMFA models, variations in the substrate fields were statistically correlated with changes in enzyme kinetic parameters such as the Michaelis-Menten constant (Km) or maximum velocity (Vmax), allowing the structural elements that are most important for activity to be extracted. The derived CoMFA models can be used to assess the probability that a new and unknown phenolic drug candidate may be a substrate of SULT1A3 or COMT.

In the SULT1A3 models, we found a clear preference for structural elements typically found in , in agreement with the results of site-directed mutagenesis studies, X-ray crystal structures and molecular dynamics (MD) simulations that have been published after our studies. When describing the electrostatic effects in the CoMFA models for COMT, semi-empirical atomic partial charges were preferred over empirically parametrized charges. In addition, the acid dissociation constant (pKa)value of the reacting catecholic hydroxyl was a critical factor that affected the affinities of the ligands.

Replacing the electrostatic field in the CoMFA model with the predicted pKa value facilitated the development of less acidic COMT ligands that are capable of central nervous system (CNS) permeation. To improve the prediction of pKa values for certain privileged COMT ligand scaffolds, we developed a modified Hammett equation, by synthesizing a series of p-vinylphenols and measuring their pKa values.

As increasing amounts of X-ray structural information, covering whole families, have become publicly available, we have attempted to extract relevant knowledge from the binding sites of several related proteins, to inspire ligand design. We created an automated data processing workflow, designed to process, combine, and analyze the electrostatic and knowledge-based contact preference fields of aligned binding sites. This analysis was performed and validated using the nuclear receptor (NR) family of proteins and was later tested for the prediction of SULT isoenzyme substrate affinities. The field- based protein binding pocket analysis has proven to be a useful tool, however, the intrinsic flexibility that has been observed among the cytosolic SULT family can be challenging to model without considering the dynamics of the system.

In summary, the computational models that were developed in this study could be used in combination with other in silico approaches, especially MD simulations, to provide a better picture of the probable enzymes that may be relevant for the metabolism of a new phenolic drug or active metabolite. These models could be also used to design compounds with an improved affinity towards the studied enzymes, which may be clinically interesting due to the important roles played by SULT1A3 and COMT in the -mediated neurotransmission pathways. Acknowledgements

The research presented in this thesis was performed at the current Division of Pharmaceutical Chemistry and Technology, Faculty of Pharmacy, University of Helsinki and at the Research and Development unit of Orion Pharma, Espoo.

Computers and chemistry, especially medicinal chemistry, have always fascinated me. My deepest gratitude belongs to the late Professor Jyrki Taskinen, who agreed to supervise my Master’s thesis and, later, my doctoral studies, which combined these two topics. These studies have represented a turning point in my professional life, after which I knew what I really wanted to do. I am still following that path, with passion. Professor Taskinen told me that the best thoughts usually do not appear when sitting in front of the computer in your office; on the contrary, they appear outdoors while doing something enjoyable. Even today, when I am puzzled by a difficult problem, I remember and follow this great advice.

I cannot express enough nice words to describe my respect for the other supervisor of my thesis, Professor Jari Yli-Kauhaluoma. Without your patience, responsiveness, and support, this study would have been very difficult for me to finalize. Sincerely, I want to thank Docent Maija Lahtela-Kakkonen and Professor Olli Pentikäinen for reviewing this thesis and improving it, through their constructive comments and suggestions. I want to thank all of my former and current colleagues and co-authors at the Faculty of Pharmacy and Orion R&D, especially Docent Martti Ovaska, for all their support and guidance. During the years, I have really enjoyed the discussions with my colleagues Mika Kurkela, Olli Aitio, Gerd Wohlfahrt, Lars-Olof Pietilä, Peteris Prusis, Heikki Käsnänen, and Tuomo Kalliokoski. I would like to thank my supervisors at Orion R&D, Leena Otsomaa, for giving me a great example to follow, and Christer Nordstedt, for all the inspiring scientific discussions and, of course, for making it crystal clear that getting a PhD is a very wise thing to do.

I want to thank the Finnish Cultural Foundation, the Sigrid Juselius Foundation, and the Graduate School in Pharmaceutical Research, for funding. I want to express my greatest gratitude to my parents, for all their support during the years of my studies.

My kids Astrid, Alvar and Viktor, I truly thank you all for the patience, and for constantly reminding me about the most important stuff in life.

Finally, I want to thank you, Anna, for helping me with this project and for everything else.

Julius Sipilä February 2020 Espoo, Finland Contents

Abstract

Acknowledgements

Contents

List of publications ...... 11

Abbreviations...... 13

1 Introduction...... 15

2 Review of the literature ...... 19 2.1 Relevance of phenolic compounds in biology ...... 19 2.2 Phase II enzymes in the metabolism of phenolic compounds...... 19 2.3 Glucuronidation by UGTs...... 22 2.4 Methylation by COMT...... 23 2.4.1 Clinical relevance of COMT...... 23 2.4.2 The human COMT , protein expression and localization....25 2.4.3 Structure and function of COMT...... 26 2.5 Sulfonation by SULTs...... 28 2.5.1 SULT isoforms...... 29 2.5.2 Clinical relevance of SULTs...... 29 2.5.3 Structure and function of SULT1A3...... 31 2.6 Computational methods for Phase II metabolism ...... 33 2.6.1 General considerations...... 33 2.6.2 Structure-based methods ...... 34 2.6.3 Ligand-based methods ...... 35 2.6.4 Comparative molecular field analysis (CoMFA)...... 36 2.6.5 Protein binding site analysis with knowledge-based fields ...... 36

3 Aims of the Study...... 39

4 Methods...... 41 4.1 Experimental enzyme kinetic parameters of SULT1A3 (I) ...... 41 4.2 Experimental enzyme kinetic parameters of S-COMT (II)...... 41 4.3 Computational Methods (I-IV) ...... 41 4.3.1 Hardware and Software...... 41 4.3.2 CoMFA models (I-II)...... 42 4.3.2.1 Description of substrates and their conformations....42 4.3.2.2 Alignment of substrates...... 43 4.3.2.3 CoMFA parameters...... 44 4.3.2.4 Partial least squares (PLS) model validation ...... 45 4.3.2.5 Combination of CoMFA and pKa values for S- COMT substrates ...... 46

4.3.3 pKa of substrates (II-III) ...... 46 4.3.3.1 Training and test set ...... 46 4.3.3.2 Modification of the Hammett equation...... 47 4.3.3.3 Validation of the modified Hammett equation...... 47 4.3.4 Ligand Binding Pocket Fields (IV) ...... 47 4.3.4.1 Nuclear Receptor (NR) Structures (IV)...... 47 4.3.4.2 Knowledge-Based Fields ...... 48 4.3.4.3 Electrostatic Fields...... 49 4.3.4.4 Clustering and Postprocessing of Fields ...... 49 4.3.4.5 Cytosolic SULT Structures and Fields...... 50

5 Results and discussion ...... 51 5.1 SULT1A3 QSAR study (I)...... 51

5.2 S-COMT QSAR and pKa study (II-III)...... 52 5.2.1 Validation and performance of S-COMT CoMFA models ...... 52 5.2.2 Visual analysis and interpretation of S-COMT CoMFA models ...... 53

5.2.3 Combining pKa prediction with the S-COMT CoMFA models..54

5.2.4 Justification for revisiting the pKa predictions of certain COMT ligands ...... 55

5.2.5 Validation and performance of the refined p-vinylphenol pKa model (III) ...... 56 5.3 Methodological considerations for SULT1A3 and COMT CoMFA models ...... 56 5.4 Field-based comparisons of ligand binding sites (IV)...... 58 5.4.1 Quantitative structural knowledge from the protein molecular fields...... 58 5.4.2 Comparison of NR ligand binding sites, using molecular interaction fields...... 59 5.4.3 Comparison of SULT binding site fields ...... 60 5.5 Future directions...... 62

6 Conclusions...... 65

7 References...... 66

Publications

11

List of publications

This thesis is based on the following publications, which are referred to in the text by their Roman numerals.

The rights have been granted by publishers to include these papers in this dissertation.

I. Sipilä, J.; Hood, A. M.; Coughtrie, M. W. H.; Taskinen, J. CoMFA Modeling of Enzyme Kinetics: Km Values for of Diverse Phenolic Substrates by Human Catecholamine Sulfotransferase SULT1A3. Journal of Chemical Information and Computer Sciences 2003, 43 (5), 1563–1569. II. Sipilä, J.; Taskinen, J. CoMFA Modeling of Human Catechol O- Methyltransferase Enzyme Kinetics. Journal of Chemical Information and Computer Sciences 2004, 44 (1), 97–104. III. Sipilä, J.; Nurmi, H.; Kaukonen, A. M.; Hirvonen, J.; Taskinen, J.; Yli- Kauhaluoma, J. A Modification of the Hammett Equation for Predicting Ionisation Constants of p-Vinyl Phenols. European Journal of Pharmaceutical Sciences 2005, 25 (4–5), 417–425. IV. Wohlfahrt, G.; Sipilä, J.; Pietilä, L.-O. Field-Based Comparison of Ligand and Coactivator Binding Sites of Nuclear Receptors. Biopolymers 2009, 91 (10), 884– 894.

Author's contribution

Julius Sipilä was the principal author and investigator for papers I and II. In paper III, Julius Sipilä performed the data processing and analysis, Harri Nurmi synthesized the compounds and measured pKa values, and the manuscript was written with equal contributions. In paper IV, Dr. Gerd Wohlfahrt was the corresponding author and analyzed the protein field visualizations, Julius Sipilä performed the molecular modelling of the proteins and all field calculations.

13

Abbreviations

3D Three-dimensional AADC Aromatic amino acid decarboxylase ADH Aldehyde dehydrogenase AdoMet S-adenosylmethionine ALDH Alcohol dehydrogenase AM1 Austin Model 1 AR Aldehyde reductase Ar Aromatic CNS Central nervous system CoMFA Comparative molecular field analysis COMT Catechol O-methyltransferase CYP Cytochrome P450 DOPAC 3,4-dihydroxyphenolacetic acid ECCS Extended clearance classification system ER Estrogen receptor FDA U.S. Food and Drug Administration GRID Software originally created by Professor Peter Goodford HINT Hydrophobic interaction field HNF4 Hepatocyte nuclear factor 4 HVA

IC50 Half maximal inhibitory concentration

Km Michaelis-Menten constant LRH-1 Liver receptor homolog-1 LXR Liver X receptor MAO MB-COMT Membrane-bound COMT MD Molecular dynamics mRNA Messenger RNA NAPQI N-acetyl-p-benzoquinone imine NR Nuclear receptor 14 Abbreviations

PAP 3'-Phosphoadenosine-5'-phosphate PAPS 3'-Phosphoadenosine-5'-phosphosulfate PDB pKa Acid dissociation constant PLS Partial least squares, projection to latent structures PST Phenol sulfotransferase (SULT1A1/A3) q2 Goodness of prediction QM Quantum mechanical QSAR Quantitative structure-activity relationship R&D Research and development r2 Goodness of fit RAR Retinoic acid receptor RMSD Root mean square deviation RNA Ribonucleic acid RXR Retinoid X receptor SAH S-adenosylhomocysteine SAM S-adenosylmethionine SAR Structure-activity relationship S-COMT Soluble COMT SE Standard error SULT Sulfotransferase SVL Scientific vector language TH hydroxylase TR Thyroid hormone receptor UDP Uridine diphosphate UDPGA Uridine diphosphate glucuronic acid UGT UDP-glucuronosyltransferase

Vmax Maximum velocity (of the enzymatic reaction) 15

1 Introduction

Phenolic compounds, or phenols, form a diverse class of compounds that are chemically defined by possessing a phenolic functional group, which is a hydroxyl moiety (‒OH) connected to an aromatic ring system as a substructure. Phenols exist ubiquitously in all living organisms, from bacteria to plants and animals, and humans consume large quantities of dietary phenolic compounds, with a variety of known and unknown properties and biological activities. The petrochemical, plastic, and pharmaceutical industries produce significant amounts of phenols during their manufacturing processes or as active ingredients and additives that are used in the final products. These synthetics create additional sources of phenolic compounds that humans are exposed to every day.

The prevalence of phenolic compounds in nature and synthetic bioactive chemicals can be understood as a consequence of the unique physicochemical properties of the phenol substructure. The chemical structure of phenol and examples of diverse bioactive phenols are shown in Figure 1. Both the phenolic hydroxyl group and the attached aromatic rings can form diverse non-covalent interactions within proteins and other biomolecules. The hydroxyl groups of neutral phenols can act as either H-bond donors or acceptors. These phenolic hydrogen bonds are very commonly observed in the X-ray crystal structures of proteins and protein-ligand complexes (Panigrahi and Desiraju, 2007). Aromatic systems are commonly involved in different π stacking and π-cation interactions (Ferreira de

Freitas and Schapira, 2017). The relatively low acid dissociation constant (pKa) of unsubstituted phenol (approximately 10) permits ionic interactions between the phenolate ion and positively charged groups, such as basic amino acid side chains. The electrostatic properties of the delocalized π electron system and the directly connected hydroxyl group can be modified by subtle differences in the substituents of the ring or the fused ring systems, as demonstrated by the huge variety of phenolic secondary metabolites found in plants. These natural compounds vary substantially in their physicochemical properties, colors, tastes and binding to biomolecular target proteins. When these compounds react with reactive oxygen and nitrogen species, the special interplay between the hydroxyl group and the benzene ring results in the possibility of radical delocalization and stabilization leading to antioxidant characteristics. The aromatic rings in phenolic compounds are often components of diverse, conjugated double bond systems, which lead 16 1 Introduction

to the further delocalization of electrons and cause shifts in the absorbance spectra of these compounds within the range of visible light, making them usable as colorful dyes. Because of the weakly acidic nature of the phenol hydroxyl, this shift in absorbance and the resulting shift in color are pH-dependent, making certain phenols useful as pH indicators.

a) b)

c)

Figure 1. Chemical structures and diversity of phenolic compounds. a) Electrostatics and the shape of neutral phenol. b) The structure of neutral phenol and phenolate ion resonance structures. c) Example structures of diverse, bioactive, small phenolic molecules.

In humans, the modification of phenolic compounds through metabolic reactions and to the bile or is the most important mechanism for controlling the effects of phenol xenobiotics, which refer to natural substances or drugs that enter the body. 17

Traditionally, has been divided into Phase I and Phase II reactions. During Phase I, metabolism is catalyzed by cytochrome P450 (CYP) enzymes, and functional groups on the substrates are added, unmasked, or destabilized, making them more prone to further metabolism or excretion. Phase II metabolic enzymes, also referred to as conjugation enzymes or transferases, conjugate their substrates by adding hydrophilic functionality, changing the physicochemical properties of the parent compound to make them more hydrophilic and, thus, easier to excrete. Conjugations also change the biological properties of metabolized compounds, usually rendering them less bioactive. The most common and important Phase II reaction types include glucuronidation, sulfonation, glutathione conjugation, acetylation, methylation, and amino acid conjugation (Jančová and Šiller, 2012). Glucuronidation, sulfonation, and methylation are directed at altering phenolic functionality. In humans, several families of Phase II metabolism enzymes catalyze these reactions: the cytosolic sulfotransferases (SULTs), the UDP-glucuronosyltransferases (UGTs), and catechol O-methyltransferase (COMT). Enzymes in these families have differential and often overlapping substrate specificity and tissue expression profiles. Most clinical, in vivo, in vitro, and in silico research performed in the field of drug metabolism has focused on Phase I CYP enzymes, due to their major contributions to the elimination of known drug molecules (Williams et al., 2004). Phase II metabolism has attracted less interest; however, the enzymes involved in Phase II metabolism are beginning to be recognized as important players in the metabolism of many pharmacologically relevant compounds. UGTs, SULTs, and COMT have all been associated with clinically relevant polymorphisms and present high inter- individual and inter-species variation (Bairam et al., 2018b; Egan et al., 2001; Miners et al., 2002). The detailed understanding of this branch of metabolism is of high interest, especially in the context of personalized medicine.

In this thesis, the Phase II metabolizing enzymes soluble COMT (S-COMT) and SULT1A3 and their substrates have been studied computationally, using three- dimensional quantitative structure-activity relationship (3D QSAR) models. These models correlate the experimentally measured binding affinities and turnover of the substrates with the molecular steric and electrostatic fields that define the enzyme- substrate interactions. Building 3D QSAR models requires a consistent compound dataset, containing measured data obtained through standardized assay conditions, and 18 1Introduction

methods for the description of substrate molecular structures that are sufficiently accurate to capture the determinants of the interactions between substrates and enzymes. For

COMT, pKa values are key drivers of high-affinity ligands, and the separation of acidity from other parameters that explain activity was studied in detail in (II). Adding pKa to the independent variables in the models required an additional study, to improve the pKa predictions of p-vinylphenols, which are important COMT-binding chemical scaffolds. COMT is also a well-known drug target in clinical use, and the models developed in this thesis pave the way for the development of a new generation of COMT inhibitors that are capable of penetrating the brain, which have been under development at Orion Pharma Research and Development (R&D) and elsewhere.

Extracting binding hot spots from the binding site structural information for several related proteins is the topic of the last section of this thesis. The field-based methods that were developed for nuclear receptors (NRs) in (IV) were later tested for the identification of different SULT isoenzyme substrates.

The next chapter presents a short review of the available literature regarding the Phase II metabolism of phenolic compounds in humans, with a focus on the enzymes that were studied and modelled in this dissertation: the cytosolic sulfotransferase SULT1A3 and COMT. These two enzymes share one important characteristic: they are specific to catecholamine substrates, rendering them highly relevant as modulators of endogenous catecholaminergic signaling pathways. The basis for the computational approaches used to model substrate specificity and the structure-activity relationships (SARs) of these enzymes is also presented. The other known cytosolic SULTs and UGTs are also briefly discussed, due to their structural and functional similarities with SULT1A3 and COMT, their overlapping substrate specificity, and/or their clinical relevance for the conjugation of phenolic drugs. 19

2 Review of the literature

2.1 Relevance of phenolic compounds in biology

Phenols form a diverse class of compounds that are chemically defined by the presence of an aromatic carbon atom connected to a hydroxyl (-OH) moiety (Figure 1). In biology, phenols exist ubiquitously; produced by all living organisms, from bacteria to plants and animals, they execute diverse essential functions. Phenols are important building blocks in proteins and macromolecular structural scaffolds, act as signal transmitters in biological communication pathways, exert antioxidant effects that protect organisms from oxidative stress, and many plants have evolved phenols as chemical defense mechanisms against harmful microorganisms and pests (Boudet, 2007; Pereira et al., 2009). Phenols are produced ubiquitously in the plant kingdom, and humans are exposed to a huge variety of dietary phenolics every day (Delgado et al., 2019). During evolution, protective mechanisms have been developed to control the bioactivities of this class of xenobiotics, which can bind to various target proteins in the human body. UGTs, SULTs, and COMT enzymes are essential components of this defense system, which also represents a major barrier to the of therapeutic compounds that possess phenolic functionality (Hu, 2007).

In humans, in addition to tyrosine, which is a naturally occurring amino acid that comprises nearly 10 percent of all human proteins, on average (Smith, 1966), the neurotransmitters and are important phenolic compounds, which are synthesized via the tyrosine and metabolic pathways, respectively. Other classes of endogenous phenols include hydroxylated steroid hormones, such as estradiol, which is synthesized from cholesterol, alpha-tocopherol (vitamin E), and pyridoxal phosphate (vitamin B6).

2.2 Phase II enzymes in the metabolism of phenolic compounds

To cope with the chemo-biological pressure associated with dietary and environmental xenobiotic phenols and to regulate the activities of endogenous phenolic signaling molecules, several Phase II metabolic enzymes have evolved. Three of these enzyme 20 2 Review of the literature

families target the phenolic hydroxyl group: the cytosolic SULTs, the UGTs, and the O- methyltransferases, including the most clinically relevant member COMT. UGTs and SULTs can also conjugate substrates with other nucleophilic functionalities, and N- glucuronidation and N-sulfonation reactions are commonly observed (Gamage et al., 2006; Kaivosaari et al., 2011). All of these enzymes transform the physicochemical and biological target affinities of their substrates by attaching different conjugating groups to the acceptor moieties as listed in Table 1.

Table 1. Typical substrates, co-substrates, and transferring groups associated with COMT, SULTs and UGTs.

Enzyme Substrate Co-substrate Transferred group

COMT Ar-OH Methyl (-CH3) (catechol)

3'-Phosphoadenosine-5'- phosphosulfate (PAPS)

- SULT Ar-OH Sulfonate (-SO3 )

Other hydroxyl or amine

S-adenosylmethionine (SAM)

UGT Ar-OH Glucuronic acid

Other hydroxyl, amine or thiol

Uridine diphosphate glucuronic acid (UDPGA) 21

In the drug discovery setting, Phase II metabolic enzymes play an important role due to the prevalence of phenolics among approved and investigational drug molecules. In the DrugBank database, this substructure is found in more than 270 approved or withdrawn drug entries (Wishart et al., 2018). In addition, the oxidative metabolism of aromatic rings by Phase I CYP enzymes produces phenolic metabolites, which are active, in many cases, and play important roles in the pharmacological efficacy and toxic effects associated with drug treatments. These metabolites can be further metabolized and inactivated by SULTs, UGTs and COMT (Kalgutkar et al., 2005).

The biotransformation of phenolic neurotransmitters and hormones by SULTs, UGTs and COMT has been shown to modulate the activities of these endogenous signaling molecules (Bock, 2015; Cook et al., 2017; Mueller et al., 2015; Scheggia et al., 2012). Phase II metabolites can also act as inactive storage reservoirs of signaling compounds, which can be re-activated by releasing the parent substrate, as exemplified by the catecholamines noradrenaline, , and, especially, dopamine sulfate (Eisenhofer et al., 1999; Wang et al., 1983).

During xenobiotic metabolism, conjugation by Phase II metabolic enzymes usually renders the xenobiotic substrates more hydrophilic, less permeable through biological membranes, and less active. Usually, the product conjugates are readily excreted, and the final result is the inactivation of potentially harmful bioactive dietary compounds. However, in some cases the conjugated metabolites can be active, resulting in bioactivation or toxicity, as has been demonstrated for the well-known cases of morphine glucuronide (Paul et al., 1989), acyl glucuronides of carboxylic acids (Langguth and Benet, 1992), glucuronides and sulfates of hydroxamic acids (Mulder and Meerman, 1983), and the numerous reactive metabolites formed after O-sulfonation (Glatt, 2000). However, these well-characterized bioactivation mechanisms utilize chemical groups other than phenol as acceptors of the conjugation reaction.

Inter-individual variations cause differences in the efficacy of drug treatments, which is a key problem during drug discovery, and for healthcare in general (Turner et al., 2015). Genetic differences in Phase II metabolic enzymes have been identified as a major contributor to this variance, especially for UGTs (Fowler et al., 2015; Hirvensalo et al., 2018; Takano and Sugiyama, 2017; Wang et al., 2012), leading to significant variations 22 2 Review of the literature

in the pharmacokinetics of the affected drugs during clinical trials and therapy. In the future, better-defined patient populations and optimized drug treatments, due to the increased understanding and better predictions of the metabolic pathways associated with a given drug, and the identification of individual enzyme isoforms will likely represent important pieces in the grand puzzle that is personalized medicine. In addition to inter- individual variations, inter-species differences in Phase II metabolism can be a serious issue during drug development, causing difficulties during the translation of the pharmacodynamic and pharmacokinetic properties of an investigational drug from the result of in vitro and in vivo studies to application in clinical studies. In silico models could be very helpful when combined with experimental models, to properly handle these risks. In the next chapters the UGTs, SULTs, and COMT enzymes will be discussed, along with their clinical relevance. UGTs and SULTs are the most important drug- conjugating enzymes, responsible for roughly 40% and 20% of all drug metabolism in Phase II, respectively (Dixit et al., 2017). In addition to xenobiotic metabolism, COMT and SULTs play important roles in endogenous signaling pathways. The focus of this literature review is on SULT1A3 and COMT enzymes, which were studied in detail in this dissertation. In addition, the computational methods that have been used to predict the metabolism catalyzed by these major Phase II enzymes are described.

2.3 Glucuronidation by UGTs

Glucuronidation by UGT enzymes represents the most important Phase II metabolism route for xenobiotic phenols and drugs (Rowland et al., 2013; Williams et al., 2004). The human UGT superfamily consists of 19 functional proteins that can potentially glucuronidate phenolic drugs. Nine of these belong to the UGT1A subfamily, three to the UGT2A subfamily, and seven to the UGT2B subfamily (Mackenzie et al., 2005). The liver is the most important organ responsible for glucuronidation, but UGTs are expressed ubiquitously in other human tissues, including the intestine, lung, kidney, brain, and skin (Court et al., 2012; Ohno and Nakajin, 2009). Even though the UGT superfamily is present in most tissues, remarkable differences in the tissue distribution profiles of individual UGT isoforms can be observed (Court et al., 2012). 23

The UGT enzymes were not examined in this thesis, and the interested reader is referred to a good summary and discussion of the available structural information, substrate specificity, molecular mechanisms, and protein-protein interactions of UGTs (Fujiwara et al., 2016).

2.4 Methylation by COMT

The COMT enzyme was discovered and its importance for catecholamine catabolism was described by the pioneer Julius Axelrod, in 1958 (Axelrod and Tomchick, 1958). The catecholamines dopamine, adrenaline, and noradrenaline are important transmitters and hormones in the and adrenergic signaling pathways of the brain and peripheral tissues. COMT methylates the catecholic hydroxyl of these messenger molecules, disrupting the binding with target receptors and turning the signal off. The COMT enzyme plays an important role in the metabolic barrier, protecting the body from xenobiotic, plant-derived, and potentially harmful dietary polyphenols (Chen et al., 2014).

2.4.1 Clinical relevance of COMT

COMT is an important enzyme in clinical pharmacotherapy. The DrugBank (Wishart et al., 2018) contains almost 30 different approved catechol drugs, which primarily target dopamine receptors or (nor)adrenergic pathways. All of these drugs are potential substrates of COMT, as exemplified by the extensive metabolism of the dopamine precursor L-dopa by COMT in peripheral tissues. This metabolic route is so important that the COMT inhibitors , , and opicapone are used as adjunctive therapy for Parkinson’s disease, together with dopa decarboxylase inhibitors, to improve the bioavailability and plasma exposure of L-dopa (Fabbri et al., 2018; Männistö and Kaakkola, 1999). COMT is also considered to be a potential CNS drug target, due to its role in dopamine metabolism in the prefrontal cortex (Käenmäki et al., 2010), where COMT and monoamine oxidase (MAO) enzymes metabolize dopamine to 3,4- dihydroxyphenolacetic acid (DOPAC), 3-methoxytyramine, and homovanillic acid (HVA), as shown in Figure 2. Genetic evidence and proof-of-concept animal and clinical studies have suggested that COMT, especially the fast metabolizing Val158Met 24 2 Review of the literature

genotype, is associated with cognition and memory function, which may be improved by COMT inhibitors (Apud et al., 2007; Giakoumaki et al., 2008; Scheggia et al., 2012; Takizawa et al., 2009; Tunbridge et al., 2004). Although the effects vary across studies, this COMT isomorphism has been associated with cognitive function, especially in schizophrenia and Parkinson’s disease (Egan et al., 2001; Schacht, 2016; Tang et al., 2019), and has been associated with cognition-improving efficacy as observed by the administration of tolcapone to healthy subjects (Apud et al., 2007; Giakoumaki et al., 2008).

Figure 2. Dopamine metabolism pathways, demonstrating the contributions of Phase II metabolizing enzymes COMT, UGT, and SULT1A1/A3 (PST). Reprinted with permission from (Meiser et al., 2013). 25

2.4.2 The human COMT gene, protein expression and localization

COMT, which is associated with enzyme classification number EC 2.1.1.6 (Bairoch, 2000), exists in humans as two different isoforms, soluble (S-COMT) and membrane- bound (MB-COMT) (Lundström et al., 1995). Both COMT isoforms are coded by a single gene, which can be transcribed into two mRNA transcripts of different lengths, with the shorter transcript coding only S-COMT and the longer transcript coding both S-COMT and MB-COMT (Männistö and Kaakkola, 1999; Tenhunen et al., 1994). Because the same gene codes them, the COMT isoforms have identical amino acid sequences, except that MB-COMT has an additional 50-residue long sequence attached to the N-terminus. This N-terminal chain forms a 26-amino-acid-long transmembrane helix and a 24-amino- acid-long flexible linker that connects the transmembrane domain to the catalytic domain (Orłowski et al., 2011). Both COMT isoenzymes are widely expressed in almost all human tissues, especially the liver, gut, and kidney (Männistö and Kaakkola, 1999). The balance between MB- and S-COMT expression levels varies in different tissues. S- COMT is considered to be more important for xenobiotic metabolism, which is reflected by its high level of expression in the gut and liver, whereas MB-COMT is more important for the metabolism of endogenous catecholamine neurotransmitters in the human CNS, where an estimated 70% of total COMT protein presents as the MB-COMT isoform (Männistö and Kaakkola, 1999; Tenhunen et al., 1994). Catecholamines, such as dopamine, have also been associated with higher affinities for MB-COMT than for S- COMT, further supporting the importance of MB-COMT in the CNS (Lotta et al., 1995). In cells, S-COMT is located in the cytosolic fraction; however, controversial results have been reported regarding whether the catalytic domain of MB-COMT faces the extracellular or cytoplasmic side of the cell membrane (Chen et al., 2011; Myöhänen et al., 2010). In the previous section, a clinically relevant COMT polymorphism was discussed, in which the amino acid Val108 is substituted with a Met residue (or Val158 to Met, for MB-COMT). This change, which is located outside of the active site, does not significantly affect the kinetic enzymatic parameters (Lotta et al., 1995). The observed reduction in mutant COMT activity, in vivo, has been associated with the thermal instability of the Met form, especially when the concentrations of the co-substrate S- adenosylmethionine (SAM) are low (Lotta et al., 1995). 26 2 Review of the literature

2.4.3 Structure and function of COMT

As previously discussed, the COMT enzyme catalyzes the methyl transfer reaction from the ubiquitous donor SAM (also referred to as AdoMet) to the catecholic acceptor substrate. The reaction is Mg2+-dependent and produces a methylated substrate and a de- methylated co-substrate, S-adenosylhomocysteine (SAH). Detailed structural studies of the various forms and states of the enzyme, complexed with different types of ligands, have been published since the first X-ray crystal structure of COMT became available, in 1994. In the first published crystal structure, rat S-COMT was crystallized in complex with SAM, Mg2+, and the inhibitor 3,5-dinitrocatechol (Vidgren et al., 1994). This complex was thought to mimic the Michaelis complex, in which the reacting chemical groups come together in a perfect preorganization, as shown in Figure 3. A very similar active complex has been observed when other nitrocatechol inhibitors are co-crystallized with human or rat COMT. However, considerable structural flexibility and different conformational states have been observed in COMT crystal structures, with different types of inhibitors (Buchler et al., 2018; Harrison et al., 2015; Lerner et al., 2016, 2001; Tsuji et al., 2009) and in apo, semi-holo, holo, or transition state analog states (Czarnota et al., 2019; Ehler et al., 2014). 27

Figure 3. The active site of COMT in complex with the inhibitor 3,5-dinitrocatechol chelated to Mg2+, with its 2-hydroxy group pointing to the activated methyl group of the cofactor SAM, mimicking the Michaelis complex of the substrate. Coordination bonds with Mg2+ and the distance between the methyl carbon atom of SAM and the 2-hydroxy oxygen atom of the inhibitor are visualized with dotted lines. The molecular surface of the active site is colored according to amino acid properties (RCSB PDB code 5LSA).

The mechanism and structural basis of the methylation reaction and COMT inhibition have been studied extensively in several studies, using enzyme kinetics and computational methods (Bunker et al., 2008; Czarnota et al., 2019; Kuhn and Kollman, 2000; Lautala et al., 2001; Lotta et al., 1995, 1992; Magarkar et al., 2018; Ovaska and Yliniemela, 1998; Palma et al., 2012, 2003; Taskinen et al., 1989; Tervo et al., 2003; 28 2 Review of the literature

Zheng and Bruice, 1997). Despite the wealth of information available, the considerable structural flexibility and the quaternary catalytic complex, involving metal interactions, remains challenging to model and to understand exactly how different ligands bind and interact with COMT.

2.5 Sulfonation by SULTs

The sulfonation of phenols by SULTs was first discovered by Baumann, in 1876, when he isolated sulfate conjugates of phenols from urine (Baumann, 1876). Baumann reported that the sulfate conjugates were much more abundant in horse urine than in dog and human urine, which demonstrates that the evolutionary pressure for the sulfate conjugation of dietary phenolics has varied among species. Two different classes of human SULTs have been characterized: cytosolic and membrane-bound SULTs. In cells, membrane-bound SULTs are localized in the Golgi apparatus, where they sulfonate peptides, oligosaccharides, and lipids, producing biochemicals that are essential for life (Negishi et al., 2001). The cytosolic SULT enzymes discussed in this thesis can be classified into families, sub-families, and individual isoforms, based on sequence similarities and overall structures. The families, sub-families, and individual members share at least 45%, 60%, and 97% amino acid sequence identity (Blanchard et al., 2004). The human cytosolic SULTs can be classified into four families: SULT1, SULT2, SULT4, and SULT6 (Allali-Hassani et al., 2007). The cytosolic SULTs metabolize endogenous and exogenous small molecules, such as neurotransmitters, hormones, drugs, and toxins (Coughtrie, 2002; Gamage et al., 2006; Negishi et al., 2001). Due to the chemical nature of the sulfate group, sulfonation by SULTs renders the parent molecules more water-soluble and more easily excreted. When the substrate is a signaling molecule or a potentially toxic xenobiotic molecule, the sulfonated product is inactive, or at least less active, which is why SULTs are considered to be components of the chemical defense and protection system (Allali-Hassani et al., 2007). In some rare cases, SULT-catalyzed reactions can also produce harmful and mutagenic products (Glatt, 2000). 29

2.5.1 SULT isoforms

The biological functions, tissue distributions, and substrate selectivities of cytosolic SULTs differ among the families, subfamilies, and individual enzymes; however, considerable overlap between enzymes exists. SULT1 is the largest human SULT family, with two subfamilies and nine individual isoenzymes: 1A1, 1A2, 1A3/4 (SULT1A3 and 1A4 produce the same protein), 1C1, 1C2, 1C3, 1B1, and 1E1. The SULT2 family contains three enzymes, 2A1, 2B1a, and 2B1b. SULT4A1 is the sole member of its family, which is expressed in the brain and other tissues, but no substrates or functions have been identified (Sidharthan et al., 2014). The SULT6 family contains only one gene (6B1), and the expression of SULT6B1 has been detected at the transcript level in the human testis, but no corresponding protein has been characterized (Freimuth et al., 2004). The SULT1 family is primarily responsible for sulfonating simple phenols (especially SULT1A1), estradiol (SULT1E1), catecholamines (SULT1A3), thyroid hormones, and xenobiotics (Allali-Hassani et al., 2007). The SULT2 family is primarily involved in the metabolism of cholesterol and other steroids, such as dehydroepiandrosterone (Coughtrie, 2002). Considerable variation in the tissue expression profiles of the major SULT enzymes has been reported; for example, SULT1A1 is the predominant form expressed in the liver and kidney, whereas SULT1A3 and SULT1B1 are more important in the gut (Riches et al., 2009). Interestingly, SULT1A1 and SULT1A3 are expressed in the brain, where they have different localization patterns and likely have different functions in the modulation of neurotransmitters (Salman et al., 2009). Considerable polymorphism has been identified in the human SULT family, with demonstrated effects on sulfonation activity among the individual isoforms for model substrates and drugs metabolized by sulfonation (Bairam et al., 2018a, 2018b; Hildebrandt et al., 2007). The clinical importance of this genetic variability is not yet completely understood; however, these differences should be considered in any personalized pharmacotherapy setting when using drugs that may be metabolized by SULTs.

2.5.2 Clinical relevance of SULTs

Although the cytochrome P450 enzymes and UGTs are responsible for the major proportion of drug metabolism (Williams et al., 2004), a considerable number of clinically 30 2 Review of the literature

used drugs and their active metabolites contain suitable acceptor groups for sulfonation and are known or potential SULT substrates. The best-known examples of sulfonated compounds include the catecholamine signaling compounds adrenaline, noradrenaline, dopamine, and their derivatives, which are also used as drugs. Another example is the widely-used analgesic agent , which is primarily metabolized into glucuronide and sulfate conjugates (which represent approximately 50% and 40% of urinary metabolites, respectively) and oxidized, to a lesser extent, by CYP enzymes into the highly reactive and toxic metabolite N-acetyl-p-benzoquinone imine (NAPQI), which has been associated with paracetamol overdose toxicity and death (Mazaleuskaya et al., 2015; Yamamoto et al., 2015). The involvement of SULT1A3 and SULT1A1 isoenzymes in paracetamol metabolism has been known since early 1980’s (Reiter and Weinshilboum, 1982), and later, in a detailed analysis, SULT1A1, SULT1A3, and SULT1C4 were determined to be capable of metabolizing paracetamol (Yamamoto et al., 2015). Genetic variations among these SULTs have been associated with changes in metabolism, such as the effects of genetic variations in SULT1A3 on the metabolism of paracetamol and opioid drugs (Bairam et al., 2018b). In silico predictions for all 1455 small molecule drugs approved by the U.S. Food and Drug Administration were performed to determine the likelihood that these molecules would be substrates for SULT1A1 or SULT2A1 (Cook et al., 2013c), and 92 of these drugs (6%) were experimentally confirmed to be substrates of at least one of these two enzymes, which can be used as a lower boundary estimate for the number of drug molecules that may be SULT substrates.

SULT1A3, which was studied in this thesis, is a highly interesting SULT for many reasons beyond its function in xenobiotic metabolism. SULT1A3 is highly specific for catecholamines, and similar SULTs does not exist in non-primate species. Recently, an allosteric modulation mechanism for SULT1A3 was discovered, which was linked to catecholaminergic neurotransmission (Cook et al., 2017). This finding makes SULT1A3 a putative target for novel pharmacotherapies. 31

2.5.3 Structure and function of SULT1A3

The structural knowledge of human cytosolic SULTs is based on the good coverage of X-ray crystal structures, obtained for all major enzymes in this superfamily. The structural basis of SULT function was established when the mouse estrogen SULT crystal structure, bound to estradiol and the inactive cofactor 3'-phosphoadenosine-5'-phosphate (PAP), was published in 1997 (Kakuta et al., 1997), followed by the same enzyme co-crystallized with PAP and a vanadate ion, modelling the transition state of the sulfonation reaction (Kakuta et al., 1998). The first human SULT crystal structure was obtained from SULT1A3, co-crystallized with PAP, though in that structure the substrate-binding pocket was partially disordered (Dajani et al., 1999). When the structures of five missing human SULTs were published in 2007, the structural and functional view of the superfamily was quite complete, as previously reviewed (Allali-Hassani et al., 2007). At the same time that the crystal structures were published, the substrate selectivities of the isoenzymes were being studied systematically (Allali-Hassani et al., 2007).

All SULT crystal structures show similar overall structures, with a four-stranded parallel beta-sheet in the middle, alpha-helices, and three intrinsically flexible and many times disordered loops (Allali-Hassani et al., 2007). These loops stabilize gradually, first when the cofactor binds and then when the substrate or inhibitor binds. Depending on the structure of the ligand, different conformations of the flexible loops have been observed. This flexibility has also been thoroughly studied using MD simulations (Cook et al., 2013a; Martiny et al., 2013). The 3'-phosphoadenosine-5'-phosphosulfate (PAPS) cofactor binding pocket and the catalytically critical histidine and lysine residues are conserved in all structures. As previously discussed, SULT1A3 is unique among the SULTs due to its specificity towards catecholamines. SULT1A3 is inactive against acidic phenols, in contrast with SULT1A1, although the enzymes have very similar (> 93%) amino acid sequences (Allali-Hassani et al., 2007). Using site-directed mutagenesis studies, this specificity difference has been shown to be caused by a single amino acid, Glu146, in the SULT1A3 structure (Dajani et al., 1998). This observation has been confirmed by the crystal structure of SULT1A3 in complex with PAPS and dopamine, in which the positively charged dopamine amino group shares strong electrostatic interactions with the acidic side chains of Glu146 and Asp86, which stabilizes the flexible 32 2 Review of the literature

gate-keeper loop (Figure 4). Without this stabilization, the substrate-binding pocket changes to a drastically more open conformation, as demonstrated by MD simulations (Martiny et al., 2013). Overall, the structural basis of sulfonation by SULT1A3 is well understood; however, the intrinsic flexibility and induced fit effects observed in the active site are challenges for the structure-based modelling of novel substrates.

Figure 4. Human SULT1A3, in complex with the inactive cofactor PAP and dopamine (PDB access code 2A3R). A salt bridge, stabilizing the flexible gate-keeper loop, is formed between dopamine ethylamine, Glu146, and Asp86. The catalytic His108, which interacts with the acceptor phenolic oxygen atom of the substrate, is also highlighted.

33

2.6 Computational methods for Phase II metabolism

2.6.1 General considerations

The challenge of modelling Phase II metabolic enzymes at the molecular level and predicting the binding and interaction of any given small molecule with the involved enzyme targets is generally very similar to the computational modelling used for any ligand-protein binding, virtual screening, or SAR study. Depending on the availability of good quality data, methodological choices must be made, including what type of methods will return results with sufficient accuracy for the problem in question, and how accessible the different methods are, considering both time and computational costs. Regardless of the chosen methodology, certain components and prerequisites must exist or be considered to develop any useful models and to assess the model quality and applicability. The most important ingredients for a successful model are as follows.

a) Data. As with any predictive model building, a certain amount of high-quality experimental data is necessary. Preferably, the experimental data set used for training should be consistent, using the same methods and experimental conditions for all measurements. Because the available data sets for Phase II enzyme substrates are quite small, this represents the most critical bottleneck, which limits the usability of modern high-performing, but data-hungry, deep- learning methods. b) Structural knowledge of the involved enzymes and the binding modes of the substrates. The small data sets that are currently available for Phase II metabolism do not provide sufficient information to sample enough chemical space variations to determine differences in binding affinity and catalytic reaction rates, without detailed structural knowledge that can be used to constrain the model space. For example, binding conformations and the relative binding poses of putative substrates can be derived from docking and binding site analysis and then used to guide ligand-based model building. The quality of the published protein structural information must be analyzed thoroughly. Several problematic structures can be found in the Protein Data Bank (PDB), due to crystallization artifacts, crystal quality and data capture issues, and differences and errors in data processing and 34 2 Review of the literature

storage procedures (Domagalski et al., 2014). One very important aspect of structural knowledge is protein flexibility, which produces different conformational states that affect whether the metabolic enzymes are accessible for substrate binding. As discussed in previous sections, flexibility and the ability to accommodate diverse ligands is typical for enzymes involved in xenobiotic metabolism, including SULT1A3 and COMT, our enzymes of interest. In general, enzyme flexibility can be addressed using MD simulations, by generating conformational ensembles of enzyme structures for docking, or by studying the dynamics of bound enzyme-substrate complexes, as exemplified for SULTs in several studies (Cook et al., 2013c, 2015; Rakers et al., 2015). c) Appropriate descriptions of the system. An important determinant of susceptibility for a given Phase II metabolic reaction is the ability of the substrate to act as an acceptor for the conjugated moiety at the binding site of the enzyme. With sufficient structural knowledge of the system, the reaction that occurs within the system can be modelled using quantum mechanical (QM) calculations, combined with molecular mechanics, in some cases. For example, the catalytic reaction mechanism and electrostatic effects of COMT ligands have been studied thoroughly, using detailed simulations (Czarnota et al., 2019; Kuhn and Kollman, 2000; Lautala et al., 2001; Ovaska and Yliniemela, 1998; Trievel and Scheiner, 2018; Zheng and Bruice, 1997). The results from these types of theoretical studies can help choose adequate surrogate descriptors for ligands, which are easier to calculate but still capture the essential factors of interaction.

2.6.2 Structure-based methods

If detailed information regarding a studied protein structure is available, preferably in complex with ligands similar to the small molecules of interest, structure-based molecular modelling methods are generally considered first. A comprehensive overview of the myriad structure-based methods developed for drug design is outside of the scope of this thesis, but most common tools, including molecular docking for virtual screening and binding prediction, fragment docking and growing, MD simulations for protein flexibility, and relative ligand binding energies, have previously been well-reviewed (Śledź and Caflisch, 2018). A variety of methods have proven to be useful during the 35

analysis of ligand binding pocket properties and have been used to guide ligand design (Henrich et al., 2010). Similar methods are directly applicable to the analysis of metabolic enzyme active sites. MD simulations, docking, and other structure-based methods have also been used to study the structure, function, ligand recognition, and reaction mechanisms of COMT (Czarnota et al., 2019; Kuhn and Kollman, 2000; Magarkar et al., 2018; Palma et al., 2012, 2003; Tervo et al., 2003; Trievel and Scheiner, 2018) and SULT1A3 (Campagna-Slater and Schapira, 2009; Cook et al., 2017; Martiny et al., 2013).

2.6.3 Ligand-based methods

The structures of known ligands that bind to or modulate the target in question and the corresponding bioactivity data can be used as inputs for ligand-based molecular modelling workflows. In principle, these methods aim to identify correlations between ligand structures and activity. The inclusion of bioactivity and structure data for ligands that are known to be inactive is equally important. QSAR refers to the performance of this type of modelling in a quantitative manner. QSAR modelling is based on the correlations observed by Hansch, in 1962, between the biological plant growth regulatory activities of a series of phenoxyacetic acids and their logP values (Hansch et al., 1962). When the ligand description in the model is dependent on the 3D conformations of the ligands, we refer to the method as 3D QSAR. QSAR is a routine tool used during drug discovery to explain the behaviors of different biological systems. A variety of QSAR methods have been used to study the inhibition or substrate specificity of COMT (Ai et al., 2008; Jatana et al., 2013, p. 3; Lautala et al., 2001; Lotta et al., 1992; Patel et al., 2018; Taskinen et al., 1989; Tervo et al., 2003). QSAR modelling has also been applied to human SULT1A3 substrates (Dajani et al., 1999; Tan and Zhang, 2014). Several best practices and various methods have been developed for QSAR. Many established methods exist to generate compound structure descriptors, to correlate biological activities with the structural descriptors, using statistics or machine learning, and to validate the model quality and predictivity (Cherkasov et al., 2014). In the next section, aspects of good 3D QSAR models that are relevant for the models developed in this thesis are discussed, briefly. 36 2 Review of the literature

2.6.4 Comparative molecular field analysis (CoMFA)

Comparative molecular field analysis (CoMFA) is a popular method for 3D QSAR modelling, developed by Cramer, in 1988, to study how the molecular shape and electrostatic characteristics of steroids affect the binding to carrier proteins (Cramer et al., 1988). Based on the original method, a CoMFA study consists of the following steps:

1. Appropriate conformations of the ligand structures are calculated, and the structures are aligned. Usually, the aim is to mimic bioactive conformations and binding poses, if the activity to be predicted is activity on a certain protein target. 2. The steric and electrostatic fields are calculated. 3. Statistical analysis of the fields is performed, using partial least squares (PLS). 4. The model is validated for internal stability and external predictivity.

For all of these steps, best practices have been developed, which is especially important during the model validation step, as in any QSAR model (Golbraikh and Tropsha, 2002). One of the strengths of the CoMFA method is the ability to visualize the field coefficients as contour maps around superimposed structures, providing an interpretable map of the ligand regions, where variations in the fields correlate with changes in activity. Since its invention, CoMFA has become a standard tool for QSAR modelling within a congeneric series of molecules, and it is perhaps the most used 3D QSAR method. A search in Google Scholar for the term “CoMFA” returns over 20 000 scientific articles (“Google Scholar,” 2020). The most critical pitfalls associated with CoMFA have been reviewed many times (Kim et al., 1998; Roy et al., 2015), and improvements that have been made to the original methodology have also been discussed thoroughly (Cherkasov et al., 2014; Cramer, 2012, 2003).

2.6.5 Protein binding site analysis with knowledge-based fields

During the period after the year 2000, when the original studies discussed in this thesis were performed, the public availability of protein structural information increased substantially. The number of protein structures available in the PDB (www.rcsb.org) increased from approximately 12 000 to 65 000 between 2000 and 2010, and the growth has continued (Berman et al., 2000), as visualized in Figure 5. 37

Figure 5. Number of protein structures available in the PDB (“PDB statistics,” 2020)

However, even with this surge of novel protein structures, many important components of the human proteome lack adequate structural information. For example, no published structures of human UGTs are currently available, despite these being the most important Phase II metabolic enzymes. As discussed in section 2.6.1, the quality of the published structures must also be considered. As the wealth of publicly available proteins and protein-ligand complex structures has grown with accelerating speed, a plethora of computational methods have simultaneously been developed for protein binding site detection, characterization, similarity comparisons, and druggability analyses (Henrich et al., 2010). Most of the tools for binding site comparisons have been developed to exploit the differences in binding sites between closely related proteins, for specificity design, or to fish out putative off-targets for a given drug candidate from a large protein structure database. Similar to how molecular interaction fields have been successfully used for ligand-based applications, such as CoMFA, which was described in the previous section, field-based methods can be used for protein binding site analyses. Different types of interaction fields have been used, such as the fields calculated by GRID software, using different probes and interaction energies calculated by empirical force fields (Goodford, 1985), or the hydropathic interaction (HINT) fields, which are based on hydrophobic atom constants and the solvent-accessible surface areas of atoms (Kellogg et al., 1991). 38 2 Review of the literature

Other types of fields can be derived by solving the electrostatic potential, using the Poisson-Boltzmann equation, or through the statistical analysis of atomic binding preferences, based on crystal structure databases (Laskowski et al., 1996; Verdonk et al., 1999). These knowledge-based potentials can provide a fast method for calculating the contact preferences of atoms or molecule fragments at a given protein site and have been used successfully for docking and binding site analysis (Ermondi et al., 2004; Hoppe et al., 2006). 39

3 Aims of the Study

Understanding the roles and structure-activity relationships of Phase II conjugating enzymes is critical during drug discovery when analyzing phenolic candidate compounds or phenolic active metabolites, which are often produced by Phase I oxidative metabolism. The underestimation of specific isoenzyme contributions to metabolism can alter the pharmacokinetic properties of phenolic bioactive compounds, resulting in unwanted and costly surprises during the translation from tests in animal species to clinical trials. In addition, variations in the effects and safety of new investigational treatments among patient populations with different Phase II enzyme genotypes may become an issue. The general goal of this study was to provide the tools and understanding necessary to estimate the metabolism of potential COMT and SULT substrates, based on their chemical structures. Therefore, the following specific studies were planned:

● SAR modelling of SULT1A3-mediated sulfonation (I) ● SAR modelling of COMT-mediated methylation (II, III) ● The application of 3D structural information for the ligand-binding pockets of a protein family, for the modelling of ligand and substrate specificity (IV)

Another goal was clarified after the initial analysis of known COMT substrates and inhibitors, which revealed that much of the variation compound affinities could be explained by variations in the pKa values. Although our studies on COMT focused on substrate specificity, we were constantly discussing the unmet need of brain-permeable COMT inhibitors. Inspired by the performance of CoMFA models, we attempted to isolate the electrostatic (or pKa-driven) and steric effects that regulate COMT binding, with the goal of generating tools for the development of less acidic COMT inhibitors.

In addition to creating useful models for prospective work and predictions, we aimed to better understand the factors that affect methylation and sulfonation by interpreting the models and to determine how the 3D structural information encoded in the molecular interaction fields of small molecules and their protein counterparts can be used in these types of studies.

41

4 Methods

4.1 Experimental enzyme kinetic parameters of SULT1A3 (I)

A set of 95 phenolic substrates was studied for their kinetic parameters (Km and Vmax) against SULT1A3. Part of the data set was determined in our lab using the same methods described here and published previously (Dajani et al., 1999). Human recombinant SULT1A3 was expressed in Escherichia coli and purified, using a published procedure (Dajani et al., 1998). Sulfonation of substrates was quantified by following the amount of unreacted co-substrate PAP35S during the reaction (Foldes and Meek, 1973). All enzyme assays were performed in a final volume of 160 μl, containing 6 mM potassium phosphate (pH 6.8), 20 μl substrate (0.1-3000 μM), 20 μl PAPS (10 μM final concentration), containing 0.04 μCi PAP35S, and 20 μl enzyme protein (0.2 μg SULT1A3). Blank reactions contained 20 μl water, in place of the substrate. All enzyme reactions were incubated at 35°C and were terminated by the addition of 200 μl barium acetate (0.1 M), 200 μl barium hydroxide (0.1 M), and 200 μl zinc sulfate (0.1 M). The reaction mixtures were centrifuged at 14 000 u g, for 4 min, and 500 μl supernatant was mixed with 4 mL scintillation fluid. Radioactivity was quantified by liquid scintillation spectrometry.

Kinetic parameters (Km and Vmax) were determined by hyperbolic regression analysis, using the Hyper.exe software package (Easterby, 1992).

4.2 Experimental enzyme kinetic parameters of S-COMT (II)

A set of 45 catechol substrates for the soluble human COMT isoform (S-COMT) was studied for their kinetic parameters (Km, Vmax, and Vmax/Km). These results and the methods used were published previously (Lautala et al., 2001; Lotta et al., 1995).

4.3 Computational Methods (I-IV)

4.3.1 Hardware and Software

All CoMFA modelling (I-II) was performed on SGI Octane workstations using molecular modelling software packages SPARTAN (SPARTAN, v5.0) and SYBYL (SYBYL, 42 4 Methods

v6.8), to build the minimum energy conformations of substrate molecules and perform CoMFA calculations, respectively. Substrate molecular conformations were optimized and atomic partial charges were calculated by MOPAC (MOPAC, 2000). To automate the calculations, several Sybyl Programming Language (SPL) scripts were developed.

For COMT models (II) the pKa values of the substrates were estimated, using the ACD/LogD program (ACD/LogD, v4.0).

The prediction of pKa values (III) was performed using the following programs: ACD/Labs pKa DB (ACD/Labs pKa DB, v7.07), Pallas pKalc (Pallas, demo 3.1) and SPARC Performs Automated Reasoning in Chemistry (SPARC) On-Line Calculator (Hilal et al., 1995; Karickhoff et al., 1991). All Hammett σ constants for substituents were calculated using the ACD/Labs pKa DB program (ACD/Labs pKa DB, v7.07). The regression analysis was performed with the statistical software package SPSS (SPSS, v11.5.1).

The modelling of NRs and their ligand-binding pocket fields (IV) (Ruuska, 2011) was performed using Maestro (Maestro, v8.0) and MOE (MOE, v2008.10) software packages, running on Linux workstations. The Maestro implementation of OPLS_2005, a force field based on OPLS-AA (Jorgensen et al., 1996), was used during the preparation of protein structures. The clustering of ligand and coactivator binding site fields was performed with Cluster3.04 (de Hoon et al., 2004), and clustering was visualized with TreeView4 (de Hoon et al., 2004). Docking to the SULT isoenzymes was performed with Glide (v.4.5), in standard precision mode, using default settings (Friesner et al., 2004).

4.3.2 CoMFA models (I-II)

4.3.2.1 Description of substrates and their conformations

For CoMFA modelling, minimum energy molecular conformations of the substrates were created, and optimized to the semi-empirical AM1 level. For SULT1A3 models (I), Gasteiger-Hückel partial atomic charges were used. In the S-COMT models, charges from the AM1 calculations were used (electrostatic potential fitted and Coulson charges). 43

The SULT1A3 substrates were divided into three different data sets:

A) All substrates containing only one phenolic hydroxyl functionality amenable to sulfonation (n = 51). B) All substrates with rigid structures (low number of rotatable bonds) (n = 59). C) All substrates in the data set (n = 95).

For some SULT1A3 substrates, the active conformations were adjusted for the final CoMFA model, based on the information from the SULT1A3 crystal structure, as described later in the substrate alignment procedure.

4.3.2.2 Alignment of substrates

To perform substrate alignments, specific rules were created for each enzyme, based on existing knowledge regarding catalytic reactions (Creveling et al., 1972; Lotta et al., 1995), substrate specificity (Lotta et al., 1992; Taskinen et al., 1989), protein X-ray crystal structures (Dajani et al., 1999; Vidgren et al., 1994), and site-directed mutagenesis studies (Dajani et al., 1998).

For S-COMT, the following alignment rules were used:

(1) Catechol oxygen atoms and carbon atoms of all substrates were superimposed. (2) All substituents next to the catechol oxygen were aligned to the same side. (3) Polar substituents were aligned in meta positions. (4) Hydrophobic substituents were aligned in para positions.

For the initial SULT1A3 analysis [models A-C in (I)] a simple alignment procedure was used, based on the root-mean-square deviation (RMSD) fit of the substrate phenolic oxygen atom and the three carbon atoms in the attached benzene ring. 44 4 Methods

Based on the analysis of initial models and the crystal structure of SULT1A3, the following refined alignment and active conformation rules were derived:

(1) For ethylamines, the side chain nitrogen atom positions should enable the important electronic interaction between the amine and Glu146. (2) No obvious overlaps between the substrate and enzyme side chains should exist when the aligned agglomerate is transferred into the active site of the SULT1A3 crystal structure. (3) Relatively small hydrophobic substituents, in ortho and meta positions, with respect to the reacting hydroxyl group, should be aligned toward the hydrophobic pocket that is formed by the residues Tyr169, Tyr139, Pro47, and Phe142, in the SULT1A3 structure. (4) The hydrogen atoms of the reacting hydroxyl groups should point toward the catalytic base His108. (5) For catechol structures, the hydroxyl group that is not acting as a sulfonate acceptor should be aligned away from His108.

To fulfill these alignment rules, all substrates from the initial alignment were transferred to the SULT1A3 crystal structure and refined, by adjusting the alignments or conformations of the substrates as necessary. For ethylamines, the structures were re- optimized by fixing the ethylamine nitrogen atom at a position that was favorable for charge-charge interaction between the protonated amine group and Glu146 of SULT1A3. These adjustments yielded a final knowledge-based alignment, with putative bioactive conformations, which was used for the final CoMFA model [model D in (I)].

4.3.2.3 CoMFA parameters

Tripos standard steric and electrostatic fields as originally described by Cramer (Cramer et al., 1988), were used in all CoMFA calculations. Steric and electrostatic fields (see Equations 1 and 2, respectively) were calculated with a +1 charged sp3 carbon probe, a smooth transition, and a cutoff value of 30 kcal/mol. 45

(Eq. 1)

(Eq. 2)

Equations for calculating the electrostatic and steric (Lennard-Jones potential) fields. EC

= coulomb interaction energy, qi = partial charge of atom i of the molecule, qj = charge of the probe atom, D = dielectric constant, rij = distance between atom i of the molecule and the grid point j, location of the probe, EvdW = van der Waals interaction energy, and

Aij and Cij are constants that depend on the van der Waals radii of the interacting atoms.

The CoMFA grid spacing was held constant, at 2 Å, in all published models, as lowering the grid spacing did not improve the correlations in the initial studies. To optimize the orientation of the substrates with the CoMFA grid, an all space search technique was used (Hou and Xu, 2001), where the aligned substrate molecules were systematically rotated and translated in 3D space, to obtain the most consistent model.

4.3.2.4 Partial least squares (PLS) model validation

After the all space search optimization, the selected models were validated using leave- one-out and leave-n-out cross-validation procedures. During leave-one-out cross- validation, the PLS model is rebuilt, omitting each compound of the data set, one at a time, and cross-validated r2 (q2) values and standard errors of the predictions are calculated. During leave-n-out cross-validation, n compounds are omitted from the analysis each time when rebuilding the PLS models. Leave-n-out cross-validation was performed for all PLS models, using 5 and 10 randomly selected cross-validation groups (20 and 10 percent of the compounds in the data set were left out of each run, respectively). The procedure was repeated 100 times, and the mean q2 values were reported. All validation runs were computed using no column filtering and with CoMFA standard pre-analysis scaling. 46 4 Methods

After validation runs, a final CoMFA analysis was performed, with no validation, to obtain graphical contour coefficient maps showing the regions in space where steric and electrostatic variation correlates most with activity, and conventional r2 and standard error values were recorded.

4.3.2.5 Combination of CoMFA and pKa values for S-COMT substrates

As discussed previously, when describing the aims of this study, the chelation of Mg2+ ions by catechol oxygen atoms is a key feature of substrate recognition by S-COMT. Changing the electronic properties of the catechol oxygens, by introducing long-range inductive and mesomeric effects to the ring through suitable substituents, can affect this interaction. This effect was confirmed by the initial analysis of Km values for 55 S-COMT substrates [an extension of the substrate set used in (II)], where the pKa value alone was able to explain almost half of the variations in Km values (unpublished results). Based on this observation, we developed combined 1D/3D QSAR models, where the empirical Michaelis constant values were used as the Y variable, and the standard CoMFA fields and calculated pKa values of the substrates were used as the X variables during the PLS analysis (Sipilä, 2006). For this study, pKa values were calculated using the SPARC system (Hilal et al., 1995; Karickhoff et al., 1991). The previously described procedure and alignment rules were used for CoMFA.

4.3.3 pKa of substrates (II-III)

4.3.3.1 Training and test set

The pKa values of six in-house synthesized and nine purchased p-vinyl phenols were determined using potentiometric titration. For water-insoluble compounds, a co-solvent was used to dissolve the sample, and the apparent values were extrapolated to zero co- solvent content, using the Yasuda–Shedlovsky procedure (Avdeef, 1993; Shedlovsky, 1962; Yasuda, 1959). In addition to the compounds that were physically used in the study, four p-vinyl compounds with reported pKa values from the literature were added to the external test set (Brauer et al., 1964; Wikberg et al., 1993). 47

4.3.3.2 Modification of the Hammett equation

The experimental pKa values were fitted to the following equation (Eq. 3)

pKa = pKa0 + ΔpKa_vin = pKa0 + B ΣČSDUDѸ YLQ  (Eq. 3)

Where constants the pKa0 and B represent the pKa value of a plain p-vinyl phenol and the attenuation factor for the electronic substituent effects caused by the vinylic double bond, respectively. ΣČSDUDѸ YLQ is the sum of Hammett σpara--constants of the substituents in the p-position, connected to the aromatic ring through the vinylic group.

The final modified Hammett equation (Eq. 4) was derived by adding the term ΔpKa, obtained from the work by Biggs and Robinson (Biggs and Robinson, 1961) for non- vinylic substituents to the regression equation.

pKa = pKa0 + ΔpKa + ΔpKa_vin = pKa0 + A Σσ + B ΣČSDUDѸ YLQ  (Eq. 4)

4.3.3.3 Validation of the modified Hammett equation

An external test set of six p-vinyl substituted phenols was used to validate the derived Hammett equation, and the test set predictions were compared with two other available pKa prediction methods. Mean errors and their standard deviations were reported as the key performance indicators of the compared models.

4.3.4 Ligand Binding Pocket Fields (IV)

4.3.4.1 Nuclear Receptor (NR) Structures (IV)

One hundred seventy-seven human NR X-ray structures, obtained from the PDB (Berman et al., 2000), were included in the analysis. Two rat NR structures were added to provide better coverage of the NR family. The collected structures represented 35 different NRs, belonging to the subfamilies NR1 to NR5. The structures were cleaned, by the removal of non-relevant protein chains, solvents, cofactors, and ligands. Using the Maestro protein preparation tool (Maestro, v8.0), hydrogens were added, the most probable protonation 48 4 Methods

states were assigned, and hydrogen atoms were minimized to a maximum of 0.3 Å RMSD, keeping heavy atoms fixed. Missing protein amino acid side chains were modelled using Prime sidechain prediction (Prime, v1.6). After preparation, all structures were superimposed, using a consistency-based multiple alignment procedure (Ebert and Brutlag, 2006) and the Maestro protein structural alignment. After initial quality control and the inspection of the aligned structures, the following structures were removed due to quality issues and/or incompatible states with other structures in the set: 1LBD (an inactive form of RXR-a) (Bourguet et al., 1995); 1GWQ (antagonistic conformation of ER-a) (Wärnmark et al., 2002); 1L2J (antagonistic conformation of ER-a) (Shiau et al., 2002); 1NDE (ER-b with missing residues) (Henke et al., 2002); 1VJB (ERR-gamma with missing residues) (Greschik et al., 2004); 1NHZ (antagonistic conformation of GR) (Kauppi et al., 2003).

Based on the initial comparison, a representative set of 35 structures, one from each NR, was selected for further analysis. For this set, the structural superposition was improved by re-aligning all members of an NR subfamily using Maestro (Maestro, v8.0). The coactivator binding site field calculations included 33 NR structures in the agonistic conformation, and 1DKF (RAR-a) (Bourguet et al., 2000) and 3CJW (COUP-TFII) (Kruse et al., 2008), for which no agonistic structures are available.

4.3.4.2 Knowledge-Based Fields

All knowledge-based field calculations were performed using customized scientific vector language (SVL) scripts within the molecular operating environment (MOE) (Hoppe et al., 2006; MOE, v2008.10). The parametrization and field calculations implemented in MOE were used as such. During the MOE implementation, the probability density is given as a product of three 1D probability densities: distance- dependent, lone-pair interaction, and out-of-plane angle-dependent densities. These densities are approximated by analytic functions. The functions have been parametrized, using protein structures from the PDB (Berman et al., 2000) with a resolution of 2.0 Å or better. Both receptor and ligand atoms were divided into hydrophobic and hydrophilic atoms, and each atom type bears information about lone pairs so that both distance and angular information are considered by the probability density. The grid point is classified 49

as hydrophilic or hydrophobic if the probability density is above a given threshold (0.9). Two differently sized grids were used for the ligand-binding pocket, and in both cases, 0.5-Å grid spacing was used. The coactivator binding site was defined similarly, using three leucine residues of a representative coactivator peptide. For each grid point, the contact preference for a hydrophobic and a polar probe was calculated, as implemented in MOE, and only those points with high contact probability values (> 0.9) were considered.

4.3.4.3 Electrostatic Fields

Amber99 charges were calculated in MOE, for all receptor structures. A customized SVL script was used for the iterative solution of the Poisson-Boltzmann equation. The calculations were performed within a grid, with 0.75-Å spacing, extending 7 Å outside of the structures in the protein set. The grid spacing was a compromise between speed and accuracy, as other sources of inaccuracies existed, such as partial charges and the protonation states of amino acid side chains. The electrostatic potentials were used only qualitatively, to divide the hydrophilic regions in knowledge-based fields into positive (> 0.03), negative (< -0.03) or neutral points (-0.03 to 0.03). A 0.1 M NaCl concentration and a 300 K temperature were used. After the Poisson-Boltzmann calculation, the grid was reduced in size and transformed, using nonlinear interpolation into the same grid that was used for the knowledge-based fields.

4.3.4.4 Clustering and Postprocessing of Fields

The clustering of ligand and coactivator binding site fields was performed using uncentered Pearson correlations as distance measures and hierarchical complete linkage clustering. Only those grid points for which the knowledge-based field values (contact probabilities) and absolute electrostatic potential values were greater than or equal to 0.9 and 0.03, respectively, were considered. For each structure, the contact probability (polar and hydrophobic) and electrostatic field values greater than or equal to these thresholds were set to 1, and smaller values were set to 0. Then, the binary fields were concatenated into one file per structure, and these combined files were used as the inputs for clustering. One representative structure for each receptor was selected from the initial clusters for the final analysis. Several SVL scripts and user interface tools in MOE were created, for 50 4 Methods

similarity, average and difference field calculations, and the visualization of the fields. To combine information from many structures of different proteins, a new tool was used to analyze the fields, in which a certain portion of receptors in a group share grid points with the field values over a given threshold. Further post-processing tools included resizing and transforming the grid box, smoothening the fields using Gaussian functions, calculating linear combinations of fields, and removing isolated high-scoring grid points. An automated procedure was created, calculates a score for each molecule, based on the number of hydrophobic and polar atoms inside a certain type of field.

4.3.4.5 Cytosolic SULT Structures and Fields

A total of 36 human and mouse SULT structures were collected from the PDB (Berman et al., 2000), prepared, and used for knowledge-based and electrostatic field calculations, according to the procedures described for the NR structures (IV; Ruuska, 2011). The SULT protein structures were superimposed, using the SULT1A1 structure with PDB code 1LS6 (Gamage et al., 2003) as a reference, and the fields were clustered and analyzed as described for NRs.

In addition to the visual analysis and clustering of SULT fields, different combinations of fields were studied if they were useful during the scoring of docking poses for SULT substrates, to predict substrate specificity. A representative set of 13 structures was selected for this analysis, one for each isoenzyme. Ligands with reported Km values for SULT isoenzymes were collected from the literature and classified as good, moderate, weak, and non-binders, based on Km values. The field-based docking score (numbers of polar and hydrophobic atoms in polar and hydrophobic fields) was divided by the number of heavy atoms in the ligand, to normalize the effects of molecular sizes on the results, and the effect of re-scoring the ligands was studied for the classification of substrates for subtype selectivity and compared with the literature. 51

5 Results and discussion

5.1 SULT1A3 QSAR study (I)

Initially, three CoMFA models were derived: model A, which included only monophenolic compounds; model B, which included only rigid compounds; and model C, which contained the whole substrate set. After the analysis of the initial models, a fourth model D was built, using refined conformations and alignments, based on X-ray crystal information. All four CoMFA models showed robust statistical characteristics, based on leave-n-out cross-validation runs using 5 and 10 randomly selected cross- validation groups. The final model D, which included all 95 substrates, demonstrated an overall good fit between the actual and predicted Km values.

When the different SULT1A3 CoMFA models were analyzed using established statistical performance indicators for the models, some remarkable trends were recognized. Model B, which included only relatively rigid compounds, showed the best performance and was the most robust model during the leave-n-out cross-validation. This model suffers less than the other models from the uncertainty of the bioactive conformers, and the alignment of the whole set is the most consistent. Model D, for which both the conformations and the alignments of the compounds were refined by rational rules, based on the partial X- ray crystal structure of SULT1A3, performed better for the predictions and was more robust during cross-validation than model C, for which no adjustments of the conformations and alignments were performed. These results clearly demonstrated the added value of including the X-ray structural information for the 3D QSAR modelling workflow, as has been demonstrated by others in numerous studies examining diverse targets, including SULTs (Sharma and Duffel, 2005, 2002).

The visual analysis of 3D coefficient contour maps for SULT1A3 CoMFA models can be very useful when analyzing important regions to determine the Km of the molecule in question. The steric contour maps revealed some preferences for ortho- and meta- substituted phenols, whereas bulky substituents in the para position were not favored. Important steric effects were also observed where the flexible lid was located on top of the substrate-binding pocket, as discussed later in this section. The electrostatic map 52 5 Results and discussion

clearly demonstrated a preference for electronic interactions with Glu146, which has been shown to be crucial for the SULT1A3 specificity for catecholamines, such as dopamine (Dajani et al., 1998; Negishi et al., 2001).

5.2 S-COMT QSAR and pKa study (II-III)

5.2.1 Validation and performance of S-COMT CoMFA models

Different CoMFA models were built to examine the enzymatic kinetic parameters Km

(models I-III), Vmax (models IV and V), and Vmax / Km (models VI and VII) of S-COMT- catalyzed methylation reactions. The Km models, I and II, included all 45 compounds in the data set, and after analysis, one obvious outlier in the data set (2,3-dihydroxybenzoic acid) was removed. The peculiar behavior of this compound has been reported in a previous study, related to the inhibitory interactions between the charged carboxylate moiety, the catalytic base, Lys144, and the co-substrate, SAM, in the COMT active site (Lautala et al., 2001). 2,3-Dihydroxybenzoic acid and were removed from models IV-VII because no Vmax values could be determined for these two compounds.

Although the clinically used COMT inhibitors entacapone and tolcapone also lacked Vmax values, these molecules are known to be methylated in vivo (Jorga et al., 1999; Wikberg et al., 1993), and based on this knowledge, they were included in models IV and V, with low estimated Vmax values (log Vmax = -1).

Based on the statistical performances of the models during cross-validation, models II, V, and VII, which used semi-empirical quantum mechanical charges, clearly performed better than the corresponding models I, IV, and VI, respectively, which used empirically parametrized Gasteiger-Hückel charges. For COMT, we observed differences of 0.14– 0.24 for the cross-validated q2 values, in favor of the AM1 charges, which represents a significant improvement, likely due to the importance of electron delocalization effects for electrostatic interactions between substrate catechol ring oxygens and the Mg2+ ion at the COMT active site. Because of the clear benefits introduced by semi-empirical charges, only models that use AM1 charges will be discussed in the next chapters.

Models II, III, V, and VII performed robustly during leave-n-out cross-validation using 5 and 10 cross-validation groups, with mean cross-validated q2 values ranging from 0.52 53

(Vmax/Km model VI, with 5 cross-validation groups) to 0.84 (Km model III, with 10 cross- validation groups). At the same time that our COMT models were published, another study was published using the IC50 values of 92 COMT inhibitors to build a CoMFA model with docking-based alignment (Tervo et al., 2003). In that analysis, a cross- validated mean q2 value of 0.594 was obtained from 5 random groups, providing further confidence that meaningful CoMFA models can be built for COMT ligand binding. When moving from modelling Km values (as a binding affinity surrogate) to Vmax and Vmax/Km, which better describe how good a given compound is as a substrate, the statistical performances of our models decreased; however, the models remained significant, based 2 on the cross-validated q values. Vmax can be considered the rate constant at which products are released from the enzyme, whereas Vmax/Km represents the rate constant at which the substrate is captured by the enzyme to form a productive complex, and both are required to generate and describe the complete catalytic turnover process (Northrop, 1998). Using these descriptions of the enzymatic kinetic parameters, our methodology produced the most robust models for substrate binding affinity, followed closely by the rate of release. The capture of the substrate by S-COMT and the formation of a productive catalysis state appear to be the most challenging phenomena to predict.

5.2.2 Visual analysis and interpretation of S-COMT CoMFA models

The interpretation of coefficient contour maps generated for the S-COMT CoMFA fields by the final models (III, V, and VII) can be aligned well with existing knowledge of the S-COMT structure and the molecular mechanisms that control ligand binding and the methylation of substrates. In the Km-predicting model III, the favorable regions for steric interactions that are ortho and para to the reacting hydroxyl group match the active site, overlapping well with the nitro group positions of the inhibitor 3,5-dinitrocatechol, which was co-crystallized with S-COMT (Czarnota et al., 2019; Rutherford et al., 2008; Vidgren et al., 1994). Above the catechol ring, and extending in the direction out of the binding pocket, a clear region of non-favorable steric field can be observed. These favorable and non-favorable regions likely arise from interactions with the gatekeeper residues Trp38 and Pro174, which flank the catechol ring binding site in COMT, and from the favorable effect of filling the small pocket ortho to the reacting hydroxyl, where the 3-nitro group is located in the 3,5-dinitrocatechol complexes. The electrostatic contour coefficient maps 54 5 Results and discussion

for the Km model show a clear preference for more positive charges around the catechol ring and the catechol oxygens and a distinct region of preferred negative charge, which aligns well with the 3-nitro group location in the 3,5-dinitrocatechol-COMT complexes. These observations match existing knowledge that substituents that draw electrons from the catechol ring, such as the nitro groups in positions 3 and 5, lower the reacting hydroxyl group’s pKa value and have an enhancing effect on binding affinity (Lautala et al., 2001; Lotta et al., 1992; Taskinen et al., 1989). This visual interpretation of the findings from the Km model agrees with the observations from another CoMFA study, which used

COMT inhibitor IC50 values (Tervo et al., 2003). The analysis of the CoMFA fields in the

Vmax model demonstrated that electrostatic fields show some inverse preferences when compared with the Km model. Therefore, the electrostatic effects that enhance binding also reduce the turnover of substrates, and a similar trend was discovered when correlating the molecular electrostatic potential fields of COMT substrates with their affinities and turnover (Lautala et al., 2001).

5.2.3 Combining pKa prediction with the S-COMT CoMFA models

As discussed in the study aims, we were intrigued by the opportunity to dissect the SAR displayed in the S-COMT CoMFA models into pKa-driven electrostatic and steric elements. Strong evidence exists for the importance of COMT for the metabolism of dopamine in the brain, especially in the prefrontal cortex, and genetic evidence suggests that the fast-metabolizing COMT isomorphism is associated with cognitive deficits in diseases such as schizophrenia (Egan et al., 2001; Lapish et al., 2009; Scheggia et al., 2012; Takizawa et al., 2009; Tunbridge et al., 2004). The clinically used nitrocatechol inhibitors, tolcapone, entacapone, and opicapone, are quite acidic, with pKa values approaching 4, which prevents their usefulness as CNS drugs. A comparison of the results from the three CoMFA models, which were built with a set of 55 catechols, using three different descriptions of the structures, is shown in Table 2. The results showed that adding pKa values to the standard electrostatic and steric CoMFA fields boosted the performance of the model, and a very robust model, which only included three PLS components, was built using only CoMFA steric field and predicted pKa values. This result was encouraging because this model could be used to optimize ligands towards high COMT affinity, independent of acidity. Using this kind of approach, less acidic, 55

efficacious, and brain-permeating COMT inhibitors have been developed by us, at Orion Pharma R&D (Espoo, Finland, unpublished results), and by others (Buchler et al., 2018; Harrison et al., 2015).

Table 2. Statistics of the CoMFA models, when standard CoMFA fields were combined with pKa values of COMT substrates. N = number of training set compounds, q2 = crossvalidated (LOO) r2 of the model, SPRESS = standard error of prediction, C = number of PLS components, SE = standard error, r2 = conventional r2 of the model. LNO 5g/10g = leave-n-out crossvalidation with 5/10 groups (Sipilä, 2006).

5.2.4 Justification for revisiting the pKa predictions of certain COMT ligands pKa of the catechol moiety, or indirectly the existence of electron-withdrawing substituents in the catechol ring was shown to be very important for affinity of substrates to COMT, as shown previously (Lautala et al., 2001; Lotta et al., 1992) and by our studies, in which the calculated pKa values were combined with the field-based 3D QSAR modelling, as described above. While working with the S-COMT CoMFA model, we identified a clear correlation between the calculated pKa values of the substrates and the QM semi-empirical charges (but not with the charges from the empirical charge model) of the reacting oxygen atom. We also found that the semi-empirical charge models explained activity variations quite well. Additionally, when examining the COMT inhibitor dataset, we determined that the commonly used computational tools for predicting pKa failed miserably for certain p-vinyl catechols that had electron- withdrawing groups attached to the distal ends of the double bond system. This vinylic scaffold is privileged for COMT, as many known COMT inhibitors belong to this class of compounds, including the widely used clinical drug entacapone. To use the calculated pKa values as surrogate descriptors in our CoMFA models, we required more reliable predictions for this class of compounds. The chosen approach was to determine the pKa values experimentally for a series of p-vinyl phenols and to re-parametrize the standard pKa predictions, based on Hammett-type linear free energy relationships. 56 5 Results and discussion

5.2.5 Validation and performance of the refined p-vinylphenol pKa model (III)

The modified Hammett equation for predicting the pKa values of p-vinylphenols was tested using an external test set of six compounds, and the results were highly accurate, with a mean error of 0.19 log units, which was good enough performance to be used as descriptors in QSAR models instead of experimental pKa values. A similar performance was achieved with the best reference system we tested, SPARC On-Line Calculator (Hilal et al., 1995). We calculated an attenuation factor (0.54) of the vinylic bond, for the electron-withdrawing effects of substituents on the p-hydroxyl pKa, and found that it was in a reasonable agreement with values reported in the literature for similar effects in the opposite direction (Jaffé, 1953; Williams, 2003). The benefit of using our equation for the attenuation factor is that it could be easily added to the commonly used pKa prediction software or the pKa values could be manually calculated if the Hammett σpara--constants for the substituents are available.

5.3 Methodological considerations for SULT1A3 and COMT CoMFA models

CoMFA (Cramer et al., 1988) is a well-established 3D QSAR method that has been successfully applied to various applications for the characterization of important ligand features that affect biological activity within a structurally related compound series. During CoMFA applications, the biological activity is generally measured in a binding or functional assay for a known target protein. QSAR models can only be as good as the training data, and having access to a consistent experimental data set, that was measured using the same methods and conditions, such as the data sets we had for COMT and SULT1A3, is a privilege. When building a CoMFA model, other critical factors must be considered, as is discussed thoroughly in the literature (Cherkasov et al., 2014; Kim et al., 1998). Critical and often model quality-limiting factors during CoMFA modelling are the generation of high quality, preferably bioactive conformations, and the mutual alignment of the ligands.

Several essential pieces of information and hypotheses regarding specific substrate- enzyme interactions have influenced our methodological choices for the 3D QSAR 57

studies performed on S-COMT and SULT1A3. First of all, we needed to make optimal use of the existing X-ray structural information for the enzymes and enzyme-substrate complexes. When we began planning the study, three high-quality structures complexed with inhibitors were available for S-COMT (Bonifácio et al., 2002; Lerner et al., 2001; Vidgren et al., 1994), whereas, for SULT1A3, only one partial structure was available, with no substrate or inhibitor bound (Dajani et al., 1999). In detailed analyses of the structures, both SULT1A3 and COMT show substantial flexibility in their active sites, and ability to enter different productive and non-productive conformations, induced by structurally different substrates and inhibitors. Evolution has likely favored this intrinsic flexibility and broad substrate specificity due to the roles played by these Phase II enzymes in the chemical defense against structurally diverse phenolic xenobiotics. For 3D QSAR with CoMFA, the first critical step is to define the conformations and alignments of the studied ligands, such that the statistical PLS analysis can match the parts of the ligands that make similar contributions to activity. If relevant X-ray crystal structures are available, the use of well-established docking methods to generate putative bioactive conformations and common alignments among the ligand set for use in a 3D QSAR method, such as CoMFA, is a common procedure. However, accounting for protein flexibility, induced-fit effects, and water behaviors during docking studies is not a trivial task. Widely used methods used to incorporate protein flexibility in docking studies usually involve representative conformational ensembles of the protein, which are derived from MD simulations or experimental structural biology (Śledź and Caflisch, 2018) and/or letting protein side-chain conformations adapt to the ligand structures, using induced-fit docking (Sherman et al., 2006). Many other approaches for docking simulations exist, allowing varying degrees of protein flexibility and dynamics (Lill, 2011). Statistical evidence from publicly available crystal structures suggest that these methods are valid, in principle (Clark et al., 2019); however, incorporating them into the 3D QSAR workflow remains challenging, due to the demands for high-quality structural information and the additional computational costs, which can be prohibitively high. Whether 3D QSAR is the best method in cases where such structural information is unavailable remains unclear. In this study, we opted for simple alignment and conformation generation rules, based on the catalytic mechanism and existing structural information for SULT1A3 and COMT. The minimum energy conformations of the 58 5 Results and discussion

structures were used as the starting points, which were transferred to the enzyme active site, where acceptor phenol ring systems were aligned to allow the sulfonation or methylation reactions, and any obvious overlaps with the protein structure were fixed manually.

For COMT, specifically, we anticipated that the substrate electrostatics would require special consideration, due to the striking importance of the catechol hydroxyls binding to the structural magnesium ion in the COMT active site. As discussed earlier in the aims of the study, the charge density at the catechol oxygen atoms is known to drastically affect binding. The most rigorous method for assessing these effects would be the use of QM methods for each enzyme-substrate complex in the active site; however, this approach would not be feasible because of the computational costs and because the exact binding complex is not known for each substrate. To better account for the electrostatic effects, we opted to study the effects of different partial charge calculation methods (empirical and semi-empirical) on model accuracy. For SULT1A3, such an analysis was considered unnecessary, and only empirical Gasteiger-Hückel charges were used.

5.4 Field-based comparisons of ligand binding sites (IV)

5.4.1 Quantitative structural knowledge from the protein molecular fields

As described in the literature review, the public availability of protein structures has improved tremendously during the last two decades. This growth has inspired us to develop novel methods that use this structural knowledge whenever we examine protein families for which the structures of several related proteins have been solved. We were intrigued by the previously tested hypothesis that ligand binding pockets contain well- conserved regions that are important for the affinities of all related proteins, whereas other regions with greater variability could be exploited by ligands, to gain selectivity among protein family members, and how knowledge-based potentials could be used to analyze the binding pockets (Hoppe et al., 2006). To gain more specificity for the fields, we added physics-based positive and negative electrostatic potentials to the fields and automated the whole analysis workflow, starting from the aligned and prepared crystal structures (IV). By calculating different combinations of molecular interaction fields over the whole 59

protein family, as described for NRs (IV) and SULTs (Ruuska, 2011), more information can be extracted than by analyzing one protein at a time. The fields can be used to classify and visualize the most important regions in the binding sites or to improve and focus virtual screening efforts and structure-based ligand specificity design. Because the focus of this thesis is on the prediction of Phase II metabolism, the potential use of field-based binding site analysis for the virtual screening of SULT substrates was examined. The analysis of NR ligand binding sites is covered only briefly, from a methodological point of view.

5.4.2 Comparison of NR ligand binding sites, using molecular interaction fields

The clustering of NRs, based on ligand and coactivator binding sites, knowledge-based potentials, and electrostatic fields, was successful. The generated classification agrees well with the conventional NR classification into families (Germain et al., 2006), which is not true for the sequence-based classifications of their ligand-binding domains. The advantage of our interaction- and field-based method is the ability to consider linear combinations, which might arise from different regions of the same X-ray protein structure or the average similarity and dissimilarity fields, calculated from many experimental structures of the same protein or other related proteins. This averaging effect can improve the signal-noise ratio of information derived from structures, for which the most important possible sources of noise are the different protein conformations induced by different ligands and potential inaccuracies in the alignment of the structures. However, averaging the fields can be problematic, in some cases, resulting in the disappearance of important regions of a flexible area of the binding site because they do not exist in enough structures.

Pairwise comparisons of the ligand-binding site fields for NRs revealed similarities between receptors from different phylogenetic families, such as RXR (NR2B) and ER (NR3A), RAR (NR1B) and RXR (NR2B), and HNF4 (NR2A) and LRH-1 (NR5A). For the first two pairs of receptors, evidence of cross-selectivity was observed, indicating that similar ligands have been found to bind to both receptors (Allenby et al., 1993; Mestres et al., 2006). These findings support the use of our field-based methods to detect promiscuous binders that are capable of binding several related proteins and to detect 60 5 Results and discussion

conserved regions of binding sites that may be crucial determinants of high binding affinity.

Another important goal of the protein binding site comparison is the identification of dissimilar regions that can be used by ligands to gain selectivity. During the NR study, we found that our field-based method could pin-point variable lipophilic and polar hot spots that are exploited by selective ligands for LXR (NR1H), RAR (NR1B) and TR (NR1A) subtypes, which is consistent with the known binding modes of these ligands (Dow et al., 2003; Klaholz et al., 2000; Klaholz and Moras, 2002; Wrobel et al., 2008).

The workflow for analyzing knowledge-based and electrostatic fields for a protein family involves many steps: protein preparation and the alignment of structures, ligand and binding site definition, the calculation of fields, combining and post-processing, and, eventually, clustering, visualization, and analysis. This computational workflow was automated to enable the analysis of different protein families and to enable the optimization of different parameters during the analysis. Some parameters, such as the smoothing of fields with Gaussian functions and the removal of isolated high-scoring field points, were optimized to remove noise and improve visualizations, whereas some parameters that affect how the linear combinations of fields are calculated can be used to focus on binding site similarities and differences. We also created a tool to calculate the numbers of hydrophobic and polar atoms in different types of fields, which can be used as a scoring function during docking studies. The field-based scoring method was later tested in a feasibility study, to identify the isoenzyme specificities of SULT substrates.

5.4.3 Comparison of SULT binding site fields

The classification of 35 SULT enzyme structures, based on the similarities among their binding pocket fields (Figure 6), was well in line with the phylogenetic classification of SULTs (Gamage et al., 2006), with a few exceptions related to structure quality, such as the SULT1A3 structure 1CJM, which was missing parts of the binding site. 61

Figure 6. Clustering of SULT crystal structures based on the knowledge-based and electrostatic fields of the active sites, colored by isoenzyme.

The use of docking, combined with the field-based scoring function (numbers of hydrophobic and polar atoms in corresponding knowledge-based fields), to classify substrates based on their binding preferences with different SULT isoenzymes resulted in weak classification performance. Although some weak trends were observed, the correlation with known substrate specificities was not significant. One identified issue was the inconsistency of the validation data set containing substrate Km values, which were collected from various literature sources; therefore, an objective performance assessment was difficult. Another issue with the methodology was protein flexibility, especially within the ligand-binding region. The flexibility of the lid covering the bound substrate in SULT structures is well-known (Cook et al., 2013a; Martiny et al., 2013) and is discussed in more detail in the section describing SULT1A3 CoMFA models. Induced- fit effects are difficult to incorporate into our method, in which molecular fields are pre- 62 5 Results and discussion

calculated using static protein structures. One method for bypassing this issue would be to generate ensembles of representative conformations for each isoenzyme, using MD simulations, followed by ensemble field generation and docking. However, this approach would introduce computational cost to the analysis, with no assurance that the relevant protein conformation for each substrate will be available. When considering the impressive performance that has been reported for MD simulations, combined with docking, for SULT1A1 and SULT2A1 substrate predictions (Cook et al., 2013c), the added value of including molecular interaction fields becomes questionable.

5.5 Future directions

Since we built our CoMFA models, much additional structural information for SULTs and SULT1A3 has emerged (Dong et al., 2012; Lu et al., 2005). The published crystal structures have been used to better understand the molecular mechanisms of ligand binding, using detailed MD simulations, QSAR, and virtual screening studies (Campagna-Slater and Schapira, 2009; Martiny et al., 2013; Tibbs et al., 2015).

The preference for an electronic interaction between the substrate and Glu146 in the active site of SULT1A3 has been recognized to be crucial for the substrate-induced stabilization of the closed form of SULT1A3 in MD simulations (Martiny et al., 2013). Our hypothesis regarding the SULT1A3 binding site flexibility and induced fit effects have been confirmed, and extensive MD simulations combined with docking methods have been used to obtain more accurate predictions of substrate selectivity, compared with those returned by simple QSAR models based on topological fingerprints (Martiny et al., 2013). The flexibility of the substrate-binding site has been shown to be a general characteristic that is observed in several members of the SULT superfamily. Substrate binding to SULT2A1 and SULT1A1 has been shown to be controlled by a flexible lid or cap, which is stabilized when the co-substrate PAPS is bound, which acts as a controller for substrate entry while leaving some flexibility for substrate-specific induced fit effects (Cook et al., 2013a, 2013b). Using MD simulations, combined with docking, to tackle this flexibility, some remarkable breakthroughs in the accuracy of models designed to predict SULT2A1 and SULT1A1 substrates and inhibitors have been achieved (Cook et al., 2013c). However, the extent of flexibility in MD simulations, the degree of openness, and the 63

exact amino acids found in the substrate-binding pocket can vary greatly among different SULT isoenzymes, and the induced fit effects of SULT1A3 might be more important and challenging to model accurately than those for SULT1A1 (Martiny et al., 2013; Tibbs et al., 2015); therefore, ligand-based 3D QSAR approaches, such as the CoMFA models built in this study, remain valid in silico approaches for the estimation of substrate binding to SULT1A3.

Solving the crystal structures of the whole SULT superfamily and feeding this wealth of information to innovative computational scientists represents a great example of how detailed understandings of molecular mechanisms, derived from structural biology and MD simulations, can result in better predictive models and the discovery of totally new and unexpected biology. This study has enabled the discovery of isoform-specific allosteric SULT sites (Cook et al., 2016; Wang et al., 2017). The allosteric site found in SULT1A3 takes part in positive feedback loops in important neurotransmitter signaling pathways. This allosteric site has appeared quite late in the evolution and is intriguingly found only in primates (Cook et al., 2017). In SULT1A1, a target site of commonly used non-steroidal anti-inflammatory drugs has been identified, which has been demonstrated to affect the liver toxicity of paracetamol, through drug-drug interactions (Wang et al., 2017).

When evaluating new candidate drug molecules, accurate predictions regarding the importance of different tissues, pathways, and individual metabolic enzymes remain challenging. However, as has been discussed in previous sections, promising models have already been built for individual enzymes, based on biophysics and MD simulations, that are capable of predicting whether a molecule is an enzyme substrate, with convincing accuracy. Empirically parametrized models, such as those presented here, can be used locally, within narrower applicability domains to obtain quantitative predictions within a specific chemical class. For targets and pathways with sufficient data availability, novel deep learning networks can be used to obtain even more predictive methods, based on experimental in vitro and in vivo data, perhaps augmented with simulations that start from first principles, when feasible. Artificial intelligence, or augmented intelligence, platforms are already emerging, which use generative algorithms to design new molecules with desired properties (Jin et al., 2019; Zhavoronkov et al., 2019). Most likely, models 64 5 Results and discussion

that predict metabolism will be incorporated into these systems, allowing the risks of drug development failures related to metabolism to be minimized, in silico, as early as the molecule design stage.

Due to advances in technology, increased computational power has become more practically accessible to researchers, enabling the use of more detailed and accurate physics-based molecular models. In our COMT CoMFA models, we found that the semi- empirical AM1 charges performed significantly better than empirical charges. Later, a systematic comparison of different charge models for CoMFA was performed, and semi- empirical charges produced improved models compared with empirical models in that study (Tsai et al., 2010). Similarly, many other ligand-based QSAR methods could benefit from improved force fields and QM descriptors.

Validated frameworks for the prediction of clearance mechanisms, such as the Extended Clearance Classification System (ECCS) (Varma et al., 2015) can be incorporated into the analysis. As all these bits and pieces of the puzzle become available, the most important question for the future will be how to integrate all of this information to produce fast, accurate, and quantitative answers to specific questions. I believe strongly that in the future, we will be able to feed these data and models into deep knowledge graphs that can be used to answer typical questions about metabolism, similar to the deep recommendation systems used by the tech giants Google, Microsoft, and Facebook, based on their vast knowledge graphs and vaults of human interaction data, that can classify users into various actionable classes. These types of platforms will also be able to estimate the uncertainty of answers and utilize active learning to suggest or initiate specific experimental or simulated studies that must be performed to improve predictions.

As discussed in the introduction, the two enzymes (SULT1A3 and COMT) studied in this thesis are privileged members of the family of Phase II metabolic enzymes, due to their associations with catecholamine-mediated neurotransmission in the brain. In my opinion, this link is strong enough to raise their status to “neurozymes”, as has been suggested for MAOs (Tripathi et al., 2018). However, the opportunity to target these enzymes using CNS-permeable drugs to modulate the levels of dopamine and noradrenaline in specific regions of the brain has not yet been realized. I hope that, in the future, such attempts will be made. 65

6 Conclusions

In this thesis, computational models were built, which were useful for the prediction of substrate binding to COMT and SULT1A3, based on chemical structure. Additionally, a novel workflow for the extraction of knowledge- and physics-based structural information from the binding site structures of an entire family of related proteins has been developed and integrated into ligand design workflows.

The CoMFA models presented for SULT1A3 Km showed robust statistics and comparable performance against other published models developed for this specific isoenzyme. SULT1A3 is regarded as a challenging target for the performance of accurate MD-based methods because of the wide-open substrate-binding site and considerable induced fit effects.

Interpretations of the COMT CoMFA Km models provided methodological insights. The semi-empirical QM charge models and/or accurate pKa predictions improved the accuracy significantly. A statistically robust CoMFA model for ligand binding was developed, using predicted pKa combined with the steric molecular interaction fields. To support this model, a practical method for accurately predicting the pKa values of a specific COMT ligand chemotype (p-vinylphenols) was developed. This type of approach enables the rational development of brain-permeating COMT ligands.

An automated field-based method was developed for the rapid classification and analysis of protein binding pockets in a protein family. This modelling workflow can be used in medicine design, to optimize the selectivity or affinity of a molecule, based on conserved and non-conserved features of the ligand-binding pockets, while considering the limitations of the method with regard to protein flexibility and induced fit effects. 66

7 References

ACD/Labs pKa DB, v7.07. ACD/Labs pKa DB. Advanced Chemistry Development Inc., Toronto, CA.

ACD/LogD, v4.0. ACD/LogD. Advanced Chemistry Development Inc., Toronto, CA.

Ai, C., Wang, Y., Li, Yan, Li, Yanhong, Yang, L., 2008. A 3-D QSAR Study of Catechol-O- Methyltransferase Inhibitors Using CoMFA and CoMSIA. QSAR Comb. Sci. 27, 1183–1192. https://doi.org/10.1002/qsar.200730053

Allali-Hassani, A., Pan, P.W., Dombrovski, L., Najmanovich, R., Tempel, W., Dong, A., Loppnau, P., Martin, F., Thonton, J., Edwards, A.M., Bochkarev, A., Plotnikov, A.N., Vedadi, M., Arrowsmith, C.H., 2007. Structural and Chemical Profiling of the Human Cytosolic Sulfotransferases. PLOS Biol. 5, e97. https://doi.org/10.1371/journal.pbio.0050097

Allenby, G., Bocquel, M.T., Saunders, M., Kazmer, S., Speck, J., Rosenberger, M., Lovey, A., Kastner, P., Grippo, J.F., Chambon, P., 1993. Retinoic acid receptors and retinoid X receptors: interactions with endogenous retinoic acids. Proc. Natl. Acad. Sci. U. S. A. 90, 30–34.

Apud, J.A., Mattay, V., Chen, J., Kolachana, B.S., Callicott, J.H., Rasetti, R., Alce, G., Iudicello, J.E., Akbar, N., Egan, M.F., Goldberg, T.E., Weinberger, D.R., 2007. Tolcapone Improves Cognition and Cortical Information Processing in Normal Human Subjects. Neuropsychopharmacology 32, 1011–1020. https://doi.org/10.1038/sj.npp.1301227

Avdeef, A., 1993. pH-Metric log P. II: Refinement of Partition Coefficients and Ionization Constants of Multiprotic Substances. J. Pharm. Sci. 82, 183–190. https://doi.org/10.1002/jps.2600820214

Axelrod, J., Tomchick, R., 1958. Enzymatic O-Methylation of Epinephrine and Other Catechols. J. Biol. Chem. 233, 702–705.

Bairam, A.F., Rasool, M.I., Alherz, F.A., Abunnaja, M.S., El Daibani, A.A., Gohal, S.A., Kurogi, K., Sakakibara, Y., Suiko, M., Liu, M.-C., 2018a. Sulfation of catecholamines and serotonin by SULT1A3 allozymes. Biochem. Pharmacol. 151, 104–113. https://doi.org/10.1016/j.bcp.2018.03.005

Bairam, A.F., Rasool, M.I., Alherz, F.A., Abunnaja, M.S., El Daibani, A.A., Kurogi, K., Liu, M.-C., 2018b. Effects of human SULT1A3/SULT1A4 genetic polymorphisms on the sulfation of acetaminophen and opioid drugs by the cytosolic sulfotransferase SULT1A3. Arch. Biochem. Biophys. 648, 44–52. https://doi.org/10.1016/j.abb.2018.04.019

Bairoch, A., 2000. The ENZYME database in 2000. Nucleic Acids Res. 28, 304–305.

Baumann, E., 1876. Ueber Sulfosäuren im Harn. Berichte Dtsch. Chem. Ges. 9, 54–58. https://doi.org/10.1002/cber.18760090121

Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E., 2000. The Protein Data Bank. Nucleic Acids Res. 28, 235–242.

Blanchard, R., Freimuth, R., Buck, J., Weinshilboum, R., Coughtrie, M., 2004. A proposed nomenclature system for the cytosolic sulfotransferase (SULT) superfamily. Pharmacogenetics 14, 199–211.

Bock, K.W., 2015. Roles of human UDP-glucuronosyltransferases in clearance and homeostasis of endogenous substrates, and functional implications. Biochem. Pharmacol. 96, 77–82. https://doi.org/10.1016/j.bcp.2015.04.020 67

Bonifácio, M.J., Archer, M., Rodrigues, M.L., Matias, P.M., Learmonth, D.A., Carrondo, M.A., Soares- da-Silva, P., 2002. Kinetics and Crystal Structure of Catechol-O-Methyltransferase Complex with Co-Substrate and a Novel Inhibitor with Potential Therapeutic Application. Mol. Pharmacol. 62, 795–805. https://doi.org/10.1124/mol.62.4.795

Boudet, A.-M., 2007. Evolution and current status of research in phenolic compounds. Phytochemistry 68, 2722–2735. https://doi.org/10.1016/j.phytochem.2007.06.012

Bourguet, W., Ruff, M., Chambon, P., Gronemeyer, H., Moras, D., 1995. Crystal structure of the ligand- binding domain of the human nuclear receptor RXR-alpha. Nature 375, 377–382. https://doi.org/10.1038/375377a0

Bourguet, W., Vivat, V., Wurtz, J.M., Chambon, P., Gronemeyer, H., Moras, D., 2000. Crystal structure of a heterodimeric complex of RAR and RXR ligand-binding domains. Mol. Cell 5, 289–298. https://doi.org/10.1016/s1097-2765(00)80424-4

Brauer, G.M., Argentar, H., Durany, G., 1964. Ionization constants and reactivity of isomers of eugenol. J. Res. Natl. Bur. Stand. Sect. Phys. Chem. 68A, 619. https://doi.org/10.6028/jres.068A.060

Buchler, I., Akuma, D., Au, V., Carr, G., de León, P., DePasquale, M., Ernst, G., Huang, Y., Kimos, M., Kolobova, A., Poslusney, M., Wei, H., Swinnen, D., Montel, F., Moureau, F., Jigorel, E., Schulze, M.-S.E.D., Wood, M., Barrow, J.C., 2018. Optimization of 8-Hydroxyquinolines as Inhibitors of Catechol O-Methyltransferase. J. Med. Chem. 61, 9647–9665. https://doi.org/10.1021/acs.jmedchem.8b01126

Bunker, A., Männistö, P.T., Pierre, J.-F.S., Róg, T., Pomorski, P., Stimson, L., Karttunen, M., 2008. Molecular dynamics simulations of the enzyme Catechol-O-Methyltransferase: methodological issues. SAR QSAR Environ. Res. 19, 179–189. https://doi.org/10.1080/10629360701843318

Campagna-Slater, V., Schapira, M., 2009. Evaluation of Virtual Screening as a Tool for Chemical Genetic Applications. J. Chem. Inf. Model. 49, 2082–2091. https://doi.org/10.1021/ci900219u

Chen, J., Song, J., Yuan, P., Tian, Q., Ji, Y., Ren-Patterson, R., Liu, G., Sei, Y., Weinberger, D.R., 2011. Orientation and Cellular Distribution of Membrane-bound Catechol-O-methyltransferase in Cortical Neurons. J. Biol. Chem. 286, 34752–34760. https://doi.org/10.1074/jbc.M111.262790

Chen, Z., Zheng, S., Li, L., Jiang, H., 2014. Metabolism of flavonoids in human: a comprehensive review. Curr. Drug Metab. 15, 48–61. https://doi.org/10.2174/138920021501140218125020

Cherkasov, A., Muratov, E.N., Fourches, D., Varnek, A., Baskin, I.I., Cronin, M., Dearden, J., Gramatica, P., Martin, Y.C., Todeschini, R., Consonni, V., Kuz’min, V.E., Cramer, R., Benigni, R., Yang, C., Rathman, J., Terfloth, L., Gasteiger, J., Richard, A., Tropsha, A., 2014. QSAR Modeling: Where have you been? Where are you going to? J. Med. Chem. 57, 4977–5010. https://doi.org/10.1021/jm4004285

Clark, J.J., Benson, M.L., Smith, R.D., Carlson, H.A., 2019. Inherent versus induced protein flexibility: Comparisons within and between apo and holo structures. PLOS Comput. Biol. 15, e1006705. https://doi.org/10.1371/journal.pcbi.1006705

Cook, I., Wang, T., Almo, S.C., Kim, J., Falany, C.N., Leyh, T.S., 2013a. The gate that governs sulfotransferase selectivity. Biochemistry 52, 415–424. https://doi.org/10.1021/bi301492j

Cook, I., Wang, T., Almo, S.C., Kim, J., Falany, C.N., Leyh, T.S., 2013b. Testing the Sulfotransferase Molecular Pore Hypothesis. J. Biol. Chem. 288, 8619–8626. https://doi.org/10.1074/jbc.M112.445015 68 7 References

Cook, I., Wang, T., Falany, C.N., Leyh, T.S., 2013c. High Accuracy in Silico Sulfotransferase Models. J. Biol. Chem. 288, 34494–34501. https://doi.org/10.1074/jbc.M113.510974

Cook, I., Wang, T., Girvin, M., Leyh, T.S., 2016. The structure of the -binding site of human sulfotransferase 1A1. Proc. Natl. Acad. Sci. U. S. A. 113, 14312–14317. https://doi.org/10.1073/pnas.1613913113

Cook, I., Wang, T., Leyh, T.S., 2017. Tetrahydrobiopterin regulates monoamine neurotransmitter sulfonation. Proc. Natl. Acad. Sci. U. S. A. 114, E5317–E5324. https://doi.org/10.1073/pnas.1704500114

Cook, I., Wang, T., Leyh, T.S., 2015. Sulfotransferase 1A1 Substrate Selectivity: A Molecular Clamp Mechanism. Biochemistry 54, 6114–6122. https://doi.org/10.1021/acs.biochem.5b00406

Coughtrie, M.W.H., 2002. Sulfation through the looking glass--recent advances in sulfotransferase research for the curious. Pharmacogenomics J. 2, 297–308. https://doi.org/10.1038/sj.tpj.6500117

Court, M.H., Zhang, X., Ding, X., Yee, K.K., Hesse, L.M., Finel, M., 2012. Quantitative distribution of mRNAs encoding the 19 human UDP-glucuronosyltransferase enzymes in 26 adult and 3 fetal tissues. Xenobiotica 42, 266–277. https://doi.org/10.3109/00498254.2011.618954

Cramer, R.D., 2012. The inevitable QSAR renaissance. J. Comput. Aided Mol. Des. 26, 35–38. https://doi.org/10.1007/s10822-011-9495-0

Cramer, R.D., 2003. Topomer CoMFA: A Design Methodology for Rapid Lead Optimization. J. Med. Chem. 46, 374–388. https://doi.org/10.1021/jm020194o

Cramer, R.D., Patterson, D.E., Bunce, J.D., 1988. Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. J. Am. Chem. Soc. 110, 5959–5967. https://doi.org/10.1021/ja00226a005

Creveling, C.R., Morris, N., Shimizu, H., Ong, H.H., Daly, J., 1972. Catechol O-Methyltransferase : IV. Factors Affecting m- and p-Methylation of Substituted Catechols. Mol. Pharmacol. 8, 398–409.

Czarnota, S., Johannissen, L.O., Baxter, N.J., Rummel, F., Wilson, A.L., Cliff, M.J., Levy, C.W., Scrutton, N.S., Waltho, J.P., Hay, S., 2019. Equatorial Active Site Compaction and Electrostatic Reorganization in Catechol-O-methyltransferase. ACS Catal. 9, 4394–4401. https://doi.org/10.1021/acscatal.9b00174

Dajani, R., Cleasby, A., Neu, M., Wonacott, A.J., Jhoti, H., Hood, A.M., Modi, S., Hersey, A., Taskinen, J., Cooke, R.M., Manchee, G.R., Coughtrie, M.W., 1999. X-ray crystal structure of human dopamine sulfotransferase, SULT1A3. Molecular modeling and quantitative structure-activity relationship analysis demonstrate a molecular basis for sulfotransferase substrate specificity. J. Biol. Chem. 274, 37862–37868.

Dajani, R., Hood, A.M., Coughtrie, M.W., 1998. A single amino acid, glu146, governs the substrate specificity of a human dopamine sulfotransferase, SULT1A3. Mol. Pharmacol. 54, 942–948. de Hoon, M.J.L., Imoto, S., Nolan, J., Miyano, S., 2004. Open source clustering software. Bioinforma. Oxf. Engl. 20, 1453–1454. https://doi.org/10.1093/bioinformatics/bth078

Delgado, A., Issaoui, M., Chammem, N., 2019. Analysis of Main and Healthy Phenolic Compounds in Foods. J. AOAC Int. 102. https://doi.org/10.5740/jaoacint.19-0128 69

Dixit, V.A., Lal, L.A., Agrawal, S.R., 2017. Recent advances in the prediction of non-CYP450-mediated drug metabolism. Wiley Interdiscip. Rev. Comput. Mol. Sci. 7, e1323. https://doi.org/10.1002/wcms.1323

Domagalski, M.J., Zheng, H., Zimmerman, M.D., Dauter, Z., Wlodawer, A., Minor, W., 2014. The Quality and Validation of Structures from Structural Genomics. Methods Mol. Biol. Clifton NJ 1091, 297–314. https://doi.org/10.1007/978-1-62703-691-7_21

Dong, D., Ako, R., Wu, B., 2012. Crystal structures of human sulfotransferases: insights into the mechanisms of action and substrate selectivity. Expert Opin. Drug Metab. Toxicol. 8, 635–646. https://doi.org/10.1517/17425255.2012.677027

Dow, R.L., Schneider, S.R., Paight, E.S., Hank, R.F., Chiang, P., Cornelius, P., Lee, E., Newsome, W.P., Swick, A.G., Spitzer, J., Hargrove, D.M., Patterson, T.A., Pandit, J., Chrunyk, B.A., LeMotte, P.K., Danley, D.E., Rosner, M.H., Ammirati, M.J., Simons, S.P., Schulte, G.K., Tate, B.F., DaSilva-Jardine, P., 2003. Discovery of a novel series of 6-azauracil-based thyroid hormone receptor ligands: potent, TRβ subtype-selective thyromimetics. Bioorg. Med. Chem. Lett. 13, 379–382. https://doi.org/10.1016/S0960-894X(02)00947-2

Easterby, J.S., 1992. Hyper.exe. University of Liverpool, UK.

Ebert, J., Brutlag, D., 2006. Development and validation of a consistency based multiple structure alignment algorithm. Bioinformatics 22, 1080–1087. https://doi.org/10.1093/bioinformatics/btl046

Egan, M.F., Goldberg, T.E., Kolachana, B.S., Callicott, J.H., Mazzanti, C.M., Straub, R.E., Goldman, D., Weinberger, D.R., 2001. Effect of COMT Val108/158 Met genotype on frontal lobe function and risk for schizophrenia. Proc. Natl. Acad. Sci. U. S. A. 98, 6917–6922. https://doi.org/10.1073/pnas.111134598

Ehler, A., Benz, J., Schlatter, D., Rudolph, M.G., 2014. Mapping the conformational space accessible to catechol-O-methyltransferase. Acta Crystallogr. D Biol. Crystallogr. 70, 2163–2174. https://doi.org/10.1107/S1399004714012917

Eisenhofer, G., Coughtrie, M.W., Goldstein, D.S., 1999. Dopamine sulphate: an enigma resolved. Clin. Exp. Pharmacol. Physiol. Suppl. 26, S41-53.

Ermondi, G., Caron, G., Lawrence, R., Longo, D., 2004. Docking studies on NSAID/COX-2 isozyme complexes using contact statistics analysis. J. Comput. Aided Mol. Des. 18, 683–696. https://doi.org/10.1007/s10822-004-6258-1

Fabbri, M., Ferreira, J.J., Lees, A., Stocchi, F., Poewe, W., Tolosa, E., Rascol, O., 2018. Opicapone for the treatment of Parkinson’s disease: A review of a new licensed medicine. Mov. Disord. Off. J. Mov. Disord. Soc. 33, 1528–1539. https://doi.org/10.1002/mds.27475

Ferreira de Freitas, R., Schapira, M., 2017. A systematic analysis of atomic protein–ligand interactions in the PDB †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7md00381a. MedChemComm 8, 1970–1981. https://doi.org/10.1039/c7md00381a

Foldes, A., Meek, J.L., 1973. Rat brain phenolsulfotransferase—partial purification and some properties. Biochim. Biophys. Acta BBA - Enzymol. 327, 365–373. https://doi.org/10.1016/0005- 2744(73)90419-1

Fowler, S., Kletzl, H., Finel, M., Manevski, N., Schmid, P., Tuerck, D., Norcross, R.D., Hoener, M.C., Spleiss, O., Iglesias, V.A., 2015. A UGT2B10 splicing polymorphism common in african populations may greatly increase drug exposure. J. Pharmacol. Exp. Ther. 352, 358–367. https://doi.org/10.1124/jpet.114.220194 70 7 References

Freimuth, R.R., Wiepert, M., Chute, C.G., Wieben, E.D., Weinshilboum, R.M., 2004. Human cytosolic sulfotransferase database mining: identification of seven novel and pseudogenes. Pharmacogenomics J. 4, 54–65. https://doi.org/10.1038/sj.tpj.6500223

Friesner, R.A., Banks, J.L., Murphy, R.B., Halgren, T.A., Klicic, J.J., Mainz, D.T., Repasky, M.P., Knoll, E.H., Shelley, M., Perry, J.K., Shaw, D.E., Francis, P., Shenkin, P.S., 2004. Glide: A New Approach for Rapid, Accurate Docking and Scoring. 1. Method and Assessment of Docking Accuracy. J. Med. Chem. 47, 1739–1749. https://doi.org/10.1021/jm0306430

Fujiwara, R., Yokoi, T., Nakajima, M., 2016. Structure and Protein–Protein Interactions of Human UDP- Glucuronosyltransferases. Front. Pharmacol. 7. https://doi.org/10.3389/fphar.2016.00388

Gamage, N., Barnett, A., Hempel, N., Duggleby, R.G., Windmill, K.F., Martin, J.L., McManus, M.E., 2006. Human Sulfotransferases and Their Role in Chemical Metabolism. Toxicol. Sci. 90, 5–22. https://doi.org/10.1093/toxsci/kfj061

Gamage, N.U., Duggleby, R.G., Barnett, A.C., Tresillian, M., Latham, C.F., Liyou, N.E., McManus, M.E., Martin, J.L., 2003. Structure of a Human Carcinogen-converting Enzyme, SULT1A1 STRUCTURAL AND KINETIC IMPLICATIONS OF SUBSTRATE INHIBITION. J. Biol. Chem. 278, 7655–7662. https://doi.org/10.1074/jbc.M207246200

Germain, P., Staels, B., Dacquet, C., Spedding, M., Laudet, V., 2006. Overview of Nomenclature of Nuclear Receptors. Pharmacol. Rev. 58, 685–704. https://doi.org/10.1124/pr.58.4.2

Giakoumaki, S.G., Roussos, P., Bitsios, P., 2008. Improvement of prepulse inhibition and executive function by the COMT inhibitor tolcapone depends on COMT Val158Met polymorphism. Neuropsychopharmacol. Off. Publ. Am. Coll. Neuropsychopharmacol. 33, 3058–3068. https://doi.org/10.1038/npp.2008.82

Glatt, H., 2000. Sulfotransferases in the bioactivation of xenobiotics. Chem. Biol. Interact. 129, 141–170. https://doi.org/10.1016/S0009-2797(00)00202-7

Golbraikh, A., Tropsha, A., 2002. Beware of q2! J. Mol. Graph. Model., QSAR in vivo 20, 269–276. https://doi.org/10.1016/S1093-3263(01)00123-1

Goodford, P.J., 1985. A computational procedure for determining energetically favorable binding sites on biologically important macromolecules. J. Med. Chem. 28, 849–857. https://doi.org/10.1021/jm00145a002

Google Scholar [WWW Document], 2020. URL https://scholar.google.com/ (accessed 1.27.20).

Greschik, H., Flaig, R., Renaud, J.-P., Moras, D., 2004. Structural basis for the deactivation of the estrogen-related receptor gamma by diethylstilbestrol or 4-hydroxytamoxifen and determinants of selectivity. J. Biol. Chem. 279, 33639–33646. https://doi.org/10.1074/jbc.M402195200

Hansch, C., Maloney, P.P., Fujita, T., Muir, R.M., 1962. Correlation of Biological Activity of Phenoxyacetic Acids with Hammett Substituent Constants and Partition Coefficients. Nature 194, 178–180. https://doi.org/10.1038/194178b0

Harrison, S.T., Poslusney, M.S., Mulhearn, J.J., Zhao, Z., Kett, N.R., Schubert, J.W., Melamed, J.Y., Allison, T.J., Patel, S.B., Sanders, J.M., Sharma, S., Smith, R.F., Hall, D.L., Robinson, R.G., Sachs, N.A., Hutson, P.H., Wolkenberg, S.E., Barrow, J.C., 2015. Synthesis and Evaluation of Heterocyclic Catechol Mimics as Inhibitors of Catechol-O-methyltransferase (COMT). ACS Med. Chem. Lett. 6, 318–323. https://doi.org/10.1021/ml500502d

Henke, B.R., Consler, T.G., Go, N., Hale, R.L., Hohman, D.R., Jones, S.A., Lu, A.T., Moore, L.B., Moore, J.T., Orband-Miller, L.A., Robinett, R.G., Shearin, J., Spearing, P.K., Stewart, E.L., 71

Turnbull, P.S., Weaver, S.L., Williams, S.P., Wisely, G.B., Lambert, M.H., 2002. A new series of estrogen receptor modulators that display selectivity for estrogen receptor beta. J. Med. Chem. 45, 5492–5505. https://doi.org/10.1021/jm020291h

Henrich, S., Salo‐Ahen, O.M.H., Huang, B., Rippmann, F.F., Cruciani, G., Wade, R.C., 2010. Computational approaches to identifying and characterizing protein binding sites for ligand design. J. Mol. Recognit. 23, 209–219. https://doi.org/10.1002/jmr.984

Hilal, S.H., Karickhoff, S.W., Carreira, L.A., 1995. A Rigorous Test for SPARC’s Chemical Reactivity Models: Estimation of More Than 4300 Ionization pKas. Quant. Struct.-Act. Relatsh. 14, 348– 355. https://doi.org/10.1002/qsar.19950140405

Hildebrandt, M. a. T., Carrington, D.P., Thomae, B.A., Eckloff, B.W., Schaid, D.J., Yee, V.C., Weinshilboum, R.M., Wieben, E.D., 2007. Genetic diversity and function in the human cytosolic sulfotransferases. Pharmacogenomics J. 7, 133–143. https://doi.org/10.1038/sj.tpj.6500404

Hirvensalo, P., Tornio, A., Neuvonen, M., Tapaninen, T., Paile‐Hyvärinen, M., Kärjä, V., Männistö, V.T., Pihlajamäki, J., Backman, J.T., Niemi, M., 2018. Comprehensive Pharmacogenomic Study Reveals an Important Role of UGT1A3 in Montelukast Pharmacokinetics. Clin. Pharmacol. Ther. 104, 158–168. https://doi.org/10.1002/cpt.891

Hoppe, C., Steinbeck, C., Wohlfahrt, G., 2006. Classification and comparison of ligand-binding sites derived from grid-mapped knowledge-based potentials. J. Mol. Graph. Model. 24, 328–340. https://doi.org/10.1016/j.jmgm.2005.09.013

Hou, T., Xu, X., 2001. Three-dimensional quantitative structure–activity relationship analyses of a series of cinnamamides. Chemom. Intell. Lab. Syst. 56, 123–132. https://doi.org/10.1016/S0169- 7439(01)00116-2

Hu, M., 2007. Commentary: Bioavailability of Flavonoids and Polyphenols: Call to Arms. Mol. Pharm. 4, 803–806. https://doi.org/10.1021/mp7001363

Jaffé, H.H., 1953. A Reëxamination of the Hammett Equation. Chem. Rev. 53, 191–261. https://doi.org/10.1021/cr60165a003

Jančová, P., Šiller, M., 2012. Phase II Drug Metabolism. Top. Drug Metab. https://doi.org/10.5772/29996

Jatana, N., Sharma, A., Latha, N., 2013. Pharmacophore modeling and virtual screening studies to design potential COMT inhibitors as new leads. J. Mol. Graph. Model. 39, 145–164. https://doi.org/10.1016/j.jmgm.2012.10.010

Jin, W., Barzilay, R., Jaakkola, T., 2019. Junction Tree Variational Autoencoder for Molecular Graph Generation. ArXiv180204364 Cs Stat.

Jorga, K., Fotteler, B., Heizmann, P., Gasser, R., 1999. Metabolism and excretion of tolcapone, a novel inhibitor of catechol-O-methyltransferase. Br. J. Clin. Pharmacol. 48, 513–520. https://doi.org/10.1046/j.1365-2125.1999.00036.x

Jorgensen, W.L., Maxwell, D.S., Tirado-Rives, J., 1996. Development and Testing of the OPLS All- Atom Force Field on Conformational Energetics and Properties of Organic Liquids. J. Am. Chem. Soc. 118, 11225–11236. https://doi.org/10.1021/ja9621760

Käenmäki, M., Tammimäki, A., Myöhänen, T., Pakarinen, K., Amberg, C., Karayiorgou, M., Gogos, J.A., Männistö, P.T., 2010. Quantitative role of COMT in dopamine clearance in the prefrontal cortex of freely moving mice. J. Neurochem. 114, 1745–1755. https://doi.org/10.1111/j.1471- 4159.2010.06889.x 72 7 References

Kaivosaari, S., Finel, M., Koskinen, M., 2011. N-glucuronidation of drugs and other xenobiotics by human and animal UDP-glucuronosyltransferases. Xenobiotica 41, 652–669. https://doi.org/10.3109/00498254.2011.563327

Kakuta, Y., Pedersen, L.G., Carter, C.W., Negishi, M., Pedersen, L.C., 1997. Crystal structure of estrogen sulphotransferase. Nat. Struct. Biol. 4, 904–908. https://doi.org/10.1038/nsb1197-904

Kakuta, Y., Petrotchenko, E.V., Pedersen, L.C., Negishi, M., 1998. The Sulfuryl Transfer Mechanism CRYSTAL STRUCTURE OF A VANADATE COMPLEX OF ESTROGEN SULFOTRANSFERASE AND MUTATIONAL ANALYSIS. J. Biol. Chem. 273, 27325– 27330. https://doi.org/10.1074/jbc.273.42.27325

Kalgutkar, A.S., Gardner, I., Obach, R.S., Shaffer, C.L., Callegari, E., Henne, K.R., Mutlib, A.E., Dalvie, D.K., Lee, J.S., Nakai, Y., O’Donnell, J.P., Boer, J., Harriman, S.P., 2005. A comprehensive listing of bioactivation pathways of organic functional groups. Curr. Drug Metab. 6, 161–225. https://doi.org/10.2174/1389200054021799

Karickhoff, S.W., Mcdaniel, V.K., Melton, C., Vellino, A.N., Nute, D.E., Carreira, L.A., 1991. Predicting chemical reactivity by computer. Environ. Toxicol. Chem. 10, 1405–1416. https://doi.org/10.1002/etc.5620101105

Kauppi, B., Jakob, C., Färnegårdh, M., Yang, J., Ahola, H., Alarcon, M., Calles, K., Engström, O., Harlan, J., Muchmore, S., Ramqvist, A.-K., Thorell, S., Ohman, L., Greer, J., Gustafsson, J.-A., Carlstedt-Duke, J., Carlquist, M., 2003. The three-dimensional structures of antagonistic and agonistic forms of the glucocorticoid receptor ligand-binding domain: RU-486 induces a transconformation that leads to active antagonism. J. Biol. Chem. 278, 22748–22754. https://doi.org/10.1074/jbc.M212711200

Kellogg, G.E., Semus, S.F., Abraham, D.J., 1991. HINT: A new method of empirical hydrophobic field calculation for CoMFA. J. Comput. Aided Mol. Des. 5, 545–552. https://doi.org/10.1007/BF00135313

Kim, K.H., Greco, G., Novellino, E., 1998. A critical review of recent CoMFA applications. Perspect. Drug Discov. Des. 12, 257–315. https://doi.org/10.1023/A:1017010811581

Klaholz, B.P., Mitschler, A., Moras, D., 2000. Structural basis for isotype selectivity of the human retinoic acid nuclear receptor11Edited by T. Richmond. J. Mol. Biol. 302, 155–170. https://doi.org/10.1006/jmbi.2000.4032

Klaholz, B.P., Moras, D., 2002. C–H···O Hydrogen Bonds in the Nuclear Receptor RARγ—a Potential Tool for Drug Selectivity. Structure 10, 1197–1204. https://doi.org/10.1016/S0969- 2126(02)00828-6

Kruse, S.W., Suino-Powell, K., Zhou, X.E., Kretschman, J.E., Reynolds, R., Vonrhein, C., Xu, Y., Wang, L., Tsai, S.Y., Tsai, M.-J., Xu, H.E., 2008. Identification of COUP-TFII orphan nuclear receptor as a retinoic acid-activated receptor. PLoS Biol. 6, e227. https://doi.org/10.1371/journal.pbio.0060227

Kuhn, B., Kollman, P.A., 2000. QM−FE and Molecular Dynamics Calculations on Catechol O- Methyltransferase: Free Energy of Activation in the Enzyme and in Aqueous Solution and Regioselectivity of the Enzyme-Catalyzed Reaction. J. Am. Chem. Soc. 122, 2586–2596. https://doi.org/10.1021/ja992218v

Langguth, H.S., Benet, L.Z., 1992. Acyl Glucuronides Revisited: Is the Glucuronidation Proces a Toxification as Well as a Detoxification Mechanism? Drug Metab. Rev. 24, 5–47. https://doi.org/10.3109/03602539208996289 73

Laskowski, R.A., Thornton, J.M., Humblet, C., Singh, J., 1996. X-SITE: Use of Empirically Derived Atomic Packing Preferences to Identify Favourable Interaction Regions in the Binding Sites of Proteins. J. Mol. Biol. 259, 175–201. https://doi.org/10.1006/jmbi.1996.0311

Lautala, P., Ulmanen, I., Taskinen, J., 2001. Molecular mechanisms controlling the rate and specificity of catechol O-methylation by human soluble catechol O-methyltransferase. Mol. Pharmacol. 59, 393–402.

Lerner, C., Jakob-Roetne, R., Buettelmann, B., Ehler, A., Rudolph, M., Rodríguez Sarmiento, R.M., 2016. Design of Potent and Druglike Nonphenolic Inhibitors for Catechol O-Methyltransferase Derived from a Fragment Screening Approach Targeting the S-Adenosyl-l-methionine Pocket. J. Med. Chem. 59, 10163–10175. https://doi.org/10.1021/acs.jmedchem.6b00927

Lerner, C., Ruf, A., Gramlich, V., Masjost, B., Zürcher, G., Jakob‐Roetne, R., Borroni, E., Diederich, F., 2001. X-ray Crystal Structure of a Bisubstrate Inhibitor Bound to the Enzyme Catechol-O- methyltransferase: A Dramatic Effect of Inhibitor Preorganization on Binding Affinity. Angew. Chem. Int. Ed. 40, 4040–4042. https://doi.org/10.1002/1521-3773(20011105)40:21<4040::AID- ANIE4040>3.0.CO;2-C

Lill, M.A., 2011. Efficient incorporation of protein flexibility and dynamics into molecular docking simulations. Biochemistry 50, 6157–6169. https://doi.org/10.1021/bi2004558

Lotta, T., Taskinen, J., Bäckström, R., Nissinen, E., 1992. PLS modelling of structure-activity relationships of catechol O-methyltransferase inhibitors. J. Comput. Aided Mol. Des. 6, 253– 272.

Lotta, T., Vidgren, J., Tilgmann, C., Ulmanen, I., Melén, K., Julkunen, I., Taskinen, J., 1995. Kinetics of human soluble and membrane-bound catechol O-methyltransferase: a revised mechanism and description of the thermolabile variant of the enzyme. Biochemistry 34, 4202–4210.

Lu, J.-H., Li, H.-T., Liu, M.-C., Zhang, J.-P., Li, M., An, X.-M., Chang, W.-R., 2005. Crystal structure of human sulfotransferase SULT1A3 in complex with dopamine and 3′-phosphoadenosine 5′- phosphate. Biochem. Biophys. Res. Commun. 335, 417–423. https://doi.org/10.1016/j.bbrc.2005.07.091

Lundström, K., Tenhunen, J., Tilgmann, C., Karhunen, T., Panula, P., Ulmanen, I., 1995. Cloning, expression and structure of catechol-O-methyltransferase. Biochim. Biophys. Acta BBA - Protein Struct. Mol. Enzymol. 1251, 1–10. https://doi.org/10.1016/0167-4838(95)00071-2

Mackenzie, P.I., Bock, K.W., Burchell, B., Guillemette, C., Ikushiro, S., Iyanagi, T., Miners, J.O., Owens, I.S., Nebert, D.W., 2005. Nomenclature update for the mammalian UDP glycosyltransferase (UGT) gene superfamily. Pharmacogenet. Genomics 15, 677–685. https://doi.org/10.1097/01.fpc.0000173483.13689.56

Maestro, v8.0. Maestro. Schrödinger, LLC., Portland, OR.

Magarkar, A., Parkkila, P., Viitala, T., Lajunen, T., Mobarak, E., Licari, G., Cramariuc, O., Vauthey, E., Róg, T., Bunker, A., 2018. Membrane bound COMT isoform is an interfacial enzyme: general mechanism and new drug design paradigm. Chem. Commun. 54, 3440–3443. https://doi.org/10.1039/C8CC00221E

Männistö, P.T., Kaakkola, S., 1999. Catechol-O-methyltransferase (COMT): Biochemistry, Molecular Biology, Pharmacology, and Clinical Efficacy of the New Selective COMT Inhibitors. Pharmacol. Rev. 51, 593–628. 74 7 References

Martiny, V.Y., Carbonell, P., Lagorce, D., Villoutreix, B.O., Moroy, G., Miteva, M.A., 2013. In silico mechanistic profiling to probe small molecule binding to sulfotransferases. PloS One 8, e73587. https://doi.org/10.1371/journal.pone.0073587

Mazaleuskaya, L.L., Sangkuhl, K., Thorn, C.F., FitzGerald, G.A., Altman, R.B., Klein, T.E., 2015. PharmGKB summary: Pathways of acetaminophen metabolism at the therapeutic versus toxic doses. Pharmacogenet. Genomics 25, 416–426. https://doi.org/10.1097/FPC.0000000000000150

Meiser, J., Weindl, D., Hiller, K., 2013. Complexity of dopamine metabolism. Cell Commun. Signal. 11, 34. https://doi.org/10.1186/1478-811X-11-34

Mestres, J., Martín-Couce, L., Gregori-Puigjané, E., Cases, M., Boyer, S., 2006. Ligand-Based Approach to In Silico Pharmacology: Nuclear Receptor Profiling. J. Chem. Inf. Model. 46, 2725–2736. https://doi.org/10.1021/ci600300k

Miners, J.O., McKinnon, R.A., Mackenzie, P.I., 2002. Genetic polymorphisms of UDP- glucuronosyltransferases and their functional significance. Toxicology 181–182, 453–456. https://doi.org/10.1016/S0300-483X(02)00449-3

MOE, v2008.10. MOE Molecular Operating Environment. Chemical Computing Group Inc., Montreal, Canada.

MOPAC, 2000. MOPAC Quantum Chemistry Program Exchange. Indiana University, IN.

Mueller, J.W., Gilligan, L.C., Idkowiak, J., Arlt, W., Foster, P.A., 2015. The Regulation of Steroid Action by Sulfation and Desulfation. Endocr. Rev. 36, 526–563. https://doi.org/10.1210/er.2015-1036

Mulder, G.J., Meerman, J.H., 1983. Sulfation and glucuronidation as competing pathways in the metabolism of hydroxamic acids: the role of N,O-sulfonation in chemical carcinogenesis of aromatic amines. Environ. Health Perspect. 49, 27–32. https://doi.org/10.1289/ehp.834927

Myöhänen, T.T., Schendzielorz, N., Männistö, P.T., 2010. Distribution of catechol-O-methyltransferase (COMT) proteins and enzymatic activities in wild-type and soluble COMT deficient mice. J. Neurochem. 113, 1632–1643. https://doi.org/10.1111/j.1471-4159.2010.06723.x

Negishi, M., Pedersen, L.G., Petrotchenko, E., Shevtsov, S., Gorokhov, A., Kakuta, Y., Pedersen, L.C., 2001. Structure and Function of Sulfotransferases. Arch. Biochem. Biophys. 390, 149–157. https://doi.org/10.1006/abbi.2001.2368

Northrop, D.B., 1998. On the Meaning of Km and V/K in Enzyme Kinetics. J. Chem. Educ. 75, 1153. https://doi.org/10.1021/ed075p1153

Ohno, S., Nakajin, S., 2009. Determination of mRNA Expression of Human UDP- Glucuronosyltransferases and Application for Localization in Various Human Tissues by Real- Time Reverse Transcriptase-Polymerase Chain Reaction. Drug Metab. Dispos. 37, 32–40. https://doi.org/10.1124/dmd.108.023598

Orłowski, A., St-Pierre, J.-F., Magarkar, A., Bunker, A., Pasenkiewicz-Gierula, M., Vattulainen, I., Róg, T., 2011. Properties of the Membrane Binding Component of Catechol-O-methyltransferase Revealed by Atomistic Molecular Dynamics Simulations. J. Phys. Chem. B 115, 13541–13550. https://doi.org/10.1021/jp207177p

Ovaska, M., Yliniemela, A., 1998. A semiempirical study on inhibition of catechol O-methyltransferase by substituted catechols. J. Comput. Aided Mol. Des. 12, 301–307. https://doi.org/10.1023/a:1007965026738

Pallas, demo 3.1. Pallas pKalc. Compudrug International Inc., Sedona, AZ, USA. 75

Palma, P.N., Bonifácio, M.J., Loureiro, A.I., Soares‐da‐Silva, P., 2012. Computation of the binding affinities of catechol-O-methyltransferase inhibitors: Multisubstate relative free energy calculations. J. Comput. Chem. 33, 970–986. https://doi.org/10.1002/jcc.22926

Palma, P.N., Bonifácio, M.J., Loureiro, A.I., Wright, L.C., Learmonth, D.A., Soares-da-Silva, P., 2003. Molecular Modeling and Metabolic Studies of The Interaction of Catechol-O-Methyltransferase and a New Nitrocatechol Inhibitor. Drug Metab. Dispos. 31, 250–258. https://doi.org/10.1124/dmd.31.3.250

Panigrahi, S.K., Desiraju, G.R., 2007. Strong and weak hydrogen bonds in the protein–ligand interface. Proteins Struct. Funct. Bioinforma. 67, 128–141. https://doi.org/10.1002/prot.21253

Patel, C.N., Georrge, J.J., Modi, K.M., Narechania, M.B., Patel, D.P., Gonzalez, F.J., Pandya, H.A., 2018. Pharmacophore-based virtual screening of catechol-o-methyltransferase (COMT) inhibitors to combat Alzheimer’s disease. J. Biomol. Struct. Dyn. 36, 3938–3957. https://doi.org/10.1080/07391102.2017.1404931

Paul, D., Standifer, K.M., Inturrisi, C.E., Pasternak, G.W., 1989. Pharmacological characterization of morphine-6 beta-glucuronide, a very potent morphine metabolite. J. Pharmacol. Exp. Ther. 251, 477–483.

PDB statistics [WWW Document], 2020. . PDB Stat. Protein- Struct. Released Year. URL http://www.rcsb.org/stats/growth/protein (accessed 1.25.20).

Pereira, D.M., Valentão, P., Pereira, J.A., Andrade, P.B., 2009. Phenolics: From Chemistry to Biology. Molecules 14, 2202–2211. https://doi.org/10.3390/molecules14062202

Prime, v1.6. Prime. Schrödinger LLC., Portland, OR.

Rakers, C., Schumacher, F., Meinl, W., Glatt, H., Kleuser, B., Wolber, G., 2015. In Silico Prediction of Human Sulfotransferase 1E1 Activity Guided by Pharmacophores from Molecular Dynamics Simulations. J. Biol. Chem. 291, 58–71. https://doi.org/10.1074/jbc.M115.685610

Reiter, C., Weinshilboum, R.M., 1982. Acetaminophen and phenol: substrates for both a thermostable and a thermolabile form of human platelet phenol sulfotransferase. J. Pharmacol. Exp. Ther. 221, 43– 51.

Riches, Z., Stanley, E.L., Bloomer, J.C., Coughtrie, M.W.H., 2009. Quantitative Evaluation of the Expression and Activity of Five Major Sulfotransferases (SULTs) in Human Tissues: The SULT “Pie.” Drug Metab. Dispos. 37, 2255–2261. https://doi.org/10.1124/dmd.109.028399

Rowland, A., Miners, J.O., Mackenzie, P.I., 2013. The UDP-glucuronosyltransferases: Their role in drug metabolism and detoxification. Int. J. Biochem. Cell Biol. 45, 1121–1132. https://doi.org/10.1016/j.biocel.2013.02.019

Roy, K., Kar, S., Das, R.N., 2015. Chapter 8 - Introduction to 3D-QSAR, in: Roy, K., Kar, S., Das, R.N. (Eds.), Understanding the Basics of QSAR for Applications in Pharmaceutical Sciences and Risk Assessment. Academic Press, Boston, pp. 291–317. https://doi.org/10.1016/B978-0-12-801505- 6.00008-9

Rutherford, K., Le Trong, I., Stenkamp, R.E., Parson, W.W., 2008. Crystal Structures of Human 108V and 108M Catechol O-Methyltransferase. J. Mol. Biol. 380, 120–130. https://doi.org/10.1016/j.jmb.2008.04.040

Ruuska, T., 2011. Sytosolisten sulfotransferaasien aktiivisten paikkojen tietämysperustaisten potentiaalien hyödyntäminen lääkeaineiden virtuaaliseulonnassa (Pro gradu). University of Eastern Finland, Kuopio. 76 7 References

Salman, E.D., Kadlubar, S.A., Falany, C.N., 2009. Expression and Localization of Cytosolic Sulfotransferase (SULT) 1A1 and SULT1A3 in Normal Human Brain. Drug Metab. Dispos. 37, 706–709. https://doi.org/10.1124/dmd.108.025767

Schacht, J.P., 2016. COMT val158met moderation of dopaminergic drug effects on cognitive function: A critical review. Pharmacogenomics J. 16, 430–438. https://doi.org/10.1038/tpj.2016.43

Scheggia, D., Sannino, S., Papaleo, M.L.S. and F., 2012. COMT as a Drug Target for Cognitive Functions and Dysfunctions [WWW Document]. CNS Neurol. Disord. - Drug Targets. URL http://www.eurekaselect.com/98408/article (accessed 1.23.20).

Sharma, V., Duffel, M.W., 2005. A Comparative Molecular Field Analysis‐Based Approach to Prediction of Sulfotransferase Catalytic Specificity, in: Methods in Enzymology, Phase II Conjugation Enzymes and Transport Systems. Academic Press, pp. 249–263. https://doi.org/10.1016/S0076- 6879(05)00014-5

Sharma, V., Duffel, M.W., 2002. Comparative Molecular Field Analysis of Substrates for an Aryl Sulfotransferase Based on Catalytic Mechanism and Protein Homology Modeling. J. Med. Chem. 45, 5514–5522. https://doi.org/10.1021/jm010481c

Shedlovsky, T., 1962. The behaviour of carboxylic acids in mixed solvents, in: Pesce, B. (Ed.), Electrolytes. Pergamon Press, New York, pp. 146–151.

Sherman, W., Day, T., Jacobson, M.P., Friesner, R.A., Farid, R., 2006. Novel Procedure for Modeling Ligand/Receptor Induced Fit Effects. J. Med. Chem. 49, 534–553. https://doi.org/10.1021/jm050540c

Shiau, A.K., Barstad, D., Radek, J.T., Meyers, M.J., Nettles, K.W., Katzenellenbogen, B.S., Katzenellenbogen, J.A., Agard, D.A., Greene, G.L., 2002. Structural characterization of a subtype-selective ligand reveals a novel mode of estrogen receptor antagonism. Nat. Struct. Biol. 9, 359–364. https://doi.org/10.1038/nsb787

Sidharthan, N.P., Butcher, N.J., Mitchell, D.J., Minchin, R.F., 2014. Expression of the Orphan Cytosolic Sulfotransferase SULT4A1 and Its Major Splice Variant in Human Tissues and Cells: Dimerization, Degradation and Polyubiquitination. PLoS ONE 9. https://doi.org/10.1371/journal.pone.0101520

Sipilä, J., 2006. Predicting Affinity of Human Catechol O-Methyl Transferase Ligands with an Approach Combining One- and Three-dimensional QSAR. Presented at the EuroQSAR 2006, Civitavecchia, Italy.

Śledź, P., Caflisch, A., 2018. Protein structure-based drug design: from docking to molecular dynamics. Curr. Opin. Struct. Biol., Folding and binding in silico, in vitro and in cellula • Proteins: An Evolutionary Perspective 48, 93–102. https://doi.org/10.1016/j.sbi.2017.10.010

Smith, M.H., 1966. The amino acid composition of proteins. J. Theor. Biol. 13, 261–282. https://doi.org/10.1016/0022-5193(66)90021-X

SPARTAN, v5.0. SPARTAN. Wavefunction Inc., Irvine, CA.

SPSS, v11.5.1. SPSS. SPSS Inc., Chicago, IL.

SYBYL, v6.8. SYBYL. Tripos Inc., St. Louis, MO.

Takano, M., Sugiyama, T., 2017. UGT1A1 polymorphisms in cancer: impact on irinotecan treatment. Pharmacogenomics Pers. Med. 10, 61–68. https://doi.org/10.2147/PGPM.S108656 77

Takizawa, R., Tochigi, M., Kawakubo, Y., Marumo, K., Sasaki, T., Fukuda, M., Kasai, K., 2009. Association between Catechol-O-Methyltrasferase Val108/158Met Genotype and Prefrontal Hemodynamic Response in Schizophrenia. PLoS ONE 4. https://doi.org/10.1371/journal.pone.0005495

Tan, Z., Zhang, S., 2014. Combine molecular docking and QSAR to predict drug metabolism by sultfotransferase1a3 (SULT1A3). Med. Chem. https://doi.org/10.4172/2161-0444.S1.013

Tang, C., Wang, W., Shi, M., Zhang, N., Zhou, X., Li, X., Ma, C., Chen, G., Xiang, J., Gao, D., 2019. Meta-Analysis of the Effects of the Catechol-O-Methyltransferase Val158/108Met Polymorphism on Parkinson’s Disease Susceptibility and Cognitive Dysfunction. Front. Genet. 10. https://doi.org/10.3389/fgene.2019.00644

Taskinen, J., Vidgren, J., Ovaska, M., Backström, R., Pippuri, A., Nissinen, E., 1989. QSAR and Binding Model for Inhibition of Rat Liver Catechol-O-Methyl-Transferase by 1,5-Substituted-3,4- Dihydroxybenzenes. Quant. Struct.-Act. Relatsh. 8, 210–213. https://doi.org/10.1002/qsar.19890080304

Tenhunen, J., Salminen, M., Lundström, K., Kiviluoto, T., Savolainen, R., Ulmanen, I., 1994. Genomic organization of the human catechol O-methyltransferase gene and its expression from two distinct promoters. Eur. J. Biochem. 223, 1049–1059. https://doi.org/10.1111/j.1432- 1033.1994.tb19083.x

Tervo, A.J., Nyrönen, T.H., Rönkkö, T., Poso, A., 2003. A structure-activity relationship study of catechol-O-methyltransferase inhibitors combining molecular docking and 3D QSAR methods. J. Comput. Aided Mol. Des. 17, 797–810. https://doi.org/10.1023/b:jcam.0000021831.47952.a7

Tibbs, Z.E., Rohn-Glowacki, K.J., Crittenden, F., Guidry, A.L., Falany, C.N., 2015. Structural plasticity in the human cytosolic sulfotransferase dimer and its role in substrate selectivity and catalysis. Drug Metab. Pharmacokinet. 30, 3–20. https://doi.org/10.1016/j.dmpk.2014.10.004

Trievel, R.C., Scheiner, S., 2018. Crystallographic and Computational Characterization of Methyl Tetrel Bonding in S-Adenosylmethionine-Dependent Methyltransferases. Molecules 23, 2965. https://doi.org/10.3390/molecules23112965

Tripathi, A.C., Upadhyay, S., Paliwal, S., Saraf, S.K., 2018. Privileged scaffolds as MAO inhibitors: Retrospect and prospects. Eur. J. Med. Chem. 145, 445–497. https://doi.org/10.1016/j.ejmech.2018.01.003

Tsai, K.-C., Chen, Y.-C., Hsiao, N.-W., Wang, C.-L., Lin, C.-L., Lee, Y.-C., Li, M., Wang, B., 2010. A comparison of different electrostatic potentials on prediction accuracy in CoMFA and CoMSIA studies. Eur. J. Med. Chem. 45, 1544–1551. https://doi.org/10.1016/j.ejmech.2009.12.063

Tsuji, E., Okazaki, K., Takeda, K., 2009. Crystal structures of rat catechol-O-methyltransferase complexed with coumarine-based inhibitor. Biochem. Biophys. Res. Commun. 378, 494–497. https://doi.org/10.1016/j.bbrc.2008.11.085

Tunbridge, E.M., Bannerman, D.M., Sharp, T., Harrison, P.J., 2004. Catechol-O-Methyltransferase Inhibition Improves Set-Shifting Performance and Elevates Stimulated Dopamine Release in the Rat Prefrontal Cortex. J. Neurosci. 24, 5331–5335. https://doi.org/10.1523/JNEUROSCI.1124- 04.2004

Turner, R.M., Park, B.K., Pirmohamed, M., 2015. Parsing interindividual drug variability: an emerging role for systems pharmacology. Wiley Interdiscip. Rev. Syst. Biol. Med. 7, 221–241. https://doi.org/10.1002/wsbm.1302 78 7 References

Varma, M.V., Steyn, S.J., Allerton, C., El-Kattan, A.F., 2015. Predicting Clearance Mechanism in Drug Discovery: Extended Clearance Classification System (ECCS). Pharm. Res. 32, 3785–3802. https://doi.org/10.1007/s11095-015-1749-4

Verdonk, M.L., Cole, J.C., Taylor, R., 1999. SuperStar: A Knowledge-based Approach for Identifying Interaction Sites in Proteins. J. Mol. Biol. 289, 1093–1108. https://doi.org/10.1006/jmbi.1999.2809

Vidgren, J., Svensson, L.A., Liljas, A., 1994. Crystal structure of catechol O-methyltransferase. Nature 368, 354–358. https://doi.org/10.1038/368354a0

Wang, P.C., Buu, N.T., Kuchel, O., Genest, J., 1983. Conjugation patterns of endogenous plasma catecholamines in human and rat. A new specific method for analysis of glucuronide-conjugated catecholamines. J. Lab. Clin. Med. 101, 141–151.

Wang, T., Cook, I., Leyh, T.S., 2017. The NSAID allosteric site of human cytosolic sulfotransferases. J. Biol. Chem. 292, 20305–20312. https://doi.org/10.1074/jbc.M117.817387

Wang, Y.-H., Trucksis, M., McElwee, J.J., Wong, P.H., Maciolek, C., Thompson, C.D., Prueksaritanont, T., Garrett, G.C., Declercq, R., Vets, E., Willson, K.J., Smith, R.C., Klappenbach, J.A., Opiteck, G.J., Tsou, J.A., Gibson, C., Laethem, T., Panorchan, P., Iwamoto, M., Shaw, P.M., Wagner, J.A., Harrelson, J.C., 2012. UGT2B17 Genetic Polymorphisms Dramatically Affect the Pharmacokinetics of MK-7246 in Healthy Subjects in a First-in-Human Study. Clin. Pharmacol. Ther. 92, 96–102. https://doi.org/10.1038/clpt.2012.20

Wärnmark, A., Treuter, E., Gustafsson, J.-A., Hubbard, R.E., Brzozowski, A.M., Pike, A.C.W., 2002. Interaction of transcriptional intermediary factor 2 nuclear receptor box peptides with the coactivator binding site of estrogen receptor alpha. J. Biol. Chem. 277, 21862–21868. https://doi.org/10.1074/jbc.M200764200

Wikberg, T., Ottoila, P., Taskinen, J., 1993. Identification of major urinary metabolites of the catechol-O- methyltransferase inhibitor entacapone in the dog. Eur. J. Drug Metab. Pharmacokinet. 18, 359– 367.

Williams, A., 2003. Free Energy Relationships in Organic and Bio-Organic Chemistry. https://doi.org/10.1039/9781847550927

Williams, J.A., Hyland, R., Jones, B.C., Smith, D.A., Hurst, S., Goosen, T.C., Peterkin, V., Koup, J.R., Ball, S.E., 2004. Drug-Drug Interactions for Udp-Glucuronosyltransferase Substrates: A Pharmacokinetic Explanation for Typically Observed Low Exposure (auci/Auc) Ratios. Drug Metab. Dispos. 32, 1201–1208. https://doi.org/10.1124/dmd.104.000794

Wishart, D.S., Feunang, Y.D., Guo, A.C., Lo, E.J., Marcu, A., Grant, J.R., Sajed, T., Johnson, D., Li, C., Sayeeda, Z., Assempour, N., Iynkkaran, I., Liu, Y., Maciejewski, A., Gale, N., Wilson, A., Chin, L., Cummings, R., Le, D., Pon, A., Knox, C., Wilson, M., 2018. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 46, D1074–D1082. https://doi.org/10.1093/nar/gkx1037

Wrobel, J., Steffan, R., Bowen, S.M., Magolda, R., Matelan, E., Unwalla, R., Basso, M., Clerin, V., Gardell, S.J., Nambi, P., Quinet, E., Reminick, J.I., Vlasuk, G.P., Wang, S., Feingold, I., Huselton, C., Bonn, T., Farnegardh, M., Hansson, T., Nilsson, A.G., Wilhelmsson, A., Zamaratski, E., Evans, M.J., 2008. Indazole-Based Liver X Receptor (LXR) Modulators with Maintained Atherosclerotic Lesion Reduction Activity but Diminished Stimulation of Hepatic Triglyceride Synthesis. J. Med. Chem. 51, 7161–7168. https://doi.org/10.1021/jm800799q 79

Yamamoto, A., Liu, M.-Y., Kurogi, K., Sakakibara, Y., Saeki, Y., Suiko, M., Liu, M.-C., 2015. Sulphation of acetaminophen by the human cytosolic sulfotransferases: a systematic analysis. J. Biochem. (Tokyo) 158, 497–504. https://doi.org/10.1093/jb/mvv062

Yasuda, M., 1959. Dissociation constants of some carboxylic acids in mixed aqueous solvents. Bull Chem Soc Jpn 32, 429–432.

Zhavoronkov, A., Ivanenkov, Y.A., Aliper, A., Veselov, M.S., Aladinskiy, V.A., Aladinskaya, A.V., Terentiev, V.A., Polykovskiy, D.A., Kuznetsov, M.D., Asadulaev, A., Volkov, Y., Zholus, A., Shayakhmetov, R.R., Zhebrak, A., Minaeva, L.I., Zagribelnyy, B.A., Lee, L.H., Soll, R., Madge, D., Xing, L., Guo, T., Aspuru-Guzik, A., 2019. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040. https://doi.org/10.1038/s41587- 019-0224-x

Zheng, Y.-J., Bruice, T.C., 1997. A Theoretical Examination of the Factors Controlling the Catalytic Efficiency of a Transmethylation Enzyme: Catechol O-Methyltransferase. J. Am. Chem. Soc. 119, 8137–8145. https://doi.org/10.1021/ja971019d