ENHANCEMENT OF THE SYNTHETIC SPROUT DE NOVO LIGAND DESIGN PROGRAM KNOWLEDGE BASE. SPROUT APPLICATION FOR 17β-HYDROXYSTEROID DEHYDROGENASE TYPE 1

Sari Alho

2005

Laboratory of Organic Chemistry, Department of Chemistry, University of Helsinki, Finland ENHANCEMENT OF THE SYNTHETIC SPROUT DE NOVO LIGAND DESIGN PROGRAM KNOWLEDGE BASE. SPROUT APPLICATION FOR 17β-HYDROXYSTEROID DEHYDROGENASE TYPE 1 ENZYME

Sari Alho

University of Helsinki Faculty of Science Department of Chemistry Laboratory of Organic Chemistry P.O. Box 55, FIN-00014 University of Helsinki

ACADEMIC DISSERTATION

To be presented with the permission of the Faculty of Science of the University of Helsinki for public criticism in Auditorium A 110 of the Department of Chemistry, A. I. Virtasen aukio 1, on April 1st, 2005 at 12 o’clock noon

Helsinki 2005

ISBN 952-91-6304-5 (paperback) ISBN 952-10-1354-0 (PDF) http://ethesis.helsinki.fi Helsinki 2005 Gummerus Oy 1

CONTENTS

ABSTRACT 4

ACKNOWLEDGEMENTS 6

ABBREVIATIONS 8 1. INTRODUCTION 11 1.1 Structure-based drug design 11 1.2 SPROUT and SynSPROUT 15 1.3 Biological background 17 2. AIMS OF THE STUDY 19 3. OVERVIEW OF SPROUT COMPONENT PROGRAM 21 3.1 Survey of de novo ligand design programs 21 3.2 SPROUT 24 3.2.1 Current developments of SPROUT 24 3.2.2 General features 25 3.2.3 CANGAROO 28 3.2.4 HIPPO 30 3.2.4.1 Boundary surface 31 3.2.4.2 HIPPO target sites 32 3.2.4.3 Pharmacophore module 38 3.2.5 ELEFANT 39 3.2.5.1 SPROUT template library 40 3.2.6 SPIDER 43 3.2.6.1 User defined parameters 44 3.2.6.2 Template joining 45 3.2.6.3 The search process 47 3.2.7 ALLIGATOR 48 3.3 SynSPROUT 50 3.3.1 Knowledge base and PATRAN language 51 3.3.1.1 Chemical patterns 51 3.3.1.2 Joining rules 51 3.3.1.3 Other specifications 52 3.3.2 New fragment library 53 2

3.3.3 Differences between Classic and SynSPROUT 53 3.4 Further modelling applications 54 3.4.1 Moloc 54 3.4.2 MacroModel 54 3.4.3 AutoDock 55 3.4.4 eHiTS® 55 3.4.5 SPA-Docking 55 3.4.6 CAESA 56 4. REVIEW OF HORMONES AND HYDROXYSTEROID DEHYDROGENASES 57 4.1 Structure of the steroid hormones 57 4.2 Physiological effects of estrogens 58 4.3 Estrogen biosynthesis 59 4.4 Hydroxysteroid dehydrogenase family 63 4.4.1 SDR and AKR protein superfamilies 63 4.4.2 Members of the hydroxysteroid dehydrogenase family important for human physiology 65 4.4.2.1 3β-Hydroxysteroid dehydrogenase/ketosteroid 66 4.4.2.2 11β-Hydroxysteroid dehydrogenase 66 4.4.2.3 3α-Hydroxysteroid dehydrogenase 67 4.4.2.4 20α-Hydroxysteroid dehydrogenase 68 4.4.2.5 Multiple specificities of hydroxysteroid dehydrogenases 69 4.4.3 17β-Hydroxysteroid dehydrogenase/ketosteroid reductase 69 4.4.3.1 Members of the 17βHSD/KSR family 73 4.4.3.2 Crystal structure information of the 17βHSD/KSR in PDB 78 4.4.3.3 Overall description of the 17βHSD/KSR (type 1) enzyme structure 79 4.4.3.4 Ligand-binding domain and the interactions 81 4.4.3.5 - 87 4.4.3.6 Reduction mechanism 88 4.4.3.7 Inhibition studies of 17βHSD/KSR 89 4.4.4 Crystallisation studies of estrogen receptor α and β 91 5. RESULTS AND DISCUSSION 94 3

5.1 Development of SynSPROUT knowledge base 94 5.1.1 1,3-Dipolar cycloaddition reactions 94 5.1.1.1 Azomethine ylides 96 5.1.1.2 Stereochemistry of the 1,3-dipolar cycloaddition reactions 99 5.1.2 Azomethine ylide chemical patterns and joining rules 100 5.2 Inhibitor design for 17βHSD/KSR1 104 5.2.1 Crystal structures selection 104 5.2.1.1 study of estradiol complex 105 5.2.1.2 Active site study of equilin complex 108 5.2.1.3 Active site study of dihydrotestosterone complex 112 5.2.1.4 Active site study of dehydroepiandrosterone complex 115 5.2.1.5 Active site studies of the estrogen receptor α and β 118 5.2.2 Structure generation 119 5.2.2.1 Structure generation for estradiol complex 121 5.2.2.2 Structure generation for equilin complex 125 5.2.2.3 Structure generation for dihydrotestosterone complex 129 5.2.2.4 Structure generation for dehydroepiandrosterone complex 132 5.2.2.5 New structure generation for dihydrotestosterone complex with the latest version of SPROUT 135 5.2.3 Examination of potential inhibitor structures 136 5.2.3.1 Modifications and optimisation studies of the selected structures 137 5.2.3.2 Energy optimisation and further analysis of selected molecules 144 5.2.4 Docking studies 152 5.2.4.1 Docking simulations into the 17βHSD/KSR1 active site 152 5.2.4.2 Docking simulations into the estrogen receptor α and β active sites 155 5.2.5 Retrosynthesis and synthesis plan 156 5.2.6 Retrosynthesis by CAESA 158 6. CONCLUSIONS AND FUTURE PERSPECTIVES 165 7. REFERENCES 167 8. APPENDICES 183 4

ABSTRACT As part of this thesis various de novo ligand design programs are briefly surveyed. The utilization and characteristics of the SPROUT ligand design program are presented in more detail. The thesis also discusses the process which led towards an extension of the knowledge base of the SynSPROUT ligand design program. It was visualized that pyrrolidine moieties might constitute a key structural element of the sought-after 17β-hydroxysteroid dehydrogenase/ketosteroid reductase type 1 candidates. Thus, a literature survey of azomethine ylide reactions, capable of producing pyrrolidine and related ring structures, was carried out in order to be able then to add the information regarding these reactions to the SynSPROUT program’s knowledge base. The text files were written for the knowledge base containing chemical patterns describing functional groups of 1,3-dipoles and dipolarophiles. For eventual addition into the knowledge base, the next step is to develop the ring formation programming language currently lacking in SynSPROUT.

A survey of the 17β-hydroxysteroid dehydrogenase/ketosteroid reductase enzyme family is also presented. These enzymes are responsible for the final step of the biosynthesis of the sex hormones. In many cases they also stimulate the proliferation of breast and prostate cancers. Because 17β-hydroxysteroid dehydrogenase/ketosteroid reductase type 1 (17β-HSD/KSR1) enzyme catalyses estrogen synthesis it is an attractive target for structure-based ligand design for the prevention and control of breast tumour growth.

The experimental work focused upon the inhibitor ligand design for 17β-HSD/KSR1 enzyme using the SPROUT de novo ligand design program. The three-dimensional crystal structure coordinates of the enzyme type 1 complexed with four different substrates (estradiol, equilin, dihydrotestosterone and dehydroepiandrosterone) have been used for the study. Structure generation of the novel ligand molecule libraries are described step-by-step. Thousands of new molecules were created for the enzyme active site. A set of molecules (64) were selected using SPROUT program’s scoring function and the ALLIGATOR module. Modifications of the functional groups and energy optimisations were carried out for the selected molecules. The interaction results of the optimised molecules were compared with SPROUT information. In silico docking simulations were performed for a promising subset of molecules and the best docking result was also compared with the original molecule generated using SPROUT. Both optimisation and docking simulation results supported the SPROUT generation results. The 5 retrosynthesis and synthetic plan for one new molecule is presented as an example and also the results of retrosynthetic program for the molecule.

6

ACKNOWLEDGEMENTS The experimental work for this thesis was carried out in the Laboratory of Organic Chemistry of the University of Helsinki and the Institute for Computer Applications in the Molecular Science (ICAMS) of the University of Leeds, United Kingdom.

I am most grateful to my supervisor, Professor Kristiina Wähälä, for introducing the fascinating world of molecular modelling to me. I particularly appreciate her numerous suggestions and helpful criticism during this work.

I am exceedingly grateful to Professor A. Peter Johnson and Dr. Kimmo Vihko for reviewing the manuscript of the thesis and for their helpful comments, and Dr. Louise Fletcher for revising the language of the present manuscript.

Sincere thanks to all members at Organic Chemistry Laboratory in Helsinki University. I wish to thank Emeritus Professor Tapio Hase for his constructive comments of the organic chemistry problems and Dr. Jorma Koskimies for his help with molecular modelling problems in the beginning of my studies. Many warm thanks to all former and present members of the Phyto-Syn group at Organic Chemistry Laboratory. I am deeply indebted to Barbara for her endless support and help.

Warm thanks are owing to the members of the ICAMS group in Leeds University. Special thanks to Vilmos, Aniko and Krisztina for their help during my work in Leeds University.

I would like to thank my friends and former study mates for their support. Thanks are extended to Katariina, Päivi ja Maarit for providing the berth during my stays in Helsinki. I am deeply grateful to my good friends in Leeds especially Sari and Houry with whom I have had many fruitful conversations.

I would like to express my deepest gratitude to my parents and siblings for their encouragement during my studies and special thanks to my father for financial support during the last year. Finally a multitude thanks to my fiancé Paul for his support, understanding and love.

7

Financial support from National Technology Agency of Finland (TEKES), The Academy of Finland, Marie Curie Fellowship Association, the Magnus Ehrnrooth Foundation, Etelä- Pohjanmaa Regional Fund of the Finnish Cultural Foundation and the University of Helsinki is gratefully acknowledged.

Leeds, United Kingdom, February 2005 Sari Alho 8

ABBREVIATIONS

1,3-DC 1,3-Dipolar cycloaddition 3α/3β-Adiol Androstanediol, androst-5α-an-3α/3β,17β-diol 3-D Three-dimensional 17βHSD/KSR 17β-Hydroxysteroid dehydrogenase/ketosteroid reductase 17βHSD/KSR1 17β-Hydroxysteroid dehydrogenase/ketosteroid reductase type 1 20αDHP 20α-Dihydroprogesterone, pregn-4-nen-20α-ol-3-one ∆4-dione Androstenedione, androst-4-en-3,17-dione ∆5-diol Androstenediol, androst-5-en-3β,17β-diol ADH Short Chain Adione Androstanedione, androst-5α-an-3,17β-dione ADT Androsterone, 3α-hydroxy-5α-androstan-17-one AKR Aldoketoreductase ALLIGATOR Algorithms for Ligand Testing and Ordering of Results AR Androgen Receptor CAESA Computer Assisted Estimation of Synthetic Accessibility CANGAROO Cleft ANalysis by Geometry based Algorithm Regardless Of the Orientation CoMFA Comparative Molecular Field Analysis CR Carbonyl Reductase CSD Cambridge Structural Database DHEA Dehydroepiandrosterone, 3β-hydroxy-5-androsten-17-one DHT Dihydrotestosterone, 17β-hydroxy-5α-androstan-3-one E1 Estrone, 3-hydroxyestra-1,3,5(10)-triene-17-one E2 Estradiol, estra-1,3,5(10)-triene-3,17β-diol E3 Estriol, estra-1,3,5(10)-triene-3,16α,17β-triol eHiTS Electronic High Throughput Screening ELEFANT Election of Functional Groups and Anchoring them to Target Sites ER Estrogen Receptor ERα Estrogen Receptor α ERβ Estrogen Receptor β ERE Estrogen Response Element 9

EQU Equilin, 3-hydroxyestra-1,3,5(10),7-tetraen-17-one FGI Functional Group Interconversion FMO Frontier Molecular Orbital GA Genetic Algorithm GA-LS Genetic Algorithm and Local Search HDE Hydratase Dehydrogenase HIPPO Hydrogen Bonding Interaction Site Prediction as Positions with Orientations ICAMS Institute for Computer Applications in the Molecular Science KSR Ketosteroid Reductase LBD Ligand-binding domain LEA Ligand Energy Alone LGA Lamarckian Genetic Algorithm LHASA Logic and Heuristic Applied to Synthetic Analysis NAD Nicotinamide Adenine Dinucleotide NADP Nicotinamide Adenine Dinucleotide Phosphate NMR Nuclear Magnetic Resonance NR Nuclear Receptor MCSS Multiple Copy Simultaneous Search MDDR MDL Drug Data Report MDL Molecular Design Ltd. MFE Multifunctional Enzyme MLE Minimised Ligand Energy MR Mineralocorticoid Receptor P450arom P450 aromatase ORAC Organic Reactions Accessed by Computer P Progesteron, pregn-4-nen-3,20-dione PDB Brookhaven Protein Data Bank PPARα Peroxisome Proliferators Activated Receptor α PTCR Porcine testicular carbonyl reductase REACCS Reaction ACCess System RMS Root Mean Square RoDH1 Retinal Dehydrogenase type 1 SCP2 Sterol Carrier Protein type 2 10

SD Structural Data file format of MDL SDR Steroid Dehydrogenase/Reductase SGI Silicon Graphics SPA Systematic Population Annealing SPIDER Structure Production with Interactive Design of Results T Testosterone, 17β-hydroxyandrost-4-en-3-one TIM Triose phosphate Isomerase TOPAS TOPology-Assigning System vdW van der Waals

Essential Amino Acids Alanine, Ala, A Arginine, Arg, R Aspartate, Asp, D Asparagine, Asn, N Cysteine, Cys, C Glutamate, Glu, E Glutamine, Gln, Q Glycine, Gly, G Histidine, His, H Isoleucine, Ile, I Leucine, Leu, L Lysine, Lys, K Methionine, Met, M Phenylalanine, Phe, F Proline, Pro, P Serine, Ser, S Threonine, Thr, T Tryptophan, Trp, W Tyrosine, Tyr, Y Valine, Val, V

11

1. INTRODUCTION

1.1 Structure-based drug design

Structure generation is one of the many approaches that are used in the computer-aided lead discovery cycle. It has been demonstrated for a large number of different molecular targets that the three-dimensional (3-D) structure of a protein can be used to design novel ligands.1,2 After selection of the therapeutically interesting target the crystal structure information of the enzyme is used for ligand designing. A noticeable amount of 3-D protein structures defined by X-ray crystallography, NMR-spectroscopy or theoretical homology modelling are available in the Brookhaven Protein Data Bank (PDB).3,4 Initially, structure-based designing concentrated on different procedures for screening databases of known 3-D chemical structures.5,6 The advantages of this approach are the commercially available chemical compounds or known synthesis. On the other hand, it is not possible to discover novel structures using this application. Nowadays, de novo design programs7 allow the possibility of designing new compounds without relying on existing molecule databases, such as MDL® Screening Compounds Directory.8 A de novo approach produces large sets of potential structures but its limitation is the synthetic accessibility problem.

Applications of 3-D searching can be divided in four groups as shown in Scheme 1 (page 12), depending on whether the structures of receptor and/or ligand are known. When receptor and ligand 3-D structures are unknown it is still possible to apply the Computer-Aided screening methods, which include the chemical similarity search, based on the structure (High Throughput Screening) and/or combinatorial chemistry application. If the ligand is known while the receptor is unknown it is feasible to apply database and similarity search to identify the pharmacophore of the ligand (Analogy-Base Drug Design). When only the receptor is known it is possible to perform de novo ligand design or receptor-based 3-D searching applications. Performing such virtual screening, compounds that match a given pharmacophore hypothesis are identified in silico. 12

Receptor structure

Unknown Known

Computer-aided Ligand structure Unknown De Novo Design Screening

Analogy-Based Structure-Based Known Drug Design Drug Design

Scheme 1. Applications of 3-D searching are divided in four groups.

De novo ligand design involves the generation of drugs, based purely on the structure of the binding site, so that bound molecule either inhibit or alter protein activity. Constraints of the binding site are defined using crystallographic information of the receptor. By knowing which amino acids are present in the binding site and where they are located, it is possible to identify the binding interactions. Good inhibitors must possess significant structural and chemical properties which are complementary to their target receptor; hereby molecule skeletons are generated to fit a set of steric-, electrostatic- and hydrophobic constraints. In addition to these constraints molecules should exist in low-energy conformation. After binding site characterisation it is possible to design a molecule that will have the correct size, geometry and necessary functional groups to interact with the amino acid residues. Structure-based and de novo ligand design processes have helped design new potential inhibitors that have also been tested in clinical trials.9,10,11 Although no new structures are based purely on any de novo ligand design program, research and development of these programs is significant, for the reason of assistance of the expeditious and efficient drug discovery.

Frequently the determined crystal structure information includes a ligand bound to the receptor and this ligand location is used as the binding site. If the receptor is unknown, constraints can be derived from a pharmacophore hypothesis. This definition enables creation of a picture of the receptor binding requirements by analysing the molecules, which are known to bind into the receptor. Many examples of successful 3-D searching using pharmacophore queries or hypothesis have been published12,13,14,15 and some expert systems for automatic prediction of pharmacophoric groups are presented.16

13

De novo ligand design process includes four main steps regardless of the design program:17

1) Definition of the constraints: Analysis of the X-ray crystallography information of the receptor provides one starting point for design. However in many cases crystal structures are not available. For this reason another starting point for ligand designing is the pharmacophore hypothesis where structure is generated to fit less well defined constraints. From constraints it is possible to identify the interaction sites. Usually it is also possible to define the volume of the binding site. 2) Skeleton generation: Once a number of interaction sites have been defined, then de novo design methods start to generate structures that have atoms or fragments placed at the interaction sites and have complementary shape, volume and appropriate chemical functional groups with the binding site. 3) Organisation of the results: Programs give large sets of answers and users need tools for navigating through these sets. These tools may include sorting, clustering and ranking techniques, which help in the selection of the designed molecule sets by estimating their chemical and biological properties. 4) Structure evaluation: New ligands must satisfy a number of criteria. For example, a potential enzyme inhibitor must be able to bind to the active site, must be synthetically accessible, and among other biological things, have the required transporting properties.

After binding site characterisation it would be tempting to design a ligand that fits perfectly. However, it is probable that the result is a disappointment because there could be an experimental error in the crystallographic structure. Moreover, the flexible binding site could change shape depending on the molecule it binds with. Therefore, it is better to design a loose-fitting molecule structure.

Programs for de novo ligand design7,18 that generates new structures, by joining together atoms or fragments are called atom- or fragment-based ligand design programs. The advantage of the single-atom based programs is that they can produce huge amounts of chemical structures. The major advantage of the fragment-based approach is that it improves the synthetic accessibility of the generated structures.19,,20 21 There are two main strategies for this approach. The first strategy connects selected interaction points and the fragments within suitable linking groups. Connections take place so as to produce an optimal result between 14 generated skeleton and protein binding site. In the second strategy the generation of the skeleton starts from one selected point of the receptors interaction site and grows piece-by- piece, and the partial skeleton reaches all interaction points one by one (see section 3.1, page 21). Molecules generated in these manners should also fulfil required steric-, electrostatic- and hydrophobic properties. The first strategy has a tendency to generate rigid structures unlike the second one, which is inclined to produce flexible ones. There are many different conformations that atoms or fragments can form when added to a growing skeleton and these lead to a large number of possible structures. Because structures are generally built in a stepwise manner from smaller fragments, it is necessary to ensure that the final structures are chemically stable and synthetically feasible.

Most of de novo design applications are based on the docking method. This means that programs, try to find favourable alignments of two molecules, ligand and its receptor, so that they interact favourably. Several de novo programs have been developed that can be used to characterise an enzyme receptor site and to generate novel molecule sets. These programs can be categorized in several different ways. One of these focuses on the way the programs connect fragments and generate molecular skeletons, they are: 1) programs that link predefined fragments placed into the interactions sites: 2) programs which place one fragment as a base and then generate skeletons in a stepwise manner: and 3) programs that are based on random or stochastic methods. The NEWLEAD,22 HOOK,23 LUDI24,,,25 26 27 and SPROUT17,28, 29,30 ,31 ,32 are examples of programs belonging to the first category. The second category includes programs such as LEGEND,33,34 GenStar,35 and GROW.36 The PRO_LIGAND37 is an example of a program using stochastic methods (section 3.1, page 21, briefly describes features of these design programs).

In this research38 the SPROUT (versions 3.2, 4.01, 4.11, 5.0, 5.1 and 6.0) was used as a de novo ligand design program for 17β-hydroxysteroid dehydrogenase/ketosteroid reductase type 1 (17β-HSD/KSR1) enzyme (EC 1.1.1.62).39 Crystal structure is available from the Brookhaven Protein Data Bank4 complexed, for example with estradiol 1 (entry code 1A27),40 equilin 2 (entry code 1EQU),41 dihydrotestosterone 3 (entry code 1DHT)42 and dehydroepiandrosterone 4 (entry code 3DHE)42 ligands.

15

OH O OH O

HO O HO HO H 1 2 3 4

The SPROUT program was chosen because after a brief comparison with the other reliable structure generation system LUDI, it was found to be more appropriate for the chosen method of design work. The SPROUT differs in an important way from most of other de novo design programs by both binding site identification and structure generation. For example, the LUDI uses real molecules, as a fragment but in addition to this SPROUT can also use templates. Templates are 3D molecular graphs whose edges are labelled by bond type and whose vertices are labelled only by hybridization state and not by atom type. When the generated skeletons satisfy all required constraints the program converts the skeleton into the molecular structure by atom substitution. The use of generalized templates is a way to reduce data (see section 3.2.5.1, page 40). Recently combinatorial chemistry programs have utilized template- based de novo application for design of bioactive molecules.43,44,45 These approaches produce molecular skeletons that can be made using combinatorial chemistry methods.

The SPROUT program itself contains a module (the ALLIGATOR), which can help user with the major problems of the de novo design: combinatorial explosion and synthetic accessibility. Addition to this the possibility to use the CAESA46,47,48 (Computer Assisted Estimation of Synthetic Accessibility) and SynSPROUT49 (Synthetic SPROUT) programs was beneficial. The SPROUT program was selected also because of the potential of contributing to the enhancement of the program itself. The information of azomethine ylide class of 1,3-dipolar cycloaddition reactions, was added after literature survey to the SynSPROUT program’s knowledge base.

1.2 SPROUT and SynSPROUT

The SPROUT molecular modelling software was developed in the ICAMS50 (Institute for Computer Applications in the Molecular Science), University of Leeds, United Kingdom, for molecule structure generation. It can be used for de novo structure generation in cases where the 3-D structure of the target protein is known. The SPROUT can also be used for “receptor 16 mapping”, pharmacophore identification, when the structure of the target macromolecule is unknown.7,51 The SPROUT uses information derived from the enzyme receptor site to provide steric and electrostatic constraints for the design of new ligands, for example potential inhibitors. The steric constraints used by this program include a volume and some interaction sites within the volume.17 Skeletons are built in a step-by-step manner joining together small 3-D fragments, so that the constraints are satisfied (see section 3.2.6, page 43).

The SPROUT program is composed of five modules (CANGAROO, HIPPO, ELEFANT, SPIDER, ALLIGATOR) and a template library manager.48,52 Different modules perform different tasks one after another and lead to structure generation and evaluation. The CANGAROO (Cleft ANalysis by Geometry based Algorithm Regardless Of the Orientation) detects potential binding pockets of protein structures. This module requires an input file containing the atomic coordinates of the receptor. Two output files are then generated; a cavity file and a receptor file. These files contain all the information necessary for the next module. The HIPPO (Hydrogen Bonding Interaction Site Prediction as Positions with Orientations) identifies favourable hydrogen bondings and hydrophobic regions within a binding pocket. The hydrogen bonding sites are directional and are used to define target sites for the position of potential ligand atoms. The ELEFANT (Election of Functional Groups and Anchoring them to Target Sites) selects functional groups and positions them at the target sites to form starting fragments for the structure generation. The SPIDER (Structure Production with Interactive Design of Results) generates skeletons that satisfy the steric constraints of a binding pocket by growing spacer fragments onto the starting fragments and then connecting the resulting part skeletons together. The ALLIGATOR (Algorithms for Ligand Testing and Ordering of Results) clusters and scores the solutions to provide the user with an efficient tool for evaluating and navigating through the results. Atoms can also be substituted on the basis of the information contained in the Vertex Score Table of each ligand. Three scoring results are allocated for each structure. The Site Substituted Score is an estimation of the putative ligands binding affinity (pKi) for the active site (see section 3.2.7, page 48).

Since it is likely that only a limited number of answers can be synthesised in the laboratory, it is necessary to be able to sort and rank the answer set in some rational way. The SPROUT can predict binding affinity using an empirically derived function that takes into account hydrogen bondings, hydrophobic interactions, etc. and this provides one mechanism for ranking. In 17 addition to the ALLIGATOR module, CAESA program is helpful as regards clustering and sorting of the output (see section 3.4.6, page 56).

The SynSPROUT program generates new potential ligands using commercially available molecules as fragments that guarantee synthetic feasibility of the molecules. This program uses Classic SPROUT program modules and the same structure generation principles (see section 3.3, page 50).

1.3 Biological background

The estrogen receptor (ER) plays a crucial role in a number of processes such as the control of reproduction and the development of secondary sexual characteristics. The ER binds ligand and undergoes a conformational change which allows the receptor-ligand complex to bind with a high affinity to specific estrogen response element (ERE) and to modulate transcription of target genes (see section 4.4.4, page 91).53 The effect of estrogen on the mammary gland is of distinct interest because of its linkage to breast cancer.

Estrogen antagonists are used in breast cancer treatment to prevent estrogen action on cell proliferation. Another way to reduce the estrogen effect on breast cancer is to decrease the amount of endogenous estrogens. This could be accomplished by inhibiting enzyme activities involved in estrogen biosynthesis. The most potential pathways for inhibition are the cytochrome P450 aromatase, estrone sulfatase (EC 3.1.6.1) and 17β-hydroxysteroid dehydrogenase/ketosteroid reductase (17βHSD/KSR) pathways54 (see section 4.3, page 59).

This research concentrates on studies of the 17βHSD/KSR enzymes,55 which are responsible for the final step of the biosynthesis of the sex hormones that in many cases stimulate the proliferation of breast and prostate cancers. Both estrogens and androgens are more active in the 17-hydroxy steroid than the corresponding 17-keto forms. Human estrogenic 17βHSD/KSR1 is one of eleven isoenzymes that have been defined in mammals so far (see section 4.4.3). Human type 1 enzyme catalyses the reduction of inactive estrone (E1) 5 to the active 17β-estradiol (E2) 1 in the presence of NADPH (nicotinamide adenine dinucleotide phosphate) 6 as a cofactor (see section 4.4.3.5, page 87).56 18

NH O 2 N N N NH2 O OH O N O N O P P O O O OH HO OH HO 5 HO OH 6

Because 17βHSD/KSR1 enzyme catalyses estrogen synthesis it is an attractive target for structure-based ligand design for the prevention and control of breast tumour growth. This inhibitor development is based mainly on the information of the binding site received from the SPROUT’s HIPPO-module. These important structural features and detailed binding site interactions are presented in this thesis. After development of the new molecules the best options have been modified and examined in detail. Structure generations have been presented step-by-step (see section 5.2, page 104). 19

2. AIMS OF THE STUDY

Many studies have proved that sex , including 17β-estradiol (E2) 1, testosterone (T) 7 and androstenediol (∆5-diol) 8, are potent stimulators of cancer cell growth. They also have an effect on the formation of steroid-based breast- and prostate tumours and cancers. Because 17βHSD/KSR enzymes catalyse the final step in the biosynthesis of estrogen and androgen, it is an important target for the design inhibitors of steroid production in tumour growth.

The primary target of this research was to examine 17βHSD/KSR type 1 enzyme and its ligand binding site and design a novel potential inhibitor molecule for the enzyme. Although the 17βHSD/KSR family currently acknowledges eleven different subtypes, type 1 is used here because it is the best known and there is plenty of information published regarding the enzyme. The most significant point is the crystal structure knowledge and mutation studies, which are not as significant in the case of other subtypes. From the therapeutic point of view concerning breast cancer, a non-steroidal inhibitor is the most desirable. Both 17βHSD/KSR1 and ER binds estrogen 5 ligand. However, in the case of a new inhibitor, it should show binding affinity for the type 1 enzyme but not for the ER. Therefore, the crystal structure information of estrogen receptors is studied for inhibitor designing.

The plan was to create non-steroidal inhibitor molecules with different skeletons compared to earlier studies, where diverse steroid analogues as inhibitors have mainly been used. The target of this study was to generate a selection of molecule structures with different substituents and functional groups using the SPROUT de novo ligand design program. After structure generation the energies and other properties of the created sets of molecules needed to be investigated. It is important to carry out ligand minimization in the receptors active site and examine ligand-receptor features after that. Finally, promising structures needed to be study using in silico docking simulations. The molecules were docked into the 17βHSD/KSR1 enzyme as well as estrogen receptor active site for interaction studies and to ensure deficiency of binding affinity for estrogen receptors. With the help of these generated structure libraries it was also possible to test the performance of the different modules of the SPROUT program and correct possible malfunctions.

20

The secondary aim was to assist in the development of the SynSPROUT program. Although this is a powerful system, currently it does not have the capability of constructing rings from acyclic precursors. 1,3-dipolar cycloaddition (1,3-DC) reactions were chosen as a specific area to test this desired enhancement to the system. These reactions are classic reactions in organic chemistry and most of these five-membered ring structures are easy to synthesise. This study concentrated on azomethine ylide reactions because of the great resemblance of the peptide structures. The plan was to carry out a thorough literature survey and then add the information about these reactions to the SynSPROUT program’s knowledge base. 21

3. OVERVIEW OF SPROUT COMPONENT PROGRAM

3.1 Survey of de novo ligand design programs De novo design programs are classified here into three different groups as mentioned previously. The first and second categories are based on so called deterministic methods and the third on stochastic methods. Programs belonging to the first category of deterministic methods, for example NEWLEAD, HOOK, LUDI and SPROUT, generate the molecules by linking predefined fragments placed into the interactions sites. This linking method is also called the “outside-in” method (Figure 1).

O N N H H O O O ? H H

O O

Figure 1. Several fragments are placed simultaneously into the binding site and connected by suitable linking groups. Adapted from Verlinde57

The NEWLEAD22 program generates molecules to fit the constraints of a known pharmacophore. Fragments are docked to binding sites and the program links fragments finding an appropriate spacer from a database of simple molecules to generate whole molecules. It connects two isolated moieties repeating the connection until the parts are satisfactorily connected using single-atoms, library spacers and fuse-ring fragments. Generated structures are ranked on the basis of van der Waals violations.

The HOOK23 program is based on 3-D searching systems. It uses databases of existing molecules, such as CSD58,59 (Cambridge Structural Database), to find linking molecules between docked fragments in interaction sites. The program uses MCSS60,61 (Multiple Copy Simultaneous Search) method for identification of the energetically favourable binding site and the HOOK-part searches systematically a database for skeletons, which logically connect binding sites. The molecules are scored according to their steric interactions with the receptor. 22

The LUDI24,25,26,27 was the first commercially available de novo design program (1992). It is a fragment-based structure generation program that can be used also for 3-D database searching. At the start the program identifies interaction sites using a purely geometrical approach. The program recognizes four different kinds of interaction sites: hydrogen-donor, hydrogen-acceptor, lipophilic-aliphatic and lipophilic-aromatic. Two hydrogen-bonding interaction sites are strongly directional and are presented by sets of vectors rather than a single position. Fragments are taken from the program’s own fragment library and fitted into interaction sites using the root mean square (RMS) method. Connecting fragments are found from other template libraries. The program docks fragments into the interaction sites and give a scored list of possible starting fragments. Users can choose one fragment at a time from the list, select the connection point and dock the next fragments into the connection point. The molecule is ready when all interaction sites are connected. The result is one new ligand molecule. Nowadays programs also take into account synthetic accessibility. The LUDI is a module of the InsightII program, which makes it a powerful tool for structure generation.62

As for the SPROUT,17,28,29,3 0,3 1,3 2 it too uses fragment joining methods and it links added templates to form a novel molecule skeleton in a stepwise manner. The program automatically identifies protein target sites such as hydrogen bond donor and acceptor sites, complex hydrogen bonding sites (multicentered and bifurcated situations), covalent bonding sites and lipophilic regions (see section 3.2.4, page 30). The SPROUT’s HIPPO module identifies interaction sites as regions just as LUDI but an important difference is that HIPPO preserves the continuous nature of each of the regions and also stores the directionality of hydrogen bond or covalent bond sites. Users can choose simple fragments (templates) for target site docking from the programs own fragment library. An enormous variety of skeletons can be generated from a small number of fragments. After structure generation is completed the program arranges the candidate’s structures in an ascending order using a scoring function based on predicted ligand-binding affinities (furthermore see section 3.2.7, page 48). This program, unlike the LUDI, is capable of creating a large library of possible ligand skeletons during one simulation run.

Another deterministic method category generates molecule structures stepwise manner using sequential growing method also called “inside-out” method (Figure 2, page 23). This category includes programs such as GROW, LEGEND, and GenStar. 23

N N H H S ? ? O O H H ? O O

N H H O S N O H O

Figure 2. New ligands can be design by positioning a seed that is further extended by additional building blocks. Adapted from Verlinde57

The LEGEND33,34 program uses an atom-based structure generation system. Sixteen atom types and five bond types are allowed to be used. The input file of the program includes pre- calculated electrostatic and van der Waals interactions in a PDB format.4 Structure building starts with generation on an anchor atom. With the addition of the next atom, the program selects a random atom of this existing partial structure as the root. The new atom is rejected if it occupies a forbidden position (van der Waals violations) with previously generated atoms or with the receptor. Additions of the atoms are repeated until the skeleton fulfils the volume and electrostatic requirements.

The GenStar35 is another atom-based program. It builds skeleton similarly to the LEGEND and the only difference is that atoms of the partial skeleton are scored and every new atom is added in the best 20% of the tested partial skeleton atoms instead of the random selection. The program is allowed to form rings and branching instead of linear growing. The user specifies the number and size of the structures that are generated.

24

The GROW36 was the first published method for sequential build-up of molecules, initially developed for the design of peptides. The program applies a stepwise joining method for structure generation using amino acids as building blocks. Skeleton building starts by adding an acyl group as the seed in the active site. After this amino acids are added one by one and partial skeletons are scored after every addition to find the best peptide chain. Structure generation terminates when the peptide length reaches a defined size.

The third category includes programs based on random or stochastic methods such as genetic algorithms (GA).37 ,63 This method mimics the evolutionary process of natural selection. The structures can undergo genetic-type operations such as crossover, whereby fragments are mixed from two different structures, and mutation where a torsional angle in a structure is altered or an element type is changed. The PRO_LIGAND is one such program, which use the genetic algorithm method.

The PRO_LIGAND37 is similar in procedure to the LUDI. Binding site identification is based on analogous rule-based approach. The program uses fragments, which are labelled with atom properties and docks these into the binding site. The first fragment that is found to satisfy the constraints is accepted and the program continues onto the next operation. These newly generated structures are evaluated via fitness function. According to this, only the best “individuals” survive for further reproduction. GA allows the mixing of information from the high and low scoring structures and aims to increase the average score of the whole set. The program also includes the possibility of designing novel molecules using pharmacophore mapping or CoMFA (Comparative Molecular Field Analysis) techniques.64

3.2 SPROUT

3.2.1 Current developments of SPROUT

The SPROUT17,29,3 0 is an automatic, interactive and comprehensive set of tools for the rational design of enzyme inhibitors. The first version of the SPROUT, built molecule skeletons by joining new templates to each other, until all the constraints were satisfied.28,29 In addition to that the updated versions of the program, builds skeletons from many target sites simultaneously.51 Development of the SPROUT program has continued for a decade and 25 validations65 of the program as well as some successful applications for lead discovery have been reported.66,67

In addition to updated and more sophisticated Classic SPROUT releases, the recent progresses of the program include, in addition to sequential system, a parallel package of structure generation by clustering SGI (Silicon Graphics) and PC/Linux platforms.49 ,52 Moreover, several other projects aim to extend the Classic SPROUT, for example VLSPROUT, and SynSPROUT.50 The former screens virtual combinatorial libraries and the latter generates synthetically accessible ligands by de novo design. Generating ligands using this approach program needs information regarding readily available starting materials and high yielding chemical reactions. Information of the high yield reactions is programmed into the knowledge base using retro-synthetic rules, which are encoded using the PATRAN language.68 The SynSPROUT use the Classic SPROUT program for de novo designing. The result of the virtual synthesis in receptor cavity is easily and synthetically accessible structure. Such a result has a huge advantage compared to the Classic SPROUT result, which still requires plenty of work after structure generation.

3.2.2 General features

The SPROUT is designed to build molecules for a range of applications based on molecular identification and structure generation. Generally structure generation is divided into two main stages:28 ,29

• Primary structure generation to generate skeletons or molecular graphs that satisfy steric constraints and • Secondary structure generation to convert the skeletons into molecules by making atom substitutions.

Primary structure generation gives skeletons, which have a required shape to satisfy the primary constraints. A skeleton that does not satisfy all the requirements is called a partial skeleton. Skeleton structures are an approximate solution to the problem. The final and chemically realistic molecule structures are produced after secondary structure generation.

26

The primary constraints defined by the SPROUT require the X-ray diffraction or NMR information of the enzyme-ligand complex, which defines steric and electrostatic constraints of the binding site. Steric constraints are the most important limitation in determining the shape of possible ligands (Figure 3).

3D coordinates

Steric constraints Primary structure determining shape and volume generation

Boundary and Target sites Templates with joining rules and conformational analysis Skeleton generation

Electrostatic and Secondary structure Hydrophobic constraints generation Atom substitution

Molecules

Organise the results Selected molecules for minimisation

Figure 3. Outline of the components required for structure generation using the SPROUT.

The 3-D shape of the receptor and substrate defines the volume of the binding site. The volume is enclosed by a boundary, which restricts the shapes of the new ligands. The electrostatic constraints are divided into more directional effects such as hydrogen bonds and less localised effects such as charge distribution and hydrophobicity.28 The weaker electrostatic and hydrophobic constraints are used later on when converting primary structure skeletons into real molecules. The way ligand binds within a receptor active site is attributable to the shape, volume and electrostatic properties and thus an inhibitor is able to bind with a binding site on the enzyme because it has complementary shape and electrostatic properties to the receptor site. Within the volume there are interaction sites. These are regions, which if occupied by an atom of the ligand can lead to favourable interactions between the ligand and the receptor.29 If the interaction sites are satisfactorily localised, they are used as primary constraints and consequently promote and direct primary structure generation. These localised interaction sites are called target sites and satisfying these regions forms a requirement for 27 primary structure generation. Small 3-D fragments, called templates, are docked into the target sites and connected to form molecular skeletons using a linking or sequential growing method. A solution is found when all the steric and geometric requirements are satisfied and no boundary violations have occurred (Figure 4). A huge range of skeletons can be generated from a small number of templates.29

Receptor site Boundary

Target sites Templates

Figure 4. Primary structure generation is initiated by constraints definition (boundary, receptor site and target sites). Templates are added to the partial skeleton to fulfil the target site requirements to give final skeleton.29

In the secondary structure generation stage atom substitutions are made to convert the approximate skeleton structures into molecules that are consistent with the secondary constraints and as a result of which molecules are possible to score and analyse (see Figure 3, page 26). When the results are analysed and clustered by the SPROUT some conformational analysis of the selected molecule structures are required. The minimised ligands and the interactions with receptor are possible to re-examine using the SPROUT (see section 5.2.3, page 137). Retrosynthetic analysis is possible to carry out using the CAESA program (see section 5.2.6, page 159).

The program is interactive in a manner that users can control each step of the structure generation. The program consists of five modules (Figure 5, page 28). Sequential use of all five modules leads to the generation of the skeletons and their subsequent analysis. In opening the window of the SPROUT program users can either create a New Job File or open an old file. At the beginning users can specify the nature of the generated skeleton, such as the general molecule skeleton or peptide skeleton. It is possible to design peptide ligands consisting of natural and/or synthetic amino acids as well.

28

Using the general de novo design process grouping the five modules of the SPROUT can be divided in three groups:

a) Identification by CANGAROO and HIPPO b) Structure generation by ELEFANT and SPIDER c) Scoring, ranking and clustering by ALLIGATOR

Figure 5. The main window of SPROUT.

3.2.3 CANGAROO

CANGAROO (Cleft ANalysis by Geometry based Algorithm Regardless Of the Orientation) detects automatically potential binding sites also called clefts (Figure 6, page 29), within the protein structure.51 A cleft is defined as a large inward facing area on the surface of the protein. The module requires a PDB4 input file, which contains the crystal structure of the target enzyme. The bonds are determined according to a table that contains the connectivity of naturally occurring amino acids (Appendix I A and B).

29

Figure 6. The cleft of the binding site of the 17βHSD/KSR type1 enzyme, together with estradiol 1. This is the region of the protein immediately surrounding the ligand.

CANGAROO identifies the binding cleft using any one of four alternative procedures:

1) Automatic identification of a ligand and receptor in a protein–ligand complex (Figure 7, page 30). Ligand is automatically separated from the protein and selected. Two PDB output files are generated, one for the cavity (ligand) file and another for the receptor file. Users can define the diameter of the desired part of the receptor using ligand as a centre. 2) Identification of all clefts in the protein based on the surface curvature of individual regions of the protein. Users select one from the many clefts identified in this way. It is also helpful to use the information of the active site residues based on the literature. Only a receptor file is generated. 3) Selection of receptor residues known to be involved in binding based on the knowledge of probable binding site (a crystal structure without ligand). A large number of publications are available on enzyme crystal structure and inhibitors revealing information about the ligand–enzyme interactions.41,56,69,70 Analysis of that data can provide useful information about the residues of the active site of an enzyme, which is important for binding to known inhibitors. Only a receptor file is generated. 30

4) Pharmacophore mode. If the protein crystal structure information is not available CANGAROO can also identify the structural information of the known active ligand.51 The program produces only the cavity (ligand) file.

Figure 7. 17βHSD/KSR1 enzyme-estradiol-NADPH complex. E2 1 is selected automatically (red) as a ligand and NADPH 6 (green) is recognised as another possible ligand.

These files, generated by CANGAROO, serve as input files for the next module of the SPROUT, called HIPPO.

3.2.4 HIPPO

The HIPPO (Hydrogen Bonding Interaction Site Prediction as Positions with Orientations) module identifies interaction sites within the cavity that can be used as starting points for structure generation.31,52,71 Graphic presentation of boundary surface (see section 3.2.4.1, page 31) and proposed good interactions between the ligand and the receptor, help users to analyse and select a subset of these interaction sites, called target sites (see section 3.2.4.2, page 32). These sites are small regions in space within the receptor cavity, which supply good starting points for structure generation because of the highly directional nature of hydrogen bonding interactions. This continuous region presentation is unique to SPROUT while other de novo 31 programs still sample orientations within the binding site.7 The HIPPO program identifies simple hydrogen bond donor and acceptor sites, complex hydrogen bonding sites, covalent bonding sites including bonds to metal ions and hydrophobic regions.

3.2.4.1 Boundary surface One feature of HIPPO is the calculation and display of the boundary surface of the receptor as a 3-D grid.52 This surface presents a suggestion of minimum distance between the ligand atom and the protein atom. Distance = x + y where x is the van der Waals (vdW) radius of the protein atom and y varies according to the nature of the protein atom; hydrophobic y = 1,5 Å, hydrogen bonding y = 1,0 Å and covalent y = 0,0Å.

The SPROUT boundary surface gives a good and clear outline for the active site. It helps users to comprehend the size and shape of the active site as well as select good target sites. For example the LUDI24 program does not give any kind of illustration of the active site, which makes it difficult to select fragments for the connections.

Different regions of the boundary surface are shown in different colours according to the hydrogen bonding and hydrophobic/hydrophilic properties of the regions. Colours that are used for hydrogen bonding sites are related to the areas of the boundary; red corresponds to a hydrogen acceptor area, blue to a hydrogen donor area and purple to a complex hydrogen bonding (acceptor-donor) area. Green correlates to a hydrophobic region (Figure 8, page 32). HIPPO also defines and displays the hydrophobic region of the receptor in which case it is shown in yellow.

32

Figure 8. The boundary surface for 17β-HSD/KSR1-equilin 2 ligand complex has determined by HIPPO module.

3.2.4.2 HIPPO target sites Using the database of natural amino acid residues (see Appendix I) it is possible to identify the hydrogen bonding regions. To discover the number and nature of the target sites, the program carries out the following operations:

• The hydrogen positions for hydrogen donors and the electron pairs for acceptors are calculated on the basis of hybridisation. • An optimal intra-molecular hydrogen-bonding network is generated next. Limits for intra-molecular hydrogen bonds within the receptor are calculated as follows: The hydrogen-acceptor distance is calculated (for all hydrogens) and compared to the limit (default 2.75Å). If the distance is less than this limit, the donor-hydrogen-acceptor angle is also checked (Figure 9, page 33). The angle is accepted if it is at least Θ (by default the tolerance is 45°, Θ = 180° - 45° = 135°). If the bond is acceptable then it is scored and stored in a list that is sorted by the scores of the potential hydrogen bonds.52

33

Figure 9. Definition of the intra-molecular hydrogen bonds is shown with the default values. Graphic presentation by Z. Zsoldos,52 representations adapted from the SPROUT homepage (SimBioSys Inc.).48

Detecting the intra-molecular hydrogen bonds usually has the effect of fixing the orientation of terminal functional groups and solvent molecules that have been allowed to remain in the receptor cavity.7 When residues can exist in different protonation states (e.g. Glu, His, Asn), SPROUT selects the protonation state that allows the highest number of intra-molecular hydrogen bonds. The donor hydrogens and the acceptor lone pairs that are used in intra- molecular hydrogen bonding are eliminated from further investigation. Complementary donor and acceptor regions are generated within the cavity to correspond to the hydrogen donor and acceptor atoms of the receptor. Tolerances are then applied to the most favourable hydrogen bond length and angle to identify the target site regions where suitable heteroatoms could have extremely strong hydrogen bonding.49,52

The hydrogen bonding target sites are represented by specifically shaped volumes around potential hydrogen bonding residues. The target sites generated by HIPPO are:

• Hydrogen acceptor sites are defined using the position of the hydrogen donor atoms of the receptor. Limits are applied to both the distance and to the direction of the ideal bond. To form a hydrogen bond, ligand (L) needs to reside in the red area (Figure 10, page 34).52

34

Hydrogen acceptor site

min dist δ

DH L

1.0A max dist

D - receptor atom L - ligand atom

Default values min dist = 1.6A max dist = 2.2A δ = 45º

Figure 10. A ligand atom can be placed anywhere within the red region (graphic). The SPROUT illustration of an acceptor site. After Zsoldos.48,52

• Hydrogen donor sites are derived similarly to acceptor sites, according to the position of the hydrogen acceptor groups in the receptor. In graphic illustration (Figure 11), O stands for the receptor hydrogen acceptor atom, C for the receptor atom next to it, L for the ligand hydrogen donor atom and H for the hydrogen atom covalently bonded to it. A hydrogen bond is formed between O and L through H. Ligand atom (L) needs to exist in blue area while hydrogen atom (H) resides in a white area.52

Hydrogen donor site

C-H VdW cut-off min dist C O HL

max dist 1.0A 1.0A δ

Default values: max dist = 2.2A min dist = 1.6A

δ = 45º

Figure 11. A donor atom of the ligand (L) can be placed anywhere within the blue region with the hydrogen located within the white region (graphic). After Zsoldos.48,52

35

• Complex hydrogen bonding sites are identified by the intersection of simple sites. These sites can be, for example multicentered and bifurcated situations (Figure 12). If a hydroxyl-group exists within this region it can form two particularly strong hydrogen bonds at the same time.52

Figure 12. A complex hydrogen donor-acceptor intersection site is presented as a purple area.

• Covalent sites regions are also identified from the receptor cavity. Covalent as well as metal sites possess similar geometry as hydrogen acceptor sites. The program recognises structures, which can form a temporary covalent bond with the ligand in the transition state and creates a covalent site around this group. In Figure 13 O represents the hydroxyl-terminal group of the serine residue.

Covalent bond site

bmin θmax bmax θmin C O

Figure 13. The geometric representation of the covalent site includes tolerances for the bond length and bond angle. A ligand atom can be placed anywhere within the green region to form a covalent bond with the Ser residue of the receptor. After Zsoldos.48,52

36

• Metal ion sites in metalloproteins are usually part of the active site of the protein and are involved in the catalytic cycle. For this reason a good inhibitor for these proteins should include an atom, which can interact with the metal ion. HIPPO identifies metal ions (Zn, Mg, Cu, Ca, Co, Fe, Ni, Mn) in a receptor PDB file, calculates the most likely direction of the free valence according to the existing connections and generates an appropriate target site. Metal ion target sites are generated by a similar geometric approach than covalent sites (Figure 14).

Metal ion site

bond length

δ M X

tolerance

Figure 14. Metal ion target site is shown as a grey area. After Zsoldos.48,52

The hydrogen bonding sites initially generated by HIPPO are quite large (usually 20-100 in number, depending on the size of the cavity). However, removing the portions that violate the steric constraints of the receptor significantly decreases sites. In addition, the user is able to adjust the angle and distance tolerance values, as well as to generate new spheric sites for hydrophobic region.

• In the earlier SPROUT versions (v3.2 to 4.11) the users were able to generate spheric sites in the cavity and place it anywhere in the binding site, typically in the most strongly hydrophobic regions (Figure 15 a, page 37). Spheric sites were not hydrogen bonding sites, but they provided starting points for the structure generation. The latest published version of the program (v5.0 and later versions) offer different ways to deal with hydrophobic regions. The program analyses the hydrophobicity of the cavity and provides the most hydrophobic areas as a starting point for the structure generation (Figure 15 b, page 37). Appropriate fragments (see section 3.2.5, page 39) are possible to dock into these areas and use as a starting points.72,73

37

(a)

(b) Figure 15. a) Spheric sites were used as hydrophobic regions for skeleton generation in the binding site (older versions). b) The hydrophobic areas of the new versions are now used similarly for structure generation.

At this point it is also possible to analyse the native ligand and identify the hydrogen bond interactions it forms. This assists the choice of target sites. The information is also important for structure generations for the reason that novel molecules should have better scoring values and interactions than the native ligand.

It would be quite impossible for any single inhibitor to satisfy all the sites the HIPPO module generates. For that reason a subset of the available binding sites of the boundary surface have to be selected by users (Figure 16a and 16b, page 38). Different subsets of target sites will give skeletons with different structural characteristics and binding properties. There could be many reasons for choosing any individual subset but the usual practice is to base this choice on literature data. The selected sites, which are going to be the interaction points between the 38 receptor and the generated ligand, are saved and serve as input for the next module of the SPROUT.

(a) (b) (a) (b) Figure 16. a) The boundary surface for 17βHSD/KSR1-estradiol 1 complex. b) The boundary and selected interaction sites (two acceptor, one donor and three spheric sites) for the enzyme. The graphical display (lower part) shows the target sites within boundary. Red labels indicate selected acceptor sites (in this case histidine His221 and asparagine Asn152) and blue indicate donor site (tyrosine Tyr218).

3.2.4.3 Pharmacophore module If the protein crystal structure information is not available and the CANGAROO module has stored only the structural information of the known active ligand, SPROUT progress directly to the HIPPO pharmacophore module.51 The information of the active ligand (Figure 17, page 39) is used for pharmacophore identification by creating spheric sites into appropriate regions in space. The easiest way to create spheric sites is to use active ligand atoms for sites generation. This way 3-D sphere appears with atom type colouring to the right location in the space. These sites act as acceptor, donor and hydrophobic sites and are stored for fragment docking. After target site identification and generation, pharmacophore modelling progresses onto the next module in a similar way to the normal de novo designing.

39

Figure 17. Active ligand (here ligand from 17βHSD/KSR1-estradiol complex) has used as a starting point for pharmacophore identification by HIPPO.

3.2.5 ELEFANT

The ELEFANT (ELEction of Functional Groups and ANchoring them to Target Sites) module allows users to choose and dock the starting fragments for target sites that were selected by HIPPO. To carry out this task users have first to select a group of target sites, which can consist of an individual target sites or group of sites. Starting fragments or templates are selected from a template library30 (see section 3.2.5.1, page 40). Users select any number of templates from the template library and the ELEFANT docks the templates in such a way that they satisfy the group of target sites. The docking process uses rigid templates and generates all mapping combinations between the template atoms and the selected target sites.73 In addition to templates, it is possible to import known or dock unknown ligands (as a PDB or MDL file) into the target site and use these as a starting point for the automated ligand generation algorithm.52,72 This gives a very wide freedom of choice, since it allows users to choose from a huge number of structures. The template, which does not satisfy the target site or violates the boundary surface of the receptor, is rejected. In the case of a group of target sites that contain more than one site, each template must contain vertices that satisfy all the target sites in the group.

A set of target sites is shown in Figure 18a (page 40). The six target sites have been divided into five sets. The first group consists of two target sites an acceptor site (5) and spheric site (1) The rest of the target sites are single sites; two of them are hydrophobic spheric sites (2 40 and 3), one is an acceptor site (4) and one a donor site (6). Some of templates docked into target sites are presented in Figure 18b. The graphical display (lower part) shows the number of selected templates for each target sites.

6 5 4 1 3 2

(a) (b)

Figure 18. a) Five groups of target sites and b) some selected templates for 17βHSD/KSR1 enzyme.

3.2.5.1 SPROUT template library The template library concept is generally subdivided into two different groups: template library and template library manager.

The template library includes 3-D molecular graphs of the simple organic structures. The edges of the templates are labelled by bond type but atom types are unspecified (generic templates). Instead of atoms types the vertices of each template are defined by their hybridisations state, since different hybridisations lead to diverse geometries.28,29,30,52 Because skeletons are treated as rigid structures, templates have to be presented in the library with various low energy conformations (Figure 19a, page 41) to introduce some ligand flexibility into the structure generation model. Generalisation of the templates enables a small library of templates and saves considerable computation time since many fragments with similar geometry do not need to be processed separately (Figure 19b, page 41).52 The template library for peptide generation is simpler than the standard library including just parts of peptides as a 41 template. Only generic templates that satisfy steric constraints are converted into actual molecules by heteroatom substitution (see section 3.2.7, page 48).

Substructure Templates N O Chair

N O

Boat S N N

Twist boat (a) (b) Figure 19. a) Cyclohexane structure exists as a three favourable conformations in template library. b) Each template represents several molecular fragments.

The basic template library includes a choice of acyclic templates, which contain sp2 or sp3 atoms, and a range of 3-6 membered rings in various conformations. The updated version (version 4.11 and 5) of the template library includes not only generic templates but also specific templates such as amino pyridine, indole, carboxy-group as well as nitrogen and sulphur atoms (Figure 20 and 21, page 42). There is no limit to the number of templates that can be selected. However, if the choice of the starting templates is restricted, the structure generation is quicker and the generated structures are less diverse.

3 3 C sp3 O N N sp S sp

O O O O N N O O S

F F N O N S 4 O N C N F O

N N N N 3 2 N N

Figure 20. The standard template library: generic templates (left), specific templates (right). The figure inside the ring indicates the number of conformations.

42

Figure 21. The standard template selection menu of the SPROUT (v4.11) and two different conformations of cyclohexane ring are shown in windows.

The template library manager enables users to make an additional template library. New templates can be made in another modelling program (e.g. MacroModel, Moloc, etc.) and can be saved as a MDL mol file. These imported fragment files are starting points for the novel template library. Template library manager automatically observes the necessary features of the molecules and calculates essential information for structure generation. Users can check and modify this information using interactive tools provided (Figure 22, page 43). It is possible to add the general template library as a part of the new library.

An ELEFANT output file is generated when the starting templates for all the target sites are selected and saved. This file is the input for the penultimate and probably the most important module of SPROUT.

43

Figure 22. Template library manager offers selection of useful tools to modify molecules for new template library. Blue label indicates nitrogen atom in hetero-aromatic cyclohexane ring and pink label indicates selected atom.

3.2.6 SPIDER

SPIDER (Structure Production with Interactive DEsign of Results) is the structure generation module of SPROUT. The docked starting fragments from ELEFANT are connected together so that created skeletons satisfy the primary constraints defined by the program, otherwise the skeleton is rejected. These constraints consist of a boundary, a set of target sites, a set of starting templates and the group of user defined parameters.52 The connecting parts of the skeleton, called spacer templates in SPIDER, can be selected from a similar basic library in the ELEFANT module.30

A set of target sites with the partial skeletons is referred to as a forest in SPIDER. Each target site is represented as a tree and the starting templates docked to these sites correspond to the nodes of each tree. The program proceeds interactively by a series of tree pair connections until a complete skeleton is generated. Figure 23 (page 44) shows an example of a SPROUT forest that consists of six trees. ELEFANT labels target sites numerically (here 1 to 6) and SPIDER uses these numbers to identify target sites during a structure generation run (see also Figure 18 a, page 40). Four of these trees contain one target site (2, 3, 4 and 6) and one contains two target sites (1, 5). The tree between the sites (1,5), (2), (6) and (3) ⇒ (1, 2, 3, 5, 6) is the result of a previous SPIDER connection (Figure 23 b, page 44). Connection between 44 target site (1,5) and 2, result 292 skeletons. These are connected to 3 resulting 480 skeletons. Connecting 6 to (1,2,3,5) results 134 skeletons (shown in yellow square). In this particular example the two trees (1, 2, 3, 5, 6) and (4) are about to be connected.

8 5 5

6 4 5 3 1 2 5 5 134

(a(a)) (b) (b )

Figure 23. a) Target site (1,5) with the 8 start template is the starting point for this structure generation (number of the templates is written in the white square). Graphical display (lower part) represents target sites (trees) as a circled numbers and the number of the templates (nodes) of every target site as a squared number. b) The target sites with the skeleton generated between the five sites of the six. Last connection results 134 skeletons (shown in yellow square).

3.2.6.1 User defined parameters The number of skeletons generated by SPIDER can be very large. The quantity of the results depends upon the parameters defined, as well as the order in which the trees are connected. In most cases, it is beneficial to perform as many different runs as possible in diverse orders, for the reason that in this way a variety of skeletons are achieved. The user defined parameters can reduce the combinatorial explosion by excluding certain structures.52 All parameters have default values that can be altered using the interactive user interface during the SPIDER run, for example: • The maximum number of vertices (non-hydrogen atoms) allowed in a skeleton. • The connecting tolerance. This can have a value between 0 and 100% relative to d/2 (Figure 24, page 45, see also section 3.2.6.3, page 47). 45

• The number of 3-, 4-, 5- and 6-membered rings that are allowed in a skeleton. • The maximum length of any chain within a skeleton. • The number of spiro joins or fused joins that are allowed in a skeleton. • The number of rotatable acyclic bonds within a skeleton.

t d/2

d

d - distance between two target sites t - tolerance

Figure 24. Connecting tolerance determine termination condition for a node expansion. Adapted from Z. Zsoldos.48,52

Before every tree connection, it is also possible for example to: • Change tree pair - Program suggests the order in which the trees are connected however users can change the parameter. • Select spacer templates - The set of spacer templates in use can be changed at any time during the search. • Change seed vertex – Generally the program automatically specifies the atom from which a skeleton expands (closest vertex between the two trees). However, atoms can be also user specified.

3.2.6.2 Template joining Skeletons are built by linking the ELEFANT start templates using the SPIDER spacer templates. In the case where a new bond is formed, a number of conformations are presented regarding the new bond. Skeleton generation can be made using one of the three possible joining rules: fusion, spiro or new bond (Figure 25, page 46).49,52 46

Template joining Fusion

Spiro

New Bond

Figure 25. Three different templates joining for structure generation.

Two ring atoms and one ring bond form fused joins by superimposing the vertices. Also the spiro join connection take place by superimposing one ring atom from skeleton to one ring atom of the template. The new bond join forms a single bond between two atoms. One atom is part of the existing skeleton and the other is atom of the new template. Several conformations are generated around the new bond, since the new bond is rotatable.30

Two skeletons from different trees are linked connecting the partial skeletons so that the new partial skeleton covers each one of the target sites covered by the partial skeleton individually. The connection is made superimposing a template, which is common to both of the skeletons (Figure 26).52

1 1

3 2 2 3

Connection

overlapped 1 seed template vertex

2 3

Figure 26. Trees one and two have already been connected and one of the nodes of the new tree 3 is shown here. The cyclopentene ring is common to both partial skeletons and so a connection is possible. Adapted from Z. Zsoldos.48,52 47

3.2.6.3 The search process A molecule skeleton is formed by a series of tree pair connections, which results in a single combined tree, the solution tree. Each tree connection step takes two tree inputs and results in a single combined tree. The trees are connected in pairs by bi-directional growth.52 In practice this means:

1) Selection of the two trees to be connected. 2) Performing a Breadth First Search (BFS) on the first tree (the BF tree). 3) Performing a Depth First Search (DFS) on the second tree (the DF tree). 4) Replacing the two initial trees by the combined tree.

For optimising memory usage, one tree is navigated by a BFS method (Figure 27) and the other by DFS method (Figure 28).71 In a BF search, each level of the tree is fully explored horizontally before the next is generated. The extended skeletons are generally grown until they reach the approximate mid point of the two trees, plus a user defined tolerance (see Figure 24, page 45). After the expansion of all the nodes of the BF tree is completed, the BF search is terminated. The DF search is then carried out by growing vertically from the second tree towards those skeletons generated from the BF search. The successful connections result in new nodes in the combined tree. As with the BF search, the node expansions are continued until the termination condition is reached. After the tree pair connection has been completed, the next pair of trees is defined and the above-mentioned steps (2-4) are repeated. The final step is the unification of skeletons in the two trees by overlapping their last templates.

Figure 27. Breadth First Search (BFS) of the first tree. Each level of the tree is explored thoroughly horizontally before moving to the next level.

Figure 28. Depth First Search (DFS) of the second tree. The tree is explored vertically so that the most recently generated nodes are explored first. 48

It is a very common situation that the number of skeletons generated by SPIDER exceeds several hundred or even thousand. Therefore, the evaluation of the resulting skeletons can be a difficult process. For this reason, SPROUT contains a separate module that helps users to estimate the quality of skeletons and to make the decision about which skeletons are going to be the candidates for synthesis in the laboratory. This function is performed by ALLIGATOR, the last module of the program.

3.2.7 ALLIGATOR

This module (ALgorithms for LIGAnd Testing and Ordering of Results) enables the user to sort the skeletons generated in SPIDER into sets using a wide variety of parameters. Skeletons can be scored, clustered and ranked on the basis of binding affinity, complexity, stability and similarity (Figure 29 a). The idea is to identify desirable skeletons and discard skeletons that are too complex or impossible to synthesise.

This module can also generate actual molecules by the heteroatom substitution, which occurs at atoms docked to donor or acceptor sites (Figure 29 b). Secondary constraints set the requirements for the heteroatom substitution paying regard to required electrostatic and hydrophobic properties (see section 3.2.2, page 25). ALLIGATOR can substitute the vertices of the skeleton that satisfy the target sites by heteroatoms such as oxygen and nitrogen.46,71

Scoring value

(a(a)) ((b)b)

Figure 29. a) The set of skeletons in the ALLIGATOR. b) The skeletons have been heteroatom substituted, scored and automatically ranked by module. 49

There are several options available relating to the process of ranking/sorting in ALLIGATOR. The simplest is global clustering where the molecules are ranked simply in descending order of the given score calculated for each one (see equation 1). A hierarchical clustering procedure includes several different sorting criteria such as 2-D similarity, rotatable bonds, fusions, stereo centres, number of atoms and estimated binding affinity. The sorted structure sets are possible to manipulate in a standard way, such as split into subsets, combine with other sets and its intersection with other sets determined.71 Also a substructure search tool is available to find all skeletons in a tree containing a certain fragment. The graphical interface of the program makes it possible also to inspect individual results with atoms scores and any occurrence of van der Waals clashes with the boundary (Vertex score table). The partial skeletons generated in SPIDER are possible to score and rank after every tree pair connection by entering ALLIGATOR. This reduces the size of the combinatorial explosion and enables quicker processing of the growing skeletons.

The SPROUT scoring function can provide an estimate of the strength of the binding between the receptor and generated skeletons. The program gives three different scores in a Vertex score table as a pKi: The ‘Skel’ score is the predicted binding affinity value for pare skeleton, the ‘SiteSub’ score is the binding affinity value for the skeleton in which target sites are substituted with heteroatoms and the ‘Max’ is the best binding affinity value which can be reached with additional heteroatom substitutions (see Figure 29b, page 48). Scoring takes into account hydrogen bonding, van der Waals and hydrophobic interactions and the number of rotatable bonds contained in a skeleton.71 The scoring function is a very important tool of SPROUT because this estimated binding affinity is used both to rank the generated skeletons and to guide the heteroatom substitution. However, it is important to point out that the SPROUT score does not take into account other biological parameters like transport, distribution, bioavailability and .

The binding energy is calculated according to the following empirical scoring function:71

Equation 1. ∆Gscore = ∆Ghbonds + ∆Ghydrophobic + ∆GVdW + ∆Grotatable

Where ∆Gscore is predicted binding affinity, displayed as pKi (Ki = inhibitor constant) and

∆Ghbonds is the hydrogen bonding interaction score calculated from the distance between the 50

52 acceptor and donor atom and the angular component. ∆Ghydrophobic is the hydrophobic term based on calculations of simple pair wise distances between all ligand and protein atoms.

∆GvdW is van der Waals interaction calculated using a simplified Lennard-Jones potential.

∆Grotatable is a simple rotatable bond count expressing the flexibility of a bound ligand.

The relative importance of each of the above-mentioned interactions in the design of a ligand depends upon the nature of the targeted enzyme. In enzymes that contain a large hydrophobic area, the proportion of the hydrophobic atoms of the ligands should be greater than in enzymes that contain small hydrophobic regions.

3.3 SynSRPOUT

De novo programs are efficient for the design of hypothetical ligands, which are predicted to bind very strongly to the active site, but the structures have no practical value without synthetically feasibility. The SynSPROUT49 program is the new extension of the Classic SPROUT, which has been developed to compete against this bottleneck of the de novo design. The program is capable of generating synthetically accessible ligands from commercially available starting materials. The application uses the same methods for analysing the enzyme active site and for structure generation as Classic SPROUT. Though, SynSPROUT is a more advanced de novo application since it is designing synthetically accessible ligands directly in the enzyme active site. Using this method it is possible to quickly get the molecule that is able to synthesise without time consuming work of sorting and analysing molecule libraries including hundreds or even thousands of molecules. Also other research groups have developed similar kinds of de novo design programs, for example TOPAS (TOPology- Assigning System),44 which uses drug-derived fragments as building blocks.

The program uses a fragment database of readily available starting materials and synthetic knowledge base information for virtual synthesis in receptor cavity. Synthetically available template structures have been taken from MDDR database (MDL Drug Data Report).8 These structures are used in the docking and build up process. After this the program allows joins that correspond exactly to a chemical reaction defined in the knowledge base. 51

3.3.1 Knowledge base and PATRAN language

The program requires a knowledge base describing synthetic reactions. This knowledge base is a user-editable text file, which contains chemical patterns describing functional groups and the synthetic joining rules, corresponding to chemical reactions. These functional groups are determined using a line notation language called PATRAN46,68 and they are automatically detected from each imported fragment. The description of functional groups starts with the keyword CHEMICAL-LABEL and synthetic rules with the keyword RULE.

3.3.1.1 Chemical patterns The keyword CHEMICAL-LABEL is followed by the name of the functional group. The chemical pattern description includes the atom and bond types, the connections between atoms and defines the number of the terminal hydrogens and heteroatoms next to the atom. The atom hybridization and connection number of the atoms can also be specified. The overall format of the chemical pattern is as follows: CHEMICAL-LABEL …STARTP … …ENDP

3.3.1.2 Joining rules When functional groups have been described it is possible to connect these by joining rules. The rules of the joining knowledge base describe the steps for the formation of the desired synthetic rules. Different rule features are: discard of atom, forming new bond, changing bond type and changing atom hybridization. The overall format of a retrosynthetic transform is as follows: RULE EXPLANATION IF THEN END-THEN

52

3.3.1.3 Other specifications After two chemical patterns have been connected by joining rules the knowledge base requires information about the new bond length and dihedral angles (see example of the amination and Scheme 2, page 53).49 This data is available from the Cambridge Structural Database (CSD).59 There is no automatic way to search this information, so the search and the addition have to be made manually.

Reductive amination: CHEMICAL-LABEL …STARTP …C[SPCENTRE=2];[HETS=1]=O …ENDP

CHEMICAL-LABEL …STARTP …C-N[HS=2];[CONNECTIONS=1] …ENDP

RULE EXPLANATION Reductive Amination 1 IF Carbonyl INTER Primary Amine THEN destroy-atom 2 change-hybridisation 1 SP2 to SP3 form-bond – between 1 and 4 DIHEDRAL 0 0 DIHEDRAL 0 60 DIHEDRAL 0 120 DIHEDRAL 0 180 DIHEDRAL 0 240 DIHEDRAL 0 300 LENGTH 1.45 END-THEN

53

2 3 4 O NH 2 1 NH2 OH OH HO aa HO bb

NH NH2 0°, 60°, 120°, 180°, 240°, 300° OH HO HO d cc d OH

Scheme 2. Elementary steps of the reductive amination reaction. a) The reaction is performed if the required primary amine (4) and carboxylic acid (2) are recognized. b) The oxygen (2) is removed from the carboxylic group. c) The hybridisation of carboxylic carbon (1) is changed to sp3. d) A new bond is formed between the two fragments (4 and 1) with the specified bond length and dihedral angles.

3.3.2 New fragment library

It is possible to build up a new template library as described earlier in section 3.2.5.1 under ‘Template library manager’ (page 40). A knowledge base for fragments can be generated in a similar way to the normal template library. The 3-D information and conformations of the fragments are necessary for the library. If this information is not available for the fragments generated by other programs it is possible to generate those while importing MOL/SD files to the new library by, for example CORINA and ROTATE programs. Imported fragments are detected according to Synthetic Knowledge Base to find all functional groups. Fragments are then inserted into the library groups according to their detected functionality.

3.3.3 Differences between Classic and SynSPROUT

Structure generation with SynSPROUT is similar to the Classic SPROUT program in the CANGAROO and HIPPO modules. The only difference is in the ELEFANT module and it is that the docked templates come from the new fragment library. Also the bi-directional sequential structure generation in the SPIDER module is like in the Classic SPROUT and the main difference arises from the fact that only synthetic joining is allowed. In every step of the structure generation phase the partial skeleton can grow from its functional group by 54 extending it with all selected fragments which have a corresponding functional group according to the synthetic rules. There are also different user defined parameters from the Classic SPROUT such as the maximum number of acceptor/donor atoms in skeleton, maximum number of stereo centre, maximum number of 5- and 6-membered aromatic rings, maximum number of synthetic joins, molecular weight and nearest functional group tolerance. The program provides users list with a Synthetic Rules and Functional Group Selection where it is possible to choose the desired synthetic reactions. This list is built from the synthetic knowledge base attached to the job. Spacer templates are selected from a fragment library after every stepwise connection. Since the functional groups are the connection points for further growth it is advisable to select only fragments, which have at least two functional groups. There are also new options in the SynSPROUT menu system. Some new options have been added to the ‘Skeleton’ menu and a completely new ‘Synthetic’ display option has been added.

3.4 Further modelling applications

3.4.1 Moloc

Moloc74 is an interactive modelling program for molecular structure calculations. It allows the user to create and display chemical structures, start and monitor a variety of calculations and analyse structures and the result of calculations. A wide variety of evaluations is possible from various geometric properties to energetic quantities. The program uses MAB force field,75,76 which is based on a simple and fast method to calculate charge distributions in organic molecules.

Specially designed tools are implemented for various purposes such as model building in protein X-ray crystallography, pharmacophore modelling, pharmacophore diversity analysis etc. In this study Moloc has been used for ligand minimisations inside the receptor to find the best possible contacts between generated ligand and protein complexes.

3.4.2 MacroModel

The MacroModel (v6.0),77 program is basic molecular modelling software with a large selection of force fields. In addition to common force fields such as AMBER, MM2, MM3 55 and Amber94, MacroModel includes MMFF (Merck Molecular Force Field), MMFFs, OPLS and OPLS-AA. The program can use five different kinds of conformational analysis algorithms including the most common one: Monte Carlo Multiple Minimum (MCMM). It has advanced methods also for molecular dynamics and free energy calculations as well as for salvation calculations. The program is well suited for general-purpose molecular mechanics for small and medium size organic molecules. It also has effective utilities for exploring proteins and protein-ligand complexes.

3.4.3 AutoDock

The AutoDock (v3.0)78 program is designed to predict how small molecules bind to a receptor of known 3D structure. The software is used for modelling flexible small molecules such as drug molecule binding to receptor proteins. AutoDock consists of three different modules: AutoDock performs the docking of the ligand to a set of grids describing the target protein; AutoGrid pre-calculates these grids; and AutoTors sets up which bonds will be treated as rotatable in the ligand. The programs search methods include the Monte Carlo simulated annealing (SA), local search (LS), Genetic Algorithm (GA) and GA-LS hybrid method (also called as the Lamarckian Genetic Algorithm (LGA).

3.4.4 eHiTS®

The Electronic High Throughput Screening (eHiTS)48, 79 program can dock flexible structures to target receptors. The program uses virtual high throughput screening methods to searching active molecules from the compound libraries. The eHiTS program performs accurate flexible ligand docking at high speed. The system generates all major docking modes that are compatible with the steric and chemistry constraints of the target cavity for each candidate structure. The solution could be used as a starting point for more involved energy minimisation studies to predict more exact binding modes and affinities. The program uses novel systematic algorithms for docking simulations.

3.4.5 SPA-Docking

This docking method has also been developed in ICAMS, University of Leeds and it is based on a novel simulated annealing minimisations algorithm called Systematic Population 56

Annealing (SPA).73 The algorithm is a combination of simulated annealing, evolutionary and local search methods. This docking method has been developed for the SPROUT program, and it uses the HIPPO module. In this docking method ligand is flexible but the receptor is kept rigid during the process. Rotatable bonds of the ligand are allowed to vary while docking. The program carries out a global energy minimisation for ligand. All conformations of the ligand are scored using a novel empirical scoring function which contains elements describing van der Waals, hydrogen bonding, metal ion bonding, hydrophobic contact, rotatable bond entropy and dihedral strain energy terms. The information for empirical coefficients is based on receptor-ligand complexes from the Brookhaven Protein Data Bank.

3.4.6 CAESA

The CAESA 46,47 (Computer Aided Estimation of Synthetic Accessibility) program attempts to overcome the synthetic feasibility problem by scoring and ranking according to an estimate of synthetic accessibility. The CAESA program is a rule-based expert system, which assesses the synthetic availability of a molecule by analysing its structural complexity arising from the stereochemistry, topology and functional groups.7 The CAESA version 2.4 was available for use during this study.

The program analyses each target structure on the basis of information included in various knowledge bases, which describe chemical and synthetical knowledge, and databases of available starting materials. Molecular fragments are described within the system’s knowledge bases using PATRAN46 linear notation and selected potential starting materials for synthesis are from a large database of available compounds such as Aldrich, Acros and Lancaster as well as in-house structure databases.31 First, a set of retrosynthetic rules is used to perform a retrosynthetic analysis of the target structure. After the analysis is complete, the selected starting materials are scored and ranked according to their physical coverage of the target, wastage and synthetic proximity. Potential synthetic routes are established between the starting material compounds and the target structure. Additionally, any part of the target structure that is not covered by any starting material undergoes a complexity analysis. 57

4. REVIEW OF STEROID HORMONES AND HYDROXYSTEROID DEHYDROGENASES

4.1 Structure of steroid hormones

The structure of steroid hormones is related to the cyclopentanoperhydrophenanthrene nucleus (Figure 30a) and they are described in classes based on the numbering of carbons in their structure (Figure 30b). The nomenclature of steroids is heterogeneous and trivial names are commonly used. Steroid hormone abbreviations, trivial and IUPAC names discussed in this work are collect in Table 1. Sex hormones can be distinguished by the carbon number: C- 21 being progestational or adrenal steroids, C-19 being androgens and C-18 being estrogens.

21 H H 18 2 2 20 C H C 12 17 H C C 13 H 2 CH2 19 11 2 C D 16 C H C H C CH 1 9 14 H C C HCH 2 2 10 8 15 2 A B H C C CH 3 5 7 2 C HC 2 HO 4 6 H2 H2 a b Figure 30. a) Steroid hormone skeletons are related to cyclopentanoperhydrophenanthrene structure. b) Numbering of the steroid backbone.

Table 1. Some of the steroid hormones nomenclature used in this chapter.

Trivial name Abbreviation IUPAC name Estradiol 1 E2 estra-1,3,5(10)-triene-3,17β- dio l Equilin 2 EQU 3-hydroxyestra-1,3,5(10),7-tetraen-17-one 5α-Dihydrotestosterone 3 DHT 17β-hydroxy-5α-androstan-3-one Dehydroepiandrosterone 4 DHEA 3β-hydroxy-5-androsten-17-one Estrone 5 E1 3-hydroxyestra-1,3,5(10)-triene-17-one Testosterone 7 T 17β-hydroxyandrost-4-en-3one Androstenediol 8 ∆5- dio l andro st- 5- en- 3β,17β- dio l Estriol 9 E3 estra-1,3,5(10)-triene-3,16α,17β-triol Androstenedione 11 ∆4-dione androst-4-en-3,17-dione Cholesterol 13 Cholest-5-en-3β-ol Pregnenolone 15 3β-hydroxyp regn-5-nen-20-one Progesterone 16 P pregn-4-nen-3,20-dione Cortisol 17 11β,17α,21-trihydroxypregn-4-ene-3,20-dione Cortisone 18 17α,21-dihydroxypregn-4-ene-3,11,20-trione Androstanediol 19 3α/3β-Adiol androst-5α-an-3α/3β,17β-diol Androstanedione 20 Adione androst-5α-an-3,17-dione 20α-Dihydroprogesterone 21 20αDHP pregn-4-nen-20α-ol-3-one Androsterone 22 ADT 3α-hyd ro xy-5α-androstan-17-one

58

4.2 Physiological effects of estrogens

In the final step of estrogen biosynthesis low-activity estrone (E1) 5 is converted to high- activity estradiol (E2) 1. E2 1 is the most effective natural estrogen and is primarily responsible for estrogen action in woman. The two other major estrogens E1 5 and estriol (E3) 9 are far less active than E2 1.

OH

OH

HO 9

Estrogens influence the growth, differentiation and functioning of many target tissues. These include tissues of the female and male reproductive systems such as mammary gland, uterus, ovary, testis and prostate. Estrogens achieve secondary female characteristics, which are mainly consequence of the deficiency of androgen hormones (secondary male characteristics). Together with other hormones estrogens coordinate the menstrual cycle, regulate mammary gland development and regulate the maintenance of pregnancy.53 In addition, estrogens appear to be the most important sex steroids in preventing osteoporosis in women80 and they have an important role in the cardiovascular system (cardio protective effects).81

A main source of estrogen in premenopausal women is the ovaries. Breast tissue contains the enzymes needed for the production of estradiol in situ from the circulating precursors E1 1, estrone sulfate (E1-S) 10 and androstenedione (∆4-dione) 11. After menopause the ovaries no longer produce estrogen and peripheral estrogen biosynthesis plays a key role. Therefore, in postmenopausal women total hindrance of estrogen is more reasonable to achieve with systematic treatment rather than surgical removal of endocrine glands.82 Lots of attention has been drawn to the development of therapeutic agents, which can either inhibit estrogen receptor action by antiestrogens, or blocking of estrogen production by inhibitors of estrogen synthetase (aromatase) or steroid sulfatases.83,84,85,86,87,88

59

4.3 Estrogen biosynthesis

Estrogen biosynthesis takes place both in human steroidogenic tissues and in peripheral tissues. The ovaries are the major source of circulating estrogens, the adrenal cortex produces the most abundant androgen steroid precursor dehydroepiandrosterone (DHEA) 4 and its sulfate (DHEA-S) 12 and the testis synthesises androgens. Locally produced hormones exert their action inside the same cells, where the final steps of the synthesis take place.87 ,8 8 ,89 All steroid hormones are synthesised from common precursor cholesterol 13. It is metabolised into androgens, mostly ∆4-dione 11, by a multi-step reaction chain (Scheme 3, page 60).53 ,90 The 3β-hydroxysteroid dehydrogenase/ketosteroid isomerase (3βHSD/KSI) enzyme is essential for the biosynthesis of all classes of hormonal steroids, namely progesterone, glucocorticoids, mineralocorticoids, androgens and estrogens as well as 17β-hydroxysteroid dehydrogenase/ketosteroid reductases (17βHSD/KSR) are important for androgens and estrogens biosynthesis (Scheme 3, page 60).

60

Cholesterol HO

22-Hydroxycholesterol 20,22-Dihydroxycholesterol

O O

Mineralocorticoid Synthesis HO O Pregnenolone Progesterone

O O OH OH Glucocorticoid Synthesis HO O 17α-OH-Pregnenolone 17α-OH-Progesterone

DHEA-S O O O

HO O HO E1-S Dehydroepiandrosterone Androstenedione Estrone

OH OH OH A-diol-S

HO O HO Androstenediol Testosterone Estradiol

P450 side chain cleavage P450 Aromatase OH 3βHSD/KSI 17βHSD/KSR 21-Hydroxylase 5α-Reductase 17α-Hydroxylase/ Steroid sulfatase/ O 17,20- H Sulfotransferase Dihydrotestosterone

O OH

O HO Androstanedione Androstanediol O

HO Androsterone Scheme 3. Catalytic pathway from cholesterol 13 to estrogens and androgens. Several dissimilar enzymes catalyses the steroid hormone reactions (indicated here with the coloured arrows). 61

Androgens are converted into estrogens by three main enzymes: cytochrome P450 aromatase enzyme, steroid sulfatase and 17βHSD/KSR type 1 enzyme. The aromatase enzyme91,92 complex is a cytochrome P450 heme protein which catalyses the conversion of ∆4-dione 11 and testosterone (T) 7 through six steps to E1 5 and E2 1.53 The protein is responsible for binding C19 steroid substrate and catalysing the series of reactions leading to the formation of the phenolic A-ring characteristic of aromatic C18 estrogenic steroids (Scheme 4).

O O O HO enolisation O2 O2

O HO HO 11 O O O FeO OH OH O O HO Fe3+-OOH H

-Fe3+OH - H2O HO HO HO

O

+ O OH H HO 5

Scheme 4. E1 5 is a product of the aromatase reaction of the ∆4-dione 11. E2 1 formation takes place to the same extent by aromatase from T 7.

The steroid sulfatase enzyme regulates the formation of E1 5 from E1-S 10 (Scheme 5). Part of the 5 formed from ∆4-dione 11 is converted back to its sulfate form by estrone sulfotransferase. 87,88,93

O O Sulfotransferase

Steroid sulfatase

HO -O SO 5 3 10

Scheme 5. E1 5 is converted to E1-S 10 by sulfotransferase and back to estrone by sulfatase. 62

This conjugated E1 is unable to bind with the estrogen receptor (ER) and stimulate tumour growth. The concentration of conjugated estrogens in the blood is high and it seems that E1-S 10 can act as a reservoir for the formation of E1 5 after hydrolysis by estrone sulfatase. In breast tumours E1 5 can be synthesised through sulfatase 10-fold more than through aromatase under conditions of limited substrate availability.88 DHEA-S 12 and androstenediol-S 14 are other important substrates for steroid sulfatase. These are significant reactions because 4, 12 and 14 can be converted into ∆5-diol 8, which can bind with the ER and support tumour growth (Scheme 6).87,93

O

O

11 O OH OH OH

HO HO HO -O3SO ER 5 O 1 8 O 14 O

Tumor Cell

-O SO -O3SO HO 3 10 4 12 Scheme 6. The selection of hormones, which affect the growth and development of hormone dependent breast tumours through steroid sulfatase.

The third possible inhibition route is the final step of estrogen biosynthesis: the reduction of E1 5 to the biologically active E2 1 catalysed by 17βHSD/KSR1 enzyme.89,90 Reversible reaction, catalysed by 17βHSD/KSR type 2 enzyme, oxidises E2 1 back to inactive E1 5 (Scheme 7).41 Further discussion regarding 17βHSD/KSR enzyme family will be presented in section 4.4.3 (page 69).

O OH

17βHSD1

17βHSD2 HO HO 5 1 Scheme 7. 17βHSD/KSR1 enzyme reduces estrone 5 to more active estradiol 1. Reversible oxidation reaction is catalysed by 17βHSD/KSR2. 63

Conversion of ∆4-dione 11 to E1 5 by aromatase, E1-S 10 to E1 5 by steroid sulfatase and E1 1 to E2 5 by 17βHSD/KSR1 are important enzymatic pathways that are thought to occur in cancer cells and may explain the high concentration of estrogens in breast tumours. These enzymes are attractive targets to block the formation of estrogens and potentially reduce their levels. Despite extensive information available on cytochrome P450 aromatase and estrone sulfatase enzymes pharmacophores and inhibitors82,84,87,88 ,94,95 there is only one crystal structure study96 published in regards to the aromatase and none on the sulfatase. On the other hand, the 17βHSD/KSR1 crystal structure is well known.40,41 ,42 The ideal situation would be to design a novel inhibitor, which has the specification for 17βHSD/KSR1 enzyme. Earlier studies have shown that it might be possible to use doses of the enzyme inhibitors that are adequate to block estradiol biosynthesis in the breast but not in the ovaries and maintain the desirable effects of estradiol on its other target tissues.

4.4 Hydroxysteroid dehydrogenase family

17βHSD/KSR1 is one of, at least, eleven enzymes belonging to the 17βHSD/KSR family. Hydroxysteroid dehydrogenases catalyse steroid hormones redox reactions to the active and inactive form. Most mammalian hydroxysteroid dehydrogenases (HSDs)97 belong either to the short-chain dehydrogenase/reductase (SDR)98,99 or aldo-keto reductase (AKR)100,101 superfamilies.

4.4.1 SDR and AKR protein superfamilies

The majority of HSDs identified belong to the SDR family, formerly known as SCAD or (insect-type) alcohol dehydrogenase family. Research on the enzyme family started in the 1980’s and interest in the superfamily has been growing in recent years.99 ,102,103,104 To this day approximately 3000 different members have been characterised belonging to this large, functionally heterogeneous group and approximately 30 3-D structures have been deposited in databases. Over 60 of these genes are found in the human genome. The dehydrogenase superfamily has been divided into three families: the long-chain, the medium-chain and the short-chain families. The short-chain family is subdivided into five lines: classical, extended, intermediate, divergent and complex SDR.104 The subdivisions are characterised by differences in lengths and cofactor-binding motifs. 64

Similarly, the AKR family is a growing superfamily of with more than 40 characterised members. Because of a wide range of different substrate specificities this family is subdivided into seven different classes (AKR1-AKR7) and these classes are further divided into smaller groups. The main difference between the SDR and AKR superfamilies is associated with protein folding and nucleotide cofactor stereospecificity.100

Typically, these superfamilies include enzymes 250-350 amino acid residues. Substrates can vary from steroids, alcohols, sugars and aromatic compounds to xenobiotics. Even though the folding pattern of the SDR enzymes is conserved it shares only10-30% residue identity in pairwise comparison. The corresponding value for AKR enzymes is 40%. Proteins of the SDR family are active as a dimer or tetramer and AKR’s are as monomers. Both enzyme families require NAD(H) or its phosphate, NADP(H) as a cofactor. All available crystal structures of the SDR family comprise a Rossmann (α/β) folding structure (Figure 31a) and

AKR enzymes display a triose phosphate isomerase (TIM) barrel (α/β8) folding pattern (Figure 31b) close to the nucleotide binding motif.100,101,103,104

(a) (b) Figure 31. a) Rossmann-fold consists of seven-stranded parallel β-sheets (blue) surrounded by six α-helices (red), with three helices existing either side of the β-sheets (PDB entry 1A27). b) In TIM barrels there is altering α-helix and β-strand arrangement that occur eight times (α/β)8, from which the β-strands form the staves of a barrel in the core of the structure (PDB entry 1AFS).

65

Both families share a similar reaction mechanism but differ in cofactor stereo-specificity. SDR enzymes transfer the proS hydrogen, whereas AKR enzymes transfer the proR hydrogen of NADP(H) during reduction (Figure 32).97

HR CONH2

HS N ribose

Figure 32. Nicotinamide ring of NADP(H) with proS and proR hydrogens.

The coenzyme-binding domain is located in the N-terminal side of the protein, and the catalytic domain is located toward the C-terminus of the protein. Both families have highly conserved residues located near the active site: SDR proteins have the Ser-Tyr- Lys98 and AKR proteins have Tyr-His-Asp-Lys.100 The importance of the catalytic triad (Ser- Tyr-Lys) for the SDR enzyme group oxidase/reductase reactions has been known for a long time. However, recent site-directed mutagenesis studies have shown that instead of the catalytic triad residues the majority of characterised SDR enzymes form a catalytic tetrad of Asn-Ser-Tyr-Lys residues.105,106 Moreover, to this conserved area near the active site, both enzyme families have similar conserved sequence including three glysine (Gly) residues in the N-terminal end of the peptide chain. This sequence forms a turn between a β-strand and a α-helix that borders on the cofactor-binding site. Subcellular localisation differ between families so that AKR-type hydroxysteroid dehydrogenases are found in cytosol, whereas enzymes such as SDR-type are found in cytoplasmic and other subcellular compartments, e.g. endoplasmic reticulum, mitochondria and peroxisomes.103

4.4.2 Members of the hydroxysteroid dehydrogenase family important for human physiology

Hydroxysteroid dehydrogenases are critically involved in the synthesis and catabolism of steroid hormones and bile acids. There are multiple isoforms of the enzyme for biosynthesis and inactivation. One form is reductase and the other is dehydrogenase. In steroidogenic tissues they catalyse the final step of estrogen, androgen and progesterone biosynthesis (see Scheme 3, page 60). In peripheral tissues they convert potent steroid hormones into inactive 66 metabolites. Due to their capability to specifically activate or inactivate steroid hormones, they play fundamental roles, such as in fertility, reproduction, intermediary metabolism and cancer.53 The enzymes have been grouped according to the reactions carried out at the steroid positions. In addition to the 17βHSD/KSR (for more details see section 4.4.3, page 69), there are other important hydroxysteroid dehydrogenase family proteins, which are significant for human physiology, acting at positions 3, 11 and 20 of steroid hormones. 3β-Hydroxysteroid dehydrogenase/ketosteroid isomerase (3βHSD/KSI) and 11β-HSD belong to the SDR superfamily, whereas 3α-HSD and 20α-HSD are members of the AKR superfamily.97

4.4.2.1 3β-Hydroxysteroid dehydrogenase/ketosteroid isomerase

Two enzymes and reactions are required for interconversion of pregnenolone 15 to progesterone (P) 16, these being the final steps in P biosynthesis that are necessary for the maintenance of pregnancy. The first reaction is a dehydrogenation of 3β-equatorial hydroxysteroids by 3β-HSD, producing ∆5-3-ketosteroid. The second reaction is the oxidation of produced ketosteroid, catalysed by ketosteroid isomerase (KSI) enzyme to form ∆4-3- ketosteroid (Scheme 8).97 Two human 3β-HSD isoforms have been isolated (placental and adrenal).

O O O

3β−HSD KSI

HO O O 15 16 Pregn-5-ene-3,20-dione

Scheme 8. Pregnenolone 15 is converted to P 16 by 3β-HSD and KSI enzymes.

4.4.2.2 11β-Hydroxysteroid dehydrogenase

This enzyme converts active glucocorticoids to the inactive 11-keto derivative. Glucocorticoids are known to play an important role in the regulation of vascular tone and blood pressure. Both gluco- and mineralocorticoids bind to the mineralocorticoid receptor (MR) with equal affinity. If glucocorticoids are present at much higher concentrations than mineralocorticoids hypertension is observed. In kidney cells 11β-HSD enzymes prevent this 67 by converting an active cortisol 17 into the inactive cortisone 18 (Scheme 9). In this way, glucocorticoids bindings to the type 1 MR are prevented. There are two different isoenzymes, which display different action so that type 1 is both reductase and dehydrogenase and type 2 is only dehydrogenase.107,108

HO 11β-HSD HO O O OH type 1 OH HO type 2 O

type 1

O O 17 18

Scheme 9. 11β-HSD enzymes regulate the reaction between 17 and 18.

4.4.2.3 3α-Hydroxysteroid dehydrogenase

This enzyme (EC 1.1.1.213 formerly 1.1.1.50) catalyses the reversible conversion of hydroxy- and oxo-groups at position 3 of the steroid skeleton. It works together with the 5α- and 5β- reductases to reduce 3-ketosteroids to 5α, 3α- and 5β,3α-tetrahydrosteroids. This enzyme is also known as a carbonyl reductase (CR).109 In the prostate, 3α-HSD works as a molecular switch and regulates occupancy of the androgen receptor (AR). It converts DHT 3 to 3α- androstanediol (Adiol) 19 (Scheme 10, page 68). In the brain, 3α-HSD reduces 5α-steroids to tetrahydrosteroids.97 Human 3α-HSD type 3 has been crystallised as a complex with testosterone (PDB entry code 1J96) and ursodeoxycholate (PDB entry code 1IHI). In both cases NADPH is a cofactor. Two crystal structures have also been published from rat liver 3α- HSD enzyme (PDB entry codes 1AFS and 1RAL). Also, crystal structures of 3α-HSD from comamonas testosterone have been published with NAD+ cofactor (PDB entry code 1FK8) and without a cofactor (entry code 1FJH). 68

OH OH

3α-HSD H O H HO H 3 19

O O

3α-HSD

H O H HO H Dihydroprogesterone Tetrahydroprogesterone

Scheme 10. In prostate DHT 3 is reduced to Adiol 19 where as in brain 5α-dihydro- progesterone is converted to tetrahydroprogesterone by 3α-HSD enzyme.

4.4.2.4 20α-Hydroxysteroid dehydrogenase

This enzyme (EC 1.1.1.149) catalyses NADPH-dependent reduction of 20-carbonyl groups of 97 C21-steroids. 20α-HSD enzyme is also called Porcine Testicular Carbonyl Reductase (PTCR) and it is a multifunctional enzyme, which catalyses the reduction of ketones on androgens, progestins and prostaglandins, as well as aldehydes and ketones on a large number of xenobiotics.110 For example in the ovary enzyme it converts active progestin P 16 into inactive metabolite (Scheme 11). The activity of 20α-HSD on P 16 is of great importance, since its production is essential for endometrial development and the maintenance of pregnancy. The enzyme is similar to the 3α-HSD type 3 enzyme but differs in activity.111 There are also other particular proteins that catalyse 20α-HSD activities. In the human placenta 20α-HSD activities are catalysed by 17βHSD/KSR type 1 or type 2. The 20α-HSD has been crystallised as a ternary complex (PDB entry code 1MRQ).

OH O H

20α-HSD

O O 16 20α-Hydroxyprogesterone

Scheme 11. P 16 is reduced to inactive form by 20α-HSD enzyme. 69

4.4.2.5 Multiple specificities of hydroxysteroid dehydrogenases In addition to exact substrate specificities the enzymes can be multifunctional in their activities by catalysing more than one reaction of a specific steroid. Several HSDs show a broad spectrum of enzymatic activities towards steroids and other compounds such as prostaglandins, retinoids and fatty acid derivatives. For example 3α,20β-HSD enzyme (EC 1.1.1.53)112 has dual activity and it reversibly oxidizes the 3α-hydroxyl and 20β-hydroxyl groups of androstane and pregnane derivatives (PDB entry codes 2HSD and 1HDC) whereas 3β,17β-HSD enzyme (EC 1.1.1.51)113,114 catalyses the reversible reduction/dehydrogenation of the oxo/β-hydroxy groups at positions 3 and 17 of steroid compounds, including hormones and isobile acids (PDB entry code 1HXH). According to recent research 17βHSD/KSR type 10 also has multiple activities so that it catalyses reactions at positions 3α, 7α, 7β, 17β, 20β and 21 of the steroid nucleus.115 This kind of broad range of substrate specificities is explained by the wide hydrophobic cleft, which is capable of accommodating steroids in different orientations.

4.4.3 17β-Hydroxysteroid dehydrogenase/ketosteroid reductase

The activities of this enzyme family have been reported since the 1950’s.116 Peltoketo et al. cloned the first 17βHSD/KSR enzyme in 1988 from human placental tissue. Since then the type 1 enzyme has received lots of attention and nowadays plenty of substrate specificity, mutation, tissue expression, structure and crystallisation information is available on this enzyme as well as the other members of the 17βHSD/KSR family. ,,117,118 Indirect evidence of the existence of more than one 17βHSD/KSR had been presented by Blomquist.119 Eleven distinct isoenzymes, designed 17βHSD/KSR1 through 17βHSD/KSR11 on the basis of the chronological order of their cloning, have been characterised up to present, mainly from human (nine types) and rodent tissues (eleven types).,120 Cloning of these enzymes has revealed that several of them also have other enzymatic activities, which has complicated their definition. Therefore, some of the isoenzymes have been assigned different gene names whereas originally a different functionality was studied (see Table 2, page 70).

70

Table 2. Synonyms used for 17βHSD/KSR isoenzymes.

17HSD/KSR SYNONYMS Type 1 Placental 17HSD, 17β20α-HSD, estradiol 17HSD, estradiol-17β-dehydrogenase/oxidoreductase, 17β-hydroxysteroid oxidoreductase, estrogenic 17-ketosteroid reductase Type 2 - Type 3 Testicular 17-ketosteroid reductase/17β-hydroxysteroid oxidoreductase, androgenic 17β-HSD Type 4 Peroxisomal multifunctional enzyme II (MFE2), multifunctional protein 2 (MFP2), D-specific multifunctional enzyme 2, D-3-hydroxyacyl-CoA dehydrogenase, D-specific bifunctional protein, D-3-hydroxyacyl-CoA dehydratase/ D-3-hydroxyacyl dehydrogenase bifunctional protein, 2- enoyl-CoA hydratase 2 Type 5 Type 2 3αHSD, AKR1C3, HAKRb, mouse estradiol 17β-dehydrogenase Type 6 - Type 7 Prolactin receptor associated protein (PRAP) Type 8 HKE6 (human), Ke6 (mouse) Type 9 - Type 10 Short chain L-3 hydroxyacyl-CoA dehydrogenase (SCHAD), Type 2 L-3 hydroxyacyl-CoA dehydrogenase (HADH II), Amyloid-peptide-β- binding alcohol dehydrogenase (ABAD), Endoplasmic reticulum– associated amyloid-binding protein (ERAB) Type 11 Pan 1b, retSDR2,

These enzymes either convert inactive 17-ketosteroids to their more potent 17β-hydroxy forms or vice versa using NAD(P)(H) as a cofactor56,121,122 (Table 3, page 71). Unlike other dehydrogenases 17βHSD/KSRs are not reversible enzymes. Catalyses in their reaction are mainly unidirectional:123,124 type 1, 3, 5, 7, 9 and 11 catalyses the reduction of 17- ketosteroids, while type 2, 4, 6, 8, 9 and 10 catalyse the oxidation of 17β-hydroxysteroids. Although a recent study with transfected HEK-293 cells, has proved that human 17βHSD/KSR types 1, 2 and 3 catalyse bidirectional equilibrium reactions.125 In the case of steroid structures some of the enzymes act predominantly with estrogens (type 1, 4 and 7), others act with androgens (type 3 and 5), whereas others metabolise both estrogens and androgens (type 2, 6, 8, 9, 10 and 11). Several 17βHSD/KSR types may also metabolise further substances such as alcohols, bile acids, fatty acids and retinols.

71

Table 3. Some general properties of the 17βHSD/KSR family.

Number 17HSD/KSR Cofactor Main activity Substrate specificity Biological function Special features of aa1 NADPH, Type 1 Reductase 327 Estrogens E2 production NADH Androgens, estrogens, T and E2 inactivation Type 2 NAD+ Dehydrogenase 387 (progestin) (20α-P activation) Deficiency – androgens Type 3 NADPH Reductase 310 Androgens, (estrogens) T production sensitive diseases Peroxisomal, + 736 Fatty acyl-CoA, β-oxidation of fatty acids, E2, Multifunctional properties, Type 4 NAD Dehydrogenase (323)2 estrogens, (androgens) A-diol inactivation Deficiency – Zellweger syndrome T, (E2) production, 20α-DHP Steroids, cholic acids, Type 5 NADPH Reductase 323 activation, bile acid Belong to AKR family xenobiotics production, detoxification Similarity (65%) to Type 6 NAD+ Dehydrogenase 327 Androgens, estrogens DHT inactivation RoDH1 enzyme Type 7 NADPH Reductase 334 Estrogens E2 production Similarity to type1 NAD+/ Dehydrogenase/ E2 and androgen inactivation, Deficienfy – Polycystic Type 8 259 Estrogens, androgens, (NADH) (reductase) (E2 production) kidney disease Retinol and 3αHSD Estrogens, androgens, Type 9 NAD+ Dehydrogenase 317 E2 and androgen inactivation activation, similarity to rat retinoids type 6 Locus in mitochondria, multifunctional properties E2 and androstanediol Estrogens, androgens, (L-hydroxyacyl-CoA Type 10 NAD+ Dehydrogenase 261 inactivation, β-oxidation of fatty acyl-CoA activity), similarity to type fatty acids, ADH activity 4, connected to Alzheimer’s disease Androstanediol and E2 Type 11 NAD+ Dehydrogenase 300 Androgens, estrogens inactivation

1 Amino acid residues 2 First 300 residues form the hydroxysteroid part of the multifunctional enzyme.

17βHSD/KSRs are widely distributed enzymes, which are expressed differently in all classical steroidogenic tissues as well as in almost all peripheral tissues (Table 4, page 72). Isoenzymes differ in tissue distribution, subcellular localisation, catalytic preferences, substrate specificity and mechanisms of regulation. As mentioned earlier (see section 4.4.1, page 63) most of the 17βHSD/KSR enzymes belong to the SDR superfamily excluding type 5, which belongs to the AKR superfamily; type 4, which is a multifunctional enzyme and type 11, which belongs to the short chain alcohol dehydrogenase (ADH) family. Although 17βHSD/KSRs catalyse substrates have a similar structure, sequence analysis of the enzymes shows only approximately 20% homology and they are commonly more similar to other SDR members than to other 17βHSD/KSRs. Consequently this family is connected by its enzymatic rather than primary structure similarities.

72

Table 4. Some detailed properties of 17βHSD/KSR isoenzymes. Data collected from Peltoketo et al., information other than that indicated by references.

Super 17HSD/KSR Species cloned Subcellular location Tissue distribution family

Human, rat, mouse, Type 1 SDR rabbit, marmoset Cytosolic Ovary, placenta, breast monkey126

Transmembrane Endometrium, liver, Human, rat, mouse, protein in the Type 2 SDR kidney, small intestine, marmoset monkey127 endoplastic placenta reticulm128

Type 3 SDR Human, mouse, rat129 Microsomal Testis

Human, rat, mouse, Widely expressed in SDR porcine, chicken, both classical Type 4 Peroxisomal (MFE) guinea pig, marmoset steroidogenic and monkey pheripheral tissues

Human, mouse, Liver, kidney, testis, Type 5 AKR Cytosolic monkey130 peripheral tissues

Type 6 SDR Rat Membrane-bound Liver, prostate

Widely expressed: Human,131 rat, mouse, ovaries, uterus, Type 7 SDR rabbit,132 marmoset Membrane-associated placenta, breast, testis, monkey prostate, liver

Human, mouse, Kidney, liver, ovary, Type 8 SDR Microsomal133 marmoset monkey testis, spleen

Type 9 SDR Mouse134 Microsomal Liver, prostate

Human,135 rat,136 Liver, kidney, gonads, Type 10 SDR mouse,137 bovine,138 Mitochondrial lung, pheripheral Drosophila139 tissue

Cytosolic,142 also Lung, pancreas, SDR associated with the Type 11 Human,140 mouse141 kidney, liver, heart, (ADH) endoplastic adrenal, ovary reticulum143

73

4.4.3.1 Members of the 17βHSD/KSR family

Type 1,126 ,144,145 is essential for E2 1 biosynthesis and prefers estrogenic substrates to androgenic ones (see Scheme 12, page 77). It is a reductase and converts E1 5 to E2 1 mainly in the ovary, placenta and breast tissues.146,147 In addition, the enzyme catalyses the conversion of DHEA 4 to ∆5-diol 8, T 7 to ∆4-dione 11, DHT 3 to androstanedione (Adione) 20, 16α-hydroxyestrone to estriol 9, and P 16 to 20α-dihydroprogesterone (20αDHP) 21. While human 17βHSD/KSR1 primarily catalyses reactions between phenolic steroids (estrogens), the rodent enzyme is also able to catalyse androgens. This enzyme is a cytosolic protein, which exists in a homodimeric form. Several crystal structure and inhibitor studies have been made for 17βHSD/KSR1 (sections 4.4.3.2, page 78 and 4.4.3.7, page 89 briefly describes these studies).

In contrast to type 1,148 the oxidative 17βHSD/KSR2149,150,151 decreases the biological activity of estrogens and androgens and may as a result protect tissues from excessive hormone action. Enzyme catalyses conversion of E2 1, T 7 and DHT 3 to their less active forms E1 5, ∆4-dione 11 and Adione 20 (Scheme 12, see page 77). Type 2 also possesses 20αHSD-activity149 converting 20αDHP 21 to P 16. In addition to these activities the enzyme shows 3αHSD- activity152 as well. Type 2 is highly expressed in the endometrium, placenta, liver, small intestine and gastrointestinal as well as urinary tracts, it is also present in the pancreas, colon, kidney, prostate and breast.153,154,155,156 The properties and expression of 17βHSD/KSR1 and 2 have been reviewed by Peltoketo et al. Some inhibitor studies have been carried out for 17βHSD/KSR2.157,158,159 It has now been established that the inhibition of this enzyme is not desirable because it oxidises active estrogens and androgens in their inactive forms and thus protects tissues.

It has been known for a long time that type 3 17βHSD/KSR129 ,160,161 is essential for T 7 biosynthesis.162 This enzyme is almost exclusively expressed in the human testis and it is thus crucial in male sexual differentiation and reproduction. With NADPH as a cofactor, it catalyses the reduction reaction of ∆4-dione 11 to T 7, which is further converted to DHT 3. It is also capable of reducing DHEA 4 to ∆5-diol 8 and E1 5 to E2 1 (see Scheme 12, page 77). 3β−Substituted androsterone (ADT) 22 derivatives have been examined as an inhibitor for type 3.163 74

17βHSD/KSR Type 4164,165,166,167,168 is the first steroid metabolising enzyme located in peroxisomes. It is also known as peroxisomal multifunctional enzyme II (MFE2) and consists of three different domains: 17βHSD/KSR, hydratase dehydrogenase (HDE) and sterol carrier protein type 2 (SCP2) like domain. The N-terminal part of the enzyme has 17β-estradiol dehydrogenase and D-3-hydroxyacyl-CoA dehydrogenase activities, the central part catalyses D-specific 2-enoyl-acyl-CoA hydratase reactions and the C-terminus can facilitate the transfer of 7-dehyrocholesterol and phosphatidylcholine between membranes.167 The 17βHSD/KSR part of the enzyme catalyse oxidation reaction of E2 1 to E1 5, ∆5-diol 8 to DHEA 4 (see Scheme 12, page 77) and 3-hydroxyacyl-CoA to 3-ketoacyl-CoA. The role of the N-terminal part (17βHSD/KSR) of the enzyme in steroid metabolism appears to be minor compared with the other activities of the enzyme. Enzymes are widely distributed in human tissues but the highest level of enzymes has been observed in the liver, heart, prostate, and testis. Several studies have revealed connections to certain disorders that are inherited in man.169,170,171 In addition, recent crystal structure studies have been carried in order out to have better understanding of the activity of the enzymes.172,173

Unlike most of the 17βHSD/KSR family, type 5174,175 belongs to the AKR superfamily instead of SDR. The enzyme is also known as type 2 3α-HSD176,177 and it shares over 80% of its identity with type 1 3α-HSD, type 3 3α-HSD and 20α-HSD. The enzyme mainly catalyses the reductase reaction of ∆4-dione 11 to T 7 but also DHT 3 to Adiol 19 and E1 5 to E2 1 (see Scheme 12, page 77). The human enzyme converts P 16 to 20αDHP 21 as well. In addition to these 17βHSD/KSR5 possesses 3α-HSD and dihydrodiol dehydrogenase activity. However, the catalytic efficiency of the 17-position is distinctly higher than the 3-position of the steroid. This multispecificity character of the enzyme is confirmed by crystal structure study of the type 5.178 The enzyme has been localised in the liver, kidney, bone, adrenal, testis and prostate as well as some cancer cell lines.177 ,179,180 Presumably type 5 enzyme catalyses the reduction reaction of ∆4-dione 11 to T 7 in the ovary and peripheral tissues while type 3 is found only in the testis.130 Lately it has been proven to be found in the human brain as well.181 A number of crystal structure178 ,182 and inhibitor studies183, 184, 185 have been carried out for type 5.

75

17βHSD/KSR type 6186 is a dehydrogenase enzyme and converts Adiol 19 to ADT 22. The enzyme also shows low activity with DHT 3, T 7 and E2 1 (see Scheme 12, page 77) and possesses a weak oxidative 3α-HSD activity. The enzyme is not present in humans but it is found in rat liver, prostate and kidney. Type 6 shares 65% of its sequence identity with retinal dehydrogenase type 1 (RoDH1), which catalyses the oxidation of retinol to retinal.

Type 7 17βHSD/KSR was first cloned and characterised from rat and named as a prolactin receptor-associated protein (PRAP).187 Later, Human PRAP was confirmed to be identical to 17βHSD/KSR7.126, 131,132,188 This shares over 70% amino acid identity with rat187 and mouse188 type 7 17βHSD/KSR. The enzyme activates estrogens and inactivates androgens equally effectively. It catalyses a reductive reaction from E1 5 to E2 1 and moderately DHT 3 to both 3α- and 3β-Adiol 19 as well as Adione 20 to DHT 3 (see Scheme 12, page 77). In addition, according to S. Törn et al. 17βHSD/KSR7 is able to convert reasonably P 16 to 4- pregnen-3β-ol-20-one and 20-OH-P 21 to 4-pregnen-3β,20α-diol.189 It has also been concluded that in addition to activity at position 17, type 7 also has 3-ketosteroid reductase activity190 just as type 5. Other than the corpus luteum (rodents) type 7 enzyme is found in the ovaries, placenta, uterus, mammary gland, liver, testis and prostate. There are also traces of it found in the lung, kidney and lymph node.131 ,191 Some homology modelling studies have been carried out for type 7.190

17βHSD/KSR8192 catalyses the oxidation of E2 1, T 7 and DHT 3 and it also catalyses reduction of E1 5 to E2 1 (Scheme 12, page 77). The enzyme was originally named as a HKE6 or Ke6 and later associated with 17βHSD activity named as 17βHSD/KSR type 8.133 Mouse Ke6 and human HKE6 amino acid sequences have 85.6% similarity. The two variants of the enzyme are differently distributed as follows: 17βHSD/KSR8a is abundant in kidney, liver and gonads, while 17βHSD/KSR8b is spleen specific. In the ovaries enzymes are present in different cells than type 1 and 7. Type 8 is supposed to be involved in the manifestation of polycystic kidney disease (PKD). 193,194

17βHSD/KSR Type 9 have been cloned only from mouse.134 It has activities for both estrogens and androgens converting Adiol 19 to ADT 22 and E2 1 to E1 5. Also, to some extent DHT 3 to Adione 20 and T 7 to ∆4-dione 11 (scheme 12, page 77). The enzyme also 76 possesses 3α-HSD and retinoid activities.195 Mouse type 9 and rat type 6 enzymes share 88% amino acid identity. This enzyme is found in liver and prostate.

Type 10 17βHSD/KSR was initially named as a short chain L-3-hydroxyacyl-CoA dehydrogenase (SCHAD),135 endoplasmic reticulum amyloid β-peptide-binding protein (ERAB)196,197 and also type 2 L-3-hydroxyacyl-CoA dehydrogenase (HAHD II).136 The enzyme catalyses the oxidation of both estrogens (E2 1 to E1 5) and androgens (Adiol 19 to DHT 3, Scheme 12, page 77). It is similar to type 4 and is capable of converting fatty acyl- CoA (L-3-hydroxy-CoA to 3-ketoacyl-CoA).135 Type 10 is the only mitochondrial 17βHSD/KSR enzyme discovered so far.198 The enzyme is associated with Alzheimer’s disease196 ,199 and is found in the liver, kidney, gonads, lung and peripheral tissues including the brain.198 ,200 Some modelling studies201,202,203 of this enzyme have been carried out and the homology model coordinates204 are available from PDB (entry code 1F67). Interestingly, human 17βHSD/KSR10 is structurally more similar to the tetrameric bacterial 3α,20βHSD112 than to other types of mammalian 17βHSD/KSRs. Recent research of activity has demonstrated that in addition to the oxidation of fatty acids and sex steroids, the enzyme in involved in the degradation pathways of glucocorticoids, epimerization of bile acids and oxidation of branched chain amino acids. In this way the enzyme possesses 3α/7α/7β/17β/20β and 21-hydroxysteroid dehydrogenase activities.115

17βHSD/KSR Type 11140,141,142,143 is the most recently discovered enzyme of this family and it may be important in androgen metabolism. The enzyme catalyses the oxidation reaction of 3α- and 3β-Adiol 19 to ADT 22 and E2 1 to E1 5 (Scheme 12, page 77). Recent studies have made it known that Expression of the type 11 gene is directly regulated by PPARα (Peroxisome Proliferators Activated Receptor α) and its ligand.143 The enzyme is found in the lung, pancreas, kidney, liver, adrenal, ovary and heart.205, 206

77

DHEA-S 12 O O O

HO O HO E1-S 10 Dehydroepiandrosterone 4 Androstenedione 11 Estrone 5 2, 4, 6, 8, 9, 1, 3, 5 2, 6, 8 1, 3, 5, 7 1 and 3 2 and 4 10 and 11 OH and 7 and 9 OH and 8 OH A-diol-S 14

HO O HO Androstenediol 8 Testosterone 7 Estradiol 1

3βHSD/KSI OH P450 Aroma tase 17βHSD/KSR

5α-Reductase 5, 6, 7 O H and 10 1, 2, 5, 8 and 9 Steroid sulfatase/ Dihydrotestosterone 3 Sulfotransferase 10 O OH 10 5, 7

O HO Androstanedione Androstanediol O 10 20 19 6, 9 and 11

HO Androsterone 22 Scheme 12. Reactions catalysed by different types of 17βHSD/KSR. Blue numbers indicate main reactions and black numbers indicate lesser extent of reaction.

HSD activity has also been detected in many microorganisms, bacteria, filamentous fungi as well as yeast. Some similarities between these and human HSDs have been discovered, for example the enzyme cloned from the yeast Candida tropicalis is MFE type 2 enzyme and shows similar activity to 17βHSD/KSR type 4.169 The highest similarity was found between 7α-HSD from Escherichia coli and human 17βHSD/KSR types 4 and 8.207 17βHSD/KSR activity has been investigated from filamentous fungus Cochliobolus lunatus (17βHSDcl),208 because it performs the biosynthesis of mammalian-like steroid hormones. Some substrate specificity,209,210 homology modelling,209,211 inhibitor212 and mutagenesis213 studies has been carried out for 17βHSDcl since it seems to be involved in steroid metabolism.

78

4.4.3.2 Crystal structure information of the 17βHSD/KSR in PDB

Type 1 enzyme has been co-crystallised with a number of different ligands. The first published 17βHSD/KSR crystal structure was type 1 enzyme as a native form (PDB entry 1BHS), which dates back to 1995 and was performed by D. Ghosh et al. After that the enzyme was crystallised in a complex with the natural ligand 17β-estradiol 1 and the cofactor NADP+ 6 (1FDS, 1FDT, 1FDU, 1FDW, 1A27),40,70 with E2 1 alone (1IOL)214 and with cofactor but without ligand (1FDV). The crystallisation study of the enzyme type 1 - equilin 2 - NADP+ 6 cofactor complex (1EQU) clarifies the 17β-HSD/KSR1 binding cavity and especially the substrate entry path loop. Type 1 enzyme has also been co-crystallised with androgens such as DHT 3 (1DHT), DHEA 4 (3DHE)42 and T 7 (1JTV).215 The crystal structure complex with inhibitor Em1745 (1I5R)216 has been published recently (for more details see Table 5, page 79). Moreover, three latest crystal structures of the type 1 are now released in PDB. These are type 1 - NADP complex (entry code 1QYV), type 1 - Adione 20 - NADP complex (1QYW) and type 1 - ∆4-dione 11 - NADP complex (1QYX).217

Also the type 4 C-terminal domain (residues 618-736) has been crystallised without ligand or cofactor (1IKT)172 and more recently residues 1-319 with NAD+ (1GZ6).173 Recently type 5 has been crystallised with ∆4-dione 11 (1XF0).178 The study proposes two alternative binding positions for T 7 as well, which confirm the multispecificity of this enzyme type. The crystal structure study for type 10 with a bound inhibitor103 and the theoretical model of type 10 with NAD+ cofactor (1F67)204 are also published. The experimental method used for these crystal structure studies is X-ray diffraction except in the case of type10 (1F67),204 which is carried out using homology modelling. This enzyme was modelled into the known 3-D structure of the homologue 7α-HSD (1FMC)218 from E. coli.

79

Table 5. The crystal structure information available from the Brookhaven Protein Data Bank.4 for 17βHSD/KSR family.

17βHSD Entry Number of Release /KSR Ligand Cofactor Classification Ref. Code residues date Type 1BHS 1 - - Oxidoreductase 327 7.12.1996 56 1FDS 1 E2 1 - Dehydrogenase 327 12.2.1997 70 NADP+, 1FDT 1 E2 1 Dehydrogenase 327 12.2.1997 70 SO4 1IOL 1 E2 1 - Oxidorectase 327 7.7.1997 214 1FDU 1 E2 1 NADP+ Dehydrogenase 1308 27.5.1998 40 Dehydrogenase, 1FDV 1 - NAD+, SO 1308 27.5.1998 40 4 H221L mutant Dehydrogenase, 1FDW 1 E2 1 - 327 27.5.1998 40 H221Q mutant 1A27 1 E2 1 NADP+ Dehydrogenase 289 27.5.1998 40 3DHE 1 DHEA 4 - Oxidoreductase 327 25.9.1999 42 1DHT 1 DHT 3 - Oxidoreductase 327 25.9.1999 42 1EQU 1 EQU 2 NADP+ Oxidoreductase 654 2.12.1999 41 1F67 10 - NAD+ Oxidoreductase 261 26.9.2001 204 1IKT 4 - - Oxidoreductase 120 (618-736) 14.11.2001 172 1I5R 1 Em1745 - Oxidoreductase 327 11.3.2002 216 1JTV 1 T 7 - Oxidoreductase 327 24.6.2002 215 1GZ6 4 - NAD+ Dehydrogenase 1276 (1-319) 24.1.2003 173 1QYV 1 - NADP+ Oxidoreductase 327 3.8.2004 217 1QYW 1 Adione 20 NADP+ Oxidoreductase 327 3.8.2004 217 1QYX 1 ∆4-dione 11 NADP+ Oxidoreductase 327 3.8.2004 217 1XF0 5 ∆4-dione 11 NADP+ Oxidoreductase 323 26.10.2004 178

4.4.3.3 Overall description of the 17βHSD/KSR (type 1) enzyme structure

The human type1 enzyme is active as a homodimer, each identical subunit having a molecular mass of 34900 Mr and 327 residues. It belongs to the SDR superfamily with the highly conserved and catalytically crucial Tyr155-X-X-X-Lys159 sequence in the active site as well as generally conserved serine 142 residue. The catalytic site also known as the active site covers a ligand-binding domain (LBD) and cofactor-binding domain (CBD). The enzyme requires a NADP+/NADPH or NAD+/NADH cofactor to transform substrate.70

The structure contains seven β-strands forming a parallel β-sheet and eleven α-helices. The core of the structure is the seven-stranded parallel β-sheet (βA to βG), surrounded by six parallel α-helices (αB-αG), three on each side of the β-sheet. The basic fold of the segment βA to βF is a doubly wound α/β motif, with alternating β-strands and α-helices. The β sheet 80

(βA to βF) segment is a so-called classic “Rossmann-fold”, associated with nicotinamide adenine dinucleotide binding. Two additional helices αG’’ (200-206) and αG’ (209-229) create a helix-turn-helix (HTH) motif preceding the helix αG. Helices αH’ (260-266) and αH (273-284) form another HTH motif. Together these HTH segments of the polypeptide chain, containing helices αG’’, αG’, αH’ and αH, forms one end of the ligand-binding cleft (Figure 33).56,70 These helices also restrict access to the active site and influence substrate specificity. The β-sheets (βD to βG), in addition to being partly in the Rossmann-fold, governs quaternary association and substrate binding.

αG’’ αG’ C (285)

αH

αF αE’ αH’

βD βB β E β F β G βC β A αG αD αC αB αE

N Figure 33. The overall structure of 17βHSD/KSR type 1. The β-sheets and α-helices are named and C- and N-terminal labelled. Graphical displays generated with the MacroModel version 5.5.77

Although the enzyme is active as a homodimer, most of the structures are shown as monodimers. Only enzyme type 1 complexed with EQU 2 forms a homodimer structure (see, Figure 34, page 81). Dimer formation is dictated mainly by hydrophobic interactions. This conclusion is supported by the fact that 4 leucines and 4 valines from αE (Leu102, 11, 122 and 126 and Val107, 110, 115 and 119) and 5 leucines, 2 valines and 2 phenylalanines from αF (Leu162, 165, 169, 172 and 173, Val154 and 178 and Phe151 and 176) are present at the 81 dimer interface (see amino acid residues in Appendix 1). In addition, there are charge and/or polar-group interactions between the two subunits: αE has four interactions between subunits (Glu100 Oε2-Lys130 NZ, Glu104 Oε1-Gln123 Nε2, Oε1 and –Gln123 Oε1, and Glu104 Oε2- Arg120 Nη1) and αF has two (Leu149 N-Ser168 Oγ and Asp153 Oδ1-Leu169 N). EQU 2 and NADP+ 6 have well-defined electron density in the A subunit of the dimeric structure (Figure 34).

Figure 34. The 17βHSD/KSR type 1 complexed with EQU 2 (pink) and NADP+ 6 (green) is a homodimer structure. The electron density of subunit A (left side) is good and both ligand and cofactor are shown unlike subunit B (right side) where only the cofactor is shown. The shape of the yellow coil indicates the difference between the crystal structure without ligand and complexed with ligand (substrate entry path). Graphical displays generated with the WebLabViewerPro 3.7 program, at present known as DS ViewerPro 5.0.219

4.4.3.4 Ligand-binding domain and the interactions The type 1 LBD is a narrow hydrophobic tunnel (Figure 35, page 82). It is highly complementary to the substrates resembling hormones both in structure and in volume. A. Azzi et al. came to a decision that E2 1 bound only one orientation inside the tunnel214 but a later crystal structure study has shown that the steroid is capable of moving slightly in its pocket and the steroid-binding site may not be fully occupied.70 Amino acid residues, which interact with the substrate, depend on the structural features of the ligand. Also, different molecular modelling programs give diverse lists of interactions depending on the limit values which have been used to determine interactions, for example the SPROUT program’s 82 boundary surface differ slightly from the interactions published earlier and given below (see section 5.2, page 104).

Figure 35. Steroid molecule lies within ligand-binding site surrounded by electrostatic potential surface performed with WebLabViewerPro 3.7 program.

Hydrophobic atoms (carbon) contribute 76% of the total buried surface (251 Å2 of 332 Å2) of the binding pocket. The hydrophobic interactions are expected to provide most of the favourable interactions for binding the ligand. The residues covering the binding pocket and making hydrophobic interactions in the case of E2 1 include Val143, Met147, Leu149, Pro187, Tyr218, Val225, Phe226, Phe259 and Met279 (Figure 36), most of them involving the E2 1 ring A, B, C and the methyl group C18. In the case of EQU 2 the hydrophobic surface is formed by the same hydrophobic residues in addition to Pro150, Asn152, Leu262 and Leu263. Concerning androgen substrates like DHT 3 and DHEA 4 the hydrophobic interactions seem to be the same as in the case of estrogens in addition to Gly186, Met193 and Ser222 (Figure 36). The hydrophobic portion of the binding site interacts mostly with the A and B rings of the steroid.

Figure 36. Hydrophobic amino acid residues lining the ligand-binding pocket are shown: 17βHSD/KSR type 1 - E2 1 complex includes amino acid residues labelled green, EQU 2 includes residues labelled both green and pink and DHT 3/DHEA 4 includes all amino acid residues shown (pink, green and blue). 83

With the exception of the hydrophobic surface there are two polar regions located at opposite sides of the LBD. One end is formed by essential Tyr155, Ser142 and Lys159 amino acid residues and another end by important His221 and Glu282 residues. Steroid structures are bound to the ligand-binding site so that 3-hydroxyl or keto-group on the A-ring make hydrogen bond interactions with Nε2 atom of His221 (3.1Å) and Oε2 atom of Glu282 (2.7Å) at the recognition end of the cleft. These hydrogen bonds play an important role in correctly positioning the ligand in the binding cavity. Simultaneously 17-hydroxy or keto-group on the D ring forms hydrogen bonds with the hydroxyls of conserved Tyr155 (3.5Å) and Ser142 (3.1Å) at the catalytic end of the ligand-binding cleft (Figure 37 a). At this end of the LBD the biochemical reduction reaction takes place and Tyr155 is particularly important, since its interaction is crucial to the hydrogen transfer between the substrate and the cofactor. In the type 1 - E2 1 - NADP+ 6 complex structure, Tyr155, Ser142 and the oxygen atom at position 17 of the steroid form a triangular hydrogen-bonding arrangement which might result in easier deprotonation of the tyrosine. Recently it has been shown that Lys159 and Asn114 have an important role to stabilise the cofactor and to promote proton transfer. All amino acid residues surrounding the ligand (E2 1) are shown in Figure 37 b.

E2

(a)(a ) (b) (b)

Figure 37. a) Atoms of the amino acid residues that make hydrogen bond interaction with steroid structure are labelled red. b) Ligand and hydrogen bond interactions (same as in Figure 37 a) with the hydrophobic amino acid residues (same as in Figure 36, page 82) are labelled with same colours. Here E2 is twisted approximately 40 degrees anticlockwise compared to the Figure 36 (page 82).

84

Site-directed mutagenesis experiments of the conserved Tyr155 (to Ala) have almost completely inactivated the enzyme. In addition mutagenesis of His221 (to Ala) reduced the catalytic efficiency eleven-fold.220 Crystallisation studies with mutant H221L and H221Q have shown the significance of His221 as well.40 Site-directed mutagenesis of Ser142 (to Ala) also leads to complete loss of all enzyme activities.113 However substitution of Ser with Thr yields a mutant enzyme, which is fully active.

The same four hydrogen bond interactions are present between amino acid residues and ligand in the 17βHSD/KSR type1 enzyme - EQU 2 - NADP+ 6 crystal structure complex. The length of the hydrogen bonds is slightly different as follows; the 3-hydroxyl group to His221 is 2.9 Å and to Glu282 is 2.9 Å, the 17-keto to Tyr155 is 2.7 Å and to Ser142 is 2.8 Å. The polypeptide chain (residues 186-201) restricts the access to the active site by closing the substrate entry path by moving the chain towards the catalytic cleft. This way Phe192 and Met193 from the entry loop (186-201) form novel van der Waals contacts (3.9Å and 4.2Å, respectively) with the D ring of the ligand molecule (Figure 38 a and b, see also Figure 34, page 81, yellow coil).

(a(a) ) (b(b) Figure 38. a) EQU 2 crystal structure unit B complexed with cofactor (green) and b) unit A complexed with ligand (pink) and cofactor (green) illustrate different kind of substrate entry loop conformation because of interaction between ligand and Phe192 (orange) and Met193 (blue) residues.

85

DHT 3 and DHE 4 bind to the ligand-binding site like 1. Hydrogen bond interactions are still the same four connections seen for E2 1 and EQU 2. The only difference is that the O-3 atom in DHT/DHEA forms a strong hydrogen bond with Nε of His221, which is important for orientating the bound ligand and the bond with Oε of Glu282 is relatively weak. The slightly different positions of DHT 3 and the nearly 20° rotated DHEA 4 core (Figure 39 a) result in reduced interactions of these steroids with several important non-polar residues in the binding pocket, although they have new interactions with other polar residues (see Figure 36, page 82). Due to the different conformation DHT 3 forms stronger interactions with Val283 and Tyr218 than DHEA 4, while DHEA make stronger interactions with Glu282 and a stronger hydrogen bond with Nε of His221. In all structures of DHT 3, DHEA 4, and E2 1 the C-10 is in a precise alignment with the fork-like side chain of Leu149 (Figure 39 b). Because there is β-methyl C-19 attached to C-10 of DHT 3 and DHEA 4, these steroids apparently cannot remain at the same position due to steric hindrance. The interactions between C-19 and the hydrophobic side chains around Leu149 appear to be the most important for discriminating C- 19 steroids and in preventing them from binding in an ideal position, which seems to be the crucial role of the Leu149 in the substrate recognition of different 17βHSD/KSR types.42

Leu149

Tyr218

Val283

β-face Glu282

α-face His221

(a) (b) Figure 39. a) DHT 3 (pink), DHEA 4 (green) and T 7 (yellow) alignment show slightly rotated DHEA and even more rotated T conformations. b) DHT, DHEA and T crystal structure alignment with important amino acid residues.

86

Instead of the above androgens T 7, Adione 20 and ∆4-dione 11215 ligands bind into the type 1 active site in the direct or the reverse orientation (Figure 39 a, page 85). This alternative binding orientation enables His221 to interact with O-17 as a replacement for catalytic amino acid residues (Tyr155 and Ser142). In the case of the T 7 Glu282 form hydrogen bond interactions with O-17 and in the case of Adione 20 bonding occurs through the water molecule instead of these ∆4-dione 11 lack of this interaction. In addition A-ring and O-3 is too distant to interact with Tyr155 or Ser142 because the structure is twisted and moved away from the cofactor-binding site compared to E2 1 (Figure 39 b, page 85). As a replacement for Tyr and Ser interactions, O-3 of the ∆4-dione 11 interact with Val188 via a water molecule. The importance of the Leu149 for the androgen recognition and the substrate specificity has been confirmed by these studies.42,215

The ligand-binding pocket appears to have three regions. The first region recognises the phenolic A-ring of the steroid, containing conserved His221 and Glu282, which makes the hydrogen bond to oxygen atom at position 3. The second region binds to the central hydrophobic core of the steroid. The third region (catalytic) surrounds the D-ring, and contains the absolutely conserved catalytic residue Tyr155 that is located on the β-face of the steroid and forms a hydrogen bond with oxygen at position 17, while α-face is accessible to the nicotinamide to facilitate hydride transfer. Recently, it has been shown in 3β/17β-HSD studies105,106 that most of the SDR enzymes form a catalytic tetrad of Asn-Ser-Tyr-Lys residues instead of a triad. In this way Tyr acts as the catalytic base, whereas Ser stabilises substrate and Lys forms hydrogen bonds with the nicotinamide ribose moiety and lowers the pKa of the Tyr-OH to promote proton transfer. Filling et al. conclude that Asn is important to stabilise the position of Lys. This includes water molecule contacts between Asn and Lys residues and essential ribose contacts with Tyr and Lys (Figure 40, page 87).

87

Substrate Ser142

Tyr155

Lys159

Asn114 H2O NADPH

Figure 40. The connections made by the important tetrad (Asn114-Ser143-Tyr155-Lys159) applied to 17βHSD/KSR type 1 enzyme.

4.4.3.5 Cofactor-binding site NAD+ and NADP+ 6 bind in an extended conformation with the nicotinamide moiety pointing towards the active site of the enzyme. The nicotinamide ring faces the D ring of the steroid (Figure 41), the distance between the nicotinamide C-4 atom (hydride donor) and E2 1 C-17 being 3.62Å.

Figure 41. NADP+ 6 lies in extended conformation within the cofactor-binding site (represented as an electrostatic potential surface). Nicotinamide ring is towards D ring of the steroid structure.

The nicotinamide ring is in a syn conformation while the adenine ring is in an anti conformation. Stabilisation of the nicotinamide nucleoside moiety is achieved through eleven hydrogen bonds and two extra hydrophobic interactions (Figure 42, page 88). In addition, three buried water molecules are hydrogen-bonded to the dinucleotide. Lys159 takes part in 88

NADP+ 6 stabilisation by establishing hydrogen bonds with O2’ and O3’ of the nicotinamide ribose. The cofactor has the same extended conformation (12.5 Å between rings) in the EQU 2 crystal structure. In addition to interacting with EQU 2, Phe192 (substrate entry loop) interacts with the 2’ phosphate in NADP+. Hydrogen bond and hydrophobic interactions are slightly different depending on what crystal structure complex is in question.

(a) (b)

Figure 42. a) Cofactor (from E2 1 complex) form hydrogen bond interactions with eleven different amino acid residues (green), b) two hydrophobic interactions with two residues (lilac) and three contacts with water molecules (blue).

4.4.3.6 Reduction mechanism The catalytic mechanism of the enzyme is supposed to proceed through a direct transfer of a hydride ion from the nicotinamide nucleotide to the substrates acceptor carbonyl. The reaction involves the electrophilic attack of the Tyr155 Oη proton on estradiol O-17, and the nucleophilic attack of the nicotinamide 4-Pro-S hydride on estradiol C-17 (Scheme 13, page

89). Both Tyr155 and Ser142 could donate a proton to O-17. However, due to its lower pKa, the tyrosine residue seems to be the best candidate. A water molecule and Lys159 participate in a proton transfer chain lowering the Tyr-OH pKa and deprotonation of Tyr155. The Lys159 side-chain further forms a bifurcated hydrogen bond to the nicotinamide ribose and thus stabilises coenzyme binding.

As mentioned earlier (in section 4.4.1, page 63) all SDR members have similar overall structures including Rossmann-fold structure, catalytically important residues (Asn-Ser-Tyr- Lys), and reduction/oxidation mechanism with NAD(P)H cofactor. This has also been 89 proposed among other 17βHSD/KSR family members. Including crystal structures, which are available from Protein Data Bank (for type 4, and type 10) and crystallisation studies that are not yet published in PDB (for type 7).190 A crystal structure study for type 5178 has also shown similarity to the AKR family.

OH

Estrone O Ser142 O O H N O H H Tyr155 H O H2N N

N H NADPH H O O H + Lys159 O N O H OH O O N P H O O Asn114 O O O H H O H Solvent N N H

Scheme 13. Reduction mechanism of estrone (E1 5 to E2 1) by 17βHSD/KSR type 1.

4.4.3.7 Inhibition studies of 17βHSD/KSR enzymes

Several inhibition studies have been carried out for 17βHSD/KSR enzymes.69,90 Most of the studies have been done for type 1 partly because the active site is quite well known and partly for the reason of the therapeutic interest of inhibiting this enzyme. In addition to this, types 2, 3 and 5 inhibitors have been examined. The rest of the family have been identified only recently, therefore there is no published information of inhibitor studies. Moreover, types 6 and 9 are not present in humans and types 4, 8 and 10 are responsible for the oxidation of active steroids to inactive steroids. This research group has published a patent of the inhibitors for type 1, type 2 and type 3 enzymes.221

90

Naturally occurring steroids, synthetic steroids and non-steroidal compounds have been used to test 17βHSD/KSR enzymes activities. These different analogues of the steroid ligands and substrate specificity studies have helped to identify and define the enzymes active site. Inhibitors have been developed for both ligand-binding and cofactor-binding sites.

Type 1 inhibitor studies cover mainly diverse steroid analogues substituted at different positions on the steroid skeleton, a range of phytoestrogens, some estradiol-adenosine hybrid studies and structure-based approaches. Dual action and alkylating inhibitors as well as affinity labelling agents have been made by substituting the steroid skeleton with halogen or other derivatives such as alkyl amide or thia-alkanamide side chains at positions 2, 3, 4, 6, 7, 11, 12, 15, 16 or 17.54 ,9 0 ,222,223,224,225 α,β-Unsaturated alcohols and ketones at positions 17 and 20 (suicide inhibitors) have also been examined.69 Studies have also been carried out for steroid skeleton substituted with fatty acids, diketones, epoxides, fused pyrazoles and isoxazoles.90 Phytoestrogens such as flavonoids,226,227,228,229,230 and chalcones231 as well as progestins232 seem to be potent inhibitors for type 1. Inhibitors targeted to the cofactor- binding site seem to be effective for preventing enzyme activity233 in addition some studies have shown that E2-adenosine hybrid compounds strongly inhibit this enzyme.90 ,234,235

Steroidal lactones (C-18) and their analogues are shown to be good inhibitors for type 2.158 ,236 Also C-19 steroidal lactones are potential inhibitors especially with C-7 substituted thioalkyl and thioaryl side chains.157,159 Inhibition activity of the phytoestrogens,226 various steroids,128 retinoids and olive oil components90 have also been examined. Since it is now known that type 2 prefers the oxidation reaction and is involved in the degradation of active estrogens and androgens the inhibition of the type 2 is not desirable for use in therapy of hormone-sensitive cancers.

Inhibition of type 3 enzyme has been examined with series of ADT 22 3β-substituted derivatives,163 ,237 as well as other steroid based compounds such as fused 2,3- and 3,4- pyrazoles and isoxazoles. Phytoestrogens and other non-steroidal compounds for example benzoquinones have also been tested successfully for type 3 inhibitors.238 Some metal salts, commercially available drugs and glycyrrhetinic acid have been reported to inhibit type 3.90

91

In contrast, few inhibitor studies have been carried out for type 5 enzyme.229 Some steroid based compounds substituted with the carboxylate (at 3α or 17β) or the spiro-oxiranyl (at 3α, 17β or 20α) or fused pyrazole (3,4 or 16,17) group have been investigated.184 Some of the phytoestrogens have given promising results to inhibit both the reduction (∆4-dione to T) and the oxidation (Adiol to ADT) reactions of type 5.185 Increasing hydroxylation of the phytoestrogens (flavones) seems to increase inhibitor activity in both types of conversions.

4.4.4 Crystallisation studies of estrogen receptor α and β

The structures of the estrogen receptor (ER) have been studied in this research because it is not desirable that new inhibitors bind into the ER. The binding site of the receptor is a hydrophobic pocket; therefore lipophilic small substrate molecules such as estrogens activate it. ER belongs to the nuclear receptor (NR) family and mediates the physiological effects of estrogens. The ER without ligand is associated with a large multi-protein complex with conformation that enables the ligand binding. After diffusing across the cell membrane E2 1 binds to its receptor and this compound dissociates from the multi-protein complex and binds as a dimer to the estrogen response element (ERE). This ER dimer interacts with a specific sequence of DNA nucleotides, known as a transcriptional enhancer region, and enhancement or repression of transcription occurs.53 However, the NR activation mechanism is not well known.

Recent studies have revealed the existence of more than one distinct estrogen receptor in our bodies. ERα239 and ERβ240 are rather well known currently since several crystal structure studies have been published for both receptors (see Table 6, page 93). Studies have been carried out to find further estrogen receptors and also other estrogen related activation elements such as estrogen-related receptor γ (EERγ).241 Estrogen-related receptors (EERα, EERβ and EERγ) share significant amino acid homology with the ERα and ERβ.242

Both the ERα and ERβ bind E2 1 with the same high affinity and both activated complexes bind to ERE within the DNA. ERα and ERβ share modest overall sequence identity (47%). Instead of that DNA-binding domain (DBD) and ligand-binding domain (LBD) have much higher conservation (94% and 59%, respectively). However, there are specific differences in regions of α and β that would be predicted to influence transcriptional activity and perform 92 specific biological functions. The structural organisation of ER’s consists of six functional regions (A-F) showing various degrees of sequence conservation. The N-terminal A/B domain contains a poorly conserved independent trans-activation function AF-1, whereas D and F regions are not conserved at all. Instead of that regions C and E are highly conserved, C including the DNA-binding domain (DBD) while E includes the ligand-binding domain (LBD). D is a linker peptide between DBD and LBD, while F is a C-terminal extension region of the LBD (Figure 43).243

Ligand binding, Trans- Dimerisation, activation Trans-activation (AF-1) (AF-2)

NH2 A/B C D E F COOH DNA binding, Dimerisation

Figure 43. Schematic representation of the functional domain organisation of ER.

ER substrate (ligand) binding cleft resemble 17βHSD/KSR type 1 ligand-binding cleft. The estrogen receptors ligand-binding cleft is also highly hydrophobic and includes two polar regions located at opposite sides of the binding cleft. These polar amino acid residues are involved in the binding of the E2 1 hydroxyl groups similarly to the type 1 binding site. Both 17βHSD/KSR type 1 and estrogen receptors are capable of binding the molecules that have functional groups about 11-12 Å distance apart. In the case of ERα the phenolic hydroxyl group of the A-ring is hydrogen bonded to Glu353, Arg394 and a water molecule. The hydroxyl group of the D-ring forms a single hydrogen bond with His524. ERβ has the same important amino acid residues (Glu305, Arg346 and His 475) in ligand-binding cleft forming hydrogen bonds to hydroxyl groups in close proximity to these regions. Because of these similar kinds of binding conditions of ER’s and type 1 the novel inhibitor should include interactions, which are selective for the type 1 active site.

In the Brookhaven Protein Data Bank (PDB)4 there is crystal structure information available for twenty-four different ER-ligand complexes (Table 6, page 93) and also for two isomerase and three DNA-binding domain complex studies. In nine of these coordinate files there is information of the ERα complexed with different ligands,244,245,246,247,248,249,250 five of these include crystal structure information of ERβ complexed with diverse ligands248,251,252,253 and 93 five include information regarding EERγ structure.241,242 Three of the crystal structure information files are for DBD complex,254,255 five of them are not particular for the specific receptor,256,257,258 and two are isomerase studies. Recently, six new crystal structures have been released in the Brookhaven Protein Data Bank (1XQC, 1XB7, 1XPC, 1XP9, 1XP6 and 1XP1).4

Table 6. Crystal structure information is available for estrogen receptors from the Brookhaven Protein Data Bank.

Number Entry Release Target Ligand Classification of Ref. Code Date residues Nuclear 1ERE ER LBD E2 1 1518 16.9.1998 257 Receptor (NR) 1ERR ER LBD Raloxifene NR 506 16.9.1998 257 1A52 ERαLBD E2 1 Receptor 516 16.9.1998 245 3ERD ERα LBD Diethylstilbestrol NR 548 8.4.1999 246 4-Hydroxy- 3ERT ERα LBD NR 261 8.4.1999 246 tamoxifen 1QKM ERβLBD Genistein NR 255 28.7.2000 251 1QKN ERβLBD Raloxifene NR 255 28.7.2000 251 Mutant NR 1QKT E2 1 NR 248 18.8.2000 258 LBD Wild Type 1QKU E2 1 NR 750 18.8.2000 258 NR LBD Antagonist 1HJ1 ERβLBD NR 255 4.1.2002 252 ICI164,384 DNA Binding 1G50 ERαLBD E2 1 741 6.2.2002 247 Protein Diethyl- Transcription 1L2I ERαLBD tetrahydrochrysene- Receptor / 548 1.5.2002 248 diol Coactivator Diethyl- Transcription 1L2J ERβLBD tetrahydrochrysene- 542 1.5.2002 248 Receptor (TR) diol 1GWQ ERαLBD Raloxifene NR 514 29.8.2002 249 1GWR ERαLBD E2 1 NR 508 29.8.2002 249 1NDE ERβLBD Triazine TR 255 18.12.2002 253 Gene 1KV6 ERRγLBD - 490 25.1.2003 241 regulation Tetrahydroiso- 1UOM ERαLBD NR 254 3.7.2003 250 chiolin 1PCG ER E2 1 TR 506 28.10.2003 256 Dihydrobenzoxa- Growth Factor 1SJ0 ERαLBD 248 27.4.2004 244 thin derivative E4D Receptor 1S9P ERRγLBD Diethylstilbestrol TR 908 8.6.2004 242 4-Hydroxy- 1S9Q ERRγLBD TR 502 8.6.2004 242 tamoxifen (CR1)* 4-Hydroxy- 1VJB ERRγLBD TR 502 8.6.2004 242 tamoxifen (CR2)* 1TFC ERRγLBD - TR 532 27.7.2004 242

* CR1= Crystal form 1, CR2 = Crystal form 2 94

5. RESULTS AND DISCUSSION

5.1 Development of SynSPROUT knowledge base

The databases of the reaction search, such as REaction ACCess System (REACCS)259 from MDL Ltd. and Organic Reactions Accessed by Computer (ORAC),260 provide chemists with easy access to the proven reaction technology and novel transformations reported in the literature. Another available computer programme called LHASA (Logic and Heuristic Applied to Synthetic Analysis),261 can also be used to generate a synthetic plan by working backward from target compounds by retrosynthetic analysis. The novel SynSPROUT program builds synthetic constraints into the structure generation process by starting with a library of readily available starting materials. It uses retrosynthetic knowledge bases for automatic fragmentation of the structures. The program requires a knowledge base describing these synthetic reactions. This knowledge base is a user-editable text file, which contains chemical patterns describing functional groups and the synthetic joining rules, corresponding to chemical reactions. In this case the knowledge base includes functional groups of azomethine ylide reactions determined using a line notation language called PATRAN.

5.1.1 1,3-Dipolar cycloaddition reactions

The 1,3-dipolar cycloaddition (1,3-DC) is a classic reaction in organic chemistry for the synthesis of five-membered rings. Heterocyclic ring compounds are prepared by addition of 1,3-dipolar compounds to alkenes.262 These reactions are one of the best and most useful methods for the highly diastereoselective and enantioselective construction of five-membered rings. In particular the [3+2] cycloaddition reactions between azomethine ylides and alkenes is a direct route to substituted prolines, which are valuable substrates in synthetic organic chemistry, pharmacology and biology, and also to other structures with a pyrrolidine nucleus.263

The chemistry of the reaction has evolved for more than 100 years, and a variety of different 1,3-dipoles have been discovered. Only a few dipoles have found general application in synthesis after the discovery of the first dipole, diazoacetic ester in 1888.264 Exceptions are ozone and diazo compounds. The synthetic value of the Diels-Alder reaction became obvious soon after its discovery in 1928.265 Huisgen’s systematic studies of 1,3-dipoles in organic 95 chemistry266 in the 1960’s and the new concept of conservation of orbital symmetry by Woodward and Hoffman267 facilitated the understanding of the mechanism of concerted cycloaddition reactions. In recent years the main issue has been the control of the stereochemistry of the 1,3-DC reactions. The selectivity challenge is to control the regio-, diastereo-, and enantioselectivity of the 1,3-DC reaction.

The 1,3-dipoles have a sequence of three atoms a-b-c that react with a double bond (dipolarophile) and form a five membered-ring. The 1,3-DC reactions with alkenes and alkynes involve 4π electrons from the dipole and 2π electrons from the dipolarophile. b b a c a c CC CC

The 1,3-DC reactions can be divided in two different groups: the allyl anion and the propargyl/allenyl anion type (Table 7, page 96). It is possible to draw four different resonance structures for the a-b-c three-atom presentation of allyl anion type structures. Two of the structures include an electron octet (atoms a and c) while two of the structures include an electron sextet (atoms a and c). The allyl type dipoles are bent and the central atom b can be nitrogen, oxygen or sulphur. + + Octet structure b - - b (a and c) a c a c

+ Sextet structure + b - - b c c (a and c) a a

The propargyl/allenyl anion type has an extra π orbital and the resonance structure is linear. The b atom can be only nitrogen. + - - + a a b c b c The 1,3-dipoles are occasionally presented as a hypervalent structure. b a a c b c The 1,3-dipoles usually involve elements from the main groups IV, V and VI, mainly nitrogen, carbon and oxygen atoms. Occasionally sulfur and phosphorus can be incorporated in 1,3- dipoles as well. 96

Table 7. Classification of the 1,3-Dipoles. Allyl anion type

Nitrogen in middle Oxygen in middle

+ + + + Carbonyl CNO C N O Nitrones COO C O O oxides

+ + + + CNN C N N Azomethine CON C O N Carbonyl imines imines

+ + + + CNC C N C Azomethine COC C O C Carbonyl ylides ylides

+ + + + NNN N N N Azimines NON N O N Nitrosimines

+ + NNO N N O Azoxy + + NOO N O O Nitrosoxides compounds

+ + ONO O N O Nitro + + OOO O O O Ozone cmpounds

Propargyl-Allenyl type

Nitrilium Betaines Diazonium Betaines

+ + + + CNO C N O Nitrile oxides NNO N N O Diazoalkanes

+ + + + CNN C N N NNN N N N Nitrile imines Azides

+ + + + CNC C N C Nitrile ylides NNC N N C Nitrous oxide

Two different mechanisms were presented earlier for the 1,3-DC reactions: concerted and diradical mechanisms. The concerted mechanism with the transition state intermediate is favoured on the basis of the stereospecificity of the 1,3-DC reaction. It has been shown that the 1,3-DC reaction can take place by a stepwise reaction involving an intermediate as well. In these cases the stereospecificity of the reaction may be destroyed.

5.1.1.1 Azomethine ylides This group of 1,3-DC reactions belongs to allyl anion type reactions where nitrogen is in the middle. Azomethine ylides are unstable species which have to be prepared in situ. They react with the double bond of a dipolarophile forming pyrrolidine derivatives that are central skeletons of several alkaloids.268

97

R'' R'' + R N R' R - N R' + R R' R R' Azomethine ylides have proven to be extremely rich in their chemistry. The azomethine ylide addition reactions can take place with different dipolarophiles such as symmetrical and unsymmetrical alkenes and alkynes, carbonyls, thiocarbonyls, imine, nitrile, nitroso and azo bonds. A wide range of dipolarophiles lead into a variety of mono- and polycyclic heterocycles.263,268,269,270 The main advantages of these reactions are ready accessibility of reactants, high yields (usually), minimum competing side-reactions (with careful selection of solvent) and control of nearly all parts of the stereo- and regiochemistry.

Azomethine ylides can be generated by a number of methods of which the most general are the thermolysis or photolysis of aziridines, proton abstraction from imine derivatives of α- amino acids and dihydrohalogenation of immonium salts. In addition azomethine ylides may be generated from pyridinium derivatives (such as pyridinium betaine and ylide) and masked azomethine ylides.

• Both thermal and photochemical creation of azomethine ylides involve a conrotatory ring opening of cis- or trans-aziridine.

R' R' H H R' H + - + - OR H N H R' N R' H N R' R R R cis trans It is also possible to generate azomethine ylides using bicyclic and tricyclic aziridines.

H H O O O O H R R N - R N R N N Me + N N O Me O Me O COR • Increased interest in peptide structures has invoked the research of α-amino acid derivatives as a starting point for 1,3-DC reactions. This method generates azomethine ylides from α-amino acid esters.271,272 The imine derivative of an α-amino acid ester is in equilibrium with the azomethine ylide, which may be trapped by a variety of 98

dipolarophiles (α-amino acid → imine → azomethine ylide → cycloaddition product).273 Cyclic secondary α-amino acids have also been examined.274,275

H H H - COOCH + COOCH R N 3 R N 3 R R H H H • In the logical route to azomethine ylides by proton abstraction from immonium salts with base, it is necessary to avoid both addition of base onto the α-carbon and abstraction of a β-proton with consequent formation of an enamine. A general route for this is the dehydrohalogenation of immonium salt. With azomethine ylides derived from quaternary salts of aromatic isoquinoline addition is retained, even though the N=C bond is an integral part of the aromatic ring.

Br

R R + + N N - O O H H H • Pyridinium betaines are generated by base treatment of the halide salts. However, in the absence of dipolarophiles they have an obvious tendency to dimerise. Such dimers has shown synthetic value since they undergo Diels-Alder addition on one side, then thermal 1,3-dipolar cycloreversion. The overall reaction corresponds to the conversion of a 3-oxidopyridinium betaine into a 4-oxidoisoquinolinium.

R R O O N O O N N

+ + C N N N N R R R R O O 3-oxidopyridinium 4-oxidoisoquinolium

• Indolizine may be considered as a masked azomethine ylide by virtue of its dipolar resonance form and actually it undergoes 1,3-dipolar cycloadditions.

+ N N - 99

Mesoionic oxazoles are another group of masked azomethine ylides, reminds α-amino acid esters. O O O O R R + - N H N H R' R'

The 1,3-dipole reactions with alkenes or alkynes involve 4π electrons from dipole and 2π electrons from dipolarophile. This means that the three pz orbitals of the 1,3-dipole and the two pz orbitals of the dipolarophile both combine suprafacially. The transition state of the concerted 1,3-DC reactions are categorised into three groups due to the relative frontier molecular orbital (FMO) energies between the dipole and dipolarophile. Azomethine ylide reactions fall into the first category and the dominant FMO interaction takes place between the HOMOdipole and LUMOdipolarophile. The second category includes both HOMOdipole-

LUMOdipolarophile and LUMOdipole-HOMOdipolarophile interactions because of the similarity of the dipole and dipolarophile FMO energies. Reactions which belong to the third category are dominated by the interactions between the LUMOdipole and the HOMOdipolarophile.

The presence of metals can catalyse the 1,3-DC reactions. The metal can change (lower) the energy of the FMO of both dipole and dipolarophile. Thus the coordination of a Lewis acid to the dipole or the dipolarophile is of great importance for asymmetric 1,3-DC reactions. Lewis acid may also affect the selectivity of the 1,3-dipole reactions, since regio-, diastereo- and enantioselectivity can be controlled by the presence of a metal-ligand complex.263,271

5.1.1.2 Stereochemistry of the 1,3-dipolar cycloaddition reactions The most important characteristics of azomethine ylide reactions are those pertaining to chemoselectivity, stereochemistry, regiochemistry and reactivity. The stereochemistry of the 1,3-DC reaction can be controlled by either choosing the appropriate substrates or controlling the reaction by a metal complex acting as a catalyst. With respect to both the dipole and dipolarophile the 1,3-DC reactions of azomethine ylides are characterised by stereospecificity. In general, 1,3-DC reactions display endo stereoselectivity similar to the Diels-Alder reaction and either regiospecificity or regioselectivity.

100

Concerted cycloaddition reactions are among the most powerful tools for stereospecific creations of new chiral centres in organic molecules. Concerted cycloaddition reactions are one of the most potent tools for stereospecific creation of new chiral centres in organic chemistry. When 1,2-disubstituted dipolarophiles are involved in concerted cycloaddition reactions with 1,3-dipoles, two new chiral centres can be formed in a stereospecific manner due to the syn attack of the dipole on the double bond. If the 1,3-dipole or dipolarophile contain a chiral centre, the approach toward one of the faces of the substrate can be discriminated, leading to a diastereoselective reaction. The reaction is enantioselective if optically active products are obtained from achiral or racemic starting materials. The enantioselectivity can be controlled by either choosing a chiral 1,3-dipole, a chiral alkene or a chiral catalyst

The stereochemistry of the azomethine ylide reaction can be controlled by either choosing chiral azomethine ylides or chiral alkenes. Chiral metal catalysts for azomethine ylide reactions give chiral products.276 Asymmetric intramolecular 1,3-DC reactions of azomethine ylides with alkenes also produce chiral pyrrolidine derivatives.277 A number of 1,3-dipolar azomethine ylide reactions creating optically active products from chiral starting materials have been review by K. Gothelf.

5.1.2 Azomethine ylide chemical patterns and joining rules It was observed that pyrrolidine moieties might constitute a fundamental structural element in the sought-after 17β-HSD/KSR1 enzyme inhibitor candidates. Thus, a literature survey of azomethine ylide reactions was carried out and a new knowledge base for azomethine ylide reactions was created. The chemical label scripts were written for the most general azomethine ylides and different dipolarophiles. The chemical pattern description includes the atom and bond types and the connections between atoms. The number of the terminal hydrogen atom, hybridization, and heteroatoms next to the atom can also be specified. The chemical label pattern should be as simple as possible, thus as few features have to be used as possible for description. The general atom and bond features used for chemical labels are collected in Table 8 (page 101).

101

Table 8. Atom and bond features used for chemical label pattern description.

Atom feature Explanation Possible values Example HS Number of hydrogens 0, 1, 2, 3, 4 [HS=1,2] Number of HETS heteroatoms 0, 1, 2, 3, 4 [HETS=1,2] (next to an atom) SPCENTRE Hybridization level 0, 1, 2, 3, 4 [SPCENTRE=2] Whether an atom is ARYL YES, NO, EITHER [ARYL=YES] aromatic Size of the ring atom YES, NO, EITHER, RINGS [RINGS=5,6] is part 1, 2, 3, 4, 5, 6… CONNECTIONS Number of bonds 0, 1, 2, 3, 4 [CONNECTIONS=2] NEUTRAL, CHARGE Charge of an atom CATION, ANION, [CHARGE=ANION] RADICAL, YES Whether an atom is a STEREO YES, NO, EITHER [STEREO=YES] stereocentre Number of halogens HALOGENS 0, 1, 2, 3, 4 [HALOGENS=0] (adjacent to an atom)

Bond feature Explanation Possible values Example Quality of the - (single), = (double), BOND C-C=C bond # (triple), & (any bond) Size of the ring YES, NO, EITHER, 1, 2, RINGS [RINGS=3,4] bond is part 3, 4, 5, 6… Whether a bo nd FUSION YES, NO, EITHER [FUSION=YES] is fused

Chemical label for azomethine ylide is presented as follows: R'' + R - N R'

R R' CHEMICAL-LABEL …STARTP …C[SPCENTRE=3];[CHARGE=ANION];[HS=0,1];[CONNECTIONS=2,3]- …N[SPCENTRE=2];[CHARGE=CATION];[HS=0,1];[CONNECTIONS=2,3]= …C[SPCENTRE=2];[HS=0,1];[CONNECTIONS=2,3] …ENDP

102

A set of chemical label descriptions were also written for various dipolarophiles. • The chemical label of alkenes include a general description for acyclic symmetrical (cis and trans) and unsymmetrical molecules, as well as cyclic alkenes such as cyclobutanes, maleic anhydrides, cyclopropenone and cyclopropenethione. • Alkynes include acyclic symmetrical and unsymmetrical and also benzyne descriptions. • The chemical labels for carbonyl bonds include description for structures such as aldehyde, ketone and ketene. • Structures including tiocarbonyl bonds such as isothiocyanates and carbon disulfide are described. • Chemical labels have also been written for imine, nitrile, nitroso and azo bonds.

The chemical label for alkyne, carbonyl and carbon disulfide are described as follows:

CHEMICAL-LABEL …STARTP …C[SPCENTRE=1];[CONNECTIONS=2]#C[SPCENTRE=1];[CONNECTIONS=2] …ENDP

CHEMICAL-LABEL …STARTP …C[SPCENTRE=2];[HETS=1]=O …ENDP

CHEMICAL-LABEL …STARTP …C[CONNECTIONS=2];[HS=0](=S)(=S) …ENDP

The joining rule scripts have also been written using rules similar to those defined in the PATRAN language for acyclic reactions (see section 3.3.1.1, page 51). However, this knowledge base does not work using the current PATRAN language because of a lack of commands for ring formation. 103

Azomethine ylide reactions produce five-membered rings, which are typically based on pyrrolidine. In the azomethine ylide reactions two bonds are formed simultaneously. With current PATRAN language it is not possible to define a five-membered ring, which is formed in these reactions. New command lines need to be defined for cyclic reactions in PATRAN language to ensure the function of the joining rules. The creation of the new commands includes the search of all possible five membered rings, and their bond length and angle information from the Cambridge Database. After definition of the five-membered rings, such as pyrrolidine, pyrroline, 3-pyrroline, imidazolidine, oxazoline, thiazolidine, and programming the ring formation commands into the PATRAN language, it is possible to write joining rules for reactions producing a pyrrolidine ring.

Azomethine ylide formation is the first step on the reaction path leading to a pyrrolidine ring (Scheme 14, Step 1). Azomethine ylides reacts with a dipolarophile so that two azomethine ylide atoms (1 and 3) react with two double bond atoms (5 and 4) forming a five membered ring (Scheme 14, Step 2). The new knowledge base should recognise this situation and replace these bonds 1, 2, 3, 4, and 5 with the pyrrolidine ring bond length and angle information.

H N Step 1 H O + H2N O O Dihedral angle 0°, 180° O H O 5 4 H 1 3 Step2 2 H N H N O O Replace bonds 1, 2, 3, 4 and 5 by pyrrolidine ring

Scheme 14. Step 1 shows the reaction between formaldehyde and glycinaldehyde forming methyleneamino acetaldehyde. In Step 2 the azomethine ylide formed in Step 1 reacts with an alkene (in this case 2-propenal) to form a pyrrolidine derivative.

104

5.2 Inhibitor design for 17βHSD/KSR1

A good inhibitor for the 17βHSD/KSR type 1 enzyme must form multiple contacts with steroid binding pocket residues to increase the ligand binding. This includes hydrogen bonds to catalytically essential amino acid residues and hydrophobic interactions with the hydrophobic core of the binding site. Theoretically, new ligand should also mimic the planar shape of steroid molecules in order to become complementary to the shape of the active site.

Selected crystal structure complexes have been analysed using the SPROUT program (section 5.2.1). In the next stage novel inhibitor molecules have been generated for the crystal structure active site (see section 5.2.2, page 119). Since the conformations generated are not necessarily the most stable or at energy minimum, energy optimisation simulations and structure modifications have been performed for the selected molecules (see section 5.2.3, page 136). The superiority of the selected modified molecules has been proved with in silico docking simulations into the original active site (reasonable activity) and into the ER α and β active sites (no activity, see section 5.2.4, page 152). The example of the retrosynthesis and synthetic plan has been made (see section 5.2.5, page 156) and the example of the CAESA retrosynthesis has also been presented (see section 5.2.6, page 158).

5.2.1 Crystal structure selection

The design work started with the selection of suitable crystal structures as pdb-files. Four different ligand complexes were chosen from the Protein Data Bank (PDB) for the inhibitor design of 17βHSD/KSR type 1 enzyme. The selection was made at the time when these four crystal structures were the latest of eleven possible 17βHSD/KSR enzyme type1 - ligand complexes. The first choice was a most recent type1 complex with E2 1 and cofactor 6 (entry code 1A27). The second was an enzyme complex with EQU 2 and NADP 6 (1EQU) because the coordinates included some new information regarding the substrate entry path. In addition, two crystal structure complexes with androgen ligands 3 (1DHT) and 4 (3DHE) were chosen due to the difference in the estrogen ligand and lack of cofactor. ERα (1L2I) and ERβ (1L2J) crystal structures were also selected, not for structure generation but for later examination to make sure newly generated molecules would be 17βHSD/KSR type 1 specific.

105

Programs used for this study operate on the Silicon Graphics platforms (Indigo2000, O2 and Octane) using either the Unix or Linux operating system. All crystal structures were loaded into the SPROUT program and analysed for the inhibitor generation. In the CANGAROO module the ligand and receptor were separated and saved into two different pdb-files - cavity and receptor. It was not necessary to use the whole receptor for modelling as a 15 Å diameter circumference around the ligand was sufficient for it to included sufficient amino acid residues. The ‘Grid Box Margin Tolerance’ used in HIPPO was 4.0 Å for all boundaries analysed. Other options in HIPPO module were set as a default.

5.2.1.1 Active site study of estradiol complex This crystal structure (1A27) contained the coordinates for the monomeric form of the 17βHSD/KRS type1 enzyme, E2 1 ligand and NADP+ 6 cofactor, although the enzyme was active as a homodimeric structure. HIPPO explored this active site and identified the target sites (Figure 44 a) and the boundary (Figure 44 b).

(a) (b) Figure 44. a) The ligand within the active site encircled with target sites. Blue hemispheres are hydrogen bond donor sites and red areas acceptor sites. b) Active site boundary area (green) surrounded by target sites.

The active site boundary included nine acceptor and twenty donor sites. The Ser142 acceptor site seemed to have important interaction with E2 1 according to SPROUT. Analysis of the ligand revealed that the length of the hydrogen bond between Ser142 and 17-hydroxy group to be 2.31 Å and logKi, also called ‘predicted binding value’ or Total Score in the HIPPO, is - 7.61 (24.5 nM). There were five amino acid residue atoms, which were predicted to be too 106 close to the ligand atoms (negative value) and might cause van der Waals (vdW) overlaps. The lists of the statistics of these target sites and predicted vdW connections between ligand E2 1 and receptor are shown in Table 9.

Table 9. The statistic of SPROUT analysis for the 17βHSD/KRS1 - E2 - NADP+ complex. The list of amino acid residues of the boundary surface defined by SPROUT show the possible interactions sites. The Sat (satisfaction) column gives the list of amino acid residues, which have hydrogen bond interaction with the ligand. The vdW interaction table lists the amino acid residues, which have predicted vdW interactions with ligand. SPROUT numbering for the ligand (used in table) is indicated into the molecule. Negative distance values indicate the vdW overlaps, which mean that the atom of the amino acid is too close to the ligand to form good interactions.

Hydrogen bonding site coverage statistics:

Predicted van der waals interactions:

18 19 OH 12 17 11 13 16 0 9 14 1 10 8 15

2 5 7 HO 4 6 3 EST = Estradiol 1

Number of pairwise VdW overlaps: 5 107

The interactions between the ligand and the enzyme analysed by SPROUT were generally similar to those published earlier in connection to the type 1 crystal structure studies. All catalytically important (Tyr155, Ser142, Lys159 and Asn114) and recognition end related anchoring amino acid residues (His221 and Glu282) were on a boundary surface and in close proximity to the hydroxy groups (Figure 45). In addition, some other polar amino acids, such as Tyr218, Ser222, Asn152, Arg258 and Cys185, generated specific features for the binding cavity and produced possible starting points for the structure generation.

Figure 45. E2 1 ligand was situated in the binding pocket surrounded by the important amino acids residues (labelled).

The boundary surface calculated by HIPPO shows acceptor (red) and donor (blue) regions not only in the recognition end and catalytic end of the cleft but also in the middle of the pocket, near position 7 of the B-ring of the steroid skeleton. Otherwise the ligand-binding pocket was mainly hydrophobic (green). Figure 46 a and b (page 108) clearly illustrates the hydrophobicity and the narrowness of the binding pocket, with the polar regions of the pocket. 108

Catalytic end

Recognition end

(a) (b) Figure 46. The boundary surfaces presented as a grid wire. a) Green region is hydrophobic, while red and blue regions are hydrogen bond acceptor and donor regions, respectively. The β-face of the steroid skeleton is towards the viewer. b) Particular parts of the binding pocket are quite limited, especially β-face of the steroid skeleton (β-face upward), as well as some parts of the α-face (close the A-ring of the steroid).

5.2.1.2 Active site study of equilin complex The crystal structure of the 17βHSD/KSR1 with EQU 2 ligand and NADP+ 6 cofactor (1EQU) is homodimeric. Due to the poorly defined electron density of the B-subunit, the EQU 2 ligand was visible only in the A-subunit. In this study, only the active site of the A-subunit was used for modelling. In the CANGAROO module the ligand was selected as a cavity file. The receptor file was formed by amino acid residues in diameter of 15 Å from the EQU 2 ligand. The active site with the donor and acceptor sites calculated by HIPPO is shown in Figure 47.

(a) (b) Figure 47. a) The EQU 2 ligand within the active site surrounded by target sites. b) The active sites and the hydrophobic boundary. 109

The boundary consisted of eight acceptor sites and sixteen donor sites. The target sites for this crystal structure complex were slightly different than for the E2 1 complex, in particular the main hydrogen bond interaction, which was formed with His221 instead of Ser142.

According to ligand analysis the length of the hydrogen bond was 2.30 Å and logKi of the ligand was -8.27 (5.4 nM). Compared to the E2 1 complex, the boundary lacked a few asparagine and glutamine atom interactions, having arginine nitrogen atom interactions instead (Table 9, page 106). In addition to hydrogen bonding the vdW interactions were different, including several leucine and valine interactions instead of phenylalanine and methionine ones. Thus, simple hydrophobic amino acid residues were replaced with larger hydrophobic amino acid residues. Between the ligand and the receptor there were six poor vdW interactions altogether (Table 10, page 110).

110

Table 10. The statistic for the 17βHSD/KSR1 - EQU 2 complex.

The list of amino acid residues of the boundary surface defined by SPROUT show the possible interactions sites. The Sat (satisfaction) colu mn gives the list of amino acid residues, which have hydrogen bond interaction with the ligand. The vdW interaction table lists the am ino acid residues, which have predicted vdW interactions with ligand. SPROUT numbering for the ligand (used in table) is indicated into the molecule. Negative distance values indicate the vdW overlaps, which mean that the atom of the amino acid is too close to the ligand to form good interactions.

Hydrogen bonding site coverage statistics:

Predicted van der waals interactions:

19 17 O 12 16 13 11 15 4 9 10 5 3 8 14 18 0 2 7 HO 1 6 EQI = Equilin 2

Number of pairwise VdW overlaps: 6

111

The natural ligand EQU 2 and important amino acid residues within the boundary are shown in Figure 48.

Figure 48. Ligand surrounded by possible target sites.

The boundary surface was approximately the same shape as the E2 1 surface. The only exception was the acceptor and donor regions of the surface, such that EQU 2 had stronger donor regions at positions 6-7 and 17 of the steroid skeleton (Figure 49), whereas the donor region of E2 1 was stronger at position 3.

Catalytic end

Recognition end

(a) (b) Figure 49. a) The EQU 2 ligand was situated within the hydrophobic pocket possessing three acceptor/donor regions at positions 3, 6-7 and 17 of the skeleton. b) Two acceptor regions are situated at both ends of the skeleton. One donor region (at position 17) is sited on the β-face and two others (3 and 6-7) are located on the α-face of the skeleton.

112

5.2.1.3 Active site study of dihydrotestosterone complex The type 1 enzyme with DHT 3 ligand (1DHT) was crystallised without NADP+ and was in a monomeric form. The active site was saved in two files and analysed by HIPPO. The target sites produced are presented in Figure 50 a. and the boundary in Figure 50 b.

(a) (b) Figure 50. a) Target sites with the DHT 3 ligand. b) Hydrophobic boundary surface surrounded by target sites.

This boundary included nine possible acceptor and twenty-two possible donor sites for hydrogen bonding. As in the case of the EQU 2 complex, the His221 formed satisfactory hydrogen bond interactions with the ligand atom (bond length 2.0 Å). The predicted binding value for the ligand was -8.62 (2.4 nM). The list of receptor atoms capable of forming hydrogen bonds with DHT 3 ligand (Table 11, page 113) seemed to be practically the same as in the case of the E2 1 complex (Table 9, page 106). Fourteen of the predicted vdW interaction values were negative, which meant that the receptor atoms were too close to the ligand atoms and could cause vdW clashes. It seems like His221 and Tyr218 formed most of these vdW overlaps, histidine with C2, O3 and C4 atoms and tyrosine with C6 and C7 atoms (Table 11, page 113). According to this analysis Leu149 was expected to form several interactions with the molecule’s methyl group at position 19 as explained in previous publications regarding crystal structure information. (see page 85).

113

Table 11. Hydrogen bond and vdW information for the 17βHSD/KSR1-DHT 3 complex.

Th e list of amino acid residues of the boundary surface defined by SPROUT show the possible interactions sites. The Sat (satisfaction) column gives the list of amino acid residues, which have hydrogen bond interaction with the ligand. The vdW interaction table lists the amino acid residues, which have predicted vdW interactions with ligand. SPROUT numbering for the ligand (used in table) is indicated into the molecule. Negative distance values indicate the vdW overlaps, which mean that the atom of the amino acid is too close to the ligand to form good interactions.

Hydrogen bonding site coverage statistics:

Predicted van der waals interactions:

18 19 OH 12 17 20 11 13 16 0 9 14 1 10 8 15

2 5 7 O 4 6 3 DHT = Dihydrotestosterone 3

Number of pairwise VdW overlaps: 14 114

The ligand and amino acid residues, which are relevant to the DHT 3 binding are labelled in Figure 51.

Figure 51. DHT 3 is surrounded by amino acid residues, which are capable of making hydrogen bond connections with the ligand.

The differences between estrogen and androgen steroid skeletons are aromatic A-ring (estrogens) and C19 at position 10 forming methyl-group (androgens) (Figure 30b, page 57). These dissimilarities caused changes in the active site boundary compared to the boundaries identified above. The boundary was mainly hydrophobic and DHT 3 had three donor regions on the surface like E2 1 and EQU 2. Moreover, dissimilarities appeared within the boundary in both ends of the binding pocket. The recognition end of the binding pocket was narrow and tight especially at positions 3 to 7 and even the hydroxy group at position 3 stuck out of the boundary (Figure 52 a, page 115). Contrary to that, the other end of the binding pocket seemed to be wide, looking towards β-face, although when the steroid skeleton was perpendicular to that (Figure 52 b, page 115), the pocket appeared to be narrow from the catalytic end as well. From this angle the recognition end looks wider than with estrogen ligands and this could be attributable to the character of C19 methyl-group and staggered conformation of cyclohexane ring. Under these circumstances the DHT 3 binding pocket lacked the fork-like shape of the boundary in the catalytic end of the binding pocket.

Catalytic end

Recognition end 115

Catalytic end

Recognition end

(a) (b) Figure 52. a) The androgen ligand boundary is different compared to the estrogen ligand at both ends of the biding pocket. b) Non-aromatic A-ring and C19 methyl-group of androgens change the conformation of the steroid skeleton compare to more planar estrogens.

5.2.1.4 Active site study of dehydroepiandrosterone complex The type 1 enzyme with DHEA 4 ligand (3DHE) was also in monomeric form without cofactor. The target sites produced by HIPPO (Figure 53) were much closer to the ligand than in case of DHT 3 or E2 1. Especially the β-face of the ligand seemed to be overcrowded.

(a) (b) Figure 53. a) DHEA 4 ligand is nearly entirely covered with target sites from β face. b) Hydrophobic boundary surrounded by target sites.

The boundary contained eight possible acceptor and eighteen possible donor sites. There were the same hydrogen bond interactions between ligand and the His221 (Nε2) as the DHT 3 ligand had, but the length of the bond was shorter (1.62 Å). The rest of the potential receptor atoms capable of becoming hydrogen bond acceptor sites were different to those previously 116 discussed and predicted binding activity -8.83 (1.5 nM) for the DHEA 4 was the best of these 17βHSD/KSR1 complexes. Met193 and Leu149 amino acid residues were predicted to form several undesirable vdW interactions with the DHEA 4 ligand (Table 12). Met193 is important for the substrate entry loop and Leu149 for the androgen recognition, detecting methyl-group at position 19. However, both of these amino acid residues seemed to be too close to the ligand since seven out of nine vdW clashes were predicted due to these residues.

Table 12. Hydrogen bond and vdW statistics specify possible interactions between receptor and DHEA 4 ligand.

Hydrogen bonding site coverage statistics:

Predicted van der waals interactions:

18 19 O 12 17 11 13 20 16 0 9 14 1 10 8 15

2 5 7 HO 4 6 3

DHEA = Dehydroepiandorosterone 4

Number of pairwise VdW overlaps: 9 117

Amino acid residues, which might be important for DHEA 4 interactions are shown in Figure 54.

Figure 54. DHEA 4 is shown with the pertinent amino acid residues.

The binding pocket of the active site was the most spacious of all four boundaries. The hydrophobicity and the donor site regions of the boundary were similar to others. However, the size of the polar regions were more extensive and the whole binding pocket was wider; both in the recognition and catalytic ends of the pocket (Figure 55 a). Even the most restrictive narrowness of the α- and β-face of the ligand had disappeared (Figure 55 b).

Catalytic end

Recognition end

(a) (b) Figure 55. a) Both ends of the binding pocket are hydrogen bond donor sites. b) The boundary is much wider than the three other boundaries mentioned earlier.

118

5.2.1.5 Active site studies of the estrogen receptor α and β

The binding sites of the ERα (1L2I)248 and ERβ (1L2J)248 were also examined using HIPPO boundary surface analysis. Both selected estrogen receptor structures were crystallised with 5,11-cis-diethyl-5,6,11,12-tetrahydrochrysene-2,8,diol. ERα had a homodimeric structure, whereas ERβ was monomeric.

OH

HO

5,11-cis-diethyl-5,6,11,12-tetrahydrochrysene-2,8,diol

SPROUT analysis for ERα predicted two main hydrogen bond interactions with His524 and with Arg394. Hydrogen bond site coverage information showed three possible acceptor sites and seven donor sites. SPROUT’s HIPPO score for ligand was -10.89 (0.13 pM). According to published information hydrogen bond interaction was also formed to Glu353. In the case of steroid structure the A-ring formed the interactions with Arg394 and Glu353 and the D-ring with His524. Excluding these regions the active site was hydrophobic (Figure 56).

Figure 56 ERα boundary was mainly hydrophobic including two polar regions at both ends of the active site.

According to HIPPO analysis ERβ formed only one hydrogen bond interaction with Arg346. Hydrogen bond site coverage information included six possible acceptor sites and nine donor sites containing important His475 and Glu305. The scoring value was -9.70 (20 nM). The boundary surface was hydrophobic including three polar regions. These regions formed the 119 triangular shaped pattern in a similar way to those in 17βHSD/KSR type 1. Arg246 and Glu305 formed one polar region, His475 a second polar region and Thr299 completed the triangular shaped pattern (Figure 57). The hydrophobic boundary was more voluminous than in the case of ERα.

Thr299

Glu305

His475 Arg246

Figure 57. The boundary of the ERβ is mainly hydrophobic including three polar regions.

5.2.2 Structure generation

The first Job file was created using SPROUT version 3.4. The HIPPO file included target sites essential to ligand binding based on published information. The updated version of the SPROUT 4.0 and 4.11 was released during this study and it was found that the old files were not compatible with the new released version. Under these circumstances new Job files were created for the SPROUT 4.0 and work continued accordingly. The SPROUT was updated several times during this period, version 5.0 (commercially available) and 6.0 (in development use) being the latest ones. At the beginning some of the selected HIPPO files with starting templates from the ELEFANT did not give any proper results in the SPIDER module. Start templates as well as spacer templates seemed to be unsuitable for the selected target sites. As a result a new template library was created (Figure 58, page 120). It included templates such as phenol, naphthalene, biphenyl, pyrrole, furan, thiophene, pyridine, 2-aminopyridine, pyrimidine, indole and purine. Structures were generated and minimised in the MacroModel program. These files (.pdb) were converted into an MDL mol-file using the Babel program (version 1.6).278

120

Figure 58. New partial structures were added into the template library with the basic selection of the templates.

Two of the crystal structure complexes included NADP+ as a cofactor. This molecule structure turned out to be problematic. The program could not recognise the cofactor structure and this caused problem in analysing boundary interactions in HIPPO. The program also viewed the molecule with incorrect bonds. Because of this only one file included NADP+ molecules with the corrected bond information (1EQU).

Some selected HIPPO files with the start templates gave either rather poor results or no results at all in the connection phase (SPIDER). This was mainly because the steric-, electrostatic- and/or hydrophobic properties were violated. Overall more than sixteen SPROUT structure generation simulations were made for this research. From all Job files, created for the 17βHSD/KSR1 complexes (two for E, three for EQU, two for DHT and two for DHEA), HIPPO files with the best skeletons were chosen to be presented in detail here.

In the ELEFANT module all selected target sites from HIPPO were labelled as a number. Here all target sites are labelled as in SPROUT, emboldened and italicised accordingly.

121

5.2.2.1 Structure generation for estradiol complex In addition to the existing hydrogen bond interaction presented by SPROUT, some other essential amino acid residues were selected for the active site based on published information. Structure generation for 17βHSD/KSR1- E2 1 complex using known amino acid residues as a target site (Ser142, Tyr155, His221 and Glu282) and some hydrophobic spheric sites gave a result with poor predicted binding (score) values. Consequently, the HIPPO file including acceptor Asn152 (Nδ2) 4, acceptor His221 (Nε2) 5, donor Tyr218 6 and three hydrophobic spheric sites 1, 2 and 3 (Figure 59) have been described in detail here, since it generated plausible results with good scores.

1 2 3

5 4 6

Figure 59. Selected sites Asn152 4 (acceptor, red), His221 5 (acceptor) and Tyr218 6 (donor, blue) for 17βHSD/KSR1 - E2 1 are presented with the three spheric sites (1, 2 and 3 green).

Two of the selected target sites (Tyr218 and Asn152) were at positions 7 and 14 of the steroid skeleton (Figure 59 a, page 122). All target sites were identified at the ‘lower half’ of the binding pocket (indicating bottom half of Figure 59 a); His221 in the recognition end of the pocket and Tyr218 and Asn152 right of the pocket (Figure 59 b, page 122). Hydrophobic spheric sites were selected within regions, which were the most hydrophobic (yellow area in Figure 59 c, page 122).

122

(a) (b) (c) Figure 60. a) Selected target sites and hydrophobic spheric sites (green) in the active site with E2 1 ligand. b) Target sites and the boundary of the active site (red areas have acceptor and blue areas donor character while green and yellow is hydrophobic areas). c) Hydrophobic sites are placed in most hydrophobic areas (yellow surface) within the active site (hydrophobic surface area cut-off is here 60%).

The selected target sites were divided into five groups where His221 5 and one of the spheric sites 1 formed a combined target site (Figure 59, page 121). The starting templates used for the five groups were: • Combined target site (His221 5 and spheric site 1) including phenol, aromatic 5- and 6-membered ring templates. Eight templates, differently positioned within the combined target site, have been accepted; three phenols, two aromatic 5-membered and three aromatic 6-membered ring templates. • Spheric site 2 has five accepted templates; aromatic 5-membered ring, aromatic 6- membered ring, naphthalene and two biphenyls. • Spheric site 3 has the same five start templates; aromatic 5-membered ring, aromatic 6-membered ring, naphthalene and two biphenyls. • Asn152 4 has five accepted templates; C sp3, amide, double bond, 5-membered ring and aromatic 6-membered ring. • Tyr218 6 also has five templates; C sp3, two amides and two amidines.

Usually aromatic 5- and 6-membered rings represent various heterocyclic structures with hydrogen donor properties (pyrrole, imidazole, pyridine etc.) in SPROUT. However, some of these specific templates were also added in the new template library.

In the SPIDER module the selected target sites with templates above were connected to each other. Originally this final stage of skeleton generation was the most time consuming part of design, although nowadays with more powerful computers and parallel versions of SPROUT 123

(v 4.0 and later versions) the linking is considerably expeditious. Presented here are the results of the three libraries of skeletons made in different connection orders, since the results were slightly dissimilar if connections were made in variable order. It is possible to change some options of the connections. The most common options are collected in the Table 13, the default values and the spacer templates used for these connections are shown in Figure 61.

Table 13. Option values for the connections in SPIDER module.

Maximum vertices 40 Minimum ring percentage 25 Max. 3-memebered ring 1 Max. 4-memebered ring 1 Max. spiro joins 1 Max. fused joins 2 Max. 5-memebered ring 2 Max. 6-memebered ring 3 Max. rotatable bonds 30 Seed vertex tolerance 1.7 Max. chain length 5 Max. new bond joins 20

O N

Figure 61. Spacer templates used for these connections. Dashed lines (so called dummies) indicate direction and number of the possible interactions.

The first tree pair connections were made between combined target sites 1 and 5 and spheric site 2 (Figure 59, page 121). The result of these connections were 292 diverse structures. These partial skeletons were scored in the ALLIGATOR module and 223 of the skeletons were selected for further connections. All 69 pruned structures were partial skeletons with an unstable pentalene structure 23. The second tree pair connection was between the original set of skeletons 1, 2, 5 and Tyr218 6, which gave 64 skeletons. This amount of skeletons was moderate; there was no need to prune this set before the next tree pair connection. The third connection between 1, 2, 5, 6 and 3 (Figure 59, page 121) gave five structures all including an ester-alike partial skeleton 24 which was undesirable. 124

23 24

These tree pair connections led to the forest, which included skeletons unfit for use. However, the first connection was good and gave suitable structures, therefore the new forest was made connecting tree pairs in different orders as follows: The tree pair 1, 2, 5 was connected with spheric site 3 instead of Tyr218 6 (Figure 59, page 121). This connection led to a result including 480 skeletons. After this Tyr218 6 was linked with 1, 2, 3, 5 without pruning and 134 skeletons from this connection were scored. The last connection with Asn152 4 gave 48 skeletons. The skeletons with partial structure 24 were pruned and the group included 41 skeletons with good predicted binding values (-8.29 to -9.23 which is 5.13 nM to 0.59 nM). Linking this set with the target site 6 gave no result thus the final set 1, 2, 3, 4 and 5 included 41 skeletons. These were examined closely and inadequate structures were pruned. Most of the skeletons were identical having different conformation therefore only two basic skeletons 25 and 26 remained for further studies (Appendix II). The skeletons were substituted and minimisation studies executed using the Moloc modelling program.

25 26

Another library of structures was formed linking 4 (Figure 59, page 121) with the earlier connected set 1, 2, 5, 6, including 64 skeletons. Linking the resulting set, including 10 skeletons, with 3 gave no result. A final result of 10 skeletons was scored and two of the ten skeletons were selected for further examination. Skeleton 27 included 5- or 6-membered rings and 28 two 5-membered rings (page 125). Analogues of these skeletons were examined and optimised in detail (see Appendix II and chapter 5.2.3, page 136). 125

27 28

5.2.2.2 Structure generation for equilin complex Nine target sites were selected from the EQU 2 active site. Numbering for these target sites was as follows (Figure 62): three spheric sites were labelled as 1, 2 and 3, acceptor sites were His221 (Nε2) 4, Arg258 (Nη1) 5 and Arg258 (Nδ2) 6 and donor sites were Asn152 (Oδ1) 7, Tyr155 8 and Ser222 9.

8 5 3 6

4 1

7 9

Figure 62. Acceptor and donor sites selected for structure generation of the EQU 2 active site are presented as a red and blue areas and spheric sites green. Spheric site 2 is behind Asn152 7 donor site.

Two of the spheric target sites were set into the EQU 2 ligand region (Figure 63 a, page 126), which were also the most hydrophobic areas within the boundary (Figure 63 c, page 126). Most of the target sites were further away from the ligand, which made it easier to create a bulkier ligand than the original. Three selected donor sites were situated within the regions where the boundary had strong hydrogen bond donor properties (Figure 63 b, page 126). In 126 addition to His221 acceptor site 4, two other good acceptor sites (5 and 6) were selected using information based on the ‘hydrogen bonding site coverage statistic’ table (Table 10, page 110).

(a) (b) (c) Figure 63. a) Two of the spheric sites are situated in ligand region, while the third is behind the Asn152 donor site. b) Target sites with the boundary. c) The hydrophobic area is smaller than in crystal structure complexed with E2 1 although the hydrophobic surface area cut-off is same.

The selected target sites were divided into six groups. His221 4 and spheric site 1 were combined as well as Asn152 7 and spheric site 2 (Figure 62, page 125). Arg258 N1 5 and N2 6 form the third combined group with the ability to bind two oxygens of the carboxyl group. Used templates for this structure generation were: • The first combined group (4 and 1) with three phenols, two aromatic 5- and 6- membered ring templates. • The second combined group (7 and 2) with two amide, three amidines, five purine and indole, pyridine and pyrrole templates. • Last combined group (5 and 6) with one amidine-like template, which in this case was carboxyl template and was docked in such a way as to form two hydrogen bonds with the nitrogens of guanidine group of arginin. • Spheric site 3 with aromatic 5- and 6-membered ring templates. • Tyr155 donor site with single atom and two amide and amidine templates. • Ser222 donor site with a wider selection of templates including single atom, amidine, two amide, pyrrole, pyridine, pyrimidine, indole and three 2-aminopyridine templates.

In the SPIDER these target sites were connected in several different ways. The best ligands were collected from six different groups. Selected options for these connection were slightly 127 different than in the case of E2 1 complex, mainly because default values of 3- and 4- membered ring and spiro joins were changed to naught and maximum rotatable bond value was decreased significantly (Table 14). Spacer templates used in these connections are specified in every connection.

Table 14. Selected preference values for template connections.

Maximum vertices 42 Minimum ring percentage 25 Max. 3-memebered ring 0 Max. 4-memebered ring 0 Max. spiro joins 0 Max. fused joins 5 Max. 5-memebered ring 2 Max. 6-memebered ring 5 Max. rotatable bonds 6 Seed vertex tolerance 1.7 Max. chain length 5 Max. new bond joins 20

Two different kinds of sets of connections were made for these target sites. The first set included skeletons, which were made connecting target site 3 with combined group 5 and 6. The second set contained skeletons, where the first connection was between target site 3 and combined group 1, 4 (Figure 62, page 125).

The first connection was between combined Arg258 group 5 and 6 and spheric site 3 templates with a single atom spacer template. The result was 427 skeletons, which were explored using a ‘substructure search’ for structures 23 and 24. All undesirable skeletons were deleted and the rest of the skeletons were scored. Skeletons (87) with good predicted binding values were selected for the next connection. Linking was made between 3, 5, 6 and combined group 1 and 4 (Figure 62, page 125) with single atom and double bond spacer templates. The rest of the connections did not give any result so group 1, 3, 4, 5, 6 was examined and two of 17 skeletons (29 and 30, page 122) with scoring values better than –8.0 (10 nM) were selected for further modelling (Appendix III).

128

29 30

The result from the previous set (17 skeletons) was connected with target site 9 (Figure 62, page 125), Ser222 without spacer template. This result included only two skeletons, which did not represent any significant improvement. Scoring for the skeletons were rather low, such as -7.45 (35 nM) for 31, and these skeletons were discarded.

31 The possibility to connect the group 5, 6 and 3 differently was also examined. Instead of connecting target sites only with single atom template, the first connection was made with single atom, double bond, aromatic 5- and 6-membered ring and naphtalene spacer templates resulting in 484 skeletons. The result was linked with combined group 1, 4 using a single atom as a spacer template. The group 1, 3, 4, 5, 6 included 8 skeletons, which resembled skeletons 29 and 30 from the first set but the scoring value was poorer than -7.50 (32 nM). However, the connection with Tyr155 8 (Figure 62, page 125) gave 5 skeletons with good scores (up to -9.44 which is 0.36 nM). The skeletons differed only by conformation; hereby the main skeleton 32 was chosen for further examination (Appendix III), whilst skeleton 30 although similar was discarded due to a modest scoring value.

HO 32 129

The second set of connections started linking combined target site 1, 4 with spheric site 3 (Figure 62, page 125) using a single atom as a spacer template. These 111 skeletons were connected with donor site 8 (Tyr155) without a spacer template. This connection gave 65 skeletons, which were connected with donor site 9 (Ser 222). The final result included one triarylic skeleton 33 with five conformations. The skeletons had poor scoring values (-6.70 to -7.62, which is 200 nM to 24 nM) and as a result they were discarded. The combined acceptor site 5, 6 (Arg258) seemed to be too close to the existing skeleton to be connected in this order.

33

The combined group 1, 4 was also connected with spheric site 3 using more than one spacer template. This linking gave 231 various skeletons with a single atom, aromatic 5- and 6- membered ring, pyridine and pyrimidine spacer templates. The next connection with target site 8, using single atom and aromatic 6-membered ring template, gave 71 skeletons. The scoring values for these skeletons were very inadequate (-5.59 to -7.09, which is 2.5 µM to 81 nM). The skeletons were indene and biaryl related structures such as 34 and 35. These skeletons were also discarded.

34 35

5.2.2.3 Structure generation for dihydrotestosterone complex Target sites for this generation are labelled in Figure 64 (page 130). Four target sites were selected, in addition to His221 (Nε2) 4 acceptor site. Two of these were spheric sites 1 and 2, another acceptor site was Asn152 (Nδ2) 3 and one donor site was Tyr155 5. 130

5

1 2

4 3

Figure 64. Target sites used in the modelling - two spheric sites 1 and 2, two acceptor sites 3 and 4 and one donor site 5 form the starting point for this structure generation.

The donor site Tyr155 5 was located at the catalytic end of the active site whereas the acceptor site His221 4 was located at the recognition end. The boundary was a triangular shaped with the acceptor or donor site in all three corners. His221 and Tyr155 target sites were close by the keto- and hydroxy-groups at position 3 and 17, respectively (Figure 65 a). The second acceptor site, Asn152 3 was at the third corner of the active site boundary, which completed the triangular shaped active site (Figure 65 b). Spheric site 1 was situated close to acceptor site 4 and spheric site 2 close to acceptor site 3 in the most hydrophobic areas (Figure 65 c).

(a) (b) (c) Figure 65. a) The original ligand located between His221 and donor Tyr155 target sites. b) The active site boundary was triangular shaped with target sites in every corner. c) The spheric sites were set into the two most hydrophobic areas of the boundary.

131

This set of target sites does not included combined groups but all five target sites were connected individually (Figure 64, page 130). Only basic starting templates were selected for these target sites: • Spheric site 1 had one aromatic 5- and 6-membered ring templates. • Spheric site 2 had one 5- and 6-membered ring and single atom templates. • Acceptor site 3 (Asn152) had single atom and double bond templates. • Acceptor site 4 (His221) had in addition to single atom and double bond templates, one amide template. • Donor site 5 (Tyr155) had single atom, double bond, two amide and two amidine templates.

Selected starting templates were as simple as possible and the options were default values for these tree pair connections (Table 15, page). The effect of the ‘seed vertex tolerance’ was highly interesting in this case. The value 0 allowed the structure to grow only in the best possible direction. In the case of structure generations mentioned above the value was 1.7, which allows the bond joining to grow in more than one direction. Spacer templates used for this structure generation are mentioned in associated with every connection.

Table 15. Selected values for options in case of structure generation for DHT 3 complex.

Maximum vertices 25 Minimum ring percentage 25 Max. 3-memebered ring 1 Max. 4-memebered ring 1 Max. spiro joins 1 Max. fused joins 2 Max. 5-memebered ring 2 Max. 6-memebered ring 3 Max. rotatable bonds 30 Seed vertex tolerance 0 Max. chain length 5 Max. new bond joins 20

The first tree pair connections were made between acceptor site 3 and spheric site 2 (Figure 64, page 130) with a single atom, double bond, amide and amidine spacer templates. The connections led to the result including 47 skeletons. This set was connected with donor site 5 using the same templates as above. The result of 2514 various skeletons were scored and 132 undesirable structures were pruned. The hundred skeletons with the best scoring values were connected with spheric site 1 (Figure 64, page 130) using a single atom, aromatic 5- and 6- membered ring. This gave 77 skeletons, which were linked to acceptor site 4 with a single atom, double bond and amide spacer templates. The final result was 81 skeletons. This set of results consisted of four different kinds of main skeletons 36, 37, 38 and 39 with variable ring sizes, functional groups and rather good predicted binding values (-8.18 to -9.70, which is 6.6 nM to 0.2 nM). Three skeletons 36, 37 and 39 were selected for the further analysis (Appendix IV).

36 37

38 39

5.2.2.4 Structure generation for dehydroepiandrosterone complex Target sites were labelled according to the HIPPO analysis as follows: four spheric sites 1, 2, 3 and 4, two acceptor sites Leu95 (N) 5 and His221 (Nε2) 6 and one donor site Tyr218 7 (Figure 66, page 133). In total seven target sites were selected for this structure generation. 133

6 1

2 5

7

4

Figure 66. Target sites presented within active site with labels. Four spheric sites 1 – 4 (number 3 behind the donor site 7); two acceptor sites 5 and 6 and a donor site 7 were selected for this skeleton generation.

Acceptor site His221 6 was located next to the hydroxy-group at position 3 and Tyr218 7 at position 6 of DHEA 4 molecule (Figure 67 a). This boundary was larger than any mentioned previously (Figure 67 b) especially close to Leu95 (Figure 66). This amino acid residue (Leu95 5) was selected as another acceptor site from this new region of the boundary, avoiding earlier used catalytic end of the binding pocket. Four spheric sites were selected from the most hydrophobic areas (Figure 67 c). This boundary seemed to be even more hydrophobic than previously analysed enzyme-ligand complexes.

(a) (b) (c)

Figure 67. a) Selected target sites in active site with DHEA 4 ligand. b) Target site locations within boundary. c) Four spheric sites were set into important regions of the rather large hydrophobic area. 134

Target sites were divided in five groups. Spheric site 1 and His221 6 acceptor site formed a combined group as well as spheric site 3 and Leu95 5 acceptor site (Figure 66, page 133). The rest of the sites were individual groups. Selected starting templates for these groups were: • Combined target site 1,6 included aromatic 5- and 6-membered ring and three phenol templates. • Spheric site 2 included aromatic 5- and 6-membered ring, naphthalene and three biphenyl templates. • Another combined group 3,5 included aromatic 5- and 6-membered ring and two phenol templates. • Spheric site 4 included aromatic 5- and 6-membered ring templates. • Donor site Tyr218 7 included a single atom, two amide and two amidine templates.

The selected options for these connections were defaults, except in case of joining the templates with spiro joins or 3- and 4-membered rings (table 16).

Table 16. Options for this skeleton generation are mainly default.

Maximum vertices 40 Minimum ring percentage 25 Max. 3-memebered ring 0 Max. 4-memebered ring 0 Max. spiro joins 0 Max. fused joins 5 Max. 5-memebered ring 4 Max. 6-memebered ring 4 Max. rotatable bonds 20 Seed vertex tolerance 1.7 Max. chain length 5 Max. new bond joins 20

The first connection was between combined group 1, 6 and spheric site 2 (Figure 66, page 133) with a single atom spacer template. The result was 103 skeletons that were connected to target site 7 but did not yield any substantial results. Moreover, an attempt to link these skeletons to spheric site 4 was unsuccessful. Therefore, group 1, 2, 6 was connected to combined group 3, 5. The distance between these groups was rather significant as a result of the spacer template selection included naphtyl, biphenyl and aromatic 5- and 6-membered 135 ring templates. The result was 54 skeletons, which were finally connected to donor site 7. This yielded a result of 8 skeletons. One skeleton was pruned since it included partial structure 24. The scoring value for the rest of the set was good (-8.11 to -9.12, which is 7.8 nM to 0.76 nM) and a range of skeletons (40, 41, 42, 43, 44 and 45) were selected for further examination (Appendix V).

40 41

42 43

44

45

5.2.2.5 New structure generation for dihydrotestosterone complex with the latest version of SPROUT Minor modifications were carried out for the algorithm of the SPIDER module and for the geometry of hydrogen bond interactions between ligand and receptor.279 Such modifications arose from those cases where SPIDER could not produce any molecules or missed some. The major problem was that the growth of structures stopped prematurely because of an inappropriate termination condition. Another dilemma rose in the case of small-size cavities; SPIDER did not start the structure building from some starting fragments, excluding the change of generating some preferable molecules. To prevent bad hydrogen bond interactions, additional constraints were introduced against hydrogen bonds to control the angles between the hydrogen bond and the lone pairs.

These modifications were made for SPROUT version 6. This version like the previous was tested with the 17βHSD/KSR1 – DHT complex crystal structure (see section 5.2.2.3, page 129). All the same target sites, templates and references were used for this new structure 136 generation. Target sites were also connected in the same order. The result included much more answers after pruning (1285) than the earlier structure generation run (100) for the same active site. The resulting hydrophobic skeletons appeared to be promising. The binding affinities were much better compared to the earlier generation being at best 0.4 fM (score -13.40). Even the lowest binding affinity was 0.24 pM (score -10.62). Unfortunately the best skeletons included some undesirable (challenging to synthesise) partial skeletons such as an equatorial pentane ring. Two of these structures were examined more exhaustively (see section 5.2.3.2, page 144).

5.2.3 Examination of potential inhibitor structures

The skeletons from SPROUT were saved as a pdb-file and the Moloc and MacroModel- programs were applied to modify the chosen ‘basic skeletons’ 25 to 45 by using different functional groups and bioisosteres. These substituted skeletons now had molecular properties and were numbered. Molecules selected for optimisations analyses are shown in Appendices II to V as follows: • Appendix II includes molecule analogues of skeletons 25 to 28, • Appendix III analogues of skeletons 29, 30 and 32, • Appendix IV analogues of skeletons 36, 37 and 39 • Appendix V analogues of skeletons 40 to 45.

Functional groups were added according to the features of the surrounding boundary. Amino acid residues affecting the selected functional groups (acceptor/donor groups) of the skeletons are shown in Figure 68 (page 137). 137

Ser222 A. Tyr218 Asn152 B. Met193 Tyr218 Ser222 His221 Cys156 His221 Leu95 Tyr155

Glu282 Tyr155 Glu282 Val143 O O Ser142 Arg258 Ser142 Gly186 Cys185 Val188

C. Ser222 Asn152 D. Tyr218 Tyr218 Asn152 Met279 Leu95

His221 His221

HN NH2 Tyr155 Glu282 Gly186 Tyr155 Ser142 Gly186

Figure 68. Examples of the surrounding amino acids of the skeletons are shown for various skeletons. The first skeleton (A.) was created in the E2 1 active site, the second (B.) was created for EQU 2 active site, the third (C.) for DHT 3 active site and the forth (D.) for DHEA 4 active site.

5.2.3.1 Modifications and optimisations studies of the selected structures The Moloc-program’s MAB-force field75,76 was applied for the energy minimisation process. The molecules were minimised within an active site so that the receptor was defined as the fixed shell, thus only allowing the molecule to move during the optimisation run (Minimised Ligand Energy, MLE). Minimisation was also carried out in the absence of the receptor (Ligand Energy Alone, LEA) in order to obtain the lowest energy conformation of the molecule (Table 17, page 138). The main objective was to remove all internal strains and reduce possible van der Waals clashes defined by SPROUT Vertex Score table. Another aim was to find a local energy minimum for the molecule within the active site. Because the active site is very hydrophobic the molecule optimisation is not as important as in the case of more lipophilic active sites. It is also widely accepted that the bioactive conformation is not necessarily the lowest energy conformation. 138

The first set of molecules (46 – 56, Appendix II), which are analogues of skeletons 25 and 26, established significant contacts with the boundaries according to the Moloc analysis. However, molecules 57 to 67 (derived from skeletons 27 and 28) did not span the entire active site. In this case only substituents of three molecules could reach more than two different amino acid interactions (Table 17). This meant that the ‘total bond energy’ values of the Moloc for these molecules were also insignificant. Instead, molecules 46 to 56 filled the active site more appropriately and made better contacts with the active site according to Moloc evaluation following optimisation (Table 17).

Table 17. Interactions between ligand molecule and amino acid residues within active site (molecules 46 to 67). In the case of more than one hydrogen bond interaction with the ligand the number of interactions is indicated in brackets.

Total Minimised Ligand Other bond ligand energy Molecule Ser142 Asn152 Tyr218 His221 Ser222 Glu282 contacts energy*) energy**) alone***) (kcal/mol) (kcal/mol) (kcal/mol) 46 x 1.48 86.04 64.78 47 x x 5.12 55.51 38.83 48 x x x x x x 10.69 66.76 41.61 49 x x x x (two) Cys185 13.88 68.03 48.96 50 x (two) 2.57 89.35 77.47 51 x 1.63 68.64 63.11 52 x Tyr155 4.89 81.04 62.97 53 x Tyr155 1.60 77.57 52.96 Tyr155, 54 x x x 14.09 89.07 68.03 Cys185 55 x x x Tyr155 7.65 87.99 54.24 Tyr155, 56 x x 12.99 111.03 66.06 Cys185 57 x x x 9.35 52.66 44.73 58 x x Met193 7.29 47.99 39.92 59 x x x Met193 12.88 54.11 45.07 60 x x 9.21 40.91 29.65 61 x x 3.71 46.54 39.53 62 x 5.66 49.17 44.11 63 x 0.23 38.88 33.04 64 x 3.59 41.76 40.13 65 x 1.08 54.62 51.50 66 x 0.97 53.94 52.25 67 x x 7.21 31.98 31.05

*) Total Bond energy values are relative hydrogen bond strengths **) Ligand energy after minimisation within active site ***) Ligand energy alone without active site restriction 139

After the energy minimisation all molecules were reloaded into the HIPPO module and analysed as a ligand (HIPPO score). The ‘Vertex score table’ of the ALLIGATOR module was also examined for possible van der Waals clashes and molecules with poor interactions were rejected. The evaluation of the results was based on the predicted binding affinity as well as the complexity and estimated synthetic accessibility of the structures. It should be noted that the SPROUT hydrogen bond interaction information of the molecules slightly differs from the Moloc information (Table 18).

Table 18. The SPROUT hydrogen bond information of the molecules after energy minimisation and predicted binding affinity (score) value information from the HIPPO module.

Other Score Molecule Asn152 Tyr218 His221 Glu282 contacts (HIPPO) 46 Cys185 -9.29 47 x x x -11.21 48 x x x x Cys185 -13.51 49 x Cys185 -12.03 50 x x -10.12 51 x x -9.15 52 x x -9.21 53 -8.94 54 x x Tyr155 -10.24 55 x x Tyr155 -11.45 56 x (two) x x Cys185 -12.16 57 x x -7.94 58 x x -8.14 59 x -7.63 60 x -8.45 61 x x x -8.12 62 x -7.14 63 -7.41 64 x -6.86 65 x -7.41 66 x x -7.35 67 x x -7.21

140

Despite the easy synthetical accessibility and low energy difference between MLE and LEA, molecules 57 to 67 were discarded due to lack of hydrogen bond interactions and poor scoring values. The predicted binding values of these molecules were almost identical to the original E2 1 binding value (-7.61). Molecules 48 and 56 were selected for more detailed examination based on good scoring value and interaction information. Due to the considerable energy difference between MLE and LEA some insignificant substituents were changed or simplified and another optimisation run was carried out within the active site. In silico docking simulations were also carried out (see chapter 5.2.4, page 152).

Skeletons 29 and 32 were relatively similar in structure. Molecules 68 to 77 (Appendix III) were analogues of skeleton 29 and molecules 78 to 81 analogues of skeleton 32. Moloc hydrogen bond and minimisation information for these molecules are collected in Table 19. The energy difference between MLE and LEA was insignificant therefore the conformations of these molecules were good being rather close their local minimum. However the hydrogen bond interaction information was deficient.

Table 19. The Moloc hydrogen bond interaction and energy information of molecules 68 to 81.

Total Minimised Ligand Other Arg258 bond ligand energy Molecule amino Met193 Ser222 N1 Glu282 energy*) energy**) alone***) acid N2 (kcal/mol) (kcal/mol) (kcal/mol) 68 x x 18.64 59.73 57.76 69 x x 4.16 63.31 57.78 70 x x 5.30 57.32 48.38 71 x x x 8.79 52.90 48.92 72 Ser142 x x 12.90 57.29 57.30 73 x x 15.13 57.53 57.48 74 x 11.08 52.96 51.99 75 x 10.76 66.13 65.66 76 x x 15.28 58.79 56.76 77 x x 15.00 54.17 52.26 78 x x 11.67 44.44 40.49 Val188 79 x x 13.34 43.58 40.06 (N) 80 Tyr155 x x 21.44 41.94 35.59 Glu144 81 x 21.18 56.25 41.46 (N)

*) Total Bond energy values are relative hydrogen bond strengths **) Ligand energy after minimisation within active site ***) Ligand energy alone without active site restriction 141

Hydrogen bond interaction information of the HIPPO analysis (Table 20) was also relatively poor and scoring values were low compared to the original ligand EQU 2 value (-8.27). Nine of these fourteen structures have lower scoring value than EQU 2. The information also indicated that Arginine 258 was selected as a target site when the skeleton was generated. This hydrogen bond interaction seems to be rather strong. Moreover, the scoring value improved when the molecule was capable of forming other interactions, especially with glutamate 282.

Table 20. Hydrogen bond interaction and scoring value information identified by HIPPO.

Other Glu282 Arg258 Score Molecule His221 residues O1 O2 N1 N2 (HIPPO) 68 x x -8.86 69 -6.66 70 x -7.08 71 x -7.39 72 x -7.51 73 x -7.57 74 x -7.46 75 x -7.56 76 x x -8.20 77 x x -8.15 78 Gly186 (O) x x -9.62 79 x x -9.24 80 x x x x -9.40 81 Gly144 (N) x x -10.38

Due to reliable hydrogen bonding information by both Moloc and SPROUT and its satisfactory scoring value, molecule 80 was selected for more detailed examination (see section 5.2.3.2, page 144).

Molecules 82 to 85 were derived from skeleton 36, whereas molecules 86 to 90 were derived from skeleton 37 (Appendix IV). Only two analogues were examined based on skeleton 39. Amino acid residues surrounding skeleton 36 were similar to residues interacting with skeleton 39 (see Figure 68 C., page 137), especially Ser142, Asn152, His221 and Tyr218. The energy values for these molecules were relatively good but the number of hydrogen bond interactions and total bond energy were not the most feasible (Table 21, page 142). 142

Table 21. Moloc hydrogen bond interaction and energy information of the molecules.

Total Minimised Ligand bond ligand energy Molecule Ser142 Asn152 Tyr155 Tyr218 His221 energy*) energy**) alone***) (kcal/mol) (kcal/mol) (kcal/mol) 82 x 5.89 53.30 49.64 83 x x 8.83 40.01 35.91 84 x x 7.98 44.21 41.69 85 x x x 9.78 44.10 42.80 86 x x x 10.84 68.53 61.97 87 x x 8.28 70.94 68.22 88 x x 7.88 70.52 66.85 89 x x x 10.69 69.87 69.98 90 x x 8.31 71.12 67.71 91 x x x 10.54 60.83 59.36 92 x x 8.44 67.88 63.74

*) Total Bond energy values are relative hydrogen bond strengths **) Ligand energy after minimisation within active site ***) Ligand energy alone without active site restriction

The HIPPO scoring values were modest compared to the original ligand scoring values. In addition, the number of the hydrogen bond interactions (one to three) was low (Table 22). Asn152 was used as a target site in the structure generation and both analyses claim that asparagine 152 was the most important amino acid residue to bind these molecules, increasing the total bond energy value as well.

Table 22. The SPROUT hydrogen bond information and scoring values of the molecules.

Score Molecule Gly186 Asn152 Tyr155 Tyr218 His221 (HIPPO) 82 x x -8.37 83 x x -8.60 84 x x -8.22 85 x x -7.28 86 x -6.80 87 x x -7.69 88 x -6.94 89 x x x -8.29 90 x x x -7.12 91 x -7.14 92 x x -8.92

143

Molecule 83 was selected for further analysis because of relatively good HIPPO scoring and relative low MLE and LEA values.

The rest of the molecule analogues (Appendix V) were derived from a diverse group of skeletons. Molecules 93 to 96 were analogues of skeleton 40, whilst molecules 97 and 98 were derived from skeleton 41 and analogues 99 to 102 were created by substituting skeleton 42. Molecules 103 to 105 were analogues of skeleton 43, molecules 106 and 107 were analogues of skeleton 45 and molecules 108 and 109 of skeleton 46 (Appendix V). The energy difference between MLE and LEA was reasonable in many cases due to the rigid characteristic of the benzene and the naphthalene rings. This is also the reason why hydrogen bond interactions were so poor. Substituents could not reach all donor and acceptor sites at a right angle because of the rigidity of the aromatic ring system (Table 23).

Table 23. The Moloc hydrogen bond interaction and energy information of the molecules.

Total Minimised Ligand Other Glu282 bond ligand energy Molecule amino Tyr218 His221 Met279 O1 O2 energy*) energy**) alone***) acids (kcal/mol) (kcal/mol) (kcal/mol) 93 x 0.49 22.92 16.53 94 x x 1.20 22.78 16.27 95 x 4.96 25.72 15.64 96 x 1.07 24.82 15.36 97 x x 3.38 42.00 38.72 98 x 3.66 39.75 36.53 99 - 10.33 10.74 100 x 4.71 11.04 11.21 101 Asn152 x 8.66 12.10 11.71 102 x 4.76 12.42 11.75 103 x 6.34 30.39 18.06 104 - 27.27 18.14 105 x 6.19 28.10 17.88 106 Leu95 x 3.78 50.94 42.60 107 x 2.13 49.87 42.75 108 x 5.04 58.27 44.79 109 x 5.12 58.55 45.13

*) Total Bond energy values are relative hydrogen bond strengths **) Ligand energy after minimisation within active site ***) Ligand energy alone without active site restriction 144

Consequently, only three molecules had better scoring values than the original ligand (-8.83) and only two hydrogen bond interactions were analysed for every molecule because of the rigid nature of the molecules (Table 24). Instead the hydrophobic interactions were rather strong due to the aromatic nature of the molecules.

Table 24. Hydrogen bond interactions and the HIPPO scoring value of the molecules.

Glu282 Score Molecule Leu95 Tyr218 His221 O1 O2 (HIPPO) 93 x -8.71 94 x x -9.29 95 x x -8.77 96 x x -9.07 97 x -8.76 98 -7.80 99 -8.13 100 x -8.41 101 x -8.61 102 -7.97 103 x x -9.24 104 x x -8.49 105 x x -8.78 106 x -8.51 107 x -8.69 108 x x -8.37 109 x x -8.24

Despite its low Total Bond Energy molecule 94 was selected for further analysis because of its good scoring value and two sound hydrogen bond interactions.

5.2.3.2 Energy optimisation and further analysis of selected molecules Molecules 48, 56, 80, 83 and 94 were modified and analysed further to observe optimal hydrogen bond interactions between the ligand and the receptor. Several new analogues were made out of these molecules and energy optimisation runs were repeated. Examples of the molecule analogues (48a, 56a, 80a, 83a and 94a) examined to get the final molecules are shown in Figure 69 (page 145), as well as the final analogues 110, 111, 112, 113 and 114.

145

O OH O HO O HO NH HO 2 O NH2 OH OH H N OH N H N 2 H N 2 2 O O N N H H NH2 N NH 48 48a 110 2

O O O HO HO HO H2N H N NH2 H2N 2 NH NH2 2

N N H N 2

NH NH NH 2 2 2 OH OH OH 56 56a 111 O O

H2N O H N H N 2 2 CF 3

OH OH O O OO O O NH NH O H N H N H2N 2 H N 2 H2N 2 80 80a 112 OH OH

HO O NH2 NH2

HO HO O

CF HN NH HN NH 3 HN NHOH 2 2 2 83 83a 113

NH2 NH2 HN NH2 O NH NH2 2 H O

NH NH NH CF NH 3 HN NH 2 NH 2 2

94 94a 114

Figure 69. Molecules selected for the further optimisation and in silico docking simulations (48a, 56a, 80a, 83a, 94a and their analogues 110, 111, 112, 113 and 114).

Many different analogues were modified from the selected molecules 48, 56, 80, 83 and 94. Molecules 48a, 56a, 80a, 83a and 94a were selected from these groups of analogues for closer examination. Energy optimisation runs were carried out for every molecule in the receptor active site to find out the best interactions. Amino acid residues of the receptor active site were fixed during the run. In addition a number of optimisation runs were carried out 146 using fixed shell for the active site except a few important amino acid residues such as Ser142, Asn152, Tyr155, Tyr218, His 221, Ser222, Arg258 or Glu282.

Most of these new analogues (48a, 56a, 80a, 83a and 94a) gave better energy optimisation and HIPPO analysis results after the modifications. Obviously several ligand energies were higher than after previous runs due to additional substitutions or functional groups. Moloc optimisations run information for molecules 48, 56, 80, 83, 94, 48a, 56a, 80a, 83a and 94a is collected in the Table 25 a. and b. Information on molecules 48, 56, 80, 83, 94 is collected from Tables 17, 19, 21 and 23 and is presented here in one table.

Table 25. a) Moloc energy optimisation (E kcal/mol) results for molecules 48, 56, 80, 83 and 94 (collected from Tables presented previously). b) Energy optimisation information for analogues 48a, 56a, 80a, 83a and 94a.

Ligand Ligand Relative a. energy Hydrogen bond Molecule energy ∆E H-bond within interactions alone energy receptor Ser142, Asn152, Tyr218, 48 66.76 41.61 25.15 10.69 His221, Ser222, Glu282 Tyr155, Cys185, Tyr218, 56 111.03 66.06 44.97 12.99 Ser222 80 41.94 35.59 6.35 Tyr155, Arg258, Glu282 21.44 83 40.01 35.91 4.1 Ser142, Asn152 8.83 94 22.78 16.27 6.51 Tyr155, Glu282 1.20

Ligand Ligand Relative b. energy Hydrogen bond Molecule energy ∆E H-bond within interactions alone energy receptor Leu95, Tyr218 (2)*, 48a 67.20 47.20 20.0 10.59 His221, Ser222 Cys185, Tyr155, Tyr218, 56a 120.63 63.26 57.37 14.40 His221 Tyr155, Tyr218, Arg258, 80a 44.08 24.87 19.21 54.14 Met279, Glu282 (2)* Ser142, Asn152, Gly186, 83a 24.13 22.39 1.74 10.97 His221 Asn152, Tyr155, Gly186, 94a 50.37 32.89 17.48 16.10 Tyr218 (2)*, Glu282

* In the cases where there is more than one interaction between ligand and amino acid residues the number of the connections is indicated in brackets.

HIPPO analysis was also carried out for analogues 48a, 56a, 80a, 83a and 94a. Scoring values were generally higher for the new analogues and new hydrogen bond interactions were 147 also observed (Table 26 a and b). Hydrogen bond and HIPPO scoring information for molecules 48, 56, 80, 83, 94, 48a, 56a, 80a, 83a and 94a are shown in Table 26 a. and b. Information on molecules 48, 56, 80, 83, 94 is collected from Tables 18, 20, 22 and 24.

Table 26. a) HIPPO ligand analysis results for the molecules 48, 56, 80, 83 and 94 (collected from tables presented previously). b) Hydrogen bond interactions and HIPPO score for molecules 48a, 56a, 80a, 83a and 94a.

a. Molecule HIPPO Score Hydrogen bond interactions 48 -13.51 Asn152, Cys185, Tyr218, His221, Glu282 56 -12.16 Asn152 (2)*, Cys185, Tyr218, His221 80 -9.40 His221, Arg258, Glu282 (2)* 83 -8.60 Asn152, His221 94 -9.29 Tyr155, Glu282

b. Molecule HIPPO Score Hydrogen bond interactions 48a -13.87 Tyr155, Tyr218, His221, Glu282 56a -11.83 Asn152, Cys185 Tyr218, His221 80a -9.48 Gly186,His221, Arg258 83a -10.44 Ser142, Asn152, Gly186, His221 94a -9.93 Asn152, Gly186, Tyr218, Glu282

* In the cases where there is more than one interaction between ligand and amino acid residues the number of the connections is indicated in brackets.

Despite the fact that both carboxyl groups of 48 formed hydrogen bond interactions with Asn152 or Tyr218 another was removed from molecule 48a and 110, mainly because the group was too bulky and caused van der Waals overlaps with the identified amino acids. According to the Moloc and HIPPO analysis all remaining substituents form hydrogen bond interactions with the receptor except the amide group, which was added to facilitate retrosynthesis and synthesis of the molecule. Regardless of the changes to improve molecule binding (pyrrole rings in 48a) it is not hydrophobic enough for this active site. The molecule also includes too many stereo-centres, which complicate the synthesis plan of the molecule. As a result this molecule was discarded. 148

O OH O HO O HO NH HO 2 O NH2 OH OH H2N OH N H N H N 2 2 O O N N H H NH2 N NH 48 48a 110 2

The substituents of molecule 56 naphtalene ring were bound so tightly that the aromatic ring system got twisted during the energy optimisation. To release some of this abnormal energy another end of the naphtalene ring was opened (48a and 111). Nevertheless, all substituents were retained. Regardless of these modifications as well as a pyrrole ring modification in 48a, during the optimisation the aromatic ring remained twisted and the molecule lost some of the hydrogen bond interactions. As a result the molecule was discarded.

O O O HO HO HO H2N H N NH2 H2N 2 NH NH2 2

N N H N 2

NH NH NH 2 2 2 OH OH OH 56 56a 111

Substituents of molecule 80 were modified by adding hydroxy group and changing amine to the amide group to form 80a. The furan ring was also modified to become a double bond because the oxygen would not interact with the receptor. Substituents of molecule 80a were modified by changing amidine groups to the amide groups to form 112.The -CF3 group was also added to improve binding to the active site. Molecule 112 was studied with 1EQU crystal structure with and without cofactor 6. The binding was better with cofactor in the catalytic end of the active site. Interactions for the molecule are good; however it might be that 112 is not hydrophobic enough.

O O

H2N O H N H N 2 2 CF 3

OH OH O O OO O O NH NH O H N H2N 2 H2N H N H2N 80 80a 2 112

The furan ring of molecule 83 was exchanged for the phenol ring and two substituents in 83a. Hydroxyl and amine groups were added into the naphtalene ring instead of actyl group to 149 ensure two more hydrogen bond interactions between receptor and the molecule 83a. The place and type of bond which connect the naphthalene and phenol rings to each other was changed for molecule 113. In addition the amidine group was desirable to change to amide to facilitate the retrosynthesis.

OH OH

HO O NH2 NH2

HO HO O

CF HN NH HN NH 3 HN NHOH 2 2 2 83 83a 113

Molecule 94 was substituted with two amine, two amidine groups instead of hydroxy and amine groups in 94a. Because of possible difficulties with synthesis the amidine groups were changed to an amidine group and a –CF3 group to form 114.

NH2 NH2 HN NH2 O NH NH2 2 H O

NH NH NH CF NH 3 HN NH 2 NH 2 2

94 94a 114

The energy difference (∆E) between the local minimum of the ligand within the active site and not including the active site is rather high for the final molecules 112 and 114 (Table 27, page 150). According to the Moloc energy optimisation result molecule 113 is rather rigid because the LEA value is nearly the same as the MLE (∆E = 5.1). Relative bond energy values are rather good for 112 and 113. Even though the relative hydrogen bond interaction energy was relatively low for 114 the ligand form four interactions with the receptor. HIPPO ligand analysis (Table 28, page 150) yielded good scoring values for all three molecules especially for 112 (predicted binding affinity -15.01, 98 fM). All predicted binding values were between -10.22 and -15.01 (0.6 pM – 98 fM) and considerably better than the original ligand values analysed (EQU 2 -8.27, DHT 3 -8.62, DHEA 4 -8.83, respectively). According to HIPPO analysis and Moloc evaluation the same amino acid residues form the hydrogen bond interactions between the protein and the ligand molecules in the case of 113 and 114. The hydrogen bond information is slightly different for 112 according to these two analyses.

150

Table 27. The Moloc energy optimisation (E kcal/mol) results for the final molecules 112, 113 and 114.

Ligand Ligand Relative energy Hydrogen bond Molecule energy ∆E H-bond within interactions alone energy recepto r 112 54.28 29.17 24.50 Tyr155, Arg258, His221 21.71 Ser142, Asn152, Cys185, 113 21.58 16.48 5.1 15.68 Gly186, His221 Gly186, Tyr218 (2),* 114 36.33 24.12 12.21 9.19 Glu282

* In the cases where there is more than one interaction between ligand and amino acid residues the number of the connections is indicated in brackets.

Table 28. The HIPPO ligand analysis results for the final molecules 112, 113 and 114.

Molecule HIPPO Score Hydrogen bond interactions 112 -15.01 Ser142, Gly186, Tyr218, His221 113 -10.72 Ser142, Asn152, Cys185, Gly186, His221 114 -10.22 Asn152, Gly186, Tyr218 (2)*, Glu282

* In the cases where there is more than one interaction between ligand and amino acid residues the number of the connections is indicated in brackets.

Two skeletons were selected among the results of the new SPROUT structure generation simulation for DHT complex (section 5.2.2.5, page 135). Molecules 115 and 116 were modified from these skeletons replacing carbon atoms, which should be hydrogen bond acceptors with nitrogen atoms.

N N HO N N N

NH NH 115 2 116 2 O O

These molecules fit really well into the boundary surface. There are no van der Waals clashes detected by ALLIGATOR Vertex Score Table between protein and molecules. Because of this it is not necessary to perform energy optimisation simulation for these two molecules within the active site. Nevertheless energy optimisation runs were carried out and the results 151 are shown in Table 29. Both molecules were highly hydrophobic thus energy values for the molecules after the Moloc evaluation were rather high: 182.13 kcal/mol for 115 and 93.96 kcal/mol for 116. The HIPPO scoring values for 115 (Figure 70) were -11.65 and for 116 -11.92 and both molecules formed four hydrogen bond interactions (Table 30).

Table 29. The Moloc energy optimisation and hydrogen bond interactions for 115 and 116.

Ligand Relative energy Hydrogen bond Molecule H-bond within interactions energy receptor 115 182.13 Tyr155, Tyr218, His221 14.38 Asn152, Tyr155, Tyr218, 116 93.96 16.96 His221, Met279

Table 30. The HIPPO ligand analysis for 115 and 116.

Molecule HIPPO Score Hydrogen bond interactions 115 -10.54 Asn152, Tyr155, Tyr218, His221 116 -10.60 Asn152, Tyr155, Tyr218, His221

In silico docking simulations for molecules 112, 113, 114, 115 and 116 were carried out and the results are presented in next chapter.

(a) (b) (c) Figure 70. a) The final molecule 115 in the enzyme active site surrounded by the electrostatic potential surface (WebLab ViewerPro 3.7 image). b) Molecule 115 alignment with DHT 3 (green) molecule structure. c) Alignment of the molecules 113 (lilac), 115 (grey) and DHT 3 (green). 152

5.2.4 Docking studies

Methods which dock a ligand into the protein binding site are useful for predicting the conformations that a ligand can take. Docking simulation algorithms include many approximations therefore the best result conformation is not always the only and right solution. Instead the method is a good way to check if the molecule is structurally possible and the right size to fit into the active site. A good 17βHSD/KSR1 inhibitor molecule should be deficient in activity for ER’s. It was important to examine the molecule dockings of the 112-116 into the ERα and ERβ substrate binding sites. In silico docking simulations were also applied to study how well possible inhibitor molecules fit into the 17βHSD/KSR1 active site. The AutoDock and eHiTS docking simulation programs were used for this study. The SPA-Docking program information was also used for the purpose of support. The AutoDock results were more reliable than the eHiTS results since it has been widely used and there are many examples of its successful application in the literature. Hundreds of publications have cited the AutoDock methods papers.280 The eHiTS-program seems to have problems especially when the ligand is flexible.

The docking algorithm of the AutoDock program that was used was GA-LS, which is a hybrid of the genetic algorithm with local search. This algorithm is also known as a Lamarckian genetic algorithm or LGA. A systematic algorithm is used in eHiTS with no random, stochastic or evolutionary element. The eHiTS system generates all major docking modes that are compatible with the steric and chemistry constraints of the target cavity for each candidate structure. In the SPA-docking program the algorithm is a combination of simulated annealing, evolutionary and local search methods.

5.2.4.1 Docking simulations into the 17βHSD/KSR1 active site AutoDock simulations were carried out in a 40*40*40 Å3 grid box (Figure 71, page 153). During the docking simulation the receptor was rigid and it was ensured that important amino acid residues were within the grid box. The results of the best fitted conformation of molecules 112, 113, 114, 115 and 116 into the 17βHSD/KSR1 are shown in the Table 31 (page 153). The AutoDock value in the first row of the table is the docking energy. It is impossible to compare the docking energy results of the different programs therefore the HIPPO score is also examined for these docking results. 153

Figure 71. The used grid box volume for AutoDock docking simulations was 40Å*40Å*40Å (green-blue-red squares). The grid box surrounding active site of the 17βHSD/KSR1.

Table 31. The results of the AutoDock program for molecules 112, 113, 114, 115 and 116 docked into the 17βHSD/KSR1 with HIPPO score values for the molecule in question.

112 113 114 115 116 Autodock -11.5 -13.1 -12.4 -12.8 -12.8 (Docking Energy) HIPPO -16.4 -11.9 -8.9 -11.4 -10.1 (Scoring value)

The HIPPO scoring value for the molecule 112 is high after docking simulation. Alignment of the SPROUT generated molecule (brown) and AutoDock result conformation (blue) are shown in Figure 72 (page 154). Molecules 113 and 115 also have rather good scoring values. On the other hand the scoring value for molecule 116 is slightly poorer than after minimisation and the scoring value for 114 is rather moderate compare to the original scoring value. According to ALLIGATOR ‘Vertex score table’ there is no any van der Waals clashes observed for either of these docked molecules.

154

Figure 72. Alignment of SPROUT generated molecule 112 (brown, scoring value -15.0) and the best AutoDock result for molecules 112 (blue, scoring value -16.4).

For most of these molecules the eHiTS program resulted in zero solutions. It is difficult to draw any conclusions as to why this program was not capable of fitting the molecules into the active site. However, it is probable that the molecule was discarded from the active site because of the steric clashes between molecule and receptor. The earlier docking simulation gave reasonable docking result for molecules 80a, 83a and 94a, using the SPA-docking program. The HIPPO score values for the best results were also rather good. Unlike the eHiTS results these results supported the AutoDock results in the sense that it was possible to dock these molecules into the active site according to the SPA-dock program. It was possible to observe consistency between SPA-dock docking energy values and HIPPO score values. However there was not consistency between docking energy values and HIPPO score values (Table 32) nor yet between AutoDock and SPA-dock results.

Table 32. Docking simulations made for 80a, 83a and 94a using the AutoDock and SPA- dock programs.

80a 83a 94a Autodock (DE) -12,0 -13,5 -14,1 HIPPO score -13,6 -10,2 -13,1

SPA-dock (DE) -62,9 -47,7 -42,4 HIPPO score -11,9 -10,1 -9,3

155

5.2.4.2 Docking simulations into the estrogen receptor α and β active sites The AutoDock program was also applied to study how the molecule fit in silico into the estrogen receptor substrate binding site. A good inhibitor molecule is selective for 17βHSD/KSR1 and binding into the ER substrate binding site is undesirable. All five molecules were docked into the ERα and ERβ using same docking algorithm. Docking energies and HIPPO scoring values of the molecules 112, 113, 114, 115 and 116 are shown in the Table 33.

Table 33. Docking results of the five molecules docked into the ERα and ERβ.

ERα ERα ERα ERα ERα ERβ ERβ ERβ ERβ ERβ

112 113 114 115 116 112 113 114 115 116 Auto dock -8.3 -10.7 -6.8 -6.1 1.4 -8.8 -11.0 -9.2 -12.4 -12.1 (Docking Energy) HIPPO -9.2 -9.5 -10.1 -9.4 -9.4 -8.6 -9.9 -10.0 -10.5 -10.4 (Scoring value)

Originally de novo generated conformations 112, 113 and 115 had better scoring values than docked molecules conformations in the ERα and ERβ substrate binding sites. The original scoring value for molecule 116 was slightly better than docked molecule conformation in the ERα substrate binding site. Instead conformation in the ERβ had same scoring value as original molecule 116. Molecule 114 seemed to have a rather good original scoring value (Table 28, page 150), although after docking simulation the scoring value was rather poor (Table 31). Both ERα and ERβ bound molecule 114 well and the HIPPO scoring values after docking simulation were good (Table 33). Due to this possible activity for estrogen receptors, 114 is not a good inhibitor for the 17βHSD/KSR1. Molecule 116 also indicated equal activity for estrogen receptors as 17βHSD/KSR1, thus this molecule is not a good inhibitor molecule for the 17βHSD/KSR1 enzyme.

Molecules 112, 113 and 115 had good predicted binding activity values and were worthy of synthesising for a more detailed biological evaluation. Noteworthy is that the scoring value after the docking of molecules 112, 113 and 115 into the ERα and ERβ were poorer than the original conformations (Table 28, page 150), however the values are better than the original natural ligand scores in every case (EQU 2 -8.27, DHT 3 -8.62, DHEA 4 -8.83, respectively). 156

5.2.5 Retrosynthesis and synthesis plan

The retrosynthesis route and synthesis plan is presented here for molecule 115. The Scheme 15 shows retrosynthesis for the 115. This retrosynthesis was started with Functional Group Interchange (FGI) where the amide group was changed into the nitrile group. After this two C-C disconnections were carried out to detach aromatic rings moieties from each other.

HO N N H H

115115 O NH2

FGI

HO N N H H

CN

Two C-C disconnections

(OH)2B OH

HO OH (OH) B N 2 H N CN H Scheme 15. Retrosynthesis for 115.

The synthesis plan (Scheme 16) was made using the most general synthesis reactions available. The starting materials were commercially available 2,7-dihydroxy-9H-carbazole, 3- brom-5-hydroxy bentzonitrile and 1-pyrrol-2yl boronic acid as shown in Scheme 15. The main reaction was Suzuki coupling (applied twice), which can connect two aromatic ring moieties.

157

MeI (CF3SO2)2O K2CO3 K3PO4 OTf HO OH PhMe OTf acetone N N H H

Me O OTf N H

Br OH Br OBn (HO)2B OBn BnCl, CaCO3 BuLi

acetone B(OH)3

CN CN CN

(HO)2B OBn Pd(PPh 3) + MeO OTf N H CN

OBn OH Me O Pd MeO N H N H 2 H

CN CN

(CF3SO2)2O OH K PO OTf Me O 3 4 MeO N PhMe N H H

CN CN

Pd(PPh ) Me O OTf 3 MeO N H H N N B( OH )2 N H H CN CN

H2SO4 BBr3 HO N H O H O N H 2 2 H

ONH 115 2

Scheme 16. Synthetic plan for molecule 115. 158

5.2.6 Retrosynthesis by CAESA

Despite the fact that the CAESA (v2.4) is still under the development the program was used to predict retrosynthesis routes for molecule 115. Earlier mentioned LHASA program can carry also out a similar kind of synthetic plan. The search procedure for the synthesis route is presented here first and then the CAESA results were compared to the synthetic plan made by a chemist. The programme is available via the Internet (http://milin6.leeds.ac.uk/caesa/). When the website is located the user is able to search the available data files and select the desired molecule file (Figure 73).

Figure 73. The homepage of CAESA retrosynthesis program.

After the search it is then possible to browse the retrosynthetic results and synthesis plans for the targeted molecule. The first window displays the structure of the targeted molecule and provides links to both the ‘synthetic schemes’ page and the ‘reasoning estimation’ page (Figure 74). 159

Figure 74. For molecule 115 the reasoning estimate is 49.1.

The reasoning page includes estimated numerical data regarding synthetic accessibility, complexity and starting materials (Figure 75).

Figure 75. Only 49% of molecule 115 is can be to synthesised using commercially available starting materials. 160

In the ‘synthetic schemes’ page potential starting materials are listed according to their physical characteristics, wastage and synthetic proximity. Possible synthetic routes are established between the starting material compounds and the target structure. The program lists different ‘starting material groups’ in order. Scoring values (as a percentage) presented a possibility to synthesise the whole molecule using the starting material in question. The source of the starting material and the reference are also presented (Figure 76). The ‘Intermediates’ link shows the detailed synthesis plan for the certain part (red) of the target molecule (Figure 77, page 155) and the ‘Alternatives’ lists other similar starting materials (Figure 78, page 156).

Figure 76. Starting materials for molecule 115 presented in order. 161

Figure 77. Synthetic steps are shown for 115 with the name of the reaction in question. 162

Figure 78. Alternative commercially available molecules are shown with the reference numbers available through the databases.

The CAESA program identified two different synthesisable parts of molecule 115 using commercially available starting materials (Starting Material Group 1, Figure 76, page 154). The retrosynthesis of the phenolic part of molecule 115 is shown in Figure 77, page 155. The aromatic six membered-ring of the carbazole is another part of the molecule, which is possible to synthesise using known starting materials and methods (Figure 79). 163

Figure 79. Synthetic steps for the carbazole ring system. Explanations of the reactions and literature references for the synthetic steps, which lead to molecule 115 are also presented. 164

Alternative commercially available starting materials for the starting material used synthetic plan above (Figure 79) are shown in Figure 80.

Figure 80. Alternative commercially available starting materials for other aromatic ring of the carbazole.

The synthesis plan made by a chemist is much more practical than the CAESA result. The program still has a quite limited knowledge base of the chemical synthesis. It can provide a synthesis plan only for some parts of the molecule while most of the molecule is a rather big unknown molecule. Instead the chemist can make a rather reliable synthetic plan using chemical knowledge and databases of the commercially available materials. Some of these databases, such as Scifinder Scholar281 and Beilstein CrossFire282 can provide the information regarding the publications connected to certain starting materials. 165

6. CONCLUSIONS AND FUTURE PERSPECTIVES

Computational and knowledge-based molecular modelling is an important part of drug discovery and development nowadays. By means of molecular modelling programs the time used for drug discovery and development has been reduced. Structure based de novo design programs have provided an easier approach for rational drug design by providing the possibility of generating novel molecules directly into the target protein active site. The SPROUT de novo ligand design program was applied during this study. It is easy to use, visualisation of the program is of a high standard and it gives rather reliable results according to the modelling studies and laboratory experiments carried out by students of the School of Chemistry (University of Leeds, UK). The Synthetic SPROUT ligand design program is based on the Classic SPROUT program and it is under development in ICAMS.

Part of this study concentrated on the creation of a new knowledge base for the SynSPROUT program. SynSPROUT does not have the capability of constructing rings from acyclic precursors. A 1,3-dipolar cycloaddition (1,3-DC) reaction was chosen as a specific area to test this desired improvement to the system. A literature study was carried out and the information of these reactions was added to the knowledge base. Having been unable to complete the knowledge base due to the absence of the required programming language for the ring formation, the addition of this command string including the information of 1,3-DC reactions will lead to improved functioning of the program.

Constant development of the SPROUT program has been remarkable during this study. Other tools of structural biology and computational chemistry have also been applied during this study including programs such as Moloc, AutoDock. These two programs were used to evaluate newly created molecules and make those fit even better into the active site. The Moloc program hydrogen bond interaction information, before and after optimisation were similar to the information of the HIPPO module. The AutoDock program results conformations similar to SPROUT structure generation results. The results of both programs supported well the SPROUT structure generation results.

A large number of novel inhibitor molecules were generated for 17βHSD/KSR1 enzyme during this study. The study concentrated on the creation of new inhibitor molecules by 166 means of in silico modelling studies. A few promising molecules were among these modelling results, especially three of them. The opportunity to use the SPROUT program for the 17βHSD/KSR1 enzyme inhibitor structure generation improved the knowledge of the characteristics of the binding site. The novel molecules created by the SPROUT program are particularly promising inhibitor molecules for 17βHSD/KSR1 enzyme. The next step is to synthesise these molecules and carry out in vitro research. If these results indicate similar biological activity to the modelling studies it is desirable to continue the research of the promising molecules. Minor modifications of the molecules using additional computational simulations may help if the in vitro results are not satisfactory. Also other kinds of in silico modelling studies such as QSAR may help to find better physical features (hydrophobicity, electronic- and steric factors) for the molecules and help to modify them in order to achieve further improvement.

Although enzyme 17βHSD/KSR1 has been the main interest of the crystallisation studies there have also been attempts to crystallise other 17βHSD/KSR sub-enzymes, particularly types 2 and 3. However, these have proved to be rather difficult to crystallise due to the membrane binding of the enzyme. It is easy to presume that the topology of the type 2 is similar to type 1 because it binds to similar type of ligands. Because of this the molecules generated during this study may show biological activity for other 17βHSD/KSR enzyme types than type 1. Possible biological activity for the other 17βHSD/KSR enzyme types have to be investigate with in vitro studies since it is not possible to carry out structure-based modelling studies without enzyme 3-D structure information.

167

7. REFERENCES

1. I. D. Kuntz, Science, 1992, 257, 1078-1082. 2. J. Greer, J. W. Erickson, J. J. Baldwin and M. D. Varney, J. Med. Chem., 1994, 37, 1035-1054. 3. H. M. Berman, J. Westbrook, Z. Feng, G. Gillard, T. N. Bhat, H. Weissing, I. N. Shindyalov, P. E. Bourne, Nucleic Acids Res., 2000, 28, 235-242. 4. Homepage of the Protein Data Bank www.rcsb.org/pdb/, 2.12.2004. 5. Y. C. Martin, J. Med. Chem., 1992, 35, 2145-2154. 6. O. F. Güner and D.R. Henry, in Encyclopedia of Computational Chemistry, ed. P. von Rague Schleyer, Wiley, UK, 1998, vol 1, p. 2988-3003. 7. V.J. Gillet and A.P. Johnson, Des. Bioact. Mol., 1998, 149-174. 8. Homepage of the MDL database, www.mdli.com, 2.12.2004. MDL Information Systems Ltd. 9. R. S. Bohacek and C. McMartin, J. Am. Chem. Soc., 1994, 116, 5560-5571. 10. A. Caflish, H. J. Schramm and M. Karplus, J. Comput. Aid. Mol. Des., 2000, 14, 161- 179. 11. G. Klebe, J. Mol. Med., 2000, 78, 269-281. 12. R. Kiyama, T. Honma, K. Hayashi, M. Ogawa, M. Hara, M. Fujimoto and T. Fujishita, J. Med. Chem., 1995, 38, 2728-2741. 13. S. Wang, G. W. A. Milne, X. Yan, I. J. Posey, M. C. Nicklaus, L. Graham and W. G. Rice, J. Med. Chem., 1996, 39, 2047-2054. 14. G. S. Chen, C.-S. Chang, W. M. Kan, C.-L. Chang, K. C. Wang and J.-W. Chern, J. Med. Chem., 2001, 44, 3759-3763. 15. D. Joseph-McCarthy, B. E. Thomas IV, M. Belmarsh, D. Moustakas and J. C. Alvarez, Proteins, 2003, 51, 172-188. 16. A. Ting, R. McGuire, A. P. Johnson and S. Green, J. Chem. Inf. Comput. Sci., 2000, 40, 347-353. 17. V. J. Gillet, W. Newell, P. Mata, G. Myatt, S. Sike, Z. Zsoldos and A. P. Johnson, J. Chem. Inf. Comput. Sci., 1994, 34, 207-217. 18. R. A. Lewis and A. R. Leach, J. Comput. Aid. Mol. Des., 1994, 8, 467-475. 168

19. H.-J. Böhm and S. Fischer, in Encyclopedia of Computational Chemistry, ed. P. von Rague Schleyer, Wiley, UK, 1998, vol 1, p. 657-663. 20. G. Schneider and H.-J. Böhm, Drug Discov. Today, 2002, 7, 64-70. 21. M. Stahl, N. P. Todorov, T. James, H. Mauser, H.-J. Böhm and P. M. Dean, J. Comput. Aid. Mol. Des., 2002, 16, 459-478. 22. V. Tschinke and N. C. Cohen, J. Med. Chem., 1993, 36, 3863-3870. 23. M. B. Eisen, D. C. Wiley, M. Karplus and R. E. Hubbard, Proteins, 1994, 19, 199-221. 24. H.-J. Böhm, J. Comput. Aid. Mol. Des., 1992, 6, 61-78. 25. H.-J. Böhm, J. Comput. Aid. Mol. Des.,1992, 6, 593-606. 26. H.-J. Böhm, Perspect. Drug Discov., 1995, 3, 21-33. 27. H.-J. Böhm, J. Comput. Aid. Mol. Des.,1996, 10, 265-272. 28. V. J. Gillet, A.P. Johnson, P. Mata and S. Sike, Tetrahedron Comput. Methodol., 1990, 3, 681-696. 29. V. J. Gillet, A. P. Johnson, P. Mata, S. Sike and P. Williams, J. Comput. Aid. Mol. Des., 1993, 7, 127-153. 30. P. Mata, V. J. Gillet, A. P. Johnson, J. Lampreia, G. J. Myatt, S. Sike and A. Stebbings, J. Chem. Inf. Comput. Sci., 1995, 35, 479-493. 31. V. J. Gillet, G. Myatt, Z. Zsoldos and P. A. Johnson, Perspect. Drug Discov., 1995, 3, 34-50. 32. Z. Szabo, M. Vargyas and A. P. Johnson, J. Chem. Inf. Comput. Sci., 2000, 40, 229-346. 33. Y. Nishibata and A. Itai, Tetrahedron, 1991, 47, 8985-8990 34. Y. Nishibata and A. Itai, J. Med. Chem., 1993, 36, 2921-2928. 35. S. H. Rotstein and M. A. Murcko, J. Comput. Aid. Mol. Des., 1993, 7, 23-43. 36. J. B. Moon and W. J. Howe, Proteins, 1991, 11, 314-328. 37. B. Waszkowycz, D. E. Clark, D. Frenkel, J. Li, C. W. Murray, B. Robson and D. R. Westhead, J. Med. Chem., 1994, 37, 3994-4002. 38. A. Lilienkampf, S. Alho, D. Roukounas and K. Wähälä, Kemia-Kemi, 2001, 7, 542-544. 39. H. Peltoketo, V. Isomaa, O. Mäentausta and R. Vihko, FEBS Lett., 1988, 239, 73-77. 40. C. Mazza, R. Breton, D. Housset, J. Fontecilla-Camps, J. Biol. Chem., 1998, 273, 8145- 8152. 41. M. Sawicki, M. Erman, T. Puranen, P. Vihko, D. Ghosh, P. Natl. Acad. Sci. USA, 1999, 96, 840-845. 169

42. Q. Han, R. Campbell, A. Gangloff, Y. Huang, S.-X. Lin, J. Biol. Chem., 2000, 275, 1105-1111. 43. A. R. Leach, R. A. Bryce and A. J. Robinson, J. Mol. Graph. Model. 2000, 18, 358-367. 44. G. Schneider, M.-L. Lee, M. Stahl and P. Schneider, J. Comput. Aid. Mol. Des., 2000, 14, 487-494. 45. G. Schneider, O. Clement-Chomienne, L. Hilfiger, P. Schneider, S. Kirsch, H.-J. Böhm and W. Neidhart, Angew. Chem. Int. Edit., 2000, 39, 4130-4133. 46. G. J. Myatt, Computer Aided Estimation of Synthetic Accessibility, PhD thesis, School of Chemistry, University of Leeds, 1994. 47. J.C. Barber, CAESA: Computer-Aided Estimation of Synthetic Accessibility. Improved Algorithms for the Identification of Starting Materials, PhD thesis, School of Chemistry, University of Leeds, 1998. 48. Homepage of the SPROUT program www.simbiosys.ca/sprout/, 2.12.2004. SimBioSys Inc, Canada. 49. K. Boda, SynSPROUT: Generating Synthetically Accessible Ligands by De Novo Design, PhD thesis, School of Chemistry, University of Leeds, 2002. 50. Homepage of ICAMS, United Kingdom, www.chem.leeds.ac.uk/ICAMS/new_web, 2.12.2004. 51. A. K. T. Ting, Computer Aided Pharmacophore Identification, PhD thesis, School of Chemistry, University of Leeds, 2000. 52. Z. Zsoldos, New method for de novo 3D structure generation, PhD thesis, School of Chemistry, University of Leeds, 1998. 53. G. Litwack and T. J. Schmidt in Textbook of biochemistry with clinical correlations, ed. T. M. Devlin, Wiley-Liss, New York, 5th ed., 2002, 960-988. 54. M. R. Tremblay and D. Poirier, J. Steroid Biochem. Molec. Biol., 1998, 66, 179-191. 55. H. Peltoketo, V. Luu-The, J. Simard and J. Adamski, J. Mol. Endocrinol., 1999, 23, 1- 11. 56. D.Gosh, V. Pletnev, D.-W. Zhu, Z. Wawrazak, W. Duax, W. Pangborn, F. Labrie and S.-X. Lin, Structure, 1995, 3, 503-513. 57. C. L. M. J. Verlinde and W. G. J. Hol, Structure, 1994, 2, 577-587. 58. F. H. Allen, Acta Crystallogr. B, 2002, B58, 380-388. 59. Homepage of the Cambridge Structural Database www.ccdc.cam.ac.uk, 2.12.2004. 170

60. A. Miranker and M. Karplus, Proteins, 1991, 11, 29-34. 61. Homepage of Network Science and MCSS/HOOK, www.netsci.org/Science/Compchem/feature04.html, 2.12.2004. Molecular simulation Inc, USA. 62. Accelrys Inc., InsightII, San Diego CA, USA: Accelrys Inc.,1999. Homepage of the InsightII program, www.accelrys.com, 2.12.2004. 63. C. M. Oshiro, I.D. Kuntz and J. Scott Dixon, J. Comput. Aid. Mol. Des., 1995, 9, 113- 130. 64. R. D. Cramer III, D. E. Patterson and J. D. Bunce, J. Am. Chem. Soc., 1988, 110, 5959- 5967. 65. J. M. S. Law, D. Y. K. Fung, Z. Zsoldos, A. Simon, Z. Szabo, I. G. Csizmadia and A. P. Johnson, Theochem.- J. Mol. Struct., 2003, 666-667, 651-657. 66. G. W. A. Milne, S. M. Wang and M. C. Nicklaus, J. Chem. Inf. Comp. Sci., 1996, 36, 726-730. 67. Q. Han, C. Dominguez, P. F. W. Stouten, J. M. Park, D.E. Duffy, R. A. Galemmo, Jr., K. A. Rossi, R. S. Alexander, A. M. Smallwood, P. C. Wong, M. M. Wright, J. M. Luettgen, R. M. Knabb and R. R. Wexler, J. Med. Chem., 2000, 43, 4398-4415. 68. G. A. Hopkinson, Computer-Assisted Organic Synthesis Design, PhD thesis, School of Chemistry, University of Leeds, 1985. 69. T. M. Penning, Endocr.-Relat. Cancer, 1996, 3, 41-56. 70. R. Breton, D. Housset, C. Mazza, J. Fontecilla-Camps, Structure, 1996, 4, 905. 71. A. J. Clark, Further development of the SPROUT de novo design program. Derivation and applications of scoring functions, PhD thesis, School of Chemistry, University of Leeds, 1998. 72. S. Sike, Computer-Based Constrained Chemical Structure Generation in Three Dimensions, PhD thesis, School of Chemistry, University of Leeds, 1992. 73. S. Csepregi, Development of Novel Ligand Docking Algorithms for De Novo Ligand Design, PhD thesis, School of Chemistry, University of Leeds, 2002. 74. Homepage of the Moloc program, www.moloc.ch, 2.12.2004, Gerber Molecular Design Switzerland. 75. P. R. Gerber and K. Müller, J. Comput. Aid. Mol. Des., 1995, 9, 251-268. 76. P.R. Gerber, J. Comput. Aid. Mol. Des., 1998, 12, 37-51. 171

77. F. Mohamadi, N. G. Richards, W. C. Guida, R. Liskamp, M. Lipton, C. Caufield, G. Chang, T. Hendrickson and K. B. Still, J. Comput. Chem., 1990, 11, 440. 78. G.M. Morris, D.S. Goodsell, R.S. Halliday, Ruth Huey, W.W. Hart, R.K. Belew and A.J. Olson, J. Comput. Chem.,1998, 19, 1639-1662. 79. Z. Zsoldos, I. Szabo, Zsolt Szabo and A. P. Johnson, Theochem.-J. Mol. Struct., 2003, 666-667, 659-665. 80. R. T. Turner, B. L Riggs and T. C. Spelsberg, Endocr. Rev., 1994, 15, 275-300 81. M.Y. Farhat, M. C. Lavigne and P.W. Ramwell, FASEB J., 1996, 10, 615-624. 82. A. Brodie, Q. Lu and B. Long, J. Steroid Biochem. Molec. Biol., 1999, 69, 205-210. 83. J. R. Pasqualini, J. Menopause, 2001, 2, 10-13. 84. P. Sonnet, P. Dallemagne, J. Guillon, C. Enguehard, S. Stiebing, J. Tanguy, R. Bureau, S. Rault, P. Auvray, S. Moslemi, P. Sourdaine and G.-E. Seralini, Bioorg. Med. Chem., 2000, 8, 945-955. 85. T. Mäkelä, K. T. Wähälä and T. A. Hase, Steroids, 2000, 65, 437-441. 86. C. Pouget, C. Fagnere, J.-P. Basly, G. Habrioux and A. J. Chulia, Bioorg. Med. Chem. Lett., 2002, 12, 2859-2861. 87. H. A. M. Hejaz, A. Purohit, M. F. Mahon, M. J. Reed and B. V. L. Potter, J. Med. Chem., 1999, 42, 3188-3192. 88. R. H. Peters, W.-R. Chao, B. Sato, K. Shigeno, N. T. Zaveri and M. Tanabe, Steroids, 2003, 68, 97-110. 89. F. Labrie, V. Luu-The, S.-X. Lin, C. Labrie, J. Simard, R. Breton and A. Belanger, Steroids, 1997, 62, 148-158. 90. D. Poirier, Curr. Med. Chem., 2003, 10, 453-477. 91. E. R. Simpson, M. S. Mahendroo, G. D. Means, M. W. Kilgore, M. M. Hinshelwood, S. Graham-Lorence, B. Amarneh, Y. Ito, C. R. Fisher, M. D. Michael, C. R. Mendelson and S. E. Bulun, Endocr. Rev., 1994, 15, 342-355. 92. P.A. Cole and C. H. Robinson, J. Med. Chem., 1990, 33, 2933-2942. 93. A. Purohit, A. Singh and M. J. Reed, Biochem. Soc. T., 1999, 27, 323-327. 94. A. Purohit, H. A. Hejaz, L. W. Woo, A. E. van Strien, B. V. Potter and M. J. Reed, J. Steroid Biochem. Molec. Biol., 1999, 69, 227-238. 95. R. P. Boivin, F. Labrie and D. Poirier, Steroids, 1999, 64, 825-833. 172

96. M. Sawicki, P. Ng, B. Burkhart, V. Pletnev, T. Higashiyama, Y. Osawa and D. Ghosh, Mol. Immunol., 1999, 36, 423-432. 97. T. M. Penning, Endocr. Rev., 1997, 18, 281-305. 98. H. Jörnvall, B. Persson, M. Krook, S. Atrian, R. Gonzalez-Duarte, J. Jeffrey and D. Ghosh, Biochemistry, 1995, 34, 6003-6013. 99. U. Oppermann, C. Filling, M. Hult, N. Shafqat, X. Wu, M. Lindh, J. Shafqat, E. Nordling, Y. Kallberg, B. Persson and H. Jörnvall, Chem.-Biol. Interact., 2003, 143- 144, 247-253. 100. J. M. Jez, T. G. Flynn, T. M. Penning, Biochem. Pharmacol., 1997, 54, 639-647. 101. J. M. Jez, M. J. Bennett, B. P. Schlegel, M. Lewis and T. M. Penning, Biochem. J., 1997, 326, 625-636. 102. U. Oppermann, C. Filling and H. Jörnvall, Chem.-Biol. Interact., 2001, 130-132, 699- 705. 103. C. Filling, X. Wu, N. Shafqat, M. Hult, E. Mårtensson, J. Shafqat and U. Oppermann, Mol. Cell. Endocrinol., 2001, 171, 99-101. 104. Y. Kallberg, U. Oppermann, H. Jörnvall and B. Persson, Protein Sci., 2002, 11, 636- 641. 105. C. Filling, K. Berndt, J. Benach, S. Knapp, T. Prozorovski, E. Nordling, R. Ladenstein, H. Jörnvall and U. Oppermann, J. Biol. Chem., 2002, 277, 25677-25684. 106. C. Filling, E. Nordling, J. Benach, K. Berndt, R. Ladenstein, H. Jörnvall and U. Oppermann, Biochem. Bioph. Res. Co., 2001, 289, 712-717. 107. M. C. Holmes, Y. Kotelevtsev, J. J. Mullins and J. R. Seckl, Mol. Cell. Endocrinol., 2001, 171, 15-20. 108. G.W. Souness, A.S. Brem and D. J. Morris, Steroids, 2002, 67, 195-201. 109. E. Möbus and E. Maser, J. Biol. Chem., 1998, 273, 30888-30896. 110. D. Ghosh, M. Sawicki, V. Pletnev, M. Erman, S. Ohno, S. Nakajin and W. Duax, J. Biol. Chem., 2001, 276, 18457-18463. 111. J.-F. Couture, P. Legrand, L. Cantin, V. Luu-The, F. Labrie and R. Breton, J. Mol. Biol., 2003, 331, 593-604. 112. D. Ghosh, Z. Wawrzak, C. Weeks, W. Duax and M. Erman, Structure, 1994, 2, 629- 640. 173

113. U. Oppermann, C. Filling, K. Berndt, B. Persson, J. Benach, R. Ladenstein and H. Jörnvall, Biochemistry, 1997, 36, 34-40. 114. J. Benach, C. Filling, U. C. Oppermann, P. Roversi, G. Bricogne, K. D. Berndt, H. Jörnvall and R. Landenstein, Biochemistry, 2002, 41, 14659-14668. 115. N. Shafqat, H.-U. Marschall, C. Filling, E. Nordling, X.-Q. Wu, L. Björk, J. Thyberg, E. Mårtensson, S. Salim, H. Jörnvall and U. Oppermann, Biochem. J., 2003, 376, 49-60. 116. K. J. Ryan and L. L. Engel, Endocrinolocy, 1953, 52, 287-291. 117. H. Peltoketo, P. Vihko and R. Vihko, Vitam. Horm., 1999, 55, 353-398. And references therein. 118. D. Ghosh and P. Vihko, Chem.-Biol. Interact., 2001, 130-132, 637-650. And references therein. 119. C. H. Blomquist, J. Steroid Biochem. Molec. Biol., 1995, 55, 515-524. 120. J.Adamski and F. J. Jakob, Mol. Cell. Endocrinol., 2001, 171, 1-4. 121. D.-W. Zhu, J.-Z. Jin and S.-X. Lin, J. Steroid Biochem. Molec. Biol., 1995, 52, 77-81. 122. B. Li and S.-X. Lin, Eur. J. Biochem., 1996, 235, 180-186. 123. V. Luu-The, Y. Zhang, D. Poirier and F. Labrie, J. Steroid Biochem. Molec. Boil., 1995, 55, 581-587. 124. V. Luu-The, J. Steroid Biochem. Molec. Biol., 2001, 76, 143-151. 125. N. Khan, K. K. Sharma, S. Andersson and R. J. Auchus, Arch. Biochem. Biophys., 2004, 429, 50-59. 126. I. Schwabe, B. Husen and A. Einspanier, Mol. Cell. Endocrinol., 2001, 171, 187-192. 127. B. Husen, J. Adamski, G. Rune and A. Einspanier, Mol. Cell. Endocrinol., 2001, 171, 179-185. 128. T. J. Puranen, R. M. Kurkela, J. T. Lakkakorpi, M. H. Poutanen, P. V. Itäranta, J. P. J. Melis, D. Ghosh, R. Vihko and P. Vihko, Endocrinology, 1999, 140, 3334-3341. 129. C. H. Tsai-Morris, A. Khanum, P.-Z. Tang and M. L. Dufau, Endocrinology, 1999, 140, 3534-3542. 130. V. Luu-The, I. Dufort, G. Pelletier and F. Labrie, Mol. Cell. Endocrinol., 2001, 171, 77- 82. 131. A. Krazeisen, R. Breitling, K. Imai, S. Fritz, G. Möller and J. Adamski, FEBS Lett., 1999, 460, 373-379. 174

132. C. A. Krusche, G. Möller, H. M. Beier and J. Adamski, Mol. Cell. Endocrinol., 2001, 171, 169-177. 133. J. Formitcheva, M. E. Baker, E. Anderson, G. Lee and N. Aziz, J. Biol. Chem., 1998, 273, 22664-22671. 134. J. Su, M. Lin and J. Napoli, Endocrinology, 1999, 140, 5275-5283. 135. X.-Y. He, G. Merz, P. Mehta, H. Schulz and S.-Y. Yang, J. Biol. Chem., 1999, 274, 15014-15019. 136. A. J. Powell, J. A. Read, M. J. Banfield, F. Gunn-Moore, S. D. Yan, J. Lustbader, A. R. Stern, D. M. Stern and R. L. Brady, J. Mol. Biol., 2000, 303, 311-327. 137. C. Hansis, D. Jähner, A. N. Spiess, K. Boettcher and R. Ivell, Eur. J. Biochem., 1998, 258, 53-60. 138. A. Kobayashi, L. L. Jiang and T. Hashimoto, J. Biochem., 1996, 119, 775-782. 139. L. Torroja, D. Ortuno-Sahagun, A. Ferrus, B. Hammerle and J. A. Barbas, J. Cell. Biol., 1998, 141, 1009-1017. 140. K. X. Li, R. E. Smith and Z. S. Krozowski, Endocr. Res., 1998, 24, 663-667 141. Y. Okazaki, M. Furuno, T. Kasukawa, J. Adachi, H. Bono, S. Kondo, I. Nikaido, N. Osato, R. Saito and H. Suzuki, Nature, 2002, 420, 563-573. 142. Z. Chai, P. Brereton, T. Suzuki, H. Sasano, V. Obeyesekere, G. Escher, R. Saffery, P. Fuller, C. Enriques and Z. Krozowski, Endocrinology, 2003, 144, 2084-2091. 143. K. Motojima, Eur. J. Biochem., 2004, 271, 4141-4146. 144. S. Ghersevich, P. Nokelainen, M. Poutanen, M. Orava, H. Autio-Harmainen, H. Rajaniemi and R. Vihko, Endocrinology, 1994, 135, 1477-1487. 145. P. Nokelainen, T. Puranen, H. Peltoketo, M. Orava, P. Vihko and R. Vihko, Eur. J. Biochem., 1996, 236, 482-490. 146. S. Ghersevich, M. Poutanen, H. Rajaniemi and R. Vihko, J. Endocrinol., 1994, 140, 409-417. 147. S. Ghersevich, M. Poutanen, H. Martikainen and R. Vihko, J. Endocrinol., 1994, 143, 139-150. 148. M. Miettinen, M. Mustonen, M. Poutanen, V. Isomaa and R. Vihko, Biochem. J., 1996, 314, 839-845. 149. L. Wu, M. Einstein, W. Geissler, H. Chan, K. Ellison and S. Andersson, J. Biol. Chem., 1993, 268, 12964-12969. 175

150. L. Akinola, M. Poutanen and R. Vihko, Endocrinology, 1996, 137, 1572-1579. 151. M. Mustonen, M. Poutanen, V. Isomaa, P. Vihko and R. Vihko, Biochem. J., 1997, 325, 199-205. 152. T. Suzuki, H. Sasano, S. Andersson and J. Mason, J. Clin. Endocr. Metab., 2000, 85, 3669-3672. 153. J. Elo, L. Akinola, M. Poutanen, P. Vihko, A. Kyllönen, O. Lukkarinen and R. Vihko, Int. J. Cancer, 1996, 66, 37-41. 154. N. Moghrabi, J. Head and S. Andersson, J. Clin. Endocr. Metab., 1997, 82, 3872-3878. 155. M. Mustonen, M. Poutanen, A. Chotteau-Lelievre, Y. de Launoit, V. Isomaa, S. Vainio, R. Vihko and P. Vihko, Mol. Cell. Endocrinol., 1997, 134, 33-40. 156. M. Mustonen, M. Poutanen, S. Kellokumpu, Y. de Launoit, V. Isomaa, R. Vihko and P. Vihko, J. Mol. Endocrinol., 1998, 20, 67-74. 157. M. Tremblay, V. Luu-The, G. Leblanc, P. Noël, E. Breton, F. Labrie and D. Poirier, Bioorg. Med. Chem., 1999, 7, 1013-1023. 158. K.-M. Sam, F. Labrie and D. Poirier, Eur. J. Med. Chem., 2000, 35, 217-225. 159. D. Poirier, P. Bydal, M. Tremblay, K.-M. Sam and V. Luu-The, Mol. Cell. Endocrinol., 2001, 171, 119-128. 160. W. M. Geissler, D. L. Davis, L. Wu, K. D. Bradshaw, S. Patel, B. B. Mendonca, K. O. Elliston, J. D. Wilson, D. W. Russell and S. Anderson, Nat. Genet., 1994, 7, 34-39. 161. J. A. Sha, K. Dudley, W. R. Rajapaksha and P. J. O’Shaughnessy, J. Steroid Biochem. Molec. Biol., 1997, 60, 19-24. 162. H. Inano and B. Tamaoki, Steroids, 1986, 48, 1-26. 163. B. T. Ngatcha, V. Luu-The and D. Poirier, Bioorg. Med. Chem. Lett., 2000, 10, 2533- 2536. 164. J. Adamski, T. Normand, F. Leeders, D. Monté, A. Begue, D. Stéhelin, P. W. Jungblut and Y. de Launoit, Biochem. J., 1995, 311, 437-443. 165. T. Mormand, B. Husen, F. Leenders, H. Pelczar, J.-L. Baert, E. Begue, A.-C. Flourens, J. Adamski and Y. de Launoit, J. Steroid Biochem. Molec. Biol., 1995, 55, 541-548. 166. J. C. Corton, C. Bocos, E. S. Moreno, A. Merrit, D. S. Marsman, P. J. Sausen, R. C. Cattley and J.-Å. Gustafsson, Mol. Pharmacol., 1996, 50, 1157-1166. 167. F. Leenders, J. Tesdorpf, M. Markus, T. Engel, U. Seedorf and J. Adamski, J. Biol. Chem., 1996, 271, 5438-5442. 176

168. K. A. Brown, D. Boerboom, N. Bouchard, M. Doré, J. G. Lussier and J. Sirois, Endocrinology, 2004, 145, 1906-1915. 169. Y. de Launoit and J. Adamski, J. Mol. Endocrinol., 1999, 22, 227-240. 170. G. Möller, E. G. van Grunsven, R. J. A. Wanders and J. Adamski, Mol. Cell. Endocrinol., 2001, 171, 61-70. 171. K. Dinkel, M. Rickert, G. Möller, J. Adamski, H.-M. Meinck and W. Richter, J. Neuroimmunol., 2002, 130, 184-193. 172. A. M. Haapalainen, D. van Aalten, G. Meriläinen, J. E. Jalonen, P. Pirilä, R. Wierenga, J. K. Hiltunen and T. Glumoff, J. Mol. Biol., 2001, 313, 1127-1138. 173. A. M. Haapalainen, M. K. Koski, Y.-M. Qin, J. K. Hiltunen and T. Glumoff, Structure, 2003, 11, 87-97. 174. Y .Deyashiki, K. Ohshima, M. Nkanishi, K. Sato, K. Matsuura and A. Hara, J. Biol. Chem., 1995, 270, 10461-10467. 175. I. Dufort, P. Rheault, X.-F. Huang, P. Soucy and V. Luu-The, Endocrinology, 1999, 140, 568-574. 176. M. Khanna, K.-N. Qin, R. W. Wang and K.-C. Cheng, J. Biol. Chem., 1995, 270, 20162-20168. 177. H.- K. Lin, J. Jez, B. Schlegel, D. Pheel, J. Pachter and T. Penning, Mol. Endocrinol., 1997, 11, 1971-1984. 178. W. Qiu, M. Zhou, F. Labrie and S.-X. Lin, Mol. Endocrinol., 2004, 18, 1798-1807. 179. M. El-Alfy, V. Luu-The, X.-F. Huang, L. Berger, F. Labrie and G. Pelletier, Endocrinology, 1999, 140, 1481-1491. 180. G. Pelletier, V. Luu-The, B. Têtu and F. Labrie, J. Histochem. Cytochem., 1999, 47, 731-738. 181. S. Steckelbroeck, M. Watzka, B. Stoffel-Wagner, V. Hans, L. Redel, H. Clusmann, C. Elger, F. Bidlingmaier and D. Klingmüller, Mol. Cell. Endocrinol., 2001, 171, 165-168. 182. M. Zhou, W. Qiu, H.-J. Chang, A. Gangloff and S.-X. Lin, Acta Crystallogr. D, 2002, D58, 1048-1050. 183. T. M. Penning, J. Steroid Biochem. Molec. Biol., 1999, 69, 211-225. 184. T. M. Penning, M. E. Burczynski, J. M. Jez, H.-K. Lin, H. Ma, M. Moore, K. Ratnam and N. Palackal, Mol. Cell. Endocrinl., 2001, 171, 137-149. 177

185. A. Krazeisen, R. Breitling, G. Möller and J. Adamski, Mol. Cell. Endocrinol., 2001, 171, 151-162. 186. M. G. Biswas and D. W.Russell, J. Biol. Chem., 1997, 272, 15959-15966. 187. W. R. Duan, D. I. H. Linzer and G. Gibori, J. Biol. Chem., 1996, 271, 15602-15607. 188. P. Nokelainen, H. Peltoketo, R. Vihko and P. Vihko, Mol. Endocrinol., 1998, 12,1048- 1059. 189. S. Törn, P. Nokelainen, R. Kurkela, A. Pulkka, M. Menjivar, S. Ghosh, M. Coca- Prados, H. Peltoketo, V. Isomaa and P. Vihko, Biochem. Bioph. Res. Co., 2003, 305, 37- 45. 190. R. Breitling, A. Krazeisen, G. Möller and J. Adamski, Mol. Cell. Endocrinol., 2001, 171, 199-204. 191. H. Peltoketo, P. Nokelainen, Y.-S. Piao, R. Vihko and P. Vihko, J. Steroid Biochem. Molec. Biol., 1999, 69, 431-439. 192. A. Ando, Y. Kikuti, A. Shingenari, H. Kawata, N. Okamoto, T. Shiina, L. Chen, T. Ikemura, K. Abe, M. Kimura and H. Inoko, Genomics, 1996, 35, 600-602. 193. S. Ramirez, I. Formitcheva and N. Aziz, Mol. Cell. Endocrinol., 1998, 143, 9-22. 194. N. Aziz, E. Anderson, G. Y. Lee and D. D. L. Woo, Mol. Cell. Endocrinol., 2001, 171, 83-88. 195. J. Napoli, Mol. Cell. Endocrinol., 2001, 171, 103-109. 196. X.-Y. He, H. Schultz and S.-Y. Yang, J. Biol. Chem., 1998, 273, 10741-10746. 197. X.-Y. He, Y.-Z. Yang, H. Schultz and S.-Y. Yang, Biochem. J., 2000, 345, 139-143. 198. X.-Y. He, G. Merz, Y.-Z. Yang, P. Mehta, H. Schultz and S.-Y. Yang, Eur. J. Biochem., 2001, 268, 4899-4907. 199. S. D Yan, J. Fu, C. Soto, X. Chen, H. Zhu, F. Al-Mohanna, K. Collison, A. Zhu, E. Stern, T. Saido, M. Tohyama, S. Ogawa, A. Roher and D. Stern, Nature, 1997, 389, 689-695. 200. R. Ivell, M. Balvers, R. J. K. Anand, H.-J. Paust, C. McKinnell and R. Sharpe, Endocrinology, 2003, 144, 3130-3137. 201. X.-Y. He, G.-Y. Wen, G. Merz, D. Lin, Y.-Z. Yang, P. Mehta, H. Schultz and S.-Y. Yang, Mol. Brain Res., 2002, 99, 46-53. 202. X.-Y. He, G. Mertz, C.-H. Chu, D. Lin, Y.-Z. Yang, P. Mehta, H. Schultz and S.-Y. Yang, Mol. Cell. Endocrinol., 2001, 171, 89-98. 178

203. C. R. Kissinger, P. A. Rejto, L. A. Pelletier, J. A. Thomson, R. E. Showalter, M. A. Abreo, C. S. Agree, S. Margosiak, J. J. Meng, R. M. Aust, D. Vanderpool, B. Li, A. Tempczyk-Russell and J. E. Villafranca, J. Mol. Biol., 2004, 342, 943-952. 204. E. Nordling, U. C. Oppermann, H. Jörnvall and B. Persson, J. Mol. Graph. Model., 2001, 19, 514-520. 205. P. Brereton, T. Suzuki, H. Sasano, K. Li, C. Duarte, V. Obeyesekere, F. Haeseleer, K. Palczewski, I. Smith, P. Komesaroff and Z. Krozowski, Mol. Cell. Endocrinol., 2001, 171, 111-117. 206. Z. Chai, P. Brereton, T. Suzuki, H. Sasano, V. Obeyesekere, G. Escher, R. Saffery, P. Fuller, C. Enriques and Z. Krozowski, Endocrinology, 2003, 144, 2084-2091. 207. T. Lanišnik Rižner, G. Möller, H. H. Thole, M. Žakelj-Mavrič and J. Adamski, Biochem. J., 1999, 337, 425-431. 208. T. Lanišnik Rižner, M. Žakelj-Mavrič,A. Plemenitaš and M. Zorko, J. Steroid Biochem. Molec. Biol., 1996, 59, 205-214. 209. T. Lanišnik Rižner, J. Adamski and J. Stojan, Arch. Biochem. Biophys., 2000, 384, 255- 262. 210. T. Lanišnik Rižner, J. Stojan, and J. Adamski, Mol. Cell. Endocrinol., 2001, 171, 193- 198. 211. T. Lanišnik Rižner, J. Stojan, and J. Adamski, Chem.-Biol. Interact., 2001, 130-132, 793-803. 212. S. Gobec, M. Sova, K. Kristam and T. Lanišnik Rižner, Bioorg. Med. Chem. Lett., 2004, 14, 3933-3936. 213. K. Kristan, T. Lanišnik Rižner, J. Stojan, J. K. Gerber, E. Kremmer and J. Adamski, Chem.-Biol. Interact., 2003, 143-144, 493-501. 214. A. Azzi, P. H. Rehse, D.-W. Zhu, R. L. Campbell, F. Labrie and S.-X. Lin, Nat. Struct. Biol., 1996, 3, 665-668. 215. A. Gangloff, R. Shi, V. Nahoum and S.-X. Lin, FASEB J., 2003, 17, 274-. 216. W. Qiu, R. Campell, A. Galoff, Dupuis, R. Boivin, M. Tremplay, D. Poirier and S.-X. Lin, FASEB J., 2002, 16, 1829-1831. 217. R. Shi and S.-X. Lin, J. Biol. Chem., 2004, 279, 16778-16785. 218. Tanaka, N. Nonaka, T. Tanabe, T. Yoshimoto, D. Tsuru and Y. Mitsui, Biochemistry, 1996, 35, 7715-7730. 179

219. Accelrys Inc., WebLabViewerPro, release 3.7, 1999, San Diego CA, USA. Accelrys Inc. www.accelrys.com/ 2.12.2004. 220. T. Puranen, M. Poutanen, H. Peltoketo, P. Vihko and R. Vihko, Biochem. J., 1994, 304, 289-293. 221. K. Wähälä, A. Lilienkampf, S. Alho, K. Huhtinen, N. Johansson, P. Koskimies and K. Vihko. Thiophenepyrimidinones as 17-beta-hydroxysteroid dehydrogenase inhibitors. WO 2004/110459 A1, 23.12.2004. 70 pp. 222. J. D. Pelletier, F. Labrie and D. Poirier, Steroids, 1994, 59, 536-547. 223. D. Poirier, P. Dionne and S. Auger, J Steroid Biochem. Molec. Biol., 1998, 64, 83-90. 224. K.-M. Sam, R. P. Boivin, M. R. Tremplay, S. Auger and D. Poirier, Drug Des. Discov.,1998, 15, 157-180. 225. M. R. Tremblay, S.-X. Lin and D. Poirier, Steroids, 2001, 66, 821-831. 226. S. Mäkelä, M. Poutanen, M. L. Kostian, N. Lehtimäki, L. Salo, R. Santti and R.Vihko, P. Soc. Exp. Biol. Med., 1998, 217, 310-316. 227. J. C. Le Bail, T. Laroche, F. Marre-Fournier and G. Habrioux, Cancer Lett., 1998, 133, 101-106. 228. A.-M. Hoffrén, C. M. Murray and R. D. Hoffmann, Curr. Pharm. Design, 2001, 7, 547- 566. 229. S. A. Whitehead and M. Lacey, Hum. Reprod., 2003, 18, 487-494. 230. C. P. Owen and S. Ahmed, Biochem. Biophys. Res. Com., 2004, 318, 131-134. 231. J.-C. Le Bail, C. Pouget, C. Fagnere, J.-P. Basly, A.-J. Chulia and G. Habrioux, Life Sci., 2001, 68, 751-761. 232. G. S. Chetrite, C. Ebert, F, Wright, J.-C. Philippe and J. R. Pasqualini, J. Steroid Biochem. Molec. Biol., 1999, 68, 51-56. 233. W. M. Brown, L. E. Metzger IV, J. P. Barlow, L. A. Hunsaker, L. M. Deck, R. E. Royer and D. L. Vander Jagt, Chem.-Biol. Interact., 2003, 143-144, 481-491. 234. W. Qiu, R. L. Campbell, A. Gangloff, P. Dupuis, R. P. Boivin, M. R. Tremblay, D. Poirier and S.– X. Lin, FASEB J., 2002, 16, 1829-1831. 235. M. Bérubé and D. Poirier, Org. Lett., 2004, 6, 3127-3130. 236. P. Bydal, S. Auger and D. Poirier, Steroids, 2004, 69, 325-342. 237. R. Maltais, V. Luu-The and D. Poirier, J. Med. Chem., 2002, 45, 640-653. 180

238. R. Le Lain, P. J. Nicholls, H. J. Smith and F. H. Maharlouie, J. Enzym. Inhib., 2001, 16, 35-45. 239. S. Green, P. Walter, V. Kumar, A. Krust, J. M. Bornert, P. Argos and P. Chambon, Nature, 1986, 320, 134-139. 240. G. G. J. Kuiper, E. Enmark, M. Pelto-Huikko, S. Nilsson and J.-Å. Gustafsson, P. Natl. Acad. Sci. USA, 1996, 93, 5925-5930. 241. H. Greschik, J.-M. Wurtz, S. Sanglier, W. Bourguet, A. Van Dorsselaer, D. Moras and J.-P. Renaud, Mol. Cell., 2002, 9, 303-313. 242. H. Greschik, R. Flaig, J.-P. Renaud and D. Moras, J. Biol. Chem., 2004, 279, 33639- 33646. 243. M. Ruff, M. Gangloff, J. M. Wurtz and D. Moras, Breast Cancer Res., 2000, 2, 353- 359. 244. S. Kim, J. Y. Wu, E. T. Birzin, K. Frisch, W. Chan, L.-Y. Pai, Y. Tien Yang, R. T. Mosley, P. M. D. Fitzgerald, N. Sharma, J. Dahllund, A.-G. Thorsell, F. DiNinno, S. P. Rohrer, J. M. Schaeffer and M. L. Hammond, J. Med. Chem., 2004, 47, 2171-2175. 245. D. Tanenbaum, Y. Wang, S. Williams, P. Sigler, P. Natl. Acad. Sci. USA, 1998, 95, 5998-6003. 246. A.Shiau, D. Barstad, P. Loria, L. Cheng, P. Kushner, D. Agard, G. Greene, Cell, 1998, 95, 927-937. 247. S. Eiler, M. Gangloff, S. Duclaud, D. Moras, M. Ruff, Protein Expres. Purif., 2001, 22, 165-173. 248. A. Shiau, D. Barstad, J. Radek, M. Meyers, K. Nettles, B. Katzenellenbogen, J. Katzenellenbogen, D. Agard, G. Greene, Nat. Struct. Biol., 2002, 9, 359-364. 249. A. Warnmark, E. Treuter, J.- Å. Gustafsson, R. E. Hubbard, A. M. Brozozowski and A. C. W. Pike, J. Biol. Chem., 2002, 277, 21862-21868. 250. J. Renaud, S. F. Bischoff, T. Buhl, P. Floersheim, B. Fournier, C. Halleux, J. Kallen, H. Keller, J.-M. Schlaeppi and W. Stark, J. Med. Chem., 2003, 46, 2945-2957. 251. A. Pike, A.Brozozowski, R. Hubbard, T. Bonn, A.-G. Thorsell, O. Engstrom, J. Ljunggren, J.-Å. Gustafsson, M. Carlquist, EMBO J. 1999, 18, 4608-4618. 252. A. Pike, A.Brozozowski, J. Walton, R. Hubbard, A.-G. Thorsell, Y. Li, J.-Å. Gustafsson, M. Carlquist, Struct. Fold. Des., 2001, 9, 145-156. 181

253. B. R. Henke, T. G. Consler, N. Go, R. L. Hale, D. R. Hohman, S. A. Jones, A. T. Lu, L. B. Moore, J. T. Moore, L. A. Orband-Miller, R. G. Robinett, J. Shearin, P. K. Spearing, E. L. Stewart, P. S. Turnbull, S. L. Weaver, S. P. Williams, G. B. Wisely and M. H. Lambert, J. Med. Chem., 2002, 45, 5492-5505. 254. J. Schwabe, L. Chapman, J. Finch, D. Rhodes, D. Neuhaus, Structure, 1993, 1, 187-197. 255. J. Schwabe, L. Chapman, J. Finch, D. Rhodes, Cell, 1993, 75, 567-578. 256. A.-M. Leduc, J. O. Trent, J. L. Wittliff, K. S. Bramlett, S. L. Briggs, N. Y. Chirgadze, Y. Wang, T. P. Burrisa and A. F. Spatola, P. Natl. Acad. Sci. USA, 2003, 100, 11273- 11278. 257. A.Brozozowski, A. Pike, Z. Dauter, R Hubbard, T. Bonn, O. Engström, L. Öhman, G. Greene, J.-Å. Gustafsson, M. Carlquist, Nature, 1997, 389, 753-757. 258. Ruff, M. Gangloff, S. Eiler, S. Duclaud, J. Wurtz, M. Dino, to be published, (PDB entry code 1QKT and 1QKU) 259. G. Grethe and T. E. Moock, J. Chem. Inf. Comput. Sci., 1990, 30, 511-520. 260. J. H. Borkent, F. Oukes and J. H. Noordik, J. Chem. Inf. Comput. Sci., 1988, 28, 148- 150. 261. A. K. Long, S. D. Rubenstein and L.J. Joncas, Chem. Eng. News, 1983, 61, 22-30. 262. R. Huisgen in 1,3-Dipolar Cycloaddition Chemistry, ed. A. Padwa, Wiley, New York, 1984, vol 1, pp 1-290. 263. K. V. Gothelf and K. A. Jorgensen, Chem. Rev., 1998, 98, 863-909. 264. E. Buchner, Ber. Dtsch. Chem. Ges., 1888, 21, 2637-2642. 265. O. Diels and K. Alder, Liebigs Ann. Chem., 1928, 460, 98. 266. R Huisgen, Angew. Chem., 1963, 75, 604. 267. R. B. Woodward and R. J. Hoffmann, J. Am. Chem. Soc., 1965, 87, 395. 268. J. W. Lown in 1,3-Dipolar Cycloaddition Chemistry, ed. A. Padwa, Wiley, New York, 1984, vol 1, pp 653-732. 269. H. Adrill, M. J. Dorrity, R. Grigg, M.-S. Leon-Ling, J. F. Malone, V. Sridharan and S. Thianpatanagul, Tetrahedron, 1990, 46, 6433-6448. 270. H. Adrill, X. L. R. Fontaine, R. Grigg, D. Henderson, J. Montgomery, V. Sridharan and S. Surendrakumar, Tetrahedron, 1990, 46, 6449-6466. 271. D. A. Barr, M. J. Dorrity, R. Grigg, J. F. Malone, J. Montgomery, S. Rajviroongit and P Stevenson, Tetrahedron Lett., 1990, 31, 6569-6572. 182

272. J. Casas, R. Grigg, C. Nájera and J. Sansano, Eur. J. Org. Chem., 2001, 1971-1982. 273. R. Grigg and V. Sridharan, Synthesis, 1999, 441-446. 274. R. Grigg, Z. Rankovic, M. Thornton-Pett and A. Somasunderam, Tetrahedron, 1993, 49, 8679-8690. 275. H. Adrill, R. Grigg, J. F. Malone, V. Sridharan and W. A. Thomas, Tetrahedron, 50, 5067-5082. 276. P. Allway and R. Grigg, Tetrahedron Lett., 1991, 32, 5817-58. 277. E. Frank, J. Wölfling, B. Aukszi, V. König, T. Schneider and G. Schneider, Tetrahedron, 2002, 58, 6843-6849. 278. Homepage of Centre for Molecular and Biomolecular Informatiocs, the Babel program, http://cheminf.cmbi.ru.nl/cheminf/babel/readme1st.shtml, 7.12.2004. 279. A. T. Vigh, First Year Report, School of Chemistry, University of Leeds, 2003. 280. Homepage of the Autodock, http://www.scripps.edu/mb/olson/doc/autodock/, 22.12.2004. 281. Homepage of the Scifinder Scholar, http://www.cas.org/SCIFINDER/SCHOLAR/, 22.12.2004. 282. Homepage of the Beilstein CrossFire, http://www.mimas.ac.uk/crossfire/, 22.12.2004. 183

APPENDIX I

A. The List of Essential Amino Acids Alanine, Ala, A Arginin, Arg, R Aspartate, Asp, D Asparagine, Asn, N Cysteine, Cys, C Glutamate, Glu, E Glutamine, Gln, Q Glycine, Gly, G Histidine, His, H Isoleucine, Ile, I Leucine, Leu, L Lysine, Lys, K Methionine, Met, M Phenylalanine, Phe, F Proline, Pro, P Serine, Ser, S Threonine, Thr, T Tryptophan, Trp, W Tyrosine, Tyr, Y Valine, Val, V 184

APPENDIX I

B. Structure of the amino acids

HYDROPHOBIC

H H H H H

H 2N COOH H2N COO H H2N COOH H2N COOH H 2N COOH CH H C H CH CH 3 CH 2 3 2 H C CH 3 3 CH CH CH Alanine, H C CH 2 2 Valine, 3 3 CH S C3H7NO2 3 C5H11 NO 2 Leucine, CH Isoleucine, 3 C6H13NO2 Methioni ne, H C H NO H 6 13 2 H N COOH C5H11 NO 2S 2 H N C OOH 2 CH2 CH 2 H N COOH N

Phenylalanine, Tryptophan, Proline,

C9H11NO2 C11H12N2O2 C5H9NO2 POLAR

H H H H H H N COO H H N C OOH H N COO H H N COO H H N COOH 2 2 2 2 2 H CH 2 H OH CH 2 CH 2 OH CH SH 3 Glycine, Serine, Threonine, Cys te ine , C2H5NO2 C3H7NO3 C4H9NO3 C3H7NO2S OH

H H Tyrosine, H N COOH 2 H 2N COOH C9H11NO3

CH2 CH 2 CH 2 O NH2 O NH Asparagine, 2 C4H8N2O3 Glutamine, IONIZED C5H10 N2O3

H H H H H

H2 N COO H H2 N COOH H2N COO H H2N COOH H2N COOH CH CH CH CH CH 2 2 2 2 2

CH2 CH2 CH2 N O OH CH CH 2 2 N O OH N CH2 H2 N NH Histidine, Aspartate, Glutamate, NH 2 C6H9N3O2 C4H7NO2 C5H9NO4 Lysine, Arginin, Arginin, Lysine, C6H14N2O2 C6H14N4O2 C6H14N4O2 C6H14N2O2

185

APPENDIX II Set of molecule analogues generated for 17βHSD/KSR1-E2 complex.

O O OH HO HO NH NH 2 2 O N OH OH H2N H N O 2 Z E E N N NH2 46 47 48

O O O OH O H HO N NH2 HO H N H N OH OH OH 2 2 N NH H2N H2N 2

Z

NH NH2 O 49 2 50 51

O O O HO HO HO NH H N 2 H N 2 2 H N NH H2N H N NH NH 2 2 2 2 HO 2

Z Z N O 52 HO 53 54

O O HO HO H2N H N NH2 2 NH H2N 2

N

NH2

OH OH 55 56

186

H H N NH O NH N NH 2 2 H 2 H O N N

57 58 59

O HO O O NH NH2 HN 2 H N O O

60 61 62

O NH O O 2 NH2 NH2

O O

63 64 65

O NH O 2 NH2

O NH2

O 66 67

187

APPENDIX III Set of molecule analogues generated for 17βHSD/KSR1-equilin complex.

NH NH NH 2 2 2 NH2 O NH NH HN O HN O O

HO O HO O HO O HO O 68 71 69 70

O O NH NH O HN O O

NH2 OH HO O HO O HO O HO O

NH NH2 72 2 73 74 75

O O HO HN O HN NH OH NH2 NH HO O HO O HO O 2 76 77 78

HO H N O H N N HN 2 2 O

NH HO O 2 HO O HO O NH 79 80 81 H N H N 2 2 O

188

APPENDIX IV Set of molecule analogues generated for the 17βHSD/KSR1-DHT complex.

HO HO O HO NH N O H O O

HN NH HN NH 82 2 83 2 84 HN NH2

O HO HO O H HO NH N NH HO O N HN H N H 85 HN NH 2 86 HN NH2 87 HN NH2

O HO H HO HO O NH N NH NH N O HN N H N H 88 HN HN NH2 89 HN NH2 90 NH2

NH 2 NH2

O H O H N N N N H H O N O H 91 HO 92 HO

189

APPENDIX V Set of molecule analogues generated for the 17βHSD/KSR1-DHEA complex.

NH 2 NH2 HO HO

O NH

93 94

NH2 NH2

O NH HO HO 95 96

NH2 HO HO NH NH

97 98

HO OH

HO 99 100

OH OH HO HO 101 102

NH2 NH2

O O HN O

103 104 NH2

O

105 O O

HN HN H H N N

O O HN O

106 107 O O

H N

N H 108 109