<<

THESE DE DOCTORAT DE

L'UNIVERSITE DE NANTES COMUE UNIVERSITE BRETAGNE LOIRE

ECOLE DOCTORALE N° 596 Matière, Molécules, Matériaux Spécialité : Chimie Physique, Chimie Théorique

Par Rui SOUSA Structural insights of -15 through Molecular Dynamics simulations : Towards the rational design of specific inhibitors

Thèse présentée et soutenue à Nantes, le 17 juillet 2019 Unité de recherche : CEISAM – UMR CNRS 6230 /

Rapporteurs avant soutenance :

Sophie SACQUIN-MORA Chargé de Recherche titulaire de HDR, Institut de Biologie Physico-Chimique, Paris Jérôme GOLEBIOWSKI Professeur des Universités, Université de Nice Sophia Antipolis

Composition du Jury :

Président : Sophie SACQUIN-MORA CR titulaire de HDR, Institut de Biologie Physico-Chimique, Paris Examinateurs : Jérôme GOLEBIOWSKI Professeur des Universités, Université de Nice Sophia Antipolis Pedro A. FERNANDES Professeur des Universités, Université de Porto Dir. de thèse : Jean-Yves LE QUESTEL Professeur des Universités, Université de Nantes Co-dir. de thèse : Adèle LAURENT CR, Université de Nantes Co-dir. de thèse : Agnès QUEMENER Ingénieur de Recherche, Université de Nantes, IRS 2 Introduction

Protein- Interactions

Proteins are the main actors in many different cellular functions, comprising the full machinery of the . Their functions range from signaling to molecular motors, further playing a role in the catalysis of reactions, transport, synthesis and degradation of molecules. They are also the building blocks of viral capsids, being the ones responsible for viral entry in the cell, while also participating in the immune response. It comes as no surprise, thus, that many biological functions are mediated or involve the interactions between two or more , forming protein-protein complexes. There is an extensive work being done in the prediction of protein-protein interactions (PPIs), as the elucidation of these networks can provide important insights into cellular function, as well as pathways and cross-connectivity. 1–5 Furthermore, clarification on protein modes of association allows for greater knowledge on their dynamic regulation. There are, however, factors that contribute to the difficulty in the study of these complexes, such as the transience and lack of strength of some PPIs, including the ones involved in signaling and regulation. 6,7 Thus, these biological complexes have a great potential to be exploited as targets of novel therapies for different human diseases. In this context, the knowledge of the different pathways involved, in particular with regards to their topology, dynamics and length, is crucial to not only understand the effect of a particular drug on a specific PPI, but also to aid in the prediction of potential side effects. 8

In essence, the ensemble of all complexes generated through PPIs at the cellular level constitutes the interactome. There are several factors that contribute to the overall complexity and dimension of the interactome. On the one hand, the fact that a protein in its monomeric form can have a significantly different function when compared to its aggregated, multimeric form, greatly widens the biological space to be considered. 9 Additionally, the complexes may be nonobligatory, with different environments or external factors leading to their forming or breaking. 10 Another factor contributing to the massive size of the interactome is the fact that proteins may organize into complexes in different fashions: homo-oligomers between two or more identical protein sequences; hetero-oligomers between two or more different protein

1 sequences. In the preceding sentence, an additional, not immediately evident, key expression is the “or more”, wherein the fact that proteins may assemble into dimeric, trimeric, tetrameric, … oligomeric complexes greatly increases the interactome size.

It is, therefore, not surprising that the estimated size of the interactome is between 130 000 and 650 000 PPIs, of which only a fraction is known. 11–15 It is, thus, as useful as it is necessary to organize the PPIs into different categories, which can be taken as:

- PPIs in which membrane receptors are involved, including proteins that use the complex networks of interactions to produce signaling networks and are able to fine-tune these interactions in response to external stimuli; 16–19 - Oligomeric proteins, either homo- or heteromeric as previously mentioned, wherein the differential aggregation depending on the local cellular environment plays a key role; 20 - Protein-peptide interactions, comprising the regulatory networks which form the peptidome (the ensemble of all small peptides, often with unknown regulatory functions); 21 - Protein-antibody interactions: immunoconjugates or immunocomplexes, able to highly specifically recognize different molecules and, at present, a somewhat untapped source of potential for drug discovery.

(Figure with an example for each)

The identification and detection of protein complexes is fairly commonly done through experimental methods. There have been, historically, different techniques contributing to this identification, including quantitative ones, such as yeast-two-hybrid screen 22, immunoprecipitation 23, gel-filtration chromatography 24, among others, and qualitative ones, such as analytical ultracentrifugation 25, calorimetry 26, optical spectroscopy 27, among others. However, these methods are not available to assess PPIs in living subjects, due to the fact that they usually require cell lysis and that they are not able to detect transient interactions, which have already been mentioned to be of great importance when dealing with signaling proteins, for instance. As such, novel tools related with molecular imaging have surfaced, wherein one can visualize, characterize and measure biological processes at the molecular and cellular levels in humans and other living systems 28. This category of methods, which includes bioluminescence

2

29,30, fluorescence 31 and positron emission tomography (PET) imaging 32, has been able to complement the already used methods, but also to provide insights that were previously out of reach, such as the in vivo evaluation of drugs which promote or inhibit homo- or hetero-dimeric protein assembly. 33,34 More recent advances in imaging and software technologies have allowed the use of NMR and X-ray crystallography to more deeply understand PPIs. As these techniques are able to resolve proteins at atomic levels, they allow for inferring structural insights which were, hitherto, impossible. 35 Even more recently, alternative methods have appeared, such as single particle electron microscopy, which allows the analysis of proteins which are difficult to crystallize in specific functional states. This method, however, presents difficulties in terms of data analysis; additionally, it is limited to larger molecular weight proteins, with small ligands being difficult to observe. 36

Structurally, PPIs are very unique, in the sense that their binding energies are highly influenced by their complementary hydrophobic surfaces, and not only built through H-bonds and salt bridges. 37 Usually, they are composed by a hydrophobic core, which often corresponds to the most hydrophobic region of each partner, surrounded by a polar border which is accessible to the solvent even when both proteins are bound. 38,39 In terms of interfacial area, it is usually higher than 600 Å2 per protein, while most commonly sitting between 750 and 1500 Å2, once again per protein. 40

Previously, when discussing the complexity of the interactome, the issues of transience and obligatory/nonobligatory interfaces were raised. In fact, even in terms of interface structures, there is an important distinction to be made between the two different types of interfaces. The obligate/permanent oligomers show a much stronger binding between proteins, with larger, more tightly packed interfaces while the transient interfaces show a flatter surface, which allows for more water molecules to be trapped between the two partners. 8,40,41 Furthermore, the former family is characterized by a larger hydrophobic effect than the latter, with the amino acids present in the interfaces being much more hydrophobic even at the outer edges. For the transient interfaces, they still possess more hydrophobic amino acids than typically seen for the interior of a protein; however, their effect is still much lower than the one observed for obligate oligomers. 42,43 This should not come as a surprise, since the obligate interfaces will spend much less time exposed to the solvent, when compared to the transient ones.

3

Taking a further step into the types of residues predominantly seen in each type of interface, temporary interfaces show a higher number of hydrogen bonds than permanent ones, whereas the latter contain a larger amount of nonpolar residues (aromatic and aliphatic), especially towards the center of the surface. Concerning these types of residues, in transient interfaces they are mostly the larger sized ones. Temporary interfaces have a tendency to contain neutral residues, whereas interactions established at permanent interfaces have a larger tendency to be between two charged residues.

The facts discussed herein, such as a lack of charged residues in some cases and a prevalence of hydrophobic patches in general, as well as the overall shape of the PPIs, presents a particular set of challenges when one is dealing with these types of interactions as targets for drug design. These will be discussed and expanded upon in a later section. However, another interesting point raised by the composition of these types of interactions is the seemingly higher importance of flexibility. In fact, the presence of larger hydrophobic residues, especially in the case of transient interfaces, means that these interfaces are highly susceptible to conformational changes. Thus, one should take greater care when analyzing these types of oligomers, namely through the introduction of flexibility in the system.

In fact, the starting point of a PPI study is usually a crystallographic structure. Routinely, the complexes available in their crystal structures are examined and conclusions may be drawn, with the protein being considered rigid in this conformation. However, there are many factors which indicate that this might not be the most accurate approach. In some cases, the crystallographic conformation might not necessarily correspond to the one the protein adopts most commonly 44,45, whereas an additional issue may arise due to changes imposed on the protein by the crystallization conditions themselves 46. Additionally, the proteins may have different binding modes which give rise to different free energy landscapes, a fact which is not observable in the crystallographic structure 47. Lastly, a protein may have more than one conformation in solution, namely when in different complexes; since an X-ray structure is an average of an ensemble of structures, this structural diversity will not be present in the structure. It is, hence, crucial to treat a protein and, especially, a complex of two or more interacting proteins, as flexible structures instead of static.

4

In order to sample the conformational space that a PPI occupies, one can use methods such as Molecular Dynamics (MD), which will be able to computationally introduce flexibility to an otherwise static structure. This does not entirely solve all the issues, since it is limited by energetic barriers between different minima and time scales (which will both be addressed in a later chapter); however, it is a powerful method which can, when in the possession of additional biochemical data, allied with the aforementioned techniques, bring a much deeper insight into the studied interface.

Another aspect to take into account is the unequal influence of the different residues at the interface. This can be demonstrated by the systematic replacement of each residue for an alanine, with subsequent measurement of the differences in binding free energy (ΔΔ퐺) between the wild-type and each mutant. 48 A drop in this difference, over a certain threshold (i.e. ≥ 2 kcal/mol), indicates that the is a hot spot. 49 These measurements can be performed either experimentally or computationally, with the former being laborious and time consuming, due to the fact that each mutein needs to be purified and analyzed individually. 50 As such, computational methods for identifying and analyzing hot spots have arisen, ranging from less complex (knowledge-based approaches 51) to more sophisticated and time consuming (fully atomistic approaches, including the aforementioned MD). An accurate prediction of such spots has crucial implications in the design of novel drugs, as these should ideally target specific regions containing hot spots in the PPIs. 52 This, as well as other facts surrounding drug design, namely in the PPI realm, will be discussed in a later section in this chapter.

Cytokines: Presentation and classification

During the work presented in this PhD thesis, the PPI in question was the one involving interleukin-15 (IL-15), which belongs to a specific class of . Cytokines are acting as cell-to-cell messengers which usually have an immunomodulating effect. They are small (8 to 50 kDa), nonstructural glycoproteins whose physiological roles involve tissue homeostasis and cellular activation, relocation and differentiation. Their name implies their cellular function (from the Greek cyto – cell – and kinos – movement); however, due to their strong presence in the immune system, and due to their cellular sources, they were initially called

5 and . 53–55 Indeed, their oldest known functions involved the regulation of the immune system response to trauma, and infection. Nevertheless, as it became clear that most nucleated cells were able to synthesize and respond to these proteins, their scope was broadened and, as such, so was their designation. 56 Throughout the years, there have been numerous diseases linked to cytokines, their actions either being a cause or effect of a disease.

Cytokines are usually produced in response to cell activation, serving as communicators for specific functions in different tissues. They are produced in a swift, rapid manner, with immediate secretion having a short distance effect.

Although the protein functions share many similarities with hormones, they differ in some key characteristics which allow us to distinguish the two. Whereas hormones are mostly produced by a dedicated gland or tissue, cytokines can be produced at various locations; cytokines, barring exceptions (such as IL-1 or TNF), act on a microenvironment (the aforementioned short distance effect), whereas hormones can act on most of the cells. Cytokines are cell-to-cell messengers, while hormones control the organism by regulating homeostasis. Cytokines usually have a single biological function (notable exceptions in IL-1 and IL-6), whereas hormones possess more than one. Cytokines are redundant, with the absence of one being usually compensated by the actions of others in order to maintain homeostasis ( are examples of this); on the other hand, hormones are irreplaceable and their absence always leads to a clinical syndrome. Cytokines are pleiotropic, with their effects extending local, developmental and systemically, which is not the case in hormones. With the exception of IL-1 and TNF, cytokines are secreted near the target site, and, as such, are not present in the blood circulation, contrary to hormones, for which it is a means of transmission. The action of cytokines can be autocrine (they affect the cell that produced them), paracrine/juxtacrine (neighboring cells) or, in rare cases, endocrine (at remote sites, such as the aforementioned IL-1 and TNF); hormones usually operate in all tissues as previously mentioned. Cytokines are secreted in response to an external event; hormones are constitutively secreted, with their action changing based on their physiological concentration. Most of these differences present exceptions, with cytokines and hormones presenting more and more similarities as novel information starts to be uncovered.

6

In general, cytokines do not share amino acid sequence motifs or three-dimensional structures. There is no standard way to classify them into different categories, and different methods of classification arise, such as based on their biological activities 57 or based on their receptors 58. For the sake of simplicity, since this chapter is not meant to be an exhaustive description of cytokines and their activity, we chose the latter, as such:

- Chemokines, originally described as being able to stimulate the migration of different cell types, such as neutrophils, lymphocytes, eosinophils, and . They share secondary structures and are distinguished by a tetracysteine motif (divided into C, CC, CSC and CX3C, depending on number and arrangement); 59 - Hematopoietic Growth Factors (HGF), required for the equilibrium, proliferation, differentiation and maintenance of hematopoietic stem cells, as well as the response to stimuli inducing expansion into specific hematopoietic cell lineages. These can be further divided, based on their respective receptors, into gp130 shared, β shared, γ shared, among others, and will be further expanded upon later in this section; 60 - IL-1 family members, including IL-1α, IL-1β and IL-18. Produced by activated , with a prominent role in local and systemic inflammation; 61 - IL-10 family members, with a remarkable number of nine members: IL-10, IL- 19, IL-20, IL-22, IL-24, IL-26, IL-28A, IL-28B and IL-29. All but IL-24 share the IL-10 receptor β subunit, with biological activities ranging from immunosuppression (IL-10), antiviral activity (IL-29, for example), to mitogenic and pro-survival effects (e.g. IL-19); 62 - IL-17 family members, including six variants: IL-17A-F. Usually secreted in response to IL-23 by helper T cells, they tend to stimulate the release of other cytokines, in a pro-inflammatory response; 63,64 - , “interfering” with viral replication, in response to infection, as well as activation of immune cells and recognition of tumor cells, through antigen presentation to T lymphocytes. They can be further divided into Type I (IFNα, IFNβ, IFNω), Type II (IFNγ) and Type III (IL-28A, IL-28B, IL-29); 65,66 - Platelet-derived (PDGF) family members, including PDGF isoforms, vascular endothelial growth factor (VEGF) isoforms and six other

7

proteins. They induce mitogenic functions through two tyrosine kinase receptors (α and β) and can be released by platelets, smooth muscle cells, activated macrophages and endothelial cells; 67 - Transforming Growth Factor β (TGFβ) family members, with three distinct subtypes of TGFβ and an additional eight proteins as part of the family. They exert antiproliferative effects, through the upregulation of the expression of inhibitors, and induce cell death by activating ; 68 - Tumor Necrosis Factors (TNF) family, including 15 distinct proteins capable of triggering cell death, under specific conditions. The oldest and most well known member, TNFα, has been shown to play a role in multiple inflammatory diseases, including but not limited to rheumatoid arthritis, Crohn’s disease or psoriasis. 69,70

Following this brief, non-exhaustive introduction into cytokines and their classification, it is now important to discuss the receptors, with particular attention to the hematopoietic growth factors family, of which IL-15 and IL-2 are members.

Cytokine Receptors: Structure and Function

In general, the cytokine receptors are transmembrane glycoproteins, usually composed of several subunits. The different subunits are very distinct in nature, with differences in molecular weight, mode of binding, as well as in the number of times they cross the . Receptors may be constituted by only one (monomeric) or several subunits (multimeric), being composed of copies of the same subunit (homo-multimeric) or distinct ones (hetero-multimeric). The binding of a to the external portion of a cell-surface receptor may lead to the intracellular recruitment of additional molecules, thus triggering a signaling cascade within the cell.

Hematopoietic Growth Factors (HGF) The HGF receptors, or type I receptors, are composed of a transmembrane domain, an intracellular COOH-terminal end and an extracellular NH2-terminal end. The two ends are variable from receptor to receptor; however, they all share a common cytokine-binding homology region (CHR), composed of two fibronectine III type

8 domains (FNIII), bound by a linker, which, as the name indicates, are involved in the 71 binding of the cytokine in question. The FNIII domain towards the NH2-terminal, contains a region composed of four conserved cysteine residues, forming disulfide bridges. In the domain closer to the membrane, there is a conserved WSXWS motif (tryptophan-serine-X-tryptophan-serine), which contributes to the stability of the tertiary structure of the protein. 72 These conserved features form structures which bind four α-helical cytokines, with the specificity of a receptor being determined by the different amino acid composition of each. These differences can mean that a specific receptor possesses more FNIII domains, or Immunoglobulin-like (Ig-like) domains; in this case, the Ig-like domains will be implicated in the binding, whereas the FNIII domains will contribute to the stability of the receptor. 73

(from Abbas)

The binding of a cytokine to its receptor will lead to the oligomerization of the receptor, which will hence be able to bind the soluble protein kinase JAK (). The binding of the receptor to JAK activates the latter, which will lead to the phosphorylation of Tyr residues in the cytoplasmic domain of the receptor. A family of transcription factors, collectively designated as STATs (signal transducers and activators of transcription) are targeted by the JAK kinase activity. STAT will dimerize when phosphorylated, which will then signal its transport to the nucleus. The presence of STAT in the nucleus leads to the transcription of specific , key for the activity in question. Alternatively, JAK can trigger a mitogen-activated protein kinase (MAPK) cascade, through phosphorylation of Grb2, also leading to altered expression.

9

(from Lehninger)

Although the examples shown here are for the simplest case in these receptors, a homo-dimer, in fact, most of the type I cytokine receptors are either formed by two different subunits (IL-4, IL-7), three different subunits (IL-2, IL-15), four (IL-6, G- CSF/G-CSFR), and so on. 74 As has been previously mentioned, this family is then further divided onto different groups, depending on the signaling component: the βc chain, the gp130 chain or the γc chain. The first group includes the receptors for IL-3, IL-5 and GM-CSF; the second the ones for IL-6, IL-11 and IL-27; the latter the ones for IL-2, IL-4, IL-7, IL-9, IL-15 and IL-21. Since this PhD work was performed on IL-15 and, to a lesser extent, IL-2, both belonging to the γc receptor family, we will focus on describing this type of receptors.

The members of the family of cytokines sharing the γc receptor chain are all composed of a four α-helix bundle, possessing both redundant effects in the homeostasis and regulation of the immune response, as well as exclusive effects for each. One interesting fact to note is that IL-13 and the thymic stromal lymphopoietin (TSLP) both use a receptor chain analogous to γc (IL-14Rα1 and TSLPR respectively), being similar to IL-4 and IL-7 respectively. 75 Concerning their specific functions:

- IL-2 is a growth factor for T lymphocytes, playing a role in the immunity tolerance through the induction of activation-induced cell death, as well as through the development of CD4+ FoxP3+ regulatory T cells (Tregs); 76,77 - IL-4 is mainly secreted by mastocytes, Th2 cells, eosinophils and basophils, having a role in the IgE class switching; 78

10

- IL-7 plays a role in the development of T cells in the bone marrow and thymus, as well as in the homeostasis of naïve and memory T cells in the periphery. Furthermore, it plays a role in the development of B cells and innate lymphoid cells; 79 - IL-9 induces the activation of epithelial cells, B cells, eosinophils and mastocytes 80, as well as playing an anti-tumoral role by increasing the activation of CD8+ T cells; 81 - IL-15 participates in the development of natural killer (NK) cells and in the homeostasis of CD8+ memory T cells; 82 - IL-21 induces the differentiation of B cells into plasma cells, also regulating the production of immunoglobulins. It also plays a role in the proliferation of NK, T CD4+ and CD8+ cells. 83

Concerning the receptors themselves, IL-4, IL-7, IL-9 and IL-21 bind to heterodimeric receptors, composed of a specific α chain (IL-4Rα, IL-7Rα, IL-9Rα, IL-21Rα, 75 respectively) and the γc chain.

IL-2 and IL-15 are the only two members of the family which bind to heterotrimeric high affinity receptor, composed of a specific α chain (IL-2Rα and IL-15Rα, respectively), a chain shared by both cytokines (IL-2Rβ) and the γc chain. The β receptor is structurally similar to the specific receptors of the other elements of the family, belonging to the type I receptor family. The association of IL-2Rβ and the γc receptors forms a dimer -9 with medium affinity for the respective cytokine (Kd ~ 10 M). Considering the α receptor for each of these cytokines, it is somewhat unique, in the sense that it does not contain the aforementioned CHR region, consisting, instead, of “sushi” domains with little to no signaling activity. This receptor forms, nevertheless, a specific -11 heterotrimer with IL-2Rβ and γc of high affinity (Kd ~ 10 M).

(Figure)

Biological Features of IL-15

Interleukin-15 (IL-15), as mentioned previously, is a hematopoietic cytokine, member of the family of cytokines which use the γc. As mentioned above, it was initially

11 identified due to its similarity in function to IL-2. 84,85 In fact, this similarity in function is not surprising, since both cytokines share two out of three receptors (IL-2Rβ and γc).

IL-15 is a soluble of 14-15 kDa, composed of 114 amino acids. It possesses a highly conserved sequence among different species, with the human IL- 15 having 97% homology towards rabbit IL-15 and 73% homology towards mouse IL- 15. 85

Human IL-15 has two disulfide bridges: Cys35-Cys85 and Cys42-Cys88 (homologous to the single disulfide bridge in IL-2: Cys58-Cys105). It contains three potential N- glycosylation sites, towards the C-terminal (Asn71, Asn79 and Asn112). 86 Similarly to the other members of the γc family, it is composed of four α-helices in an up-up-down- down configuration (Figure). 87 IL-15 and IL-2 do not present primary ; nevertheless, the amino acids necessary for the binding to their shared β 88,89 and γc receptors are conserved.

(Figure of sequence alignment)

IL-15 has two isoforms, based on the length (short signal peptide – SSP – or long signal peptide – LSP). 85,90,91 The SSP isoform is exclusively intracellular, not being secreted 91,92, whereas the LSP isoform is located at the and the . 93 Nevertheless, both isoforms can be located in the nucleus. 94

IL-15 is expressed constitutively by a plethora of cellular types, whether they are part of the immune system or otherwise. It can be expressed in the placenta, skeletal muscle, liver, kidney, lungs, heart and skin, apart from the immune system cells themselves. 85,95 Contrastingly, IL-2 is mainly expressed by activated T cells and dendritic cells. 96

The main cellular sources of IL-15 are dendritic cells, /macrophages and epithelial cells 97,98, also being produced in bone marrow stromal cells, thymic epithelial cells and fetal intestines. 99–101 Furthermore, renal epithelial cells, epidermal cells, keratinocytes, astrocytes and microglia can also express IL-15. 102,103 However, even with this vast set of cells capable of expressing IL-15, the cytokine itself is rarely detectable in physiological conditions 92,93, which indicates that its expression is tightly

12 regulated, due to its pro-inflammatory effect. A deregulation in the expression of this protein could lead to auto-immune diseases. 104

The regulation of IL-15 is done, essentially, at three levels: transcription, , post-translation. At the transcription, the regulation can be done either through the nuclear factor NF-κB, in a positive response to (LPS) 105 or negatively through transcription factors IRF-1 and 2. 106,107 Translationally, the presence of a number of initiation codons leads to a negative regulation, by reducing the efficiency of translation. 108 Lastly, in post-translational regulation the IL-15 will associate to its specific receptor (IL-15Rα) in the endoplasmic reticulum, moving towards the signaling pathway. 109

(Figure with different regulation types)

In order to perform its activity, IL-15 binds to three different receptors, as has been previously mentioned. As such, it is interesting to note their structural features and details, which will be described below.

Structurally, the γc receptor is a 64 kDa membrane glycoprotein, once again part of the type I cytokine receptors. It is formed by 369 amino acids, wherein 22 are part of the signal peptide. The intracellular domain is composed of 86 amino acids (aa), the transmembrane domain of 28 aa and the extracellular one of 233 aa. 110 The intracellular portion does not contain intrinsic kinase activity; however, it has an SH2 domain, allowing JAK3 to bind to it. 111 As such, in order for the signaling to occur, this chain needs to be associated to the IL-2Rβ receptor. It is constitutively expressed in many different types of cells, with its expression being increased following activation by IL-2 or IFNγ and decreased by TGF-β1. 112

The IL-2Rβ receptor, similarly to the γc, is a 70-75 kDa membrane glycoprotein, also part of the family (Dukovich 1987, Tsudo 1986). 113,114 From a total of 525 amino acids, the signal peptide is 26 aa, the intracellular domain is 286 aa, the transmembrane domain is 25 aa and the extracellular one, containing one CHR domain, is 241 aa. 115 It is expressed in immune cells (NK cells, T cells, monocytes and neutrophils), with its expression being increased through stimulation by TCR, PMA, IL-2 or IL-4 (Anderson 1995b, Kim 2006, Krause 1996). 116–118

13

The IL-15Rα closely resembles IL-2Rα, being a transmembrane type I protein composed of a sushi domain. 119 Totaling 267 amino acids, the signal peptide is 30, the intracellular domain is 41, the transmembrane domain is 18 and the extracellular domain is 175. 120 It can be expressed in T, B and NK cells, macrophages, dendritic cells, thymic stromal cells, bone marrow, activated vascular endothelial cells, liver, heart, lung, among others 117,119,121,122, with an increased expression being due to the effect of IFN-γ, in macrophages, or due to IL-2, in T cells. 82 Its vast cellular distribution is opposite to IL-2Rα, which evidences the wide spectrum of action of IL-15.

The three receptor subunits hitherto described can associate in different manners, producing different functional IL-15 receptors with different affinities. The IL-15Rβ/γc dimeric receptor has intermediate affinity for IL-15 (with either receptor chain incapable of binding IL-15 on its own) and allows for signaling to occur, in the absence of IL- 15Rα. 123 IL-15Rα has high affinity to IL-15 and can associate with the other two receptors, to form a high affinity trimeric receptor for IL-15.

There are two modes of action for IL-15, the cis-presentation and trans-presentation modes. The first one is common to other cytokines and involves the binding of IL-15 to the IL-15Rα in a cell, with subsequent binding to IL-2Rβ/γc in the same cell. The latter involves, once more, the IL-15/IL-15Rα complex bound at a presenting cell surface (dendritic cells, macrophages, stromal cells), which will bind to the IL-2Rβ/γc complex in an effector cell (NK or T CD8+). 124–126

(Figure of cis and trans presentation)

Biological activity As has already been alluded to, IL-15 has a broad spectrum of activity. It is crucial for the development, proliferation, survival and differentiation of different cells which take part in innate and adaptive immunity. However, increases in its production may lead to auto-immune and inflammatory diseases. As such, in the context of these diseases, it can prove essential to target the interaction of IL-15 with its receptors in order to treat them.

Physiologically, and concretely in the immune system, it plays a role in the homeostasis 126 and survival of NK cells, lymphocytes which are crucial to the , wherein they recognize cells which have been infected and need to be eliminated. 127 IL-15 is able to increase the expression of anti-apoptotic molecules

14 belonging to the Bcl-2 family and reduce the expression of pro-apoptotic molecules, thus intervening in the survival of these cells. 128–132 It is also able to induce the cytotoxic activity in NK cells. 133

In the case of Natural Killer T (NKT) cells, which share properties between T and NK cells, both producing different cytokines, as well as possessing lytic activity, IL-15 plays a role in their expansion, as well as maturation and survival. 82,134,135

When considering T cells, both CD4+ and CD8+ (which are most of T cells), IL-15 is involved in their proliferation 126,136, further playing an anti-apoptotic role. 137 It is also involved, combined with IL-7, in the survival of CD4+ T cells. Furthermore, it can regulate other cytokines expression, either through the induction of secretion of IL-2, IL-4 and IFNγ 138 or the induction of the secretion of chemokines. 139

In the case of γδ T cells in particular and of the intraepithelial lymphocytes (IELs) in general, present mostly in the epithelial layer of the gut mucosa, IL-15 is involved in the proliferation and survival of both 140,141, also increasing the cytotoxicity and IFNγ production. 142

With regards to other types of immune cells, it is known to stimulate production of IL- 8 and MPC-1 in monocytes/macrophages (the larger sources of IL-15 in the organism) 85,143, participate in the proliferation of: dendritic cells 144, mastocytes 145 and B cells 146; furthermore, it modifies neutrophils, increasing their phagocytic activity 147 and prevents apoptosis in eosinophils. 148

Outside of the immune system, it is also able to control the development of endothelial cells 149, skeletal muscle cells 150, adipocytes 151 and glial cells. 121 Furthermore, among others, it acts on the survival of epithelial cells 152 and fibroblasts 153. Also worthy of note is its role in the self-renewal, proliferation and differentiation of neural stem cells. 154,155

Pathology With such a wide range of activity and such a tight regulation, it should come as no surprise that deregulations in IL-15 can lead to drastic effects, namely to pathologies which can range from inflammatory and autoimmune diseases to cancer. In some

15 cases, IL-15 can be detected in a soluble form, suggesting its role in these types of diseases.

An overexpression of IL-15, due to deficiencies in its regulation mechanisms, may be responsible for the emergence of autoimmune diseases, through the induction of the survival of T cells and secretion of pro-inflammatory cytokines, such as TNFα. 104 In fact, in the context of different autoimmune diseases, IL-15 was detected in the serum. Examples of this are rheumatoid arthritis, where IL-15 recruits and activates T cells, which will, in turn, interact with macrophages, leading to the production of TNF-α by the latter. 156 Furthermore, expression of CD40L, also induced by IL-15, leads to the secretion of pro-inflammatory cytokines, through interaction with CD40 at the membrane of monocytes (Avice 1998, Möttönen 2000). 157,158 In the intestine, the overexpression of IL-15 leads, once again, to the activation of T cells which, in this case, induce the production of IL-12 and TNFα in the neighboring monocytes; 159 furthermore, this overexpression is responsible for local expansion of NKT cells, leading to inflammation160, which is then sustained by IL-15-induced IEL survival. 161 This leads to inflammatory diseases, such as Crohn’s disease.

In terms of cancer, since there is a link between increased inflammation and the development of cancers 162 it is evident that, at least indirectly, IL-15 may contribute to the rise in the cancer incidence. Indeed, an overexpression of IL-15 in keratinocytes has led to the detection of IL-15, in relation to human . 163,164 Further, in adult T-cell leukemia/lymphoma (ATL), which is reportedly caused by infection by human T cell leukemia/lymphotropic type 1 (HTLV-1) 165, the viral Tax protein can transactivate IL-15 gene transcription through a NF-κB site, leading to an increase in IL-15 expression. 166 As a last example, IL-15 can induce the proliferation of normal B cells, but also of malignant B cells, resulting from a B cell chronic lymphocytic leukemia or hairy cell leukemia 167.

Drug Discovery and Design

Drug discovery is an interdisciplinary endeavor, with its genesis happening when the field of chemistry was developed enough to tackle issues outside chemistry itself, allowing it to be allied with pharmacology, when the latter had become a scientific

16 discipline of its own. In the mid-nineteenth century, with the essential foundations of chemical theory being laid 168, followed by the advent of a pioneering theory on the structure of aromatic organic molecules 169, there was a significant push towards the research on dyes, which would provide the beginnings of medicinal chemistry. Indeed, the existence of “chemoreceptors” was postulated by Paul Ehrlich, due to the selective affinity of dyes for biological tissues. In fact, he would affirm that parasites, microorganisms and cancer cells would possess different chemoreceptors than analogous structures in host cells, a fact that would give birth to chemotherapy. 170

Developments in analytical chemistry opened the door to the isolation and purification of active ingredients from medicinal plants, which could then be used in preparations of drugs of increased purity by pharmacies. Examples of this are the isolation of morphine or papaverine from opium poppies 171,172.

Pharmacology took its role on the basis of physiological experiments, providing support to the growing industry of drug research, which had been mainly driven by chemical or dye companies. This was the final building block necessary to establish this multidisciplinary field.

Moving forward half a century, one has to skip a few major milestones, such as the discovery of penicillin in 1929 173, as well as other antibiotics. These spawned departments of microbiology and fermentation units (showcasing the growing influence of technology in the discovery of novel drugs) on drug companies worldwide. At this stage, biochemistry started to have a central role in the field. The introduction of concepts such as enzymes and receptors, as well as the discovery of their potential as drug targets, allowed the description of the carboanhydrase, for example, which would be inhibited by the active metabolite of sulfonamide (Prontosil), which had a diuretic effect. 174,175 The discovery of an inhibitor was, at this stage, followed by trial and error attempts to find derivatives with improved or novel effects, in a rudimentary chemical diversification process. Pharmacology lent a hand to this process (and had in fact begun to do so in 1905 176), with the functional idea that a receptor functions as a switch which receives signals, and can be blocked by antagonists or activated by agonists. This paved the way to the discovery of many different families of drugs, targeting different receptors, such as β-agonists, β-blockers, and even monoclonal

17 antibodies (with glimpses of the immunology field appearing as well) 177, thus greatly increasing the number of targets available to the field.

With the influence of molecular biology, drug discovery (which moves further away from discovery and further towards design) stops being a process solely driven by chemists, moving instead to a level where it is moved by the collaboration between chemists and biologists. Founded in the understanding of biological structure and function, one can design chemical structures which will act on biochemical mechanisms of action. In fact, molecular biology brought a new layer to the field, wherein the genetic component can also be taken into consideration. Not only is this useful for the cloning and expression of genes that encode therapeutically useful proteins, thus greatly opening the field to new potential drugs, it is also key in understanding diseases at a genetic level.

This summarized timeline of drug discovery/design major events and contributions would not be complete without mentioning the major influence that the exponential development, in the past few decades, of computers, accompanied by the growth of computational power. In fact, imaging techniques, such as X-ray crystallography or cryo-EM, provided the tools necessary for computational biochemists to dive into the world of drug design. This is often called rational drug design, with a notable example being the design of nelfinavir, a HIV-1 protease inhibitor, in 1997. 178 Indeed, computers are such versatile tools that one can perform many important tasks related to drug design, simplifying the entire process while, importantly, lowering the costs of drug design. These will be mentioned throughout the Computational Details chapter.

The progress made in computer technology induced a shift from empirically-based screening, with natural products at the forefront, where the discovery of individual leads (potential drug candidates) was mostly based on in vitro and primary in vivo screening, to a more knowledge-based, rational approach. In the first case, apart from natural products themselves, leads could come from published patents, reports in journals or even from sources spread through word of mouth, in conferences. It was assumed that these lead compounds had undergone significant investigation prior to being identified as leads, with other, less performing compounds, being eliminated. With the advent of new technologies, discussed above, came the possibility to rapidly screen hundreds of thousands of compounds (high throughput screening - HTS) in

18 vitro. 179 Usually the results of HTS require a follow up, since the methods used influence the measured solubility of the identified compounds, which determines IC50s based on dose response curves. 180 There are issues with this identification, mostly related with potential human biases in the leads to keep, as well as in the physical properties of the compounds being rejected and kept; however, these are outside of the scope of this work and, as such, will not be expanded upon. Nevertheless, an attempt to solve this issues, in combination with an analysis of drugs available in the market, was made by Lipinski in 1997 180, who defined a set of parameters for “drug- like” compounds, in order to be orally active. These became known as Lipinski’s rule of five:

- The molecule should weigh less than 500 Da; - The octanol-water partition coefficient, which indicates the lipophilicity of the compound, should be <5; 181,182 - The number of H-bond donors should be <5, in order to improve permeability across the membrane; 183,184 - The number of H-bond acceptors, for similar reasons than the previous rule, should be <10.

This rule of thumb provided a rationalization of the features present in most of the orally available molecules, defining easy to calculate parameters which allowed to improve the physical properties of the compounds being identified as leads. As such, a lot of potential false positives were discarded. In the time since, there have been many variations of these rules, since they should only apply to orally-administered compounds, with notable exceptions to the rule appearing in the market nonetheless.

On a final note, with regards to the potential drug targets, one major category has been left out of this section. In fact, more recently a new category of targets has surfaced and shown great potential. These concern the binding of a protein to its receptor or the association of a protein to another, as has been somewhat extensively discussed in previous sections (PPI). PPIs have shown to be linked to several different types of diseases and, as such, have become the focus of the interdisciplinary approach of drug design. However, these targets present particular challenges that need to be addressed in order to perform a successful drug design campaign. These challenges and particularities will be discussed in the next section.

19

(Figure somewhere of the drug design pipeline or of the history/milestones)

Drug Design in Protein-Protein Interactions

The most straightforward way to inhibit a protein is by using molecules which act as enzyme substrates 185 or through allosteric inhibition, by binding a small molecule to a cavity which will recognize it, similarly to the way an active site does. As was alluded to in the previous sections, however, the modulation of PPIs with “small” molecules is an extremely promising novel strategy in drug discovery. The sheer size of the interactome means that there are hundreds of thousands of interactions that could be explored with regards to this, a number that greatly dwarfs the potential single protein targets. Furthermore, the regulatory impact of these interactions, as well as their involvement in a plethora of diseases, makes this class of macromolecular interactions an extremely enticing target for drug design campaigns.

However, as previously mentioned, there are some issues with trying to design inhibitors for PPIs. Indeed, although there have been some cases of success in the modulation of PPIs, these are few and far between, especially when compared to more traditional targets. Although this is due in part to the relative novel interest in this type of structures, other factors play into this, with PPIs even being considered “undruggable” for a large stretch of time. 186 When we consider the rules for a compound to be considered “drug-like”, discussed on the previous section, it is not hard to understand that the significantly large surface areas present in PPIs pose a large challenge. Indeed, if one thinks that, in principle, a potential compound should target an entire PPI surface, the task would be nigh on impossible, as, even if such a compound could be identified, it would probably be highly unspecific or, in extreme cases, not possess in vivo activity. In addition, the contact surfaces of proteins interacting with other proteins are usually very flat, without the binding pockets and grooves present in the proteins typically targeted by small molecules. 187 Indeed, whereas these types of targets usually possess small natural substrates or ligands (as is usually the case of enzymes or G-protein-coupled receptors), a PPI does not have a natural compound to base the process on. 188 Moreover, the contact surfaces in PPIs usually do not involve contiguous residues in the polymer chain, thus rendering the

20 attempts to build inhibitors for a PPI starting from peptides in one of the partners much more difficult.

Remarkably, despite all of these setbacks, there are several evidences that these issues can be, and have been, dealt with. Indeed, as was mentioned in previous sections, the issue of the size of the contact surface of PPIs is somewhat mitigated when one considers that not all residues have the same influence, with some being crucial for the binding between the two proteins (hot spots). 52,189 If one can identify these residues, through, for instance, computational or experimental mutational studies, and since they constitute less than half of the interface, the task of finding a molecule to target these becomes less daunting. Furthermore, in some cases a protein may bind to more than one target using the same hot spot region. 190 These factors have led to success stories in recent years, where small molecules with PPI- modulating activity were developed.

The fact that PPIs are such unique systems has, consequently, made the aforementioned Lipinski rule of 5 inadequate to predict drug candidates for this type of systems. Indeed, when looking at the literature, at the compounds being identified with potential for inhibition of PPIs, most of these compounds actually break one or more of these rules. 191 In fact, Lipinski himself, in 2016 192, identified the issue with the rules when targeting these systems, since inhibitors of PPI tend to be more hydrophobic, more lipophilic and more likely to have a higher aromatic ring count than those compounds which fall under the rule of 5 criteria. 193 One could say that the compounds targeting PPIs should, in theory, be less “orally bioavailable”. Since most of the available general purpose screening libraries tend to show a bias towards this type of compounds, new libraries needed to be built, specifically targeting PPIs.

One such case concerns the Bcl-2 family of proteins, which intervenes in the regulation of cell death, through control of the integrity of the mitochondrial membrane. Bak and Bax are two pro-apoptotic members of the family, whose effect is blocked when they bind to Bcl-2 and Bcl-xL. As such, inhibiting the binding of these two partners can induce apoptosis in cancer cells. 194 Bak and Bax possess BH3 domains, containing an α-helical motif which establishes hydrophobic interactions with the anti-apoptotic proteins. The α-helical motif is mimicked by a small molecule, which is then able to bind to the anti-apoptotic proteins. This small molecule falls outside of the Lipinski

21 rules of 5. There were different stages of development of these molecules, with different methods (experimental and computational) contributing to their discovery, with trials ongoing. 195,196

(Figure)

Another success case is the focus on IL-2, protein which has already been mentioned throughout this chapter. Indeed, this is a remarkable case and of great interest in relation to the work described here, which mainly deals with IL-15, a closely related family member; as such, the results discussed next reveal the promising potential of the inhibition of IL-15 through the design of appropriate small molecules.

Many of the structural features described previously for IL-15 apply to IL-2 as well. In fact, as already mentioned in this chapter, they share two of three receptors (IL-2Rβ and γc). Essentially, the interaction of IL-2 with its IL-2Rα is a globular protein-globular protein interaction. The binding surface of IL-2 is composed of hydrophobic patches surrounded by polar groups. 197 The reported small molecule inhibitors of this protein- receptor interface mimic various non-continuous structural elements from IL-2Rα. Interestingly, a group at Roche designed a molecule in an attempt to bind to IL-2Rα; further NMR studies showed that the molecule was able to inhibit the IL-2/IL-2Rα binding, by interacting with IL-2 itself. 198 Another approach to tackle this target was a structure-based design technique, which, however, led to low-potency ligands (IC50 value). A fragment-based strategy, in conjunction with mutagenesis studies which were able to identify a region which should bind small, aromatic carboxylic acids, proved more fruitful. 199 More recently, a study screening a small library of compounds, which combined in vitro and in silico techniques, identified inhibitors for this protein. 200

(Figure of the ligand)

The case of IL-15 is interesting, in the sense that it was discovered considerably more recently than IL-2. As such, although there have been some attempts to inhibit the activity of this cytokine, it is a field that has yet to reach its maturation. In fact, at the time of writing most of the developed inhibitors for IL-15 were not small molecules. Notable examples include the inhibition of the cytokine through administration of a soluble form of its alpha receptor in mice 201, the development of a mutant form of IL- 15 and its linking to mouse Fcγ2a, or the introduction of a monoclonal antibody against

22

IL-15 202. A recent work by Quéméner et al. combined pharmacophore and virtual screening/hit optimization approaches in order to produce small molecule inhibitors for IL-15 with its IL-2Rβ receptor starting from the crystallographic structure, with astoundingly promising results. Indeed, the developed compound was able to block the interaction of IL-15 with its IL-2Rβ receptor, with sub-micromolar efficiency, thus pioneering the development of small molecule inhibitors for IL-15 and the specific targeting of the IL-2Rβ receptor (in contrast to the aforementioned small molecules developed for IL-2, which targeted the α receptor). 203 The developed molecule also showed similar activity for IL-2; this leaves the interesting question if it would be possible to develop a highly specific small molecule inhibitor for IL-15, either targeting its interaction with IL-2Rβ or with γc. This is the scope of the studies performed in the work presented herein, which will be detailed in the following sections.

(Figure with compound from jmedchem 2017)

Computational Methods

Molecular Mechanics

Molecular Mechanics (MM) has increasingly become an enticing tool in the computational study of molecular systems. This interest becomes more pronounced when the subject is the study of biomolecules, for which so-called “higher level” methods (such as the ones involving Quantum Mechanics (QM), which will not be discussed here), albeit more accurate, involve a sometimes prohibitively high computational cost. As more and more complex and accurate experimental methods have been developed, so has the interest in using MM. As such, starting from the results of those data, the computational methods based on MM, allow for a close connection between theory and experiments. Indeed, these methods appear as a tool of validation, confirmation or clarification of experimental results. They are also a

23 means to observe phenomena on an atomistic level which, then, allow for new insights leading to, and being a driving force of, novel experiments.204

MM is a simplified model describing molecular systems, through the use of classical mechanics. In fact, MM methods are also referred to as force field (FF) methods, since they write the potential energy of a molecular system as a set of parameters, as a function of the nuclear coordinates. One fact to note is that this energy is an estimation which is to be taken in a relative sense, so to speak; in absolute terms, on a physics level, any conclusions drawn directly from these energies should be taken with a grain of salt. These parameters are the result of the concatenation of experimental and higher level computational data, and the ensemble of parameters constitutes a classical force field. In these methods, the atoms are typically the smallest units, while coarse graining methods appear as a tool which, by using larger units such as groups of atoms or residues as the smallest unit, sacrifice atomic precision in order to save on computational time.205 However, throughout this section, we will focus on all-atom simulations, with a particular focus on the CHARMM (Chemistry at HARvard Molecular Mechanics) force field and its application in the NAMD program.206,207 Nevertheless, the particularities discussed herein are applicable to many other force fields, with notable examples including AMBER208 and GROMOS209.

The CHARMM Force Field

In the CHARMM force field, the smallest unit is the atom, which is a charged point mass with no directional properties and no internal degrees of freedom.210 With a specific focus on the study of biomolecules, in this force field, such as in the ensemble of force fields which allow for the computational study of biomolecules, the underlying assumption is that the physical principles guiding the interactions and motion of the atoms of the system are sufficient to obtain a sufficiently accurate description of their behavior.206 The behavior of protons and electrons is not explicitly described. Instead, they are considered implicitly in the atomic parameters present in the force field, with the bonding information being provided explicitly, and not being the result of the computationally much more complex solving of the Schrödinger equation. Furthermore, the quantum aspects of the nuclear motion are also neglected, allowing for a significantly lower computational cost, with only a slight cost in accuracy.

24

In terms of the energy function present in CHARMM, the total energy is expressed as follows:

퐸 = 퐸푏 + 퐸휃 + 퐸휙 + 퐸휔 + 퐸푣푑푊 + 퐸푒푙 with the first four terms corresponding to internal energy terms and the remaining two corresponding to nonbonded interactions, each term being further expanded upon hereafter.

The internal energy terms are the bond potential (퐸푏), the bond angle potential (퐸휃), the dihedral angle potential (퐸휑) and the improper torsions (퐸휔). The first term is defined as:

2 퐸푏 = ∑ 푘푏(푟 − 푟0) , with 푘푏 being the bond force constant, 푟 the current bond length of an atom pair and

푟0 the equilibrium bond length of the same atom pair. The harmonic approximation is applied here, since, due to the fact that there is no forming or breaking of bonds, no large variations from the equilibrium length are to be expected. Furthermore, since the temperatures used in these simulations are ordinary, this approximation can be safely applied.

The same harmonic approximation is employed in the potential energy describing the angle between three consecutively bound atoms:

2 퐸휃 = ∑ 푘휃(휃 − 휃0) .

Similarly to the previously described equation, 푘휃 is the angle force constant, 휃 is the present angle and 휃0 is the equilibrium angle. As previously discussed, the harmonic approximation is sufficient in this case.

For the dihedral angle potential, one defines the equation as:

퐸휙 = ∑|푘휙| − 푘휙 cos(푛휙), where n = 1, 2, 3, 4, 6. which corresponds to the torsion energy, a four-atom term based on the dihedral angle about an axis defined by the middle pair of atoms. In this case, the force constant (푘휙) can be negative, there may be different contributions with different force constant values, and the same set of four atoms may have different periodicity.

25

The potential energy associated with the improper torsion is as such:

2 퐸휔 = ∑ 푘휔(휔 − 휔0) which is, once again, similar to the first two discussed terms, since it provides a better approximation near the minimum-energy geometry, indicated for dynamics and vibrational analysis.

Having described the internal energy terms, the nonbonded interactions are formulated as follows. The energy obtained from van der Waals (vdW) interactions is defined by:

푚푖푛 푚푖푛 푅푖푗 푅푖푗 퐸 = ∑ 휀푚푖푛 ( − ) 푠푤(푟2, 푟2 , 푟2 ) 푣푑푊 푖푗 푟12 푟6 푖푗 표푛 표푓푓 푒푥푐푙(푖,푗)=1 푖푗 푖푗 where 푒푥푐푙(푖, 푗) = 0 if the atoms are connected by atoms or bonds, or if i ≥ j and 푒푥푐푙(푖, 푗) = 1 otherwise. Furthermore, sw is a switching function, defined as:

푠푤(푥, 푥표푛, 푥표푓푓) = 1 when 푥 ≤ 푥표푛

2 (푥표푓푓−푥) (푥표푓푓+2푥−3푥표푛) when 푥표푛 < 푥 ≤ 푥표푓푓 푠푤(푥, 푥표푛, 푥표푓푓) = 3 (푥표푓푓−푥표푛)

푠푤(푥, 푥표푛, 푥표푓푓) = 0 when 푥 > 푥표푓푓

Although the specific parameters related to the Molecular Dynamics (MD) simulations throughout this work will be specified in another chapter, the switching function is in place in order to switch off the potential at a specific, defined, distance, wherein interactions will not be computed beyond that point. Particularly, the vdW energy is

211 푚푖푛 calculated through a 12-6 Lennard-Jones potential , with 휀푖푗 being the minimum attainable potential energy, the rij term corresponding to the distance between two considered atoms and Rij the distance at which this potential is at its minimum.

푚푖푛 푚푖푛 min The 휀푖푗 and Rij terms are typically obtained from the individual 휀푖푖 and Ri for individual types, and then are combined as such:

26

푚푖푛 푚푖푛 푚푖푛 휀푖푗 = √휀푖푖 휀푗푗

푚푖푛 푚푖푛 푅푖 + 푅푗 푅푚푖푛 = 푖푗 2

Furthermore, the first term of the vdW equation corresponds to the Pauli repulsion, at a power 12, whereas the negative factor, at power 6, mimics the attractive dispersion. As such, at very low distance values between two non-bonded atoms, the positive factor will have a very significant influence, increasing the energy penalty of this interaction, whereas, as this distance increases, the energy will decrease towards the minimum, converging, finally, to zero, at sufficiently large distances. 206

The final term in the force field is the electrostatic potential, which is defined by:

푞푖푞푗 퐸푒푙 = ∑ 4휋휀0푟푖푗 푒푥푐푙(푖,푗)=1 where the electrostatic interactions are calculated through the interaction between two partial charges, following the Coulomb law. These partial charges, which correspond to qi and qj, are attributed to atoms, in order to mimic the multipole generated by the unevenness of the electron density in a molecule. rij is, once again, the distance between the two atoms being considered, and 휀0 is the vacuum permitivity.

(Figure to illustrate the terms, example. Do it on pymol?)

Energy Minimization

27

The procedure to find a minimum in the potential energy surface (PES) through the modification of the structure of the system is called energy minimization. In a simplified case, one could think of a minimization approach where, in order to minimize a function, one would increment one variable at a time until the function was minimized, and then move on to the next variable and repeat the procedure, and so on. However, since the variables are not independent and a macromolecular system has a considerably large number of degrees of freedom, one would need to run several steps of this cycle, with no guarantee that, in the end, the minimum would be found. This solution would, thus, be impractical and costly. In order to solve the issue of optimization in a many-variable system, the methods used in computational chemistry assume that the first derivative of the function with respect to all variables, the gradient, can be calculated analytically. This is done with a finite precision, with the gradient being reduced below a suitable cutoff value or until the function change between two iterations is insignificantly small. As such, the search of a PES minimum will, in the case of biomolecular systems, lead to a local minimum, with the true stationary point being approximated instead of exactly determined.

This section is not meant to be an exhaustive analysis of the methods for finding minima; nevertheless, it is important to understand the differences between the different methods, particularly in terms of advantages and disadvantages. There are three classes of search methods: steepest descent (SD), conjugate gradient (CG) and Newton-Raphson methods (NR).

(Figure showing the differences)

In the case of the steepest descent algorithm 212,213, the search direction is defined as the opposite of the direction of the gradient, since the gradient always points to the direction where the function increases the most. An approximation is calculated at the interpolated point between two points, with a new gradient being calculated and used for the next line search. The new position is defined by:

푥푖 = 푥푖−1 − 휀∇푉(푥푖−1) with xi and xi-1 being the new and previous position, respectively, 휀 being the magnitude of the step taken and ∇푉(푥푖−1) the gradient at the previous position. The SD algorithm has the advantage of being able to always lower the function value (since it evolves in the opposite direction of the gradient) and of performing very rapidly when

28 the function is far from the minimum; however, each new search direction will be orthogonal to the previous one, which can lead to a very slow convergence in unfavorable cases, with the tendency to zig zag around the minimum when surfaces display long narrow valleys.

The conjugate gradient method tries to improve on the SD method by performing the line search along a line which is “conjugate” to the previous search direction, instead of performing it on the current gradient. The direction can be seen as:

푑푖 = −푔푖 + 훽푖푑푖−1 with gi being the current gradient, di and di-1 being the current and previous direction, and 훽푖 being a scheme-dependent term which includes the current and previous gradients and can be calculated through different methods214–216. The CG methods have much better convergence properties than SD, since the distance is not orthogonal to the previous one. The drawbacks of this method are a slightly higher storage requirement (since both the previous and the current steps gradients need to be saved) and the fact that sometimes the algorithm needs to be restarted during the optimization (훽 = 0).

The Newton-Raphson methods217,218 expand the function to the second order, with the requirement that the gradient of the second-order approximation has to be zero.

−1 (푥 − 푥0) = −퐻 푔 with x0 being the current point, H being the Hessian and g being the gradient. Near a minimum, the H eigenvalues will always be positive, by definition, with the step direction being opposite to the gradient direction. If one of the H eigenvalues is negative, the function value will be increased and the stationary point found may be a saddle-point instead of a minimum. The NR methods are, however, much faster to converge.

Although CHARMM allows the possibility to use different minimization methods, the NAMD code has implemented the CG method, which was used throughout this work.

Molecular Dynamics

29

In Molecular Dynamics (MD), Newton’s equations of motion are integrated, thus determining the coordinates of the system as a function of time. The principles of classical mechanics are applied, wherein quantum corrections to the atomic dynamics are negligible. With general advances in computer sciences, accompanied by the development of more efficient and more accurate methods, MD has become an increasingly essential tool in the computational study of biomolecules. It is clear that, since the first MD study involving proteins was published in 1977 219, progress has been exponential, with biomolecular simulations playing an increasingly important role in the plethora of different fields where biology is involved. 220,221

MD methods consist of the generation of a series of time-correlated points in phase space, by propagating a starting set of coordinates and velocities, according to Newton’s second equation in a number of finite time steps. Since the starting system is static, random initial velocities are assigned to each atom based on a Maxwell- Boltzmann distribution, mimicking the temperature of the system. 222

퐹 = 푚 푎

If we write the previous equation in differential form, its solving leads to the simulation of dynamics:

푑푉 푑2푟 − = 푚 푑푟 푑푡2 where V is the potential energy (obtained from the previously mentioned force field equation) and r is a vector containing the coordinates for all particles. In order to obtain the positions at a time step later, one uses a Taylor expansion as such:

훿푟 1 훿2푟 1 훿3푟 푟 = 푟 + (Δ푡) + (Δ푡)2 + (Δ푡)3 + ⋯ 푖+1 푖 훿푡 2 훿푡2 6 훿푡3

1 with the second term corresponding to 푣 (Δ푡), the third to 푎 (Δ푡)2, the fourth to 푖 2 푖 1 3 푏 (Δ푡) , with vi being the velocities (first derivative dr/dt) at time ti, the accelerations 6 푖 ai the second derivatives, the hyperaccelerations bi the third derivatives, etc. Positions at a time step earlier can be derived by replacing Δ푡 with −Δ푡 in the previous equation. With both equations for the time step after the current one and for the time step before the current one, it is possible to derive the way to predict the position at a time step

30 later, at a Δ푡 apart, based solely on the current position, the previous position and the current acceleration.

2 푟푖+1 = (2푟푖 − 푟푖−1) + 푎푖(Δ푡) + ⋯ where ai can be calculated through the force or the potential through Newton’s second law, as mentioned above, through the Verlet algorithm 223:

퐹푖 1 푑푉 푎푖 = = − 푚푖 푚푖 푑푟푖

The method is only a problem for the initial point, since there is no position before the initial point. However, this position can be estimated from first order approximations:

푟−1 = 푟0 − 푣0Δ푡

There are, however, some issues with this numerical solving of the Newton equation. On one hand, the new position is obtained by adding a difference in positions 2 (2푟푖 − 푟푖−1) and a term proportional to Δ푡 , with Δ푡 being inherently small and the difference in positions comparably large. On the other hand, velocities do not appear explicitly in this algorithm, which would not be a problem if the ensemble used throughout this work did not have constant temperature. However, throughout this work, as will be explained later in this section, the simulations were run at constant temperature. As such, one has to apply another algorithm which solves these issues, 224 the leap-frog algorithm . In this algorithm, the equivalent expression to obtain ri+1 is as follows:

푟 = 푟 + 푣 1Δ푡 푖+1 푖 푖+ 2 where the velocity is calculated at half a time step Δ푡 after the current position ri. In order to obtain this term:

푣 1 = 푣 1푎 Δ푡 푖+ 푖− 푖 2 2

The use of explicit velocities allows the coupling to an external heat bath, with the disadvantage that, since velocities are obtained at half a time step after or before the current point, they are always out of phase of positions by half a time step.

In NAMD, the integration is done through a multiple-time-stepping method, the impulse-based Verlet-I. This allows for the force acting on each atom to be broken into

31 two pieces, a local component, which varies quickly and consists of all bonded and van der Waals interactions, as well as the portion of electrostatic interactions for pairs of atoms separated by less than the local interaction distance, and a slower long range component, electrostatic interactions outside of the local interaction distance. The slower, long range component, does not need to be evaluated every time step, a fact that is allowed by the multiple time step integration, thus lowering the computational cost. 207

Ensemble Conditions

In order to correctly mimic the environment and conditions that the molecules, in this case biomolecules, are exposed to, it is possible to choose which quantities should remain constant throughout the simulation. This set of conditions that will remain stable forms the ensemble conditions. The most common ones will be briefly discussed in this section.

In the previous section, there was a brief mention of the concept of ensembles when discussing the issues with the Verlet algorithm, which works well when temperature is not constant. One of the most common ensembles and an example of this is the microcanonical ensemble (NVE), where the quantities which remain constant are the number of particles, volume and energy of the system. 225 This is the natural ensemble generated by standard MD simulations, wherein the total energy is a sum of the kinetic and potential energies, calculated directly from positions and velocities:

푁 1 퐸 = ∑ 푚 푣2 + 푉(푟) 푡표푡 2 푖 푖 푖=1

As a contrast to this ensemble, there are two others where the temperature is constant: the canonical ensemble (NVT), where the number of particles, volume and temperature are constant, and the isothermal-isobaric ensemble (NPT), where number of particles, pressure and temperature are constant. In order to attain these two ensembles, the velocities and positions have to be modified at each time step. As such, the aforementioned Newtonian equations of motion have to be modified, notably through the coupling of the system to a reservoir. This coupling can be deterministic or stochastic, with the latter being the implemented way in NAMD, due to its ease of implementation and to an enhancement of dynamical stability due to the friction terms.

32

The Boltzmann distribution for the canonical (NVT) ensemble is generated by the stochastic general Langevin equation 226:

2훾푘 푇 푀푣̇ = 퐹(푟) − 훾푣 + √ 퐵 푅(푡) 푀

with M being the mass, v the velocity, F the force, r the position, 훾 the friction, kB the Boltzmann constant, T the temperature and R(t) a univariate Gaussian random process. The coupling to the reservoir is modeled by adding the fluctuating (final term) and dissipative (−훾푣) forces to the Newtonian equations of motion. The integration of the Langevin equation is done through the Brünger-Brooks-Karplus (BBK) method (an extension of the Verlet method). 227

훾Δ푡 1 − 1 2훾푘 푇 푟 = 푟 + 2 (푟 − 푟 ) + [푀−1퐹(푟 ) + √ 퐵 푍 ] 푛+1 푛 훾Δ푡 푛 푛−1 1 + 훾Δ푡2 푛 Δ푀 푛 1 + 2

In this case, Zn is analogous to R(t) in the previous equation, but in this case it is a set of Gaussian variables of mean set to 0 and variance equal to 1. Only one number is required for each degree of freedom and the error is proportional to Δ푡2. 228

Finally, for the isothermic-isobaric (NPT) ensemble, the equations of motion are, once again, modified, with a numerical integrator implemented, based on the Langevin- piston method 223 and the Hoover method229–231, in order to keep the system at a constant pressure. The Langevin-piston equation is as follows:

1 푉̇ = [푃(푡) − 푃 ] − 훾푉 + 푅(푡) 푊 푒푥푡

With V being the volume, W the piston mass, P(t) the instantaneous pressure, Pext the imposed pressure, 훾 the collision frequency and R(t) a random force taken from a Gaussian distribution with zero mean and variance:

2훾푘 푇훿(푡) 〈푅(0)푅(푡)〉 = 푏 푊

Time Step

33

The choice of the time step in MD simulations has to be carefully considered, since it is crucial in every aspect. There is always a compromise found between computational cost, accuracy and stability. Generally, the computational cost is inversely proportional to the time step used. As such, it would be ideal to use the largest time step possible, in order to have the lowest computational cost. However, the use of a large time step introduces large amounts of instability in the system; this is due to the fact that forces are calculated from the initial positions at each step and, as such, the calculated velocities will not correctly approximate the true velocities (which will change drastically from 푡 to 푡 + Δ푡). This error at each newly integrated velocities can be compounded, with the system moving towards increasingly larger energy values. Furthermore, there is the question of accuracy, wherein the time step should be adequately adjusted to observe the phenomena one is studying. In practice, the chosen time step should be smaller than the limit which would introduce errors and low enough to capture high-frequency fluctuations accurately; typically, this corresponds to an order of magnitude lower than the smallest vibration in the system. As the smallest vibrations are the stretching vibrations involving hydrogen atoms, of around 10 fs, a time step of 1 fs is, usually, a good starting point. However, this is still a somewhat low time step, which involves a significantly high computational cost.

As such, there are strategies that can be applied to increase the time step. One of the most common methods to increase the time step is to constrain the hydrogen stretching vibrations at their equilibrium value, without affecting the remaining degrees of freedom. In practice, the C-C bond stretches are the next fastest vibrations, at around 20 fs, so one can already use a time step of 2 fs. In order to achieve this, one can, for example, use the SHAKE algorithm. 232,233 Freezing bond angles is also a possibility; however, this may lead to differences in the overall motion 234.

Another method to increase the time step is the use of multiple time step (MTS) schemes, as has been previously mentioned. In this type of scheme, the force components are divided into classes, which are updated at different time-steps. Taking the specific example of NAMD, the MTS scheme based on the Verlet algorithm uses 4 fs for long-range electrostatic forces, 2 fs for short-range non-bonded forces and 1 fs for bonded forces. However, MTS can be combined with the aforementioned SHAKE algorithm to fix the bonds involving hydrogen atoms, with the time steps being, therefore, increased respectively to 6 fs, 2 fs and 2 fs. This involves a more difficult

34 implementation, due to the fact that the different pair lists need to be generated and updated, but improves the computational cost significantly. 235,236

Solvation and periodic boundary conditions

There are different ways to introduce a solvent (in the case of biomolecules, usually water) to a model. These can be divided into two main groups, implicit and explicit solvent models. The former applies a homogeneous medium to represent the solvent, whereas the latter includes the solvent molecules in an atomistic fashion. This section will focus on the explicit solvent models, as these are the most indicated for the study of macromolecules.

Within the explicit solvent models, there are mainly two types of models that can be used: quantal and classical models. The former make use of QM and are usually used in QM MD simulations, in which the solvent may participate in the reaction and has to, as such, be modelled with high accuracy, at the cost of computational time. Since, throughout this work, we employed classical MD, we will only briefly discuss classical models in this section.

The classical models are MM based, as the name indicates. With the most common solvent being water, one could think that the choice of model would be simple and straightforward, as it is a fairly simple molecule that should be easily modelled. However, even with the stated premises being true, the fact of the matter is that the thorough use of water molecules in the field has spawned many different models, with differing parameters and number of interacting sites. The simplest way to represent a water molecule in a classical simulation is to consider it as a Lennard-Jones sphere, containing two opposing charges, in order to mimic the dipole moment. Moving towards higher complexity, another type of model employs equal positive atomic charges on the hydrogen atoms and a negative charge on the symmetry axis or the lone pair regions, mimicking the dipole moment and the charge distribution. Some of the most popular models, such as the transferable-intermolecular-potentials-3-, -4-, and -5-point-charge models (TIP3P, TIP4P237 and TIP5P238, respectively) or the simple point charge (SPC)239,240 and modified forms of it241,242. Throughout this work, the semi-empiric TIP3P model was used, containing three point charges, one for each

35 atom, but only one van der Waals radius, located on the oxygen atom. It does not contain an angle parameter on the oxygen, possessing, instead, a bond parameter between the two hydrogen atoms. This might seem like a small simplification; however, since there is a significantly high number of water molecules surrounding a protein in a simulation, a lot of computational time is spent on the water molecules. The fact that there are two van der Waals parameters which are not present, combined with a cheaper calculation of bonds in comparison with angles, makes this simplification of utmost importance in the saving of computational cost.

(Figure showing a water box)

The addition of explicit solvent molecules in the system introduces a supplementary issue. If the water molecules were to be let free throughout the simulation, with no additional restraints, there would be a significant “loss" of molecules, which would establish interactions with the protein, among themselves and, in the case of the molecules in the outer sphere, with vacuum. These molecules would, thus, not have a proper solvent environment, and the system would not correctly mimic the biological environment. As such, it is necessary to introduce periodic boundary conditions, to prevent the system of having an interaction with vacuum. Generally, one assumes that the system is inside a water box (unit cell), with this unit being infinitely replicated in all directions. The application of this method leads to each molecule inside a unit cell interacting with molecules in the same unit cell, as well as with molecules in neighboring ones. Furthermore, an added benefit is that, as a protein or water molecule moves, it may get out of this box; with the aforementioned infinite replication, as a molecule moves out of the box on one side, it enters the same box on the precisely opposite side, allowing every participant in the system to freely cross the boundary, causing no issues in the simulation.

(Figure showing how a unit cell works)

As the system now possesses virtually infinite molecules (the “main” unit cell as well as the infinitely replicated ones), it also has to take into account virtually infinite interactions between all the interacting partners, with infinite contributions to the potential energy of the system. This is not a problem for the internal energy terms, as discussed in the force field section, as the number of atoms and their bonds remains the same (as such, the bonds, angles and dihedral terms are all the same in each

36 box). However, for pair wise non-bonded interactions, they will occur not only for molecules inside one box, but also between one molecule and all other molecules in all other boxes; as such, steps need to be taken in order to avoid a potentially infinite computational time.

A straightforward way to deal with this issue is to apply a cut-off, over which no van der Waals interaction should be taken into account. Recalling the force field terms, the van der Waals contribution to the energy decays quickly with the increase of distance:

푚푖푛 푅푖푗 − 6 . As such, one defines a cut-off typically between 10 and 12 Å, over which the 푟푖푗 contribution of the non-bonded interactions is negligible. In the case of electrostatic interactions, the method used by NAMD is a variation of the Particle-Mesh Ewald (PME)243 method, the smooth PME (SPME) method. 244 The electrostatic interactions are divided into small and large interactions, with the former interactions being calculated according to the force field equation and the latter using a fast Fourier transform.

The only issue left to take into account is what size of box to use. The box should be at least as large as the cut-off radius. Furthermore, since the protein is at a high concentration, when compared to the biological system being studied, the size of the box should be large enough to avoid self-interacting proteins.

Molecular Mechanics/Generalized Born Surface Area

The analysis of the contributions from electrostatic and van der Waals interactions and changes in solvation is an important tool to assess the binding of complexes. This is especially the case when one is dealing with models obtained through virtual mutagenesis.245

The Molecular Mechanics/Generalized Born Surface Area method (MM/GBSA) allows for the calculation of gas-phase energies, solvation free energies and entropic contributions, as an average over several snapshots from MD trajectories. Other methods, such as the analogous MM/PBSA method which uses the Poisson- Boltzmann approach 246,247, could be used, albeit in a somewhat slower way.

Accordingly, the binding free energy of association of a proteic complex can be written as follows:

37

Δ퐺푏푖푛푑 = 〈퐺퐴퐵〉 − 〈퐺퐴〉 − 〈퐺퐵〉 with GAB being the free energy of the complex and GA and GB being the free energy of each interacting partner, and 〈 〉 denoting an average of snapshots taken from the MD simulation. When considering each individual free energy, it is as follows:

퐺 = 퐻푔푎푠 + 퐻푡푟푎푛푠/푟표푡 + 퐺푠표푙푣 − 푇푆 wherein G is estimated from contributions of gas-phase energies, solvation free energies and entropies. The gas-phase contributions are obtained as a sum of internal energies (bond, angle, torsional, as previously described) and van der Waals energies.

The energy due to the translational and rotational degrees of freedom (Htrans/rot) is 3RT.

The solvation free energy Gsolv is obtained from a sum of polar and non-polar contribution. The non-polar contribution to the solvation free energy, due to the van der Waals interactions between the solute and solvent, as well as the formation of cavities, is given by a term dependent on the solvent-accessible surface area (SASA):

퐺푛표푛−푝표푙푎푟 = 훾푆퐴

The entropy contribution (-TS in the free energy equation) is determined from changes in the degrees of freedom (translational, rotational and vibrational), through classical statistical thermodynamics. 248,249

Add drawbacks of methods and mention/cite some of the highly accurate methods that can be used

(Figure showing a GBSA cycle? Complex AB and separated A and B, arrows, etc)

38

Molecular Docking

Molecular docking aims to predict and reproduce the binding mode(s) (orientation, interactions) of a ligand to a protein or, in some cases, a protein to another protein, through the lowest energy pathway. It is a very complex problem, many different forces playing a pivotal role in the association of the two partners. Namely, these can be hydrophobic, van der Waals, hydrogen bonding and electrostatic forces. The first one is considered to be the driving force in the binding, while the latter two are involved in the binding specificity. 250–252 Ideally, this sort of methods would predict the perfect fit between the two molecules being considered. When there is a crystallographic structure with a ligand bound to its active site, for example, one can validate the method by removing the compound and re-docking it with the method being used; furthermore, after removing the ligand, other ligands may be docked to the active site. This is one of the principles of drug design, wherein one can identify potential hits through the docking of a ligand to a protein.

The process of studying the interaction between a ligand and a protein through computational methods presents a host of challenges, due to the number of degrees of freedom and no effect of solvent or entropic effect in the association. In order to solve the first issue, usually the receptor is kept rigid or semi-flexible (flexibility only on

39 side-chains of residues in the target binding site, for example), whereas the ligand may be kept rigid or fully flexible. The issue of the lack of an observation of the solvent effect has been mitigated in the scoring functions used for the molecular docking, which will be explained later.

During the docking process, the two studied molecules are physically brought closer, with the conformation of the ligand being changed according to the set parameters, and the binding energy being evaluated on each complex formed and subsequently minimized. The different conformations are then ranked according to a scoring function.

Search Algorithms

The search algorithms use approximations to evaluate all the receptor and ligand degrees of freedom, with the potential constraints discussed above, in order to generate different conformations for the ligand. The search algorithms can be divided into three categories, with different approximations, leading to differing computational costs and accuracy: systematic search, stochastic search and simulation.

In the first category, the systematic search algorithms can be further divided into different subgroups. The algorithms based on systematic conformational search probe all possible dihedral angles of a ligand over defined angular ranges, with specific incremental steps (e.g. 0 to 360° with steps of 10°). However, they are time consuming and may generate many unrealistic conformations. As for fragmentation methods, they divide the ligand into different fragments and then rebuild them iteratively on the target site, optimizing the binding mode based on energetic criteria. Some popular programs using this approach are Dock253, LUDI254,255, FlexX256 and ADAM257. Lastly, the structural database methods use libraries of pre-generated conformations, thus lightening the issued mentioned for the conformational search methods. An example of this type of methods is FLOG258.

The stochastic algorithms, or random search algorithms, randomly change a ligand or ensemble of ligands, in order to probe the conformational space, with each change being evaluated and accepted or rejected, based on a probability function. The Monte Carlo (MC) methods use a Boltzmann probability function, using a simple energy

40 function to evaluate the conformations (examples include LigandFit259, Prodock260 or MCDOCK261). Genetic algorithms (GA)262 use concepts from genetics and evolution theory to generate different conformations, with a starting population being generated randomly. Each individual conformation in this starting population is composed of a set of state variables, describing its binding pose, in relation to the protein receptor. The scoring function then selects the best conformations, with a new population being created. At any point in the process, the individuals in the population may suffer mutations, crossover or migration, until the process is ended after a set number of iterations. Many popular programs, such as AutoDock263 or GOLD264 use a genetic algorithm.

The final category of search algorithms is particular, in the sense that it tries to solve the Newton equations of motion, extensively mentioned previously in this chapter, in order to produce different conformations of a ligand, correlated with each other. MD methods can be applied to this end; however, a significant limitation in this type of methods is the fact that it cannot adequately probe the conformational space, allowing, nevertheless, the observing of the binding process along the simulation. An alternative to the MD methods is the energy minimization methods, which are usually used in conjunction with other methods, due to their ability to only achieve local minima.

(Check umbrella sampling for the last)

Scoring Functions

As was briefly mentioned before, a scoring function has the role of ranking the different poses obtained by the search algorithm, in terms of quality of binding of each conformation of the ligand to the protein. To this end, it approximates the free energy of binding of the ligand, identifying which poses are adequate and which ones are not. There is, however, a compromise to be found between the computational cost of the energy calculation and the accuracy obtained. To achieve this, there are three types of scoring functions: force field-based, empirical and knowledge-based.

The first class, force field-based, use the force fields mentioned throughout this chapter in order to approximate the interaction energy between the receptor and the ligand and the internal energy of the ligand. These have the limitation of not taking into account the solvation and entropic effects, as well as not accurately treating the long-

41 range effects. Examples of these scoring functions are GoldScore265 and the AutoDock scoring function266.

(example of scoring function equation)

Empirical scoring functions aim to reproduce experimental data, with the binding energy corresponding to the sum of uncorrelated terms. These terms can be, for example, the number of hydrogen bond or immobilized rotatable bonds. Although these methods are computationally cheap, they are dependent on the experimental data set used for the parameterization. Examples of these are the ChemScore267 and the X-SCORE268.

(example of scoring function equation)

Lastly, knowledge-based scoring functions use the information available in structural databases (e.g. the Cambridge Structural Database (CSD) (ref) and the (PDB) 269) containing thousands of structures, to derive some simple parameters, such as, for example, the mean potential force between atom pairs. Through the analysis of the frequency of occurrence of certain protein-ligand atom pairs in the databases, these methods then rank the ligands based on the assumption that the pairs that appear more in the experimental databases should have more importance. Examples of these scoring functions are the Muegges’ Potential of Mean Force (PMF)270,271 and DrugScore272,273. (check the one used in GOLD)

The different scoring functions outlined in this section can be combined, in order to obtain a more robust ranking of the different analyzed conformations, through a consensus scoring274,275. In theory, by combining different scoring functions, one is able to compensate for errors found in individual ones; however, if the different scoring functions are similar among themselves, it is possible that the error is amplified instead of mitigated. This is something that one has to take into account when combining different scoring functions.

Throughout this work, LigandFit was employed to perform the molecular docking simulations. As such, we will now briefly go through the search algorithm and scoring functions used. Initially, the generation of ligand conformations is done through steepest descent minimization (as previously described), followed by a stochastic algorithm, which subjects a ligand to a change in the torsion angles of all rotatable

42 bonds, up to a defined number of attempts. The values for this change depend on the number of rotating atoms, with a ratio being defined as:

푛 2 휎 = (0.25 [ 푡표푡푎푙] ) 푛푟표푡 with ntotal being the total number of atoms in the ligand and nrot being the number of rotating atoms. The amount of rotation at each step will, therefore, be guided by the 휎 value. The program allows steps in the array {1, 2, 5, 10, 20, 30, 45, 60, 90, 120, 180}. It also stores an array, Cut, which is defined by 0.5 ∗ [푆푡푒푝(푖 − 1) + 푆푡푒푝(푖)], with i being in the interval [2-11]. So, if we take the first two values in the stepping array, 1 and 2, the Cut value would be 0.5*[2+1] = 1.5. After building the Cut array for all pairs of subsequent steps, another integer is determined, such that 휎 is between two consecutive Cut values. A random modifier is then applied to this step value to obtain the final amount of rotation. The method allows for smaller changes in torsion angles as the number of rotating atoms increases, without avoiding large rotations from occurring. The occurrence of these large rotations allows the system to escape from local minima.

After generating all the desired conformations, the docking is performed by aligning the ligand to the site, through the shape comparison between the conformation and the target site.276,277 For each pose, the docking energy is taken by

퐸푑표푐푘 = 퐸푙푖푔−푝푟표푡 + 퐸푙푖푔 with Elig-prot being the interaction energy between the ligand and protein and Elig being the internal energy for the ligand. The first term is the sum of the van der Waals and electrostatic energies, such that:

∗ 9 ∗ 6 푟푖푗 푟푖푗 퐸푣푑푊 = ∑ 휀푖푗 [2 ( ) − 3 ( ) ] 푟푖푗 푟푖푗 푖,푗

∗ ∗ ∗ where 휀푖푗 = √휀푖휀푗 and 푟푖푗 = √푟푖 푟푗

* and the letters i and j correspond to each ligand and protein atom, respectively. ri and

* rj are the van der Waals radius for each, whereas 휀푖 and 휀푗 are energy parameters, where the indices i and j indicating the provenance of the atom. rij is the distance

43 between the two atoms. The parameters are imported from a force field, such as CFF278 or Dreiding279. (put CFF and Dreiding on the force fields part)

The electrostatic energy of interaction is:

332.0716 푞푖푞푗 퐸푒푙푒 = ∑ 휀 푟푖푗 푖,푗

Once again, i and j designate, respectively, each ligand and protein atom, with 휀 being the dielectric constant. qi and qj are the respective charges, in atomic units, being defined as part of the force field used. Since this computation is rather time consuming, LigandFit uses a grid-based280 energy estimation of the interaction energy, in order to improve the speed. (expand on the grid-based part)

The energy function described above is a method to find ligand conformations with a favorable interaction energy with the protein. However, it is not designed to ideally predict ligands with good binding affinity, or to differentiate between ligands and rank them in relation to one another. As such, one has to employ scoring functions which are specifically designed for this end. Some of the scoring functions used by LigandFit include the aforementioned LUDI254,255 and PMF270,271, as well as PLP281. In addition, the developers of LigandFit have also created a specific scoring function, LigScore282, which uses a Genetic Function Approximation283 to train the function against protein- ligand complexes on crystallographic structures with available binding affinity. In practice, the binding affinity in LigScore is defined as:

2 푝퐾푖 = 0.517527 − 0.043650(퐸푣푑푊−푠표푓푡) + 0.143901(퐴+) − 0.00099039(퐴푡표푡)

In this estimation of the binding affinity (pKi), EvdW-soft is a soft-potential modification applied to the previously mentioned vdW equation284, in order to prevent the penalization of short ligand-protein contacts when the protein is rigid. A+ is the surface 2 area of the ligand involved in attractive polar interactions with the protein and 퐴푡표푡 = 2 2 퐴+ + 퐴−, with A_ being the ligand surface area involved in repulsive polar interactions with the protein.

Virtual Screening

44

Virtual Screening (VS) consists of applying the methods described in the molecular docking section to a large number of compounds, in order to screen and rank them according to their binding energy towards a specific region in a given target, usually a protein. One can think of a VS campaign as a repeating of a protein-ligand docking for each of the ligands being studied. This introduces an additional issue that needs to carefully be pondered and addressed. One of the plethora of differences between the different molecular docking methods, mentioned in the previous section, was the varying degrees of complexity involved therein. It is evident that, although higher complexity methods are adequate when one is docking a single ligand to a region in a target protein, when this process is to be repeated in the order of the thousands, sometimes millions, of times, this higher complexity is compounded and comes at a very significant computational cost. As such, it is usual to apply protocols in the lower tiers of complexity to these large libraries of compounds, with a consecutive penalty in the accuracy of the final results. Consequently, the results obtained in a VS campaign should be taken as indicative, as an extremely useful filtering tool, an initial step in the full molecular modeling protocol which allows the reduction of the number of compounds being used in the subsequent steps by a factor of a hundredth or a thousandth.

Pharmacophore

A method that can be used either instead of or concurrently with the ligand-based ones described in the previous sections, which can be focused on the ligand or the receptor, is one employing the concept of pharmacophores. In a general, it should be used when structural information of the receptor being targeted is absent. By definition, a pharmacophore is an arrangement of particular properties, common to several drugs, which intervene in the biological activity of the drugs285. (expand definition on the IUPAC one) One can then, through the definition of a pharmacophore, scan a large database of compounds and find potential candidates which display the corresponding properties. Following this, even if the scaffold of the found molecule is different, the interacting groups can be superposed to the spatial position of the defined properties. It is assumed that the ligand binding activity to a specific receptor is due to the functional groups present in the ligand; as such, good candidates for binding to a

45 specific receptor should possess similar functional groups. Alternatively, if the ligands do not possess similar functional groups, their functional groups should respect the same properties in the same spatial configuration.

There are many different kinds of properties that can be included in a pharmacophore definition. These are mainly hydrogen-bond-forming groups (acceptor or donors), hydrophobic regions, positively or negatively charged groups. One can approach the pharmacophore definition in two distinct ways: the properties can be derived from a ligand or a set of similarly binding ligands to form a ligand-based pharmacophore or, alternatively, the protein can be the starting point for the pharmacophore. For this latter method, it is imperative to analyze which functional groups are in the binding pocket or target area, corresponding to the amino acids of interest. In this case, it is the protein that guides the binding, and as such, this method forms a protein-based pharmacophore.

(Example on previous paragraph)

Although there is a cost to the accuracy, with the use of only descriptors of functional groups in a database search, instead of an atomistic method, there is a significant benefit to the computational speed with which one can use these methods. On the other hand, in both types of pharmacophore building (ligand-based and protein- based), there is a problem with the rigidity of the molecule in question. Due to these issues, it is recommendable that one uses a pharmacophore search as a first filtering tool, for example, when significantly large databases are to be dealt with, after which other methods described in the previous section will be more suitable to predict the binding modes and energies of the different ligands. Furthermore, to deal with the rigidity of the molecule used as a building block for the pharmacophore, it is possible to either introduce conformational changes to the ligands when a ligand-based pharmacophore is being built (as per the previous sections) or, when one is building a protein-based pharmacophore, to combine this method with other, higher-level methods, such as, for example, MD simulations, in order to generate different conformations of the protein and, as such, include the effect of the flexibility of the protein in the building of the pharmacophore.

(Ensemble Docking)

(Figure of a pharmacophore example)

46

(refs on Klebe Drug Design, Leach’s Molecular Modeling principles)

Methods

Model Building

The Protein Data Bank (PDB) archive stores the published structures for proteins and other macromolecules. 269 At the time of writing, this large database contains 151079 total structures, with most of the structures being obtained by X-ray crystallography (89%), solution nuclear magnetic resonance (NMR) (8%) or by electron microscopy (2%). In some cases, the same protein may have more than one structure in the database, differing in, among other features, resolution or modifications to the native structure.

The basis for any structural biology study is a protein model. Ideally, a structure for the protein being studied should be present in the PDB database; otherwise, one should be built using methods such as homology modelling. 286 Throughout the work presented here, there are two proteins being studied, which have been extensively described: IL-15 and IL-2. In both cases, there are crystallographic structures present in the PDB database.

For IL-15, the choice of PDB structure to use was fairly straightforward. In fact, only one structure of IL-15 bound to its three receptors (IL-15Rα, IL2-Rβ and γc) is present on PDB (PDB ID 4GS7) 287 and, as such, this structure was used for all models of IL- 15 except one, which will be described later. The structure of the IL-15 quaternary complex was published in 2012 and was obtained by molecular replacement using published models of all four chains. There are some particularities observed in this structure. In order to improve the resolution of the model, surface lysine residues were demethylated, which often improves crystal diffraction 288, which yielded a structure with a resolution of 2.35 Å. Furthermore, there are artifacts from the crystallization process present in the structure, such as acetate ions, 1,2-ethanediol and N-acetyl-D- glucosamine present in the crystallographic structure. The protein corresponds to the human protein, albeit expressed in Escherichia coli. In order to obtain the wild-type models used throughout this study, the lysine residues had to be demethylated and

47 the crystallization artifacts needed to be removed. This process was common to all models and performed directly on PyMOL 289, through deletion of the relevant atoms.

In the process of X-ray crystallography, an X-ray beam crosses a crystal of the molecule being studied; a diffraction pattern is formed, and the structure is obtained from this pattern. The scattering is due to the electron density of the atoms; as such, smaller atoms with low electron density will not be visible in the structure. Thus, these need to be added by an external software. In the case of hydrogen atoms bound to atoms without lone pairs, their addition is trivial, with their positions being known with a high degree of certainty; an example of this is an sp3 carbon, where the angles between the hydrogen atoms and the C-H bond distances are known. When hydrogen atoms are bound to atoms with lone pairs, these lone pairs will influence the angles formed between the hydrogen atoms and the atom they are bound to; such is the case of the water molecule. For these atoms, one can guess their location and, over the course of an MD simulation, they will move to their correct location, if they were not already positioned there. The last type of hydrogen atoms is those bound to atoms with unknown or uncertain protonation states. In this case, more than the position, it is necessary to know if one should add a hydrogen atom or not. Indeed, even if the pKa for a given residue is well established, in a protein environment this choice is not as straightforward, as the pKa of a residue is highly dependent on the neighboring residues; this is, especially, an issue in amino acids which are not in contact with the solvent, such as buried residues or residues at a protein-protein interface. Residues, such as histidine, which are more sensible to the environment, should be given special care, as a wrong prediction on their protonation state could lead to wildly different results. In practice, the program used to add hydrogens to the proteins takes all of this into account; in this case, the software used for this was the Prime program included in the Schrödinger Small-Molecule Drug Discovery Suite 2017-1 290,291, which uses PROPKA 292,293 in its protonation state assessment. This program was also used in order to perform other small structural additions, such as disulfide bridges and missing side-chain atoms.

Furthermore, in the aforementioned quaternary structure there was a missing loop, corresponding to residues 25 to 31 in the IL-15 structure. This loop is identical to a loop present in the structurally similar IL-2 and, as such, was added through homology modelling (based on PDB ID 2B5I for IL-2). 294 This was also performed with the Prime

48 program, with the choice of the template structure being done since, at that time, it was the only IL-2 quaternary structure containing this loop.

(Figure of the loops superposed)

Several wild-type models were henceforth built, starting from the quaternary structure of IL-15. These models were built through the removal of certain receptors from the initial structure, in order to assess the influence of individual receptors on the interactions established at the different interfaces, as well as on the overall proteic structure. Through this method, the IL-15/IL-15Rβ/γc (absent of IL-15Rα) and the soluble IL-15 model were built. An exception to the use of the quaternary structure or IL-15 as starting point for a model was the use of a binary IL-15/IL-15Rα model already present on the PDB database (PDB ID 2Z3Q) 295, published in 2007. This model has a resolution of 1.85 Å and was used in order to build the IL-15/IL-15Rα model used in the MD simulations. As such, the final tally of IL-15 wild-type models used in the simulations was: the quaternary IL-15/IL-15Rα/IL-2Rβ/γc, the ternary IL-15/IL-2Rβ/γc, the binary IL-15/IL-15Rα and the soluble IL-15 model.

(Figure of the different models)

Using the quaternary IL-15 as a starting point, several different mutated models were built. Intending to study the influence of individual hot-spots on the different interfaces, these were built by simple replacement of the wild-type residue with the mutated one. These replacements were done using the mutator plugin present on PyMOL 289, with the most prevalent conformer being kept for each substitution. The mutated models were:

- D8S, S58K, D61K, E64K, N65K, I68K, L69R and N72K on the IL-15/IL-15Rβ interface;

- D30K and H32E on the IL-15/γc interface.

(Figure of the locations of the mutated residues)

All the steps in the protocol described above for the wild-type models were applied on the mutated models as well, with hydrogens, disulfide bridges and missing atoms being added on the Prime program.

49

For IL-2, the choice was not as straightforward as in the case of IL-15, since different structures were present on the PDB database. However, the choice fell on the quaternary structure published in 2006 (PDB ID 2ERJ). 296 Although this structure presents a lower resolution than another one (PDB ID 2B5I) published in 2005 297 (3.0 vs 2.3 Å), this latter one presented regions where the structure was not solved. Thus, we decided to sacrifice the higher resolution, in order to have a structure wherein all the protein was already present, and therefore introducing less error through the addition of atoms of unknown positions. Indeed, since there are Cα atoms missing, this task would need to be performed through homology modelling, as described previously for IL-15, which has its own inherent issues. The goal of using IL-2 is two- fold: if, on one hand, we intend to obtain IL-2 structures for the docking process which will be described later in this section, on the other hand we also want to compare IL-2 to IL-15, namely the influence of the respective α receptor on the different interfaces and global parameters. As such, we built two models for IL-2: a quaternary model with the three receptors present (IL-2/IL-2Rα/IL-2Rβ/γc) and a ternary model in the absence of Rα (IL-2/IL-2Rβ/γc). The process for building these was similar to the one already described for IL-15.

Considering the preparation of the models described up to this point for MD simulations, the process was similar for all. In order to mimic the biological medium for each protein, water molecules were added, up to a distance of 12 Å from the protein edge, forming a rectangular box. Notably, all water molecules already present in the crystallographic structure, up to 4 Å from all residues at the interface, were kept. Counter-ions were added in order to neutralize the system. Both the addition of solvent and counter-ions were performed using CHARMM-GUI 298, a web-based interface which aids in the preparation of the systems and generation of certain input files. The final sizes of the systems range from X to Y atoms, corresponding to the IL-15 in solution and the quaternary IL-15 structure, respectively.

(Figure of a model inside waterbox)

Molecular Dynamics

50

In order to observe what happens along a certain timescale, as well as to adequately explore the conformational space the proteins in study may adopt and extract snapshots which can be used in subsequent steps in this study, the models described in the previous section are subject to a MD protocol. The parameters used for this protocol, as has been mentioned in the computational details section, were from the CHARMM36 all-atom force field. 299,300 All the simulations were run using NAMD. 207

MD conditions Initially, the system should be brought to an energy minimum, prior to the actual molecular dynamics steps. This minimization was done in two stages, wherein at first the protein is kept fixed, with the water molecules and counter-ions being allowed to freely move and be minimized (for 10 000 steps), followed by a minimization of the full system (for 10 000 steps). The cutoff distance, for all the steps, was set to 12 Å, with the switching function starting to take effect at 10 Å and the pairlist distance, for which electrostatics and vdW interactions will be calculated set to 14 Å. Periodic boundary conditions were used, with the particle mesh Ewald method being applied.

Following the two minimization steps, the proper MD part commences, also being divided into two separate stages. First, the system slowly heats up, from 0 K to 303.15 K, in order to prevent large shifts in the structure and to ease the system into the desired temperature. This equilibration/heating step is done in constant volume (NVT ensemble, as previously described). The biologically relevant production stage is the second one, wherein the system has been heated up and can now evolve in a constant temperature and pressure (NPT ensemble), allowing for a better representation of the solvent molecules density and providing a more accurate description of the biological phenomena. The heating step was performed for 200 ps and the production one for 200 ns. The choice of the length of the production step is important and consequential, due to the fact that a crystallographic structure is used. This structure corresponds to an average of structures observed in the crystal, wherein more populated states will contribute more to the final states than the less observed ones. Hence, the full Boltzmann distribution of the system is found in a single structure; therefore, the timescale should be large enough to allow for the relaxation and equilibration of the system.

51

The MD run was performed on a 2 fs timestep, as previously mentioned in the computational details chapter, with snapshots being saved every 2 ps (to a total of 100 000 snapshots for the full 200 ns). Energies were output every 0.25 ps. Bond lengths involving hydrogen atoms were fixed using the SHAKE algorithm and the non- bonded forces were calculated every step. The pressure was kept constant (1 bar) using a Langevin piston coupled to a heat bath, to keep the temperature constant at the aforementioned value.

(Figure with the “process”)

MD analyses

The studies performed with the use of MD simulations, throughout this work, can be divided into two parts. On one hand, the wild-type IL-15 models, in their different configurations concerning the presence or absence of certain receptors, were compared between each other, in order to evaluate the influence of the receptors on each individual IL-15/receptor interface. On the other hand, the IL-15 wild-type in the quaternary structure was compared to the mutated models described in the previous section, in order to evaluate the influence of every highlighted hot-spot in the overall properties of the system. The description of the analyses performed in each case was as follows.

Comparison of IL-15 in the different multimeric species Root mean square deviation (RMSD) and Root mean square fluctuation (RMSF) values were obtained for IL-15 by first aligning the coordinates of every snapshot from all the corresponding systems to the IL-15 chain Cα atoms, using the first frame as reference, in the VMD plugin Trajectory Tool. 301 Following this, the respective quantities were calculated through the aforementioned plugin, in the case of the RMSD, and an in-house TCL 302 script, in the case of the RMSF. The RMSD values were calculated only taking into account the Cα atoms of every residue, with the global value being further broken down into structural elements as follows: helix A (residues 1-19), A-B loop (residues 20-35), helix B (residues 36-54), helix C (residues 57-77), C-D loop (residues 78-95) and helix D (96-111). The B-C loop was not considered, as

52 it is composed of only three residues and would not, as such, bring significant information to the analyses. The RMSF values presented are by-residue, with the RMSF value being used for each residue corresponding to the Cα RMSF value. In both cases, the values presented correspond to the full 200 ns of simulations.

The former analysis allows for a broad view of the conformational changes throughout the simulations, also serving as a tool for the validation of the models and verification of the equilibrated state of the different systems. The latter allows for a detailed view, on a per-residue basis, of the fluctuations observed throughout the simulations, thus allowing a deeper analysis of the influence of the flexibility of the residues on the system as a whole.

Following this preliminary analysis and confirmation of the equilibrated state of the system, the total number of contacts between IL-15 and each individual receptor was calculated, for each frame of the trajectory in the interval 100-200 ns, interval in which the trajectory was considered to be equilibrated for all cases. This was performed using the nativecontacts command of the cpptraj trajectory analysis tool, present in the AmberTools18 collection of the Amber18 suite. 303 A contact was, thus, defined based on a distance cut-off (<3 Å), with both native (present in the reference frame, the 100 ns point of the simulation) and non-native (not present in the reference frame) contacts being considered equally.

Distance between residue pairs was analyzed based on the previously mentioned total number of contacts analysis. Individual (atom-atom pair) contacts were probed through the use of the aforementioned cpptraj tool, with all the atom pairs for each residue pair, at <3 Å from each other at any point during the simulation, being considered. The final distance taken into account was the minimum atom-atom distance in each frame at the 100-200 ns interval, for each residue pair. Further, average values of the atom-atom pair corresponding to the minimum distance were computed, as well as the percentage of the time wherein the residue-residue distance (as previously defined) remains at <3 Å. Lastly, the atoms considered when building the distance tables, as seen in Table X, for instance, were the most prevalent throughout the simulation.

Water molecules residency time was calculated for the 100-200 ns interval, in each system, using the hbond command of the aforementioned cpptraj tool, with only

53 inter-receptor (and hence, no intra-chain) solute-solvent hydrogen bonds being considered. The acceptor-donor heavy atom distance considered was 3.2 Å and all the bridging water molecules with more than 10% residency time (as defined by the ratio between the number of frames wherein each residue establishes a hydrogen bond with a non-specific water molecule versus the total number of frames) were kept.

IL-15 under the influence of different mutations In order to assess the influence of an individual mutation on the overall characteristics of the IL-15 system, some of the analyses described on the previous section were performed on each mutated model. These concern the mentioned RMSD and RMSF analyses, in order to compare the stability and degree of change from the wild-type model to the mutated one. The contacts analysis was also performed as described, namely allowing the evaluation of the gain/loss of contacts due to the mutation.

Furthermore, MM/GBSA analyses were performed, based on the MD trajectories obtained for each mutated model. Indeed, recalling the equation for the binding free energy of association, mentioned in the computational details chapter:

Δ퐺푏푖푛푑 = 〈퐺퐴퐵〉 − 〈퐺퐴〉 − 〈퐺퐵〉

In fact, in the calculation of the binding free energy of association, the designation A corresponds to IL-15, with B being different, depending on what is being taken into consideration. For instance, for mutations occurring in the residues in contact with the IL-2Rβ, B would correspond to the IL-2Rβ, whereas if the mutation is present in a residue in contact with γc, this would correspond to γc. In practice, this means that each individual interacting partner on the PPI (IL-15 and either IL-2Rβ or γc) is being taken into account separately. Thus, since the MD trajectories taken into account correspond to the quaternary structure, in the MM/GBSA calculation, the receptors which are not involved in the PPI are removed. Once again considering the two examples, for a mutation occurring in a residue in contact with IL-2Rβ, the IL-15Rα and γc receptors would be removed, and vice-versa. This method ensures that the obtained value in energy is due, exclusively, to the two interacting partners. The averages used for the calculation correspond to the time interval between 100-200 ns, wherein the simulations were considered to be equilibrated.

54

The buried surface area was also calculated for each mutated model, based on the MD trajectories. For this, the solvent accessible surface area (SASA) was probed for each monomer and for the dimeric complex being analyzed (similar to the logic explained in the previous paragraph). The buried surface area then corresponds to the sum of the SASA of the monomers minus the SASA of the complex. This calculation assumes that there would not be conformational changes in the monomers when assembled into the complex. Furthermore, the results obtained are divided by two, which assumes that the interfaces are symmetric: 304

(퐴 + 퐴 − 퐴 ) 퐴 = 퐴 퐵 퐴퐵 푏푢푟푖푒푑 2

Similarly to the previous MM/GBSA analysis the subscript A corresponds to IL-15, with

B corresponding to either IL-2Rβ or γc, depending on the analyzed mutation. These values were calculated using PyMOL for each individual snapshot, with the final value being the average of the values obtained for the interval 100-200 ns.

(Figure showing an illustration of the areas?)

Virtual Screening

The main goal of the work developed in the present document was to try and develop one or several compounds which would potentially inhibit IL-15 activity, in the pathological context discussed in the introduction chapter. To this end, there are two inevitable conditions: one should be able to identify one or more molecules which target IL-15; these molecules should not possess inhibitory activity towards IL-2. The first condition has been achieved in an already mentioned previous work, with a molecule being identified with potent activity towards IL-15. 305 However, this molecule did not fulfill the second condition. As such, the door remains open for the identification of a molecule which is able to simultaneously have IL-15 inhibitory activity and not have the same activity towards IL-2.

Throughout this study, in order to try and identify molecules with potential IL-15 inhibitory activity, the results obtained from the application of the methodologies described in the previous sections were applied towards the building of pharmacophores. In practice, the results from the MD simulations were taken

55 advantage of, as starting points in the virtual screening protocol, which was developed using Discovery Studio (DS) 3.5. 306,307

Building a pharmacophore model The creation of the pharmacophore model presumes that there is a region of interest already identified. In order to minimize the issue of specificity, an acceptable starting point is to try and target the interaction between IL-15 and IL-2Rβ, receptor which is only shared with IL-2. However, as has been alluded previously that, if one were to try and target the whole interface between IL-15 and its receptor, the identified molecule would not be considered as small, due to the size of the interface. As such, the scope needs to be narrowed, with a more specific region needing to be pinpointed. This is the first area in this protocol where the MD simulations will intervene.

In fact, through results that will be presented in the following chapter, there was a small cluster of residues which were identified as having significant importance in the establishment of the interface: Ser7, Lys11, Ser58 and Asp61. These correspond to a yet unexplored region on IL-15, shared between helices A (Ser7 and Lys11) and C (Ser58 and Asp61). Hence, the pharmacophore was built based on this region. The interacting residues on IL-2Rβ (His133, Tyr134, Asp68, Ser69, Gln70 and Lys71) were used as a reference for the pharmacophore, thus creating a binding site with a 10 Å radius.

(Figure with the IL-15 pharmacophore)

The same logic was then applied to IL-2, with the same residues on IL-2Rβ being identified, corresponding, however, to the residues analogous to IL-15: Leu19, Met23, Leu80, Arg81 and Asp84. This amounts to a larger binding site, with a 13 Å radius.

(Figure with the IL-2 pharmacophore)

Several different tests for combinations of properties were performed, as well as for methods of building the pharmacophore, with the choice falling on the Structure Based Pharmacophore option, present on DS. For IL-15, this meant that Ser7 and Ser58 would be Hydrogen Bond Donor/Acceptor (HBDA), Lys11 would be a Hydrogen Bond Acceptor (HBA) and Asp61 would be a Hydrogen Bond Donor (HBD). For IL-2: HBD for Asp84 and Arg81 (on the oxygen atom on the main chain), HBA on Arg81 and, in contrast to IL-15, a hydrophobic region near Met23 and Leu19. Exclusion spheres

56 were included in order to avoid clashes of the potential molecules with atoms in the protein. Finally, through superposition of both pharmacophore hypotheses, it was found that, if both serine residues on IL-15 would be kept as a hybrid HBD/A, they would neatly superpose with IL-2. This is not ideal, since this increases the probability of the compounds identified for IL-15 to also show activity towards IL-2. As such, the property on Ser58 was changed to only HBA.

(Figure of both pharmacophores for cryst)

In order to further profit from the results of the MD simulations, and to introduce the notion of flexibility into this part of the study, four snapshots from the trajectories corresponding to the quaternary structures of both proteins (IL-15 and IL-2) were extracted. These four snapshots corresponded to:

- An average structure between 25 and 50 ns; - An average structure between 50 and 75 ns; - An average structure between 75 and 100 ns; - The final snapshot (t=100 ns) in the MD simulation.

One aspect to note in this extraction is the method through which the average structure was decided upon. The three time windows were decided based on the number of structures to be used in the virtual screening work (five in total, the four aforementioned ones and the crystal structure), with the initial part of the simulation being rejected, in order to avoid taking a structure that would not correspond to a biologically relevant one. Furthermore, the average structures are not geometric averages: if one would take the average of all coordinates in all the snapshots on a given time window, this would not correspond to a “true” conformation. As such, the geometric average was

57 taken, with the RMSD between each snapshot and this geometric average being measured; the structure with the lowest RMSD was extracted.

The pharmacophore for each frame was built, based on the previously mentioned properties. This means that, for each protein, there would be a total of five pharmacophores, to a total of ten. However, in the case of IL-2, the hydrophobic region had to be split: if all the hydrophobic spheres were kept, no compounds would be identified. As such, depending on the frame of IL-2, there were either two or three pharmacophores; as such, the total number was X. (More details on pharmacophores? Distances, etc)

Pharmacophore Filtering The compounds used for this study came from a diverse set of databases, containing different types of molecules with different features:

- The Maybridge HitFinder 2016 collection, 14 399 compounds representing drug-like diversity for lead identification; - The Chembridge DIVERSet Libraries, a structurally diverse set of compounds which provide great coverage of the pharmacophore space (DIVERSet CORE Library stock – 49 886 compounds – and DIVERSet EXPRESS-Pick collection – 49 885 compounds); - ASINEX PPI-focused 2015 library, 11 387 compounds which mimic the backbone geometry and the projection of side-chains as observed in peptides present in PPIs; - The ChemDiv Eccentric PPI library (12 995 compounds), PPI Helix Turn 3D mimetics (25 775 compounds), PPI shape helix mimetics (8 885 compounds) and PPI 3D mimetics (2 108 compounds), libraries selected for their specific targeting of PPIs; - The ENAMINE set targeting PPIs, 65 512 diverse compounds which were obtained from the analysis of more than twenty protein-protein complexes to define specific features of potential inhibitors; - The Life Chemical PPI-focused set, containing a total of 22 932 molecules which were found from the total pool of Life Chemical molecules to have activity towards PPI, through different ligand-based methods (machine learning, 2D similarity, rule of four).

58

The total set of databases consists of 260 000 molecules, which were imported onto the DS program, with duplicate structures being removed and the 3D coordinates being generated using the Catalyst tool present on the program. This program also generated multiple 3D conformations for each ligand, thus building the database used in this study.

Each database was systematically probed by each pharmacophore, five times for IL- 15 and X times for IL-2. For each protein, the molecules which fit the different pharmacophores were pooled for each database. Furthermore, the program assigned a FitValue on each pharmacophore filtering run: as such, an additional criterium for keeping a compound was defined, wherein only molecules which had a FitValue > 1 for any of the pharmacophore runs were kept. Additionally, compounds which fit the pharmacophores for IL-15 were compared to the compounds which fit the pharmacophores for IL-2; if a compound fit both the pharmacophores, it was eliminated.

Virtual Screening At this point, with a set of molecules filtered to contain only the ones which fit onto the interface between IL-15/IL-2Rβ and not the ones which fit onto the interface between IL-2/IL-2Rβ, the docking protocol itself could move forward. As mentioned in the computational details chapter, this was performed using LigandFit, available in DS.

In order to test out different options for a ligand (which would then define the binding site for the molecule to attach to), a small portion of 200 compounds was extracted from the full database and docked onto the different tested grid regions. These were compared based on the number of compounds (out of 200) docked, considering the poses that had been rejected and the amount of time it took for each run. An artificial ligand was built, based on the scaffold defined by residues on IL-2Rβ: Gln70, Lys71,

His133 and Tyr134. This ligand was extended artificially, using -CH2-CH3, in order to reach Ser58 on IL-15. In this case, the volume of the binding site was X, with a grid containing X points.

(Figure with the ligand)

Considering the docking itself, the protocol was performed using the CFF force field, with a penalty of 200 kcal/mol being applied for molecules which fell outside of the defined grid, in order to bias against larger molecules which could potentially bind well

59 onto the defined region but still be present on the outside (and thus potentially not being selective). A softened Lennard-Jones 9-6 potential was used for the calculation of the vdW energy, such that as the distance of separation approaches zero, the potential rises to a smaller value, instead of increasing without limit. As such, close protein-ligand contacts are less penalized. The energy calculation was cut-off at 10 Å. The conformational search was performed using a Monte Carlo trial method, with the maximum internal energy set to 10 000 kcal/mol. The torsional step size for polar hydrogens was 30 Å. An additional step with 10 Steepest Descent rigid body minimizations was then performed, with 10 poses for each ligand being saved. These poses were clustered to 1.5 Å, which means that if two poses had a RMS value < 1.5 Å, only the one with the best DockScore was kept.

In order to check for potential biases in the scoring function, the molecular properties of every ligand were considered and they were ranked based on molecular weight vs docking score.

(Figure example of MW vs dockscore)

After confirming that there was no bias on the molecular weight, different scoring functions were used in order to rank the compounds: Ligscore1, Ligscore2, 282 PLP1, PLP2, 308 Jain 309 and PMF. Keeping only the best pose, the compound library was further filtered based on a consensus scoring: only the compounds which performed on the top 10 percentile for the six scoring functions (CS=6) were kept. These compounds were then subject to a binding energy calculation, wherein some residues of IL-15 were kept flexible: Asn4, Ser7, Asp8, Lys10, Lys11, Ala57, Ser58, Asp61, Glu64, Asn65, Ile68, Leu69 and Asn72. Furthermore, the binding energy obtained at this step was divided by the number of heavy atoms on each ligand, thus obtaining the ligand efficiency. This enabled the obtention of the final ranking of the filtered compound library.

This process, described herein for the crystallographic structure of IL-15, was then repeated for the average structure obtained from the 75 to 100 ns window of the MD simulation. A ligand was built based on the same residues mentioned for the 2 crystallographic structure, with 919.5 Å volume and a grid consisting of 7356 points, wherein the full library of compounds was then docked.

(Figure of the ligand)

60

In order to further filter out the compounds which would perform better for IL-2, the molecules obtained from the protocol described for the crystallographic structure of IL- 15 were then docked onto the IL-2 crystallographic structure. In order to do this, a ligand was prepared for IL-2, based on the same conditions as the one for IL-15, resulting in a region of 929.1 Å2 volume, containing 7433 points. The same principle was applied for the MD simulations structures, with the molecules resulting from the docking onto the MD snapshot from IL-15 being docked onto the same timeframe snapshot for IL-2, with an analogous ligand being built (959.6 Å2 volume, 7677 points).

(Figure of ligands)

The final list of compounds consists of a comparison between both crystallographic structures, yielding a ranked list based on ligand efficiencies; the same compound will have a placement on the ranking for IL-2 (RankIL2) and IL-15 (RankIL15). Therefore, for each compound, one can subtract the RankIL2-RankIL15 to obtain a ΔRank. These values can then be normalized, with the lowest values being 0% and the highest corresponding to 100%. Finally, the chosen compounds are the ones which are over a certain threshold of normalized rank (we chose > 60%), meaning that they performed significantly better for IL-15 than for IL-2. This process is then repeated for the structures which originated from the MD simulations, corresponding to the same comparison, obtaining the final compounds which will be tested using experimental means.

Results and Discussion

The intricacies of identifying a compound which targets a specific protein have been extensively mentioned throughout this document. In fact, it is important to keep in mind the specific challenges inherent to the design of a small molecule capable of inhibiting the interaction between a protein and its receptor. Indeed, in this case the target in question is IL-15, either through its interaction with its IL-2R, preferably, or, less desirably, through its interaction with c. To this end, the work performed herein was designed such that, at an initial stage, the most relevant interactions between IL-15 and its receptors should be identified in significant detail, with key residues being defined as hot spots; following this, the hot spots conceived in the previous point

61 should be further explored, namely through the use of mutated models to evaluate the effect of changing each residue individually; lastly, the information garnered in the previous two points should be taken advantage of, in order to perform a robust Virtual Screening campaign which would be able to accurately identify compounds which will, ideally, potently inhibit IL-15, while not having any activity towards IL-2. Throughout this chapter, these different phases will be detailed and expanded upon.

Multimeric

2.1. The quaternary structure impacts the flexibility of the IL-15 receptor interfaces

In order to probe the effect of the quaternary structure on the interface dynamics, we first compare the RMSDs of various IL-15 components (whole chain, interfacial sites) as a function of the multimeric state of the wild-type IL-15.

Figure X shows the deviations obtained along the simulation time for the whole IL-15 chain in the various multimeric species, with Table X evidencing the statistical analysis performed for this quantity in the different models.

62

Figure 2. RMSDs plots of the C carbon atoms of the whole IL-15 chain over 200 ns of MD simulations.

Table 1. RMSD Statistics (min, max, average and corresponding standard deviation) for the various IL-15 multimeric models.

Model IL-15:IL- IL-15:IL-15R:IL- IL-15 IL-15:IL-15R Statistics 2R:c 2R:c

Min 0.77 0.72 0.63 0.90

Max 2.70 2.50 3.40 2.43

Average 1.85 1.82 2.55 1.91

Std1 0.36 0.20 0.48 0.24

1 Standard deviation

It appears from these results that the flexibility of the IL-15 chain is strongly dependent on its quaternary structure. Through this global analysis, we observe that the IL-15/IL- 15R dimer appears slightly more “restricted” in terms of conformational flexibility, when compared to the free monomeric IL-15. In addition, RMSD values of apo IL-15 and its bound IL-15R complexes behave rather similarly after 100 ns while the absence of the IL-15R receptor, i.e. the trimeric form (IL-15/IL-2R/c) appears to induce greater deviations.

In order to gain deeper insights, we broke the RMSD analysis down into the specific IL-15 structural elements. Figure X shows their RMSD in the four heteromeric models.

63

Figure 3. Average RMSDs values together with their standard deviations, calculated for each specific structural element of the IL-15 chain over 200 ns of MD simulations.

Through the analysis of Figure X, different features are identified following the various structural elements considered (A-D helices and associated loops). It is clear that the largest standard deviation associated to the average RMSD value is, in all but one case (the only exception is loop C-D), obtained for the free IL-15 chain (black dots/lines in Figure X), which is indeed expected to present the largest conformational freedom. Accordingly, we observed that for most structural elements, the smallest RMSD variation range is the one of the tetramer (orange in Figure 3), in agreement with the fact that this structure is the most constrained, all the IL-15 interfaces being involved in their corresponding contacts. Considering the average RMSD values, these also revealed interesting trends. In fact, if we consider helix B, whose residues are in contact with the IL-15R receptor, RMSDs are decreasing in the order:

IL-15 (1.55 ± 0.45 Å) > IL-15/IL-15R (1.14 ± 0.28 Å) ~ IL-15/IL-15R/IL-2R/c (1.21 ± 0.14 Å).

However, interestingly, the trimeric form (IL-15/IL-2R/c), in which the helix B residues are “free of contacts”, shows the largest RMSD value (1.78 ± 0.46 Å). A

64 similar behavior is apparent for helix A, which is known to participate in contacts with residues of the IL-2R chain. Indeed, the corresponding order of evolution of the RMSD is the following:

IL-15 (1.80 ± 0.45 Å) > IL-15/IL-2R/c (1.53 ± 0.21 Å) ~ IL-15/IL-15R/IL-2R/c (1.61 ± 0.19 Å).

For this element, the form with the largest RMSD value (1.99 ± 0.33 Å) is the IL-15/IL- 15R dimer, for which helix A conserves its conformational flexibility, the interface being free. Similar trends could not be drawn from the consideration of the corresponding values for the A-B and C-D loops, which showed the largest amplitude of structural variations, as should be expected for loops. One behavior, however, deserves a special attention. Indeed, the C-D loop presents a very large average RMSD (4.93 ± 1.15 Å) for the trimeric form compared to the other forms. In fact, this behavior can be rationalized by the fact that residues of this loop are involved in the IL-15R interface, remaining therefore exposed to the solvent in the IL-15/IL-2R/c trimeric receptor, thus compensating for the constraints imposed by the presence of the two other receptor chains.

2.1.2. RMSF

To get a complementary picture of the flexibility of the various IL-15 components pointed out in the previous section, we have then turned to RMSF analyses. Figure 4 shows the results obtained for the various multimeric models considered.

65

Figure 4. RMSF of C carbon atoms of IL-15 residues in the investigated multimeric models. The shaded areas indicate the IL-15 interfaces residues interacting with IL- 15R (red), IL-12R (green), and c (orange). Helices A (dark blue), B (purple), C (light blue) and D (teal) of IL-15 are also represented.

Figure X confirms the trends drawn from the RMSD analyses, since it is possible to denote a different flexible character according to the position of the amino acid residues, for the different studied models. Globally, the RMSF values are significantly larger for the free IL-15 structure (black line), in line with its larger flexibility as a monomer, being unrestricted at all interfaces. However, such a behavior is not observed for loops A-B and C-D.

The examination of the IL-15 specific structural elements in the three multimeric models also provides a complementary description to the previous RMSD analyses. As such, the position of the loops is clearly pointed out from the RMSF plot, with a maximal RMSF value corresponding to the A-B (residues 20 to 36) and C-D (residues 78 to 96) loops of IL-15. For helix A, it appears that the largest RMSF values are obtained for the free IL-15 chain, compared to the trimeric complex IL-15/IL-2R/c exhibiting the lowest value, a trend which is consistent with the interaction of helix A residues with the IL-2R chain (ref). In the same vein, the dimeric complex (IL-15/IL- 15R) exhibits higher RMSF values for helix A, but they are lower than for the free IL-

66

15 chain, suggesting a slight stabilizing effect of IL-15R. The situation is different for helix B, since the largest RMSF values are found for the IL-15/IL-2R/c trimer, followed by similar values for the free IL-15 chain and the IL-15/IL-15R dimer, with the lowest values being obtained for the tetrameric model. This behavior is in agreement with the fact that helix B residues are known to be involved in the interface with the IL-15R chain. (ref) It is worth noting that a specific region corresponding to the end of helix B (from 52 to 54) and the B-C loop (residues 55 and 56) shows large values in the trimer and the monomer. For helix C, a similar profile than the one obtained for helix A could be discerned, presenting an increased conformational stability, with less structural fluctuations. For helix D, only the free IL-15 chain behaves differently from the three other species. Indeed, on average, the RMSF values are significantly higher for the IL-15 monomer compared to the three other models, which behave very similarly. Lastly, it is remarkable that for both A-B and C-D loops, the largest structural fluctuations are obtained for the trimeric IL-15/IL-2R/c receptor, in agreement with the fact that those residues are involved in contacts with IL-15R chain amino acids and therefore, with full conformational freedom in the trimer.

2.2. The quaternary structure impacts the structural features of the various interfaces

We then compare the number of contacts at the various interfaces between IL-15 and its receptors as a function of the investigated multimeric models. Table X clearly evidences a different number of contacts at each interface. Indeed, the average number of contacts involving the IL-15R chain is higher than 40, whereas the corresponding values are around 30 and 20 for the IL-2R and c chains, respectively.

Table 2. Number of contacts at the various interfaces for the three multimeric (dimer, trimer and tetramer) forms of IL-15. The number in parentheses corresponds to the standard deviation.

Interface Model Dimer Trimer Tetramer

IL-15:IL-15R 43 (4) - 41 (3)

IL-15:IL-2R 28 (4) 26 (4)

67

IL-15: c 18 (5) 18 (7)

Such a significant difference can be correlated to the higher affinity of IL-15 for the IL- 15R chain (between 7-40 pM) compared to the corresponding quantity for IL-2R/c complexes (13.5 nM). (Anderson, J Biol Chem 1995, Mami Chirifu et col., 2007, Nature Immunol.)).

2.3. Highlighting novel key structural features at IL-15 interfaces

In order to get deeper details from our MD simulations, the different interface features were then scrutinized using a pairwise amino acid analysis of the interactions across the various interfaces. Tables X, X and X detail the interactions observed in the different models.

Table S1. Residues, atoms and corresponding distances and percentage of presence along the simulation time for the interface contacts predicted by the MD simulations for the IL-15:IL-15R complex in the dimeric and tetrameric receptors.

Dimer Tetramer Dimer Tetramer

IL-15 IL-15R d(H…A) d(H…A) Percentag Percentag Å Å e (%) e (%)

Residue Atom Residue Atom

Asp22 OD2 Arg24 HH11 2.27(1.42) 87

Asp22 OD1 Arg26 (NH1)H 1.83 (58) 1.74(16) 97 100

CB(HB) Arg26 (NH2)H 2.83(42) 2.80(33) 73 73

Thr24 (C)O Arg35 (NH2)H 2.68(47) 2.98(41) 74 50

Leu25 (CA)HA Arg35 (NH2)H 2.36(20) 2.45(20) 100 100

Tyr26 O(H) Lys34 CB(HB) 2.71(24) 2.97(32) 89 59

Tyr26 (O)H Arg35 O(C) 1.81(13) 1.79(12) 100 100

68

Tyr26 (CD2)H Ala37 (CB)HB2 2.49(23) 98,00

Tyr26 (CD2)H Ala37 (CA)HA 2.52(22) 98

Leu45 HD22 Ala37 CB(HB1) 2.48(37) 94

Leu45 (CB)HB Ala37 CB(HB3) 2.61(41) 84

Leu45 (C)O Gly38 (CA)HA2 2.54(21) 2.44(20) 98 99

Glu46 OE1 Arg35 (NH2)H 1.66(07) 1.66(07) 100 100

Glu46 OE2 Ala37 (CA)HA 2.55(25) 2.49(22) 96 98

Glu46 OE2 Gly38 HN 1.81(12) 1.78(12) 100 100

Gln48 (CB)HB2 Gly38 (CA)HA2 2.81(47) 3.66(94) 72 26

Val49 (CG2)HG22 Arg35 (NH1)HH12 2.70(34) 2.69(30) 84 84

Val49 (CG2)HG22 Gly38 (CA)HA2 2.35(25) 2.42(26) 98 97

Val49 (CG2)HG22 Thr39 (C)O 2.93(29) 2.89(27) 64 70

Val49 (CG2)HG11 Ser40 (CB)HB2 2.39(26) 2.22(22) 98 99

Leu52 (CB)HB2 Ser40 (CB)HB2 2.76(48) 2.59(34) 73 88

Leu52 (CD2)HD22 Leu42 (CB)HB2 2.93(75) 73

Leu52 (CD2)HD22 Ser60 (OG)HG1 2.78(47) 76

Glu53 (C)O Arg24 (NH1)HH12 3.35(1.35) 40

Glu53 OE1 Arg26 (NH2)H22 1.67(08) 1.67(08) 100 100

Glu53 OE2 Ser40 (OG)HG1 1.95(32) 1.88(19) 99 100

Glu53 (CG)HG2 Leu42 (CB)HB1 2.41(27) 2.42(23) 98 98

Cys88 (CB)HB1 Ala37 (CB)HB2 2.41(28) 2.67(29) 98 87

Glu89 (CB)HB1 Lys34 (CE)HE1 3.34(1.28) 2.25(35) 51 98

Glu89 (CG)HG2 Arg35 (C)O 3.06(39) 3.10(47) 51 49

Glu89 (CG)HG2 Lys36 (CA)HA 1.93(36) 1.91(34) 99 99

Glu89 OE2 Ala37 HN 1.94(21) 2.00(19) 100 100

Glu89 (CG)HG2 Ile64 (CD)HD3 2.41(30) 2.37(28) 96 96

Glu90 (CA)HA Lys34 HZ2 4.10(1.90) 2.50(84) 33 70

Glu90 (CG)HG2 Pro67 (OG)HG1 4.37(2.25) 49

Glu93 OE1 Arg35 (NE)HE 1.83(25) 2.07(75) 98 87

69

Table S2. Residues, atoms and corresponding distances and percentage of presence along the simulation time for the interface contacts predicted by the MD simulations for the IL-15:IL-2R complex in the trimeric and tetrameric receptors.

Trimer Tetramer Trimer Tetramer

IL-15 IL-2R d(H…A) Å d(H…A) Å Percentag Percentag e (%) e (%)

Residue Atom Residue Atom

Asn4 (ND2)HD22 Thr74 (OG1)HG1 3.21(1.20) 59

Asn4 (ND2)HD21 Tyr134 (CB)HB1 2.68(23) 2.73(61) 93 83

Ser7 (OG)HG1 His133 (CB)HB2 2.87(51) 68

Ser7 (CB)HB1 Tyr134 (CD2)HD2 2.95(67) 6.21(3.20) 66 32

Ser7 (OG)HG1 Glu136 OE2 3.70(2.15) 45

Asp8 OD1 Tyr134 (OH)HH 1.85(39) 2.57(97) 96 71

Lys11 (CB)HB2 His133 (CD2)HD2 3.15(1.15) 55

Ser58 (CA)HA Ser69 OG 3.53(74) 2.88(67) 27 73

Asp61 OD2 Leu69 (CG)HG1 2.18(82) 3.15(1.58) 82 54

Asp61 (CB)HB1 Gln70 (C)O 3.27(45) 3.14(47) 83 47

Asp61 OD2 Lys71 (NZ)HZ1 1.82(33) 1.92(75) 98 93

Glu64 OE1 Arg42 (NH1)HH11 2.18(60) 2.36(88) 85 80

Asn65 OD1 Arg42 (NH1)HH12 1.87(16) 1.85(13) 100 100

Asn65 (ND2)HD22 Gln70 (C)O 2.05(24) 2.16(32) 100 98

Asn65 (CB)HB1 Thr73 (CH)HG23 2.34(21) 2.44(21) 100 99

Asn65 (CB)HB1 Tyr134 (OH)HH 3.52(51) 3.57(1.26) 15 45

Ile68 (CD)HD2 Lys41 (CG)HG1 2.64(47) 85

Ile68 (CG2)HG21 Lys41 (CD)HD2 2.37(25) 98

Ile68 (CD)HD2 Arg42 NH2 3.25(58) 35

Ile68 (CD)HD3 Arg42 (CD)HD1 3.12(61) 50

70

Ile68 (CG2)HG21 Thr73 OG1 3.19(40) 35

Ile68 (CG2)HG23 Thr73 (OH1)HG1 3.11(45) 48

Ile68 (CG2)HG21 Val75 (CB)HB 2.44(25) 97

Ile68 (CG2)HG22 Val75 (CG2)HG22 2.81(37) 74

Leu69 (CD2)HD21 Thr73 (CB)HB 2.87(89) 73

Leu69 (CB)HB2 Thr73 (CG2)HG23 2.58(33) 88

Leu69 (CD)HD12 Thr74 (OG1)HG1 3.53(1.63) 51

Leu69 (CA)HA Val75 (CG2)HG22 2.46(36) 2.44(32) 90 94

Leu69 (CD1)HD12 Tyr134 (OH)HH 4.30(2.28) 34

Asp72 (CB)HB2 Val75 (CG1)HG12 2.35(21) 99

Table S3. Residues, atoms and corresponding distances and percentage of presence along the simulation time for the interface contacts predicted by the MD simulations for the IL-15:c complex in the trimeric and tetrameric receptors.

Trimer Tetramer Trimer Tetramer

IL-15 c d(H…A) Å d(H…A) Å Percentag Percentag e (%) e (%)

Residue Atom Residue Atom

Val3 (CA)HA Leu208 (CD2)HD22 2.83(62) 67

Val3 (CG2)HG22 Leu208 (CD2)HD22 3.05(91) 63

Ile6 (CG2)HG21 Pro207 (CB)HB2 4.13(1.46) 33

Ile6 (CD1)HD1 Leu208 (CA)HA 3.95(1.53) 39

Asp30 (CB)HB1 Lys70 (CB)HB1 3.18(1.53) 53

Asp30 OD1 Lys70 (NZ)HZ2 3.18(2.40) 69

Asp30 (CB)HB1 Asn71 (ND2)HD22 2.72(1.11) 6.04(2.50) 82 13

Asp30 OD1 Thr105 (OG1)HG1 3.68(2.15) 48

Asp30 OD2 Lys125 (NZ)HZ1 4.45(2.55) 34%

Val31 O Asn71 (ND2)HD21 5.64(2.07) 11%

71

Val31 O Asn71 (CB1)HB1 4.01(1.67) 32%

His32 (CA)HA Asn71 (CB1)HB1 3.12(1.57) 71%

Pro33 (CD2)HD2 Gln104 (NE2)HE21 3.78(1.40) 42%

His105 (CE1)HE1 Thr105 O 3.20(1.64) 69%

His105 NE2 Lys125 (CE)HE1 3.63(1.08) 37%

His105 NE2 Lys125 (NZ)HZ1 3.95(1.99) 45%

His105 (ND1)HD1 Gln127 OE1 3.63(1.64) 51%

His105 NE2 Asn128 (ND2)HD21 4.02(2.30) 52%

Gln108 (CB)HB1 Tyr103 (CE1)HE1 3.08(87) 7.56(2.81) 55% 13%

Gln108 (CB)HB1 Gln127 (NE2)HE22 3.33(1.17) 49%

Gln108 OE1 Gln127 (NE2)HE22 2.42(61) 85%

Gln108 (NE2)HE22 Pro207 O 4.50(1.47) 3.61(1.68) 11% 48%

Gln108 (CG)HG2 Leu208 O 3.02(95) 3.00(1.21) 63% 76%

Gln108 (CB)HB1 Cys209 (CA)HA 3.60(1.22) 36%

Interestingly, the majority of the interactions having the highest percentages of occurrence within our simulations are obtained for the IL-15/IL-15R interface. Indeed, the average percentages of occurrence of the various interactions for this interface are of 78 and 86%, respectively for the dimeric and trimeric complexes, whereas the corresponding values are 64 and 67%, and 45 and 39%, for IL-15 interfaces with IL- 2R and c chains in trimeric and tetrameric complexes. In agreement with the crystallographic structures, both in the dimeric (2Z3Q, ref) and tetrameric (4GS7, ref) forms, the key role played by charged residues of IL-15 is evidenced by our simulations. Particularly, the salt bridges between Asp22 (A-B loop), Glu46 and Glu53 (helix B) of IL-15 with the positively charged lateral chains of Arg26 and Arg35 on the IL-15R receptor were virtually conserved throughout the whole simulation (percentages higher than 97%) (Figure X). Moreover, short hydrogen-bonds occurring during the full time length of the calculations are worth noticing. They correspond to interactions between the phenolic OH group of Tyr26 (A-B loop of IL-15) with the main carbonyl chain of Arg35 (IL-15R) and the carboxylate group of Glu53 (helix B of IL- 15) with the alcohol OH group of Ser40. We also note the occurrence of Van der Waals

72 interactions between apolar amino acid side chains present on each side of this interface that proved to be conserved all along the simulation time (interactions between methylene groups of the lateral chains of Glu53 (helix B) and Glu89 (C-D loop) of IL-15 with Leu42 and Ile64 of IL-15R, respectively). The agreement between our theoretical results and the experimental data for the IL-15/IL-15R interface makes us confident in the interest of our model. Even more interestingly, new interactions between amino acid residues pairs that have never been reported in the experimental works are predicted. Two of them are short hydrogen bonds involving acidic amino acid side chains: the carboxylate of Glu46 (helix B) and Glu89 (C-D loop) of IL-15 with the main chain NH of Gly38 and Ala37 of IL-15R. Another novel feature corresponds to a salt bridge between the carboxylate group of Glu93 of IL-15 (C-D loop) and the ammonium of Arg35. Weaker van der Waals interactions involving CH groups of amino acid side chains are also revealed by our work: between the CH of Leu25 (A-B loop of IL-15) and NH of Arg35 (IL-15R), and methylenic CH of Cys88 (C-D loop of IL-15) and Ala37 (IL-15R), on the one hand, and of Glu89 ( C-D loop of IL-15) and Lys36 (CH, (IL-15R)), on the other hand. In fact, if these interactions do not appear in the earlier crystallographic structural analyses, they all involve amino acid residues already evidenced as contributors for interactions at the interface.(ref) Therefore, for the IL-15/IL-15R interface, our simulations allow the discerning of new, as well as the confirming of established, structural features. In particular, our data confirm the relative importance of amino acid residues at the interface and clarify their role at an atomic level. It is interesting to note that all the interactions discussed above are systematically observed both in the dimer and the full receptor, with very similar features.

73

Figure 5. Representation of key interactions at the IL-15/IL-15R interface for the full system (left) and the dimeric system (right).

For the IL-15/IL-2R interface, noticeable differences are obtained compared to the IL-15/IL-15R interface. First, as already mentioned, the percentage of occurrence of the various contacts is significantly lower compared to the IL-15/IL-15R interface, in line with the moderate affinity of this complex (Balasubranian Int Immunol 1995). As a matter of fact, while the number of contacts in the trimer and tetramer forms remains similar, the interactions tend to be less conserved, the chemical fragments involved in the trimeric complex being in some situations different to the ones in interaction in the tetramer. Remarkably, among the various interactions observed, salt bridges are much

74 less numerous for this interface than in the case of the IL-15/IL-15R. Indeed, only one interaction of this kind is observed, between the carboxylate group of Asp61 (helix C of IL-15) and the ammonium lateral chain of Lys71 (IL-2R). The most frequent interactions are hydrogen bonds. Two residues of IL-15 appear to play a pivotal role in such interchain hydrogen-bond interactions: Asp8 (helix A) and Asn65 (helix C). Indeed, one hydrogen-bond is kept all along the simulation time, involving the carboxylate group of IL-15 Asp8 and the phenolic OH of IL-2R Tyr134 (Figure X). Another one concerns the OH of IL-15 Ser7 (helix A) and the carboxylate group of IL- 2R Glu136. Interestingly, the amide group of the Asn65 lateral chain uses both its hydrogen-bond donor and acceptor potential through the NH2 group (with the main carbonyl of Gln70), and the C=O (with the ammonium of Arg42) fragments. Furthermore, one of the methylene groups of the lateral chain of Asn65 is in van der Waals contact over most of the simulation time (99.7 and 98.6% in the trimer and the full tetramer, respectively) with a methylene group of Thr73. The importance of these residues for both IL-2 and IL-15 has already been highlighted by mutagenesis studies in 2012 (Nature Immunol12). In addition, Ile68 and Leu69 residues of IL-15 are predicted to be in van der Waals contacts (through fragments of their aliphatic chain) with a significant occurrence (from 35 to 98%) along the simulation time with several IL-2R residues (Lys41, Arg42, Thr73, Thr74, Val75). In this case, the fragments involved in the various contacts are not always identical in the trimer and the tetramer, highlighting the higher flexibility of these groups, in line with the weaker specificity of such interactions.

Figure 6. Representation of key interactions at the IL-15/IL-2R interface for the full system (left) and the trimeric system (right).

The IL-15:c complex appears clearly to be the least stabilized, with significantly lower occurrence of the interactions highlighted, compared to the other IL-15 interfaces, in

75 line with the difficulty to measure the affinity of this complex (Balasubramanian Int Immunol 1995). As a consequence, the differences obtained for the trimer and the full tetramer are the most significant. Two residues of IL-15 appear involved in hydrogen bond interactions at this interface: His105 and Gln108, the first with the NH of the imidazole ring as hydrogen bond donor, the carbonyl oxygen of Gln127 (c) behaving as the hydrogen bond acceptor, the second through the NH2 group of the amide lateral chain with the main chain carbonyl oxygen of Pro207 (c). The respective occurrence of these interactions, of 51 (His105:Gln127) and 48% (Gln108:Pro207) highlight, nevertheless, substantial structural rearrangements, in line with the significant flexibility of this interface. The importance of residue Gln108 for both IL-15 and IL-2 (Gln126) has been confirmed by mutagenesis studies (Pettit JBC 1997, Collins PNAS 1988) and described in the earlier crystallographic structural analyses (Ring Nature Immunol 2012, Wang Science 2005). Other fragments of some of these residues (Gln108), together with other ones of IL-15, are involved in interactions with a more pronounced van der Waals character. For instance, Ile111 and Asn112 of IL-15, whose lateral chains are involved in contacts with apolar fragments of c chain residues with significant occurrence (from 63 to 78%), (see Table X in Supporting Information)

76

Figure 7. Representation of key interactions at the IL-15/ c interface for the full system (left) and the trimeric system (right).

2.4. Water molecules are essential in the stabilization of the different interfaces

From previous crystallographic studies, the importance of water molecules in the high affinity complex between IL-15 and some of its receptors, especially IL-15R, has been emphasized [ref]. In this work, we have therefore scrutinized the structural

77 features of water molecules at the vicinity of the various interfaces (see the methodology section). Table X presents the results obtained for the various interfaces in the various multimeric species.

Table 6. Percentage, along the simulation time, of water molecules in hydrogen-bond interactions with amino acid residues across the various interfaces of IL-15.

Bridged amino acid residues

IL-15 IL-15R % in the dimer % in the tetramer

Glu53 Ser41 76 83

Glu93 Arg35 48

Glu53 Glu44 42

Asp24 Arg26 28

Glu53 Arg24 52 26

Asp22 Arg24 34

Asp22 Arg26 31

Glu89 Ala37 22 19

Glu89 Lys34 22 15

Glu93 Arg35 19

Glu89 Lys36 18 20

Glu92 Lys34 16

Glu89+Leu91 Lys34 14

Glu90 Pro67 13

Tyr26+Glu93 Arg35 11

IL-15 IL-2R % in the trimer % in the tetramer

Glu64 Arg43 63

Asn4 Tyr134 47 19

Asp8 Tyr134 40

78

Asn1 Thr74 38 17

Ile68 Arg41 36

Glu64 Trp44 26

Lys11 His133 26 38

Glu64 Arg41 35

Lys11 Asp68 23

Asp61 + Glu64 Arg42 16

Asn4 Gln188 13

Asp8 His133 13

Ser7 His133 15 13

Asn1+Asn4 Thr74 15

Asn72 Arg41 14

Asp8 Gln70 14

Glu64 Arg42 13

Asp61 Arg42 12

Asp61 Ser69 12 12

Asp61 Lys71 11 13

Glu64 Lys71 11

Asp61 Ala66 11

Thr62 Gln70 10 12

Asp61 Gln70 11

Glu64 Arg42 11

IL-15 c % in the trimer % in the tetramer

Asp30 Lys70 19

Asp30 Asn71 30 15

Gln108 Leu208 14

Gln108 Pro207 13 18

79

Asp30 Thr105 16

His105 Gln104 12

Gln108 Gln127 12 12

His105 Tyr103 11

His105+Gln108 Gln127 10

Gln108 Ser211 11

Val31 Asn71 11

His107 Gln127 11

Asp30 Lys125 10

The trends highlighted by these results are in concordance with the ones pointed out through the other descriptors. Indeed, it appears that the highest percentages of presence of water molecules are obtained for the IL-15:IL-15R interface, in line with the very high affinity reported for this complex (Anderson, J Biol Chem 1995, Sakamoto J Mol Biol 2009) and the previously suggested key role of two water molecules. Interestingly, our results show a significant difference according to the quaternary structure since in the dimer, the number of contacts is significantly higher (12 instead of 8); some of the contacts being more frequent in the full system (around 83% of the simulation time) than in the dimer. Among the various hydrogen-bond networks established around these “conserved” bridging water molecules, a special role should be assigned to the one involving the Glu53 residue of IL-15 and the Ser41 residue of IL-15R (Figure X). Indeed, this specific water molecule coincides with the position occupied by a water molecule with a particularly low B factor in the crystallographic structure of the IL-15:IL-15R,(ref) this position being conserved, with a higher percentage (83 instead of 76%), in the full receptor. It is worth noting that the second position appearing for this interface as the most “occupied” by a water molecule corresponds to another important region of the IL-15:IL-15R interface, involving charged residues (Glu93:Arg35 (almost 50% in the full receptor) and Glu89:Lys36, around 20%) and reported in the experimental studies (ref).

80

Figure 8. Representation of key bridging water molecules at the IL-15/IL-15R interface for the full system (left) and the dimeric system (right).

Remarkable features can also be pointed out for the IL-15:IL-2R interface. In fact, bridging water molecules have previously been evidenced in the IL-2/IL-2R interface through His133 and Thr134 of IL-2R interacting with Asp20 of IL-2 from x-ray crystallography studies (‘ref Science, Wang., 2005). Despite the resolution of the quaternary IL-15:IL-15R:IL-2R:c complex, no discussion on the specific behavior of water molecules at the various interfaces was carried out by Garcia and coworkers (Nature Immun. 2012). Interestingly, for the IL-15:IL-2R interface, among the contacts with the highest occupancy, His133 and Thr134 were involved (Asp8:Tyr134 (40%); Lys11:His133 (38%)), (Figure 9). Indeed, bridging water molecules involving these residues are predicted by the MD simulations, both in the trimer and the tetramer. However, it was surprising to see that the number of contacts at the IL-15:IL- 2R interface appear to be significantly higher (14 compared to 8 in the quaternary IL- 15 complex) than in the IL-15:IL-15R interface. Although the number of contacts is higher for this interface, the same residues are involved, which evidences a higher dynamic character of the interface, which could be in line with a lower affinity of the IL- 15:IL-2R complex compared to IL-15:IL-15R (Table X). Another region, not mentioned by relevant experimental studies, is apparent from our MD simulations. This

81 area involves other polar and/or charged residues of both chains (IL-15: Asp61, Thr62; Glu64; IL-2R: Arg42, Ser69, Gln70).

Figure 9. Representation of key bridging water molecules at the IL-15/IL-2R interface for the full system (left) and the trimeric system (right).

Lastly, both in terms of number of contacts and of occupancy percentage, the water molecules at the IL-15:c appear much less conserved, in agreement with the weaker stability of this interface. However, there are relevant positions which should be mentioned. Precisely, Gln108 interacts with a plethora of different residues located on the c receptor, with relevant interactions with Pro207 (15 and 13% for the two respective systems) and Gln127 (12% for both systems) being evidenced. Furthermore, there is a water molecule mediating the interaction of the aforementioned residue (Gln108) with Leu208 exclusively in the trimeric system (14%) and with Ser211 exclusively in the tetrameric system (11%), further suggesting an important role of this residue in the establishment of the IL-15:c interface. These results corroborate the ones outlined in the previous section.

82

Figure 10. Representation of key bridging water molecules at the IL-15/c interface for the full system (left) and the trimeric system (right).

Mutations

The analysis of the different wild-type multimeric species described in the previous section led to the identification of key residues in the establishment of the different interfaces. These residues can now be mutated, in order to clarify the effect their absence has in the overall structure of IL-15. Based on previous knowledge of the IL- 15 system and on the results obtained in the first part of this study, in collaboration with the biologist partners of this project, eight residues were chosen: Asp8, Ser58, Asp61, Glu64, Asn65, Ile68, Leu69 and Asn72. The models were built according to

83 the procedure detailed in the previous chapter, being then compared one by one to the wild-type model, based on structural identifiers (RMSD, RMSF, individual interactions), as well as energetic components, through MM/GBSA. These results will potentially allow for the clear identification of hot-spots, as described in previous sections, and are presented next.

D8S Of the mutations studied throughout this work, located in the interface with IL-2Rβ, Asp8 is the only residue which is a part of helix A. This residue has been shown to be key to the activity of IL-15 (Pettit, 1997) and is conserved in IL-2 (analogous to Asp20).

(Figure with WT on the left and mut on the right)

As can be seen in Figure X, the wild-type residue was mutated to a serine. As such, the interactions at this position are expected to be disrupted, since the replacing residue (serine) is somewhat different, in terms of properties, to the residue being replaced (aspartate). Figure Y shows the results obtained for the RMSD in the IL-15 chain, for the mutant model versus the wild-type one, throughout the MD simulations.

Indeed, even if the differences between the two models are not significant, there are some particularities that should be mentioned. Through the induction of a mutation in

84 the model, the RMSD values increase immediately, in order to compensate for the shifts brought about by the mutation itself. This can be observed through the visible differences at the initial stages in the simulation, wherein the mutated model presents a higher RMSD than the wild-type one. At the halfway point of the simulation, however, the two models present similar RMSD values, with both structures being equilibrated at similarly different points from the initial models. When looking at the average value, taking the simulation as a whole, we obtain 1.63 Å for the mutated model vs 1.39 Å for the wild-type, indicating that both models have a remarkably low RMSD ceiling but show a slight difference. Hence, the mutation introduces a disorder in the system which is discernible through this analysis.

Moving towards the per-residue RMSF analysis, the picture painted by this comparison is similar, as excepted. When taking into account Figure X, once again only for the residues in the IL-15 chain, there is a tendency for the residues in the mutated model to have a very slightly higher RMSF value compared to the wild-type; this tendency is broken only for the A-B and C-D loops, wherein the wild-type shows slightly higher values.

In the previous section, the interactions established at the different interfaces were thoroughly analyzed. In this section, key interactions established at the IL-15/IL-15Rβ interface will be compared between the wild-type on its trimeric receptor form (IL-15/IL-

85

15Rα/IL-15Rβ/γc) and the mutated model, on the same multimeric state, on a more local level. In this case, we are interested to discern what the effects of the mutated residue are on its neighboring residues. Indeed, Figure X illustrates part of what happens throughout the entire simulation, with Ser8 in the mutated model establishing a very strong salt bridge with His133 of the IL-15Rβ (for 100% of the simulation); on the other hand, Asp8 intervenes to a lesser extent on the wild-type, playing a role in the interaction with Tyr134 instead. Furthermore, on the mutated model, the neighboring Ser7 also establishes a strong interaction with His133, as well as with Tyr134.

(Figure Asp8-Tyr134 side by side with Ser7-His133+Ty134/Ser8-His133)

Taking into account the MM/GBSA energies, there is no discernible difference between the wild-type and mutated models, as Figure X illustrates. However, the wild- type seems to destabilize in terms of energy, towards the end-point of the simulation, whereas the mutated model seems to maintain a somewhat stable value throughout. Indeed, the average values are somewhat different when taking into account the final 100 ns of the simulation, with the wild-type having -27.2 kcal/mol vs the -32.5 kcal/mol of the mutated model. This would seem to indicate that the mutated model is more stable than the wild-type and, thus, that this mutation is a stabilizing one.

86

As far as the total number of contacts goes, both models present a similar 21.3 (wild- type) vs 21.6 (mutated) value.

Taken as a whole, this ensemble of results allows to draw some interesting conclusions. On the one hand, there are no discernible differences on some of the quantities (RMSD, RMSF, number of contacts). On the other hand, the potential energy values are different; however, they seem to show the inverse tendency than one should expect. In fact, if we take into account the pStat5 signaling results seen in Figure X, it is noticeable that the D8S mutation inhibits this pathway. Indeed, it is useful to recall the results described for the specific interactions at the interface: the mutation leads to a reorganization of the interface, with the specific interactions being different on the mutated model; this fact could be the key to explaining the lack of signaling, as the novel interactions could prevent it from occurring.

(Figure with biology results, values? Bars?)

S58K The Ser58 residue on the helix C of IL-15 is an interesting one. Indeed, it had not been described in the literature as a potential interacting residue (and thus, as a hot-spot). Nevertheless, as was mentioned in the previous section, after the MD simulation on the trimeric receptor model, this residue popped out as an interesting prospect in the establishment of the IL-15/IL-15Rβ interface, a peculiar, unexpected fact, due to its position at the end of the helix C and being more solvent exposed. As such, a model was built, based on the replacement of this residue with a lysine.

(Figure with the position of Ser58)

When taking into account the differences in stability of this mutant and the wild-type model, it is readily apparent that both models present the same tendencies, with the latter presenting a slightly higher RMSD value than the former from the 100 ns point onwards, as can be seen in Figure X. Indeed, the average deviation for the mutated model throughout the simulation, 1.38 Å, is virtually the same as the aforementioned 1.39 Å for the wild-type.

87

Further confirming these structural similarities between the mutated S58K and the wild-type models, the evolution of the RMSF throughout the simulations can be seen in Figure X. Indeed, the comparison of this quantity in the two models leads to the same conclusion as the previous one: both models, on a global structural level, in what concerns IL-15 itself, are very similar. Similarly to what was observed for D8S, the A- B loop has higher fluctuations on the wild-type model; in the C-D loop they are virtually the same.

88

Moving towards the specific residue-residue interactions on the mutated residue and neighboring ones, on the wild-type Ser58 establishes an important interaction with Ser69 on the IL-2Rβ receptor for 46.4 % of the latter half of the simulation, a fact which lead us to pay special attention to this residue. Interestingly, on the mutated model, the Lys58 which replaced the serine residue forms a salt bridge with a residue which neighbors Ser69 (on IL-2Rβ), Asp68 on IL-2Rβ, in this case for 46.5 % of the final 100 ns. It is to be expected that Lys58, due to its positive nature, would establish a salt bridge with a negative residue. On the wild-type, interactions established between Asp61 and the beta receptor will be detailed on the next section; however, to compare them with the S58K mutated model, they will be briefly mentioned here: this residue also interacts with Ser69 on IL-2Rβ (for 52.8 % of the final 100 ns); furthermore, it interacts strongly with Lys71 (for 87.2 % of the analyzed period). In the mutated model, this residue establishes an interaction with the main chain O atom on Gln70 in the IL- 2Rβ, for 45.8 % of the time, as well as with the aforementioned Lys71 for 45.4 % of the time. These interactions are detailed on Figure X.

(Figure of WT Ser58-Ser69, mut Lys58-Ser69, WT Asp61-Ser69 and Asp61-Lys71, mut Asp61-Gln70 and Asp61-Lys71)

When taking into account the potential binding energy of IL-15 to its beta receptor on both models, the same tendency as previously mentioned for D8S is observed,

89 wherein the mutated model seems to be more stabilized than the wild-type one. Indeed, this is discernible on Figure X, which corresponds to an average value of - 33.9 kcal/mol for S58K vs the aforementioned -27.2 kcal/mol for the wild-type model.

Furthermore, the average number of contacts throughout the simulation tells a similar story, with S58K presenting 22.8 contacts vs 21.3 on the wild-type. Indeed, in the ensemble of results for this mutant it would seem that this mutation is a stabilizing one, with a general improvement in all analyzed quantities. This is further corroborated by the results obtained from experimental studies, where the mutation of Ser58 shows no effect on a trans-presentation trimeric receptor context; however, in a cis-presentation context, it does show a significant effect.

(Figure of bio results)

D61K Moving forward on helix C, the next residue which establishes interactions with IL-2Rβ is the Asp61. This residue is conserved in IL-2 (analogous to Asp84), being expected to behave similarly and interact in the same way with IL-2Rβ. As can be seen in Figure X, this residue neighbors the aforementioned Ser58, as well as the Glu64 which will

90 be detailed next. The results obtained for the mutation of the Asp61, replacing it with a lysine residue, are detailed in this section.

(Figure with wt and mut)

There is a clear destabilizing effect, when analyzing the RMSD of the mutant in comparison to the wild-type, stemming from the introduction of the mutated residue in the 61 position. This is all the more apparent in the earlier stages of the simulation, as seen on Figure X, where the RMSD values for the mutated model show a very high instability. This is somewhat mitigated in the latter portion of the simulation, where these values end up stabilizing at a value not much higher than the wild-type. Notably, the average RMSD is of 1.70 Å for the mutated model, vs 1.39 Å for the wild-type. This value is not high when taken by itself, but through this comparison highlights the destabilization induced by the mutation.

When taking into account the RMSF values, there are some interesting points to note. Although the tendency already described for the A-B loop, where the wild-type seems to consistently present a higher RMSF value than whichever mutant is being analyzed, in this case the instability introduced by the mutant is clear when looking at helices A and C. At the residues in the 1-19 and 57-77 regions (respectively corresponding to the two helices), there is a clear increase in the values for this quantity, a clear sign that the IL-15/IL-2Rβ interface is destabilized. This is to be expected, as this residue

91 seems to play a clear role in the establishment of the interface. This effect extends to the C-D loop (residues 78-80), where the mutant also shows slightly higher RMSF values.

Considering the individual interactions at the interface, namely established by the residue concerned in this mutation and neighboring residues, it is important to note that this residue appears to have significant interactions, stable throughout the simulation. Indeed, for the wild-type model, as has been alluded to in the previous section, this residue establishes a very strong interaction with Lys71 on the IL-2Rβ side (for 87.2 % of the simulation), also intervening strongly with Ser69 (for 52.8 % of the time); contrastingly, the mutated Lys61 interacts with the same Ser69 residue only 14.3 % of the time, also forming minor interactions with Asp68 for 10.9 % of the analyzed period. Indeed, even for neighboring residues there is a significant loss of interactions, with only Ser58 interacting with Asp68 for 38.8 % of the simulation (interaction which does not appear in the wild-type). These interactions can be gleaned on Figure X.

(Figure with ints described above)

92

Considering the results obtained from the MM/GBSA analyses, they seem to corroborate the loss of interactions that was hinted to from the previous analyses. Indeed, when one looks at Figure X, it is clearly discernible that there is a significant increase in the energy of the mutant, compared to the wild-type, which is only partly compensated towards the end of the simulation. In fact, this is reflected on the average values observed, with the D61K model having an average of -20.5 kcal/mol vs the already mentioned -27.2 kcal/mol.

The confidence in the conclusion that D61K is destabilizing the interface increases when one compares the total number of contacts, where the mutated model has 15.0 average contacts throughout the simulation, vs 21.3 for the wild-type. This further cements the destabilizing role of this mutation and the important role of this residue. This is corroborated by the experimental results in Figure X.

(Figure bio results)

E64K Moving ever further into helix C on IL-15, the residue which neighbors Asp61 is Glu64. Although this residue was not explicitly known to be important in the establishment of the interface, its location seems to suggest so, as seen on Figure X. It is located in a buried position, already somewhat far from the solvent accessible surface, and as

93 such should play a role in the interface. In order to build this model, the Glu64 residue was mutated to a lysine, a very different residue in terms of charge and size.

(Figure with mutation)

The suspicions that this residue might play an important role in the stabilization of the interface, and that its replacement might lead to higher instability, do not seem to be corroborated by the RMSD results. Indeed, when looking at Figure X, the profiles seen for both models are fairly similar, with either model having a higher RMSD value at different points in the simulation. Towards the end of the simulation, the mutant has a slightly higher RMSD value. Indeed, the average values for this quantity are 1.44 Å for the mutant vs 1.39 Å for the wild-type model, a difference which is not significant to indicate that there is instability.

When taking into account the RMSF values, the picture is somewhat clearer. There are, in fact, zones where the mutant shows higher fluctuation when compared to the wild-type. Remarkably, an attentive look at the two profiles yields a high degree of similarity with the previous comparison for D61K; the regions wherein the mutant has higher fluctuation are exactly the same. The wild-type has higher fluctuation for the A- B loop, once more, with the two helices interacting with IL-2Rβ (helix A from 1-19 and helix C from 57-77) showing a higher degree of variability on the mutant. This seems to indicate that this mutation might be slightly destabilizing.

94

An analysis of the contacts at the interface shows that the Glu64 residue is key in the establishment of the interface; indeed, it interacts with Arg42 65.1 % of the time on the wild-type. This percentage contrasts with the 38.2 % observed from the interaction of the mutated Lys64 with Arg42; furthermore, this interaction should be much weaker due to the equal charge in the two residues involved. In terms of neighboring residues, both Asn65 (which will be detailed in the next section) and the aforementioned Asp61 maintain very strong interactions throughout, not seeming to contribute to the instability.

(Figure showing the comparison)

Where the potential binding energy is concerned, there are notable differences between the mutated and the wild-type models. Indeed, even if the energy seems to be stable for the E64K model, it stabilizes at a higher energy value when compared to the wild-type. They converge and start tending towards the same value in the latter half of the simulation; however, towards the ending point, the wild-type model stabilizes further, thus supporting the slight instability observed in the previous analyses. Indeed, the average value for the potential binding energy is -21.7 kcal/mol for the mutant, compared to the already mentioned -27.2 kcal/mol for the wild-type.

95

Lastly, the average number of contacts is 20.6 for E64K, vs 21.3 for the wild-type, which is not significantly different. In terms of the experimental results, wherein the pStat5 activity is significantly lower, our results seem to indicate that this is observable in the computational model, especially when taking into account the specific residue- pair interactions.

(Figure bio results)

N65K Asn65 sits in the center of helix C and has a conserved analogous residue in IL-2 in Asn88. It is already known, from previous mutagenesis studies, to be a key contact residue in IL-15. It has been identified to make contacts with Arg42, Gln70 and Tyr134 on the IL-2Rβ side. The mutated model was built by replacing this residue with a lysine one, positively charged and larger in size, as seen on Figure X.

(Figure with wt and mut)

When looking at the RMSD values comparison, it is clear that the introduction of the mutation induces a quick conformational change, since the beginning of the simulation. This is, however, smoothed out from the 100 ns point towards the end of the simulation. Nevertheless, this impacts the respective average RMSD values, which

96 in the case of the mutated model correspond to 1.55 Å (1.39 Å in the wild-type). This is a significant different, allowing the beginning of the establishment of the hypothesis that, indeed, through this model one can observe significant destabilization.

Interestingly, the RMSF values comparison paints a similar picture to the analyses mentioned for previous mutations, wherein the only region defined by a higher RMSD in the wild-type is the one corresponding to the A-B loop. Otherwise, both helices involved in the IL-15/IL-2Rβ interface and the C-D loop show a higher RMSD value for the mutated model, as can be seen in Figure X. This fact is in line with the hypothesized instability brought about by the N65K mutation.

97

The individual interactions established by the Asn65 residue have been briefly mentioned in this section. Indeed, as expected, on the wild-type it interacts very strongly with Arg42 and Gln70 for the whole simulation, but also, interestingly, with Thr73 for 59.4 % of the simulation. In contrast, the interaction most prevalent in the mutated model is with the main chain O atom of Arg41 for 46.3 % of the simulation, with the expected residues showing much lower percentages (17.5 % for the interaction with Thr73 or 30.5 % for the interaction with Arg42). Indeed, it is quite evident that there is a significant loss of stability in the interface, with most of the contacts being lost, if not entirely. Figure X illustrates this fact.

(Figure with interactions)

It is already very discernible that N65K is a very destructive mutation, in terms of the coherence of the IL-15/IL-2Rβ interface. In order to further investigate this, once again the potential binding energy was evaluated. Indeed, as Figure X shows, the MM/GBSA values are much higher, in a consistent way throughout the simulation. It is remarkable to note that, at certain points in the simulation, the energy tends to zero. Indeed, the average values make this abundantly clear, with the mutant having -12.0 kcal/mol vs the -27.2 kcal/mol of the wild-type. Indeed, this is by far the largest difference observed so far, for all the mutants. The fact that Asn65 establishes so many different

98 interactions with IL-2Rβ and its central location in helix C means that any destabilization in this residue will lead to a loss in the integrity of the interface.

Furthermore, the total number of contacts follows the same trends observed for the quantities analyzed up until this point: the N65K model has an average of 8.4 contacts, vs the 21.3 observed on the wild-type model, which illustrates rather blatantly the collapse of the interface. Indeed, this is in agreement with the experimental data, as can be seen on Figure X.

(Figure bio)

I68K The next residue interacting with IL-2Rβ on the helix C is Ile68. This residue is analogous to Val91 on IL-2, forming interactions with Thr73 and Val75 of the receptor. In order to build the model, the hydrophobic isoleucine residue was replaced by a lysine, positively charged, as can be seen on Figure X.

(Figure with wt and mut)

Turning towards the RMSD analysis, one can see that the values always follow the trend of the mutated model being slightly higher than the wild-type structure. The

99 mutation seems to induce some variation from the starting structure, albeit not to a significant degree. Indeed, this difference leads to an average of 1.62 Å for the mutant vs 1.39 Å for the wild-type.

Taking into account the fluctuations observed through the analysis of the RMSF values, the residue by residue increase of the values for the mutant are more widespread than in previous cases. Instead of the previously observed restriction to the residues involved in the IL-15/IL-2Rβ interface, in this case most of the residues show higher fluctuations for the mutant model. Indeed, this is the case for the aforementioned helices A and C, forming the interface with the receptor, but also to a lesser extent for helices B and D. Indeed, it seems that this mutation has less of a local effect, having instead a more global one.

100

As far as the individual contacts are concerned, as has already been mentioned, Ile68 forms interactions with Val75, in 63.8 % of the simulation. It also interacts, to a lesser extent (28.6 %), with Arg41. On the other hand, the Lys68 residue in the mutated model actually forms a very stable salt bridge with Asp76, for 87.6 % of the simulation, as well as interacting with Val75 in 56.6 % of the simulation. Figure X highlights these interactions. Indeed, the extent of the contacts established show a difference in nature, with the ability of lysine to establish salt bridges; however, there does not seem to be a loss of contacts stemming from the presence of this mutant.

(Figure with interactions)

On the analysis of the potential binding energy, there is no discernible difference between wild-type and mutated model. Indeed, at some points in the simulation the former model possesses a larger MM/GBSA energy value, whereas in others, it is the latter model which has a higher energy. This means that one cannot argue, based on energetic parameters, that the mutation negatively influences the stability of the interface. Indeed, through this analysis it does not seem to have any effect, with the average value for the mutant being -31.9 vs -27.2 kcal/mol for the wild-type.

101

Furthermore, the analysis of the number of contacts follows the same trend, wherein there is actually a slightly larger number of contacts, on average, for the mutated model (23.4) when compared to the wild-type (21.3). This is not consistent with the experimental data, wherein I68K induces an inhibition in the signaling.

(Figure with bio data)

L69R On IL-15, the Leu69 residue plays a similar role as Ile92 on IL-2, being a hydrophobic residue which forms van der Waals interactions, mainly with Thr73 and Val75. In this case, the leucine residue was substituted by an arginine one, a bulky positively charged residue which is very different from the hydrophobic, original one, as seen on Figure X.

(Figure wt mut)

Notably, it seems that this mutation induced a very significant degree of instability. Indeed, even though at the beginning of the simulation it seems that both models are comparable in terms of RMSD, at around the 50 ns point there is a sudden shift in the mutated model, whose deviations increase abruptly. This is somewhat sustained throughout the simulation, with a peak at around 150 ns and a final destabilization

102 occurring from the 175 ns point up until the end of the simulation. This result is highly divergent from what was seen previously for the other mutations, including for N65K, where the mutation was deemed to be very destructive but whose effect was not entirely apparent in the RMSD analysis. In this case, the average for the mutated model jumps to 1.70 Å, compared to the 1.39 Å of the wild-type.

As a matter of fact, the RMSF analysis clarifies why this staggering increase happens. It is interesting to note that the introduction of this mutation does not have a significant effect on most of the regions comprising the helices interacting with IL-2Rβ; instead, it destabilizes the ending part of the helix C, also impacting the C-D loop. The mutation is, indeed, located at the end of the helix closest to the C-D loop, which can explain the observed increase.

103

On the topic of individual contacts, Leu69 interacts very strongly with Thr73 (73.4 %) and with Val75 (94.2 %), as previously mentioned. Lys69, on the mutated model, also interacts very strongly with Thr73 (82.6 %), while also being able to interact with Tyr134 for 70.0 % of the simulation. Furthermore, it weakens its interaction with Val75, when compared to the wild-type model, only presenting this interaction 46.4 % of the time. These interactions can be seen on Figure X. Indeed, this more “zoomed in” picture of the interactions and their differences may explain the high RMSD values obtained for this region in the mutated model. A disruption of the natural interactions established by the residue in position 69, even if it still establishes strong interactions when mutated, leads to a rearranging of the structure of the residues which are next to it. Thus, this highly destabilizes the interface.

(Figure with interactions)

Moving towards the energetic analysis, with the comparison of the potential binding energy, it is evident that the L69R mutation induces a destabilization on an energy level. This destabilization does not entail a higher overall MM/GBSA value for the mutation, necessarily; instead, it is noticeable from the almost constant increases and decreases of energy observed. Indeed, the average energy value is -32.7 kcal/mol for the mutant, vs 27.2 kcal/mol for the wild-type, which would hint that this mutation would

104 be stabilizing. However, when this value is taken with the previous RMSD/RMSF and individual contacts analyses, this does not seem to be the case.

Globally, the number of contacts on average, for the L69R mutated model, is 20.4, which is not significantly lower than the 21.3 observed for the wild-type. As such, no conclusion can be taken from this value. However, when we take the ensemble of the results, we can confidently say that they are in agreement with what is observed in the experimental results seen in Figure X.

(Figure bio)

N72K The final residue interacting in a significant way with IL-2Rβ, on helix C, is the Asn72, near the final turn of the helix. This residue does not possess an analog on IL-2 and, in fact, has not hitherto been described as having a potential importance in this interface. The model was built by replacing the asparagine residue for a lysine, similarly to the N65K model, as can be seen on Figure X.

(Figure wt mut)

105

Looking at the results obtained for the RMSD, it is evident that the introduction of this mutation leads to an immediate destabilization. However, this large increase in RMSD value remains stable throughout the simulation, and towards the end is not so dissimilar to the value that the wild-type as reached. Nevertheless, due to the fact the it remains so stably higher than the wild-type, on average the difference is apparent: 1.72 Å vs 1.39 Å on the wild-type.

The RMSF analysis reveals a similar profile as observed for some of the previous mutations, with the residues on helix C presenting a discernibly higher value than the wild-type. There is also some different at the level of helix A, albeit to a lesser extent. These data are not conclusive to the stability or lack thereof of this model and mutation.

106

As such, it is necessary to look into detail on the interactions established by the residue on the 72 position. In the case of the wild-type, the Asn72 residue interacts throughout the whole simulation with Val75 of IL-2Rβ, a residue which already appeared previously for Ile68 and Leu69. Interestingly, a novel interaction shows for the mutant, wherein the Lys72 residue establishes a salt bridge with Asp40 (a residue which had not, hitherto, been mentioned) for 91.1 % of the simulation. Furthermore, on the mutated model, Lys72 interacts with the main chain O of Val75 through 81.6 % of the simulation. These interactions can be seen on Figure X.

(Figure wt mut)

The MM/GBSA analysis paints an interesting picture. If, on one hand, the mutated model simulation starts on a similar vein to the wild-type one, it increases in energy and seems like it could evolve to a higher energy point. However, at the 80 ns point the energy decreases abruptly, becoming much lower than the wild-type one; it then increases and ends up stabilizing at a somewhat lower value. Indeed, the average values show exactly this, with the N72K model having a much lower -38.8 kcal/mol vs the -27.2 kcal/mol on the wild-type. This mutation seems to be highly stabilizing.

107

In terms of total number of contacts, this quantity follows the same trend, wherein the mutated model has an average of 24.8 to a 21.3 of the wild-type. As mentioned, the results taken as a whole seem to indicate that this mutation stabilizes the protein. This fact is corroborated by the experimental results seen in Figure X.

(figure bio)

Virtual Screening

Having analyzed the pool of potential hot-spots for the establishment of the IL-15/IL- 2Rβ interface, which allowed the clarification of which residues intervene more directly in it, the choice of the residues targeted on this part of the work (alluded to in the Methods section) should now become clearer. If we take as criteria that the residues chosen should:

- Be hitherto unexplored; - Have shown, in the MD simulations studies, to establish strong interactions at the wild-type and/or their absence to have an effect (through the mutations studies); - Not have significant overlap with IL-2;

108 the targetable residue pool becomes narrower. A similar work had already been performed by our collaborators on the Lys65/Leu69/Asp8 region, as seen on Figure X.

(Figure with Lys65/Leu69/Asp8 that agnes used in her paper)

This work showed promising results, with a compound being identified with double IL- 15+IL-2 activity. Aiming to selectively disrupt IL-15 signaling whilst avoiding inhibition of IL-2, our work focused on a portion of helix A, as well as on the opposite side of helix C. As such, the residues chosen were Ser7 and Lys11 on helix A, which showed great potential in the wild-type MD simulations, and Ser58 and Asp61 on helix C, since they not only appeared to be important on the wild-type MD simulations, but were shown to be key to the PPI between the protein and its receptor on the mutation studies. Of these four residues, only Asp61 has an overlapping residue on IL-2 (Asp84). In the case of the other three residues, their analogous are quite different in terms of properties, with Ser7 overlapping with Leu19 on IL-2, Lys11 sharing a position with Met23 on IL-2, and Ser58 corresponding to Arg81.

(Figure with region on IL-15 and corresponding one on IL-2)

Pharmacophore Filtering

Virtual Screening

After having filtered the large library of compounds, according to the previously defined criteria, the final pool of compounds totaled one hundred, detailed in Table X. The results presented herein already take into account the experimental results obtained by our collaborators; these are presented in yellow in the following table. As such, the analysis of the results will befall on the compounds which showed in vitro activity towards IL-15, marked yellow in the table.

Ligand Rank Rank Index Code IDNUMBER Supplier Delta Normalized Efficiency IL-15 IL-2

68 A1 LAS 51216378 ASINEX -6,95297 22 145 123 60,40609 37 B1 LAS 51217045 ASINEX -6,90758 25 146 121 60,23689 486 C11 F2509-0367 Life_Chemical -6,57786 40 472 432 86,54823

109

485 D11 F2509-0393 Life_Chemical -6,22041 67 476 409 84,60237 169 C4 9242137 Chembridge_dvs_exp -6,18161 69 339 270 72,84264 319 G12 S13910 Maybridge -6,12481 2 96 94 64,73354 487 E11 F1882-1468 Life_Chemical -6,01036 80 357 277 73,43485 398 D9 Z95851300 Enamine -5,77878 101 274 173 64,63621 183 D4 9213127 Chembridge_dvs_exp -5,77659 102 231 129 60,9137 81 C1 BDC 23190654 ASINEX -5,75625 106 301 195 66,49746 397 E9 Z97750328 Enamine -5,72256 110 507 397 83,58714 184 E4 9262406 Chembridge_dvs_exp -5,71565 113 345 232 69,62775 484 F11 F2509-0380 Life_Chemical -5,6772 116 506 390 82,99493 153 F4 7644034 Chembridge_dvs_exp -5,65457 120 249 129 60,9137 404 F9 Z1533046752 Enamine -5,61124 128 395 267 72,58883 151 E2 72092261 Chembridge_dvs_cl -5,58801 132 355 223 68,86633 175 G4 9275823 Chembridge_dvs_exp -5,55829 137 425 288 74,36548 555 G11 F3406-5576 Life_Chemical -5,53874 139 284 145 62,26734 210 F5 V005-4651 ChemDiv_Eccentrix -5,50414 145 334 189 65,98985 189 H4 9209786 Chembridge_dvs_exp -5,50368 146 419 273 73,09644 149 F2 19254365 Chembridge_dvs_cl -5,4541 149 514 365 80,87987 399 G9 Z225665320 Enamine -5,44817 150 332 182 65,39763 173 G8 G868-1219 ChemDiv_shape -5,44567 7 71 64 60,03135 187 A5 9204150 Chembridge_dvs_exp -5,4126 153 375 222 68,78172 339 H8 L482-0976 ChemDiv_shape -5,39806 155 452 297 75,12691 174 B5 7973591 Chembridge_dvs_exp -5,39297 157 378 221 68,69712 392 H9 Z424954482 Enamine -5,38295 159 316 157 63,28257 194 C5 5715448 Chembridge_dvs_exp -5,37628 161 397 236 69,96616 92 G2 16466623 Chembridge_dvs_cl -5,36867 163 -163 100 140 H2 10051678 Chembridge_dvs_cl -5,35027 167 287 120 60,15228 97 A3 40697304 Chembridge_dvs_cl -5,33619 171 392 221 68,69712 42 D1 BDH 32367667 ASINEX -5,28546 10 139 129 70,21944 471 A10 Z571215614 Enamine -5,24002 178 313 135 61,42132 532 H11 F2323-1320 Life_Chemical -5,20087 182 324 142 62,01353 292 C7 V019-7375 ChemDiv_PPI -5,17993 184 460 276 73,35025 174 A9 D349-0812 ChemDiv_shape -5,17736 14 78 64 60,03135 52 E1 BDD 25871803 ASINEX -5,16258 190 509 319 76,98816 438 B10 Z302583658 Enamine -5,15394 193 404 211 67,8511 224 G5 M459-1089 ChemDiv_Eccentrix -5,13945 197 325 128 60,8291 310 D7 V014-1655 ChemDiv_PPI -5,13756 198 487 289 74,45008 121 B3 68826761 Chembridge_dvs_cl -5,10145 208 328 120 60,15228 69 F1 BDH 34021132 ASINEX -5,08982 210 540 330 77,91879 469 C10 Z108724418 Enamine -5,05844 218 341 123 60,40609 279 E7 P218-3171 ChemDiv_PPI -5,03738 220 524 304 75,71912 112 C3 61293529 Chembridge_dvs_cl -5,03703 221 380 159 63,45178 147 D3 14556752 Chembridge_dvs_cl -5,03183 223 429 206 67,42809 106 E3 38618215 Chembridge_dvs_cl -5,00402 227 554 327 77,66498 105 F3 12185662 Chembridge_dvs_cl -5,00332 228 367 139 61,75973 236 H5 K284-3275 ChemDiv_Eccentrix -4,98303 235 464 229 69,37394 27 G1 BDH 32346320 ASINEX -4,90721 21 155 134 71,00314 381 D10 Z750768202 Enamine -4,88352 252 451 199 66,83587 574 A12 F6151-0029 Life_Chemical -4,87775 253 446 193 66,32825 123 G3 39150615 Chembridge_dvs_cl -4,87693 254 465 211 67,8511 193 D5 9078532 Chembridge_dvs_exp -4,84259 262 448 186 65,73604 150 H3 24325467 Chembridge_dvs_cl -4,83275 267 500 233 69,71235

110

470 E10 Z275056332 Enamine -4,83008 269 415 146 62,35194 316 H12 KM07753 Maybridge -4,81684 26 100 74 61,59875 246 F10 Z971078160 Enamine -4,80613 27 92 65 60,18809 544 B12 F3382-6798 Life_Chemical -4,77206 280 457 177 64,97462 209 B9 L482-1792 ChemDiv_shape -4,7543 31 158 127 69,90595 312 C12 F1264-2375 Life_Chemical -4,74988 32 122 90 64,10658 228 A6 K784-7860 ChemDiv_Eccentrix -4,73148 287 497 210 67,76649 290 F7 T652-0127 ChemDiv_PPI -4,70001 293 420 127 60,7445 582 D12 F2724-2219 Life_Chemical -4,69277 294 566 272 73,01184 254 G10 Z385163426 Enamine -4,6593 37 104 67 60,50157 150 G7 S595-0072 ChemDiv_PPI -4,65325 38 222 184 78,84013 118 H7 S427-0862 ChemDiv_PPI -4,62541 41 206 165 75,86207 238 B6 4896-2330 ChemDiv_Eccentrix -4,6161 314 545 231 69,54314 458 H10 Z65715641 Enamine -4,59004 323 492 169 64,2978 86 H1 BDH 33903876 ASINEX -4,57139 327 466 139 61,75973 217 C6 K784-7866 ChemDiv_Eccentrix -4,56965 328 477 149 62,60575 518 E12 F1065-0411 Life_Chemical -4,5309 337 518 181 65,31303 222 D6 K279-1435 ChemDiv_Eccentrix -4,5284 338 505 167 64,12859 107 E6 P349-0895 ChemDiv_Eccentrix -4,481 55 191 136 71,31661 96 F6 K284-2064 ChemDiv_Eccentrix -4,47688 56 121 65 60,18809 87 A4 53922729 Chembridge_dvs_cl -4,45769 355 478 123 60,40609 35 A2 BDH 33516187 ASINEX -4,39838 65 311 246 88,55799 317 C9 D400-2467 ChemDiv_shape -4,38067 370 527 157 63,28257 13 B2 BDF 25425120 ASINEX -4,37634 373 559 186 65,73604 88 B4 78468186 Chembridge_dvs_cl -4,36574 376 515 139 61,75973 152 A8 S510-1312 ChemDiv_PPI -4,36418 70 287 217 84,01254 234 G6 V007-5804 ChemDiv_Eccentrix -4,35426 380 529 149 62,60575 122 B8 S558-0321 ChemDiv_PPI -4,34209 71 199 128 70,0627 147 C8 S427-0838 ChemDiv_PPI -4,31083 74 239 165 75,86207 95 H6 E613-0049 ChemDiv_Eccentrix -4,28977 78 150 72 61,28527 23 C2 BDH 32356746 ASINEX -4,28856 79 192 113 67,7116 123 D8 S558-0248 ChemDiv_PPI -4,27571 81 187 106 66,61442 146 E8 S510-0843 ChemDiv_PPI -4,25076 82 285 203 81,81818 28 D2 LAS 51217227 ASINEX -4,23483 84 202 118 68,4953 276 F12 F2507-0345 Life_Chemical -4,22448 86 168 82 62,85266 105 A7 3118-0063 ChemDiv_Eccentrix -4,22258 87 252 165 75,86207 115 F8 S586-1251 ChemDiv_PPI -4,20414 90 163 73 61,44201 465 A11 Z14009973 Enamine -4,18813 419 561 142 62,01353 196 E5 9210164 Chembridge_dvs_exp -4,17787 422 550 128 60,8291 263 B11 Z1014964394 Enamine -4,13059 98 218 120 68,80878 101 B7 E613-0441 ChemDiv_Eccentrix -4,12898 99 197 98 65,3605 592 A1B HTS12253 Maybridge -3,64843 522 1182 100

Out of the 97 compounds selected to be studied in vitro, through the pharmacophore filtering, followed by the virtual screening protocol itself as described in the methods section, and posterior criteria based on the binding energy of the docked compounds, 24 showed inhibitory activity towards IL-15 in vitro, which is a rather impressive result. Of these 24, we selected the top 10 compounds based on the Ligand Efficiency, in

111 order to detail the interactions established by each with the IL-15 and IL-2 proteins. Thus, this allows a further clarification of the binding modes of the potential IL-15 inhibitors to the protein itself.

A1 Compound A1 was provided by the ASINEX database, being of MW = 467.64. Its structure can be seen on Figure X. On the docking protocol, it bound to IL-15 with a potent LE = -6.95, being ranked 22nd in terms of this quantity. With regards to IL-2, it was ranked poorly, in 145th, which indicates that it would bind more strongly to IL-15.

Considering the specific interactions established by this compound, Figure X shows that it targets some of the hot-spots mentioned in previous sections. It establishes a carbon hydrogen bond between the carboxyl group in Asp8 and the hexane in A1, as well as a conventional hydrogen bond between the Lys11 amine group and the carbonyl group in A1. Out of the aforementioned hot-spots, they are closed out by the hydrogen bond between the carboxyl group in Asp61 and the amine in A1; however, it also interacts significantly with Asn4, through its sulfonyl group, with Thr62 (the hydroxyl group in the residue with the alkyl group in the compound) and with Leu69, establishing an alkyl interaction with its side-chain.

Contrastingly, it seems to interact less strongly with IL-2, only establishing Pi-sulfur interactions between the sulfonyl group in A1 and the side-chain of His16, as well as Pi-alkyl interactions and a hydrogen bond with the same residue; Asp84 (equivalent to Asp61 on IL-15) establishes a carbon hydrogen bond with the alkyl functions in the compound, whereas Val91 and Leu19 only establish alkyl interactions. All these interactions are detailed in Figure X.

112

Notably, it seems that the difference in scope of interactions, both in number of interactions and in strength thereof directly correlates with the differences in binding energy observed for the two proteins. In terms of the pStat5 studies, however, these two compounds show a similar behavior, as can be seen in Figure X. Indeed, it possesses a ED50 of 12.9 ± 1.2 and 13.4 ± 1.2 respectively. As such, this compound is not specific to IL-15.

Compound PIRAMID3 - A1 125

100

75

50 IL-15E53K ED50=12.9µM 25 IL-2 ED50=13.4µM GM-CSF ED50>100µM pSTAT5 response (%) response pSTAT5 0 10 -1 10 0 10 1 10 2 Compounds (µM)

B1 Moving on to compound B1, which comes from the same database as the previous one (ASINEX), it presents a slightly higher molecular weight (527.14). Indeed, it is somewhat similar in structure to the previous compound, with the sulfonyl group

113 neighboring the two pyridine groups, as seen on Figure X. A fact to keep in mind is the presence of a thiazole group. Indeed, this molecule shows a similar LE = -6.91 as the previous one, with similar rankings on the docking to IL-15 and IL-2 (25th and 146th respectively). Thus, similar results to the previous compound are to be expected.

When considering the specific interactions established between this compound and the targeted area on both proteins. On one hand, it is possible to clearly identify, similarly to the previous example, some of the mentioned hot-spots for IL-15. Asp8 establishes a carbon hydrogen bond through its carboxyl group; Lys11, once again, participates in a hydrogen bond with the carbonyl group in B1, wherein Ser58 makes an appearance in this interaction, forming a carbon hydrogen bond with the same group. Similarly to what was previously observed, Asn4 establishes a hydrogen bond with the sulfonyl group on B1, whereas Leu69 establishes an alkyl bond with the alkyl group in the compound. Finally, Thr62 also interacts through a carbon hydrogen bond, similarly to the previous case.

In the case of IL-2, there are several residues interacting in the analogous positions as IL-15. Leu19 and Met23 (analogous to Ser7 and Lys11 on IL-15, respectively), establish Pi-alkyl bonds with the pyridine end of B1; Asp20 (Asp8 on IL-15) forms a hydrogen bond with an amine group, while Asp84 (Asp61 on Il-15) only forms a carbon hydrogen bond with the pyrimidine region, similar to the previous compound. Asn88 forms a conventional hydrogen bond with the sulfonyl group, evidenced in Figure X.

114

In the case of molecule B2, the gap between the interactions established on IL-15 and IL-2 is not as evident. Indeed, it seems that IL-2 establishes significant interactions with the compound, even if the LE ranking does not seem to evidence this fact. Indeed, the pStat5 assay evidences this lack of difference, similarly to the previous case, with ED50 values being 15.1 ± 2.4 and 19.9 ± 7.1 respectively, as seen on Figure X.

Compound PIRAMID3 - B1 125

100

75

50 IL-15E53K ED50=15.1µM 25 IL-2 ED50=19.9µM GM-CSF pSTAT5 response (%) response pSTAT5 0 10 -1 10 0 10 1 10 2 Compounds (µM)

C11 Moving to the third compound being considered, in this case its provenance is Life Chemical, having a MW = 442.49. It is, structurally, somewhat different from the

115 previous two compounds, with the content in benzenes being higher. Nevertheless, it shares some similarities, such as the presence of a sulfonyl group and a pyridazine. It is not surprising, hence, that it ranked highly in its docking to IL-15 (40th) and poorly to IL-2 (472nd), with a LE = -6.58.

The relevant individual interactions for IL-15 include Pi-cation bonds between Lys11 and the benzene and a conventional hydrogen bond between this residue and the sulfonyl group; Asp8 establishes Pi-anion interactions with the pyridine part of the molecule, while Ala57 (hitherto not shown interacting) forms an amide-Pi stacking with C11.

Comparing this with IL-2, Asn88 (comparable to Asn65 in IL-15) establishes a hydrogen bond with the sulfonyl group in C11, whereas Asp84 establishes one with the neighboring amine group. Asp20 establishes a carbon hydrogen bond, with Met23 forming a Pi-alkyl bond with this compound. All of these interactions are shown in Figure X.

116

Being a less “extended” molecule, the number of interactions established with C11, for both proteins, seems to be lower. As mentioned, the LE value and corresponding rankings are, in fact, slightly weaker. Indeed, once again taking the experimental results into consideration, the ED50 values are 41.7 ± 9.6 and 60.0 ± 16.1, respectively, as shown on the curves present in Figure X. These values are not, yet again, significantly different, indicating that we are, once more, in the presence of a non-specific ligand.

Compound PIRAMID3 - C11 125

100

75

50 IL-15E53K ED50=41.7µM 25 IL-2 ED50=60.0µM GM-CSF pSTAT5 response (%) response pSTAT5 0 10 -1 10 0 10 1 10 2 Compounds (µM)

117

E9 Compound E9 originates from the Enamine database, from which no compound had appeared up until this point. It possesses a molecular weight of 429.92, with the presence of a thiazole group, as already seen for compound B1.

In terms of the individual interactions shown in Figure X, IL-15 is characterized by an abundance of conventional hydrogen bonds. Indeed, Lys11, Ala57, Asp61 and Thr62 all interact in this way with E9. In the case of Ala57, it is through the oxygen atom on its main chain, interacting with the carboxamide group on E9. Asp61 also interacts with this same group, whereas the two remaining residues (Lys11 and Thr62) interact with both this group and other carboxyl groups. Lys11 also participates in a Pi-alkyl interaction with the thiazole group. Lastly, Asp8 extensively interacts through carbon hydrogen bonds, similarly to Ser58.

IL-2 interacts comparatively less with this compound. Leu19 establishes a Pi-alkyl interaction with the benzene group, whereas Leu85 forms an alkyl bond with the piperidine group. Asp84 forms a Pi-anion with the other benzene group, as well as forming carbon hydrogen bonds, through its main chain oxygen atom.

118

Indeed, it seems once again that IL-15 interacts more robustly with compound E9 when compared with IL-2, a fact which is reflected on the relative ranking the compound occupies on both proteins. However, when taking into account the experimental results shown on Figure X, the picture seems rather different. As the trend has been shown so far, both proteins react similarly to the presence of this compound, with ED50 values of 10.8 ± 2.1 vs 10.4 ± 0.1, which indicates that this molecule is not specific.

Compound PIRAMID3 - E9 125

100

75

50 IL-15E53K ED50=10.8µM 25 IL-2 ED50=10.4µM GM-CSF ED50>100µM pSTAT5 response (%) response pSTAT5 0 10 -1 10 0 10 1 10 2 Compounds (µM)

119

F11 Molecule F11 also comes from the Life Chemical database, similarly to C11. It has a similar molecular weight than the previously analyzed molecule, at 428.47. Interestingly, the only factor that distinguishes this molecule from the previously mentioned C11 is the absence, in this case, of a CH3 group bound to the benzene which neighbors the sulfonyl group. It should therefore be expected that this compound should behave very similarly to the compound C11. However, this small difference leads to a non-negligible difference in LE, with this compound possessing - 5.68 (vs -6.58). This translates to it being ranked 116th (vs 40th), compared to it being ranked 506th (vs 472nd) on IL-2. Indeed, it seems that this compound behaves slightly poorer than its closely related one.

Regarding comparatively the individual interactions present within this interaction, it is interesting to note that, for IL-15, this compound seems to dock in such a way that it interacts with more residues than C11. Indeed, the Pi-anion interaction of Asp8, the conventional hydrogen bond of Lys11 and the Pi-alkyl interaction of Ala57 are maintained from the other compound; however, there is a novel interaction being established by Asn4, as well as an extra Pi-alkyl interaction provided by Leu69.

In the case of IL-2, the picture is reversed. The F11 compound seems to establish less interactions with this protein than the C11 molecule, with the carbon hydrogen bond with Asp20 and the conventional hydrogen bond with Asp84 being lost, due to the differential binding pose of both molecules. All the interactions are detailed on Figure X.

120

In the case of this molecule, for IL-15 the number of interactions established seems to be contrary to the relative LE value, if we take into account that the number of interactions increases, whereas the LE value is higher for this compound (which hints at a worse binding). In the case of IL-2, the number of interactions corresponds to the tendency one expects, when considering that it possesses less interactions and a worse LE value. The ED50 values are 53.9 ± 2.3 and 61.2 ± 7.0, respectively, quite higher than the ones observed for C11. This was predicted by the calculated LE value and provides a hint that this value is a better predictor of the actual ED50 than a structural inference might imply. However, as can be seen on Figure X, this also means that this molecule is not specific for IL-15.

121

Compound PIRAMID3 - F11 125

100

75

50 IL-15E53K ED50=53.9µM 25 IL-2 ED50=61.2µM GM-CSF pSTAT5 response (%) response pSTAT5 0 10 -1 10 0 10 1 10 2 Compounds (µM)

Conclusion

Throughout this work, we set out to identify compounds with potential inhibitory activity towards IL-15. In order to achieve this, there are different factors to take into account and challenges that needed to be addressed.

IL-15 exerts its biological activity through its interaction with a receptor dimer (IL-

2R/c), also binding to its specific receptor (IL-15R). Another structurally similar interleukin, IL-2, also binds to the receptor dimer (IL-2R/c) in order to perform its biological effects. As such, when one wants to develop a specific molecule targeting IL-15, one also has to consider its effect on IL-2.

The choice of the IL-15/receptor interface to target is a fairly straightforward one. If, on one hand, one can discard the IL-15/IL-R interface, since IL-15 can perform its activity in the absence of this receptor, one can also discard the IL-15/c interface, as this receptor is shared with other , namely IL-2, IL-4, IL-7, IL-9 and IL-21. As such, the IL-15/IL-15R was chosen to be targeted, as it is only shared with IL-2.

After having built the different relevant models, it was important to realize which residues contributed the most to the establishment of this interface, as well as to understand how its features evolve over time and how it can be influenced by the presence or absence of the other two receptors. To this end, different multimeric computational models were built, allowing the study of these features through MD simulations.

122

Indeed, this study allowed the study of the influence of the quaternary structure on on the variation of the backbone of the different receptor chains, through the use of RMSD analyses. It was possible to discern a higher effect in the trimeric form (IL-15/IL2-

R/c), where the absence of IL-15R led to a lower stability. Furthermore, the IL- 15/IL-15R dimer showed a lower conformational flexibility when compared to soluble IL-15, which further suggests that the contacts at this interface stabilize the interleukin. Following a more global analysis of the trends at the interfaces, the changes observed in IL-15 were broken down into the different structural elements, allowing the conclusion that the tetrameric structure was more stable for all elements, as all the interfacial residues were establishing their interactions with the three receptors. In this case, the soluble IL-15 showed the reverse tendency, with the absence of the receptors leading to instability in the different structural elements. More specifically, it was possible to observe that helix A shows the largest RMSD value (1.99 ± 0.33 Å) for the IL-15/IL-15R dimer, wherein the residues in this element would interact with IL-2R, were it to be present. Helix B, which possesses residues in contact with IL- 15R, showed the highest RMSD value (1.78 ± 0.46 Å) for the trimeric structure (IL-

15/IL-2R/c). Contrastingly, no significant trends can be gleamed for the C and D helices. Both analyzed loops, A-B and C-D, show the largest amounts of flexibility, a trend which is consistent in all analyzed systems. The RMSF analyses allowed to further corroborate the results obtained on the RMSD values, further permitting a by- residue understanding of the results. Through the analysis of the number of contacts at each interface, it was possible to rank the interfaces consistently, wherein the IL- 15/IL-15R interface established a larger number of contacts, followed by IL-15/IL-

2R and, lastly, IL-15/c.

The analysis of the specific pairwise amino acid interactions allowed a confirmation of the validity of the model, since the contacts which had been previously identified through crystallographic analysis to be important, were also found in our model, in high percentage of presence along the simulation time. Furthermore, it was possible to highlight the importance of novel residues or fragments of known ones. On the IL- 15/IL-15R interface, we identified Glu46 and Glu89 as important for this interface, with a salt bridge established between Glu93 on IL-15 and Arg35 on IL-15R being described; on IL-15/IL-2R, we were able to discern that both Ser58 and Asn65 played

123 key roles; on IL-15/c, we clearly highlighted the role of residues such as Asp30, His105 and Gln108 for the establishment of the interface. Finally, the role of water molecules was evidenced for the interfaces involving IL-15R and c, as these two had been previously unexplored.

Moving towards the computational mutagenesis studies, the models were built based on the results explored above, with the key identified residues for IL-2R being mutated, in order to further clarify their effect. In close contact with our biologist partners, we were able to exploit the results obtained through pStat5 and proliferation studies, in order to directly compare our results to the in vitro ones. To this end, we were able to reproduce, fairly consistently, the experimental results. The most interesting cases of study were the N65K and N72K mutations, with the former showing very clearly its destructive effect in our model, and the latter highlighting why it showed no significant effect in the in vitro analyses. As such, we were able to further clarify which residues qualified as hot spots in this interface.

The final step in attaining the goal we set out to was the Virtual Screening campaign. This was preceded by the building of different pharmacophores, taking advantage of our MD simulations results. The pharmacophores themselves were built on a region which was identified as being of potential interest through our results, with residues Ser7, Lys11, Ser58 and Asp61 being specifically targeted. Furthermore, snapshots from the MD simulations themselves (both for IL-15 and IL-2) were extracted, in order to introduce the concept of conformational flexibility to this method and allow the possibility of having different distances for the same sets of properties. The very large library of compounds (of approximately 260 000 compounds) was then probed to perform the pharmacophore filtering step, restricting the pool of compounds to around 33 000. Following this step, the compounds were docked onto the crystallographic structure of IL-15, as well as onto one extracted from the MD simulation, further reducing the number of compounds to around 26 000. This was another step wherein the MD simulations were taken advantage of, in order to perform this protocol. A filter was set on the still large pool of molecules, wherein only the compounds performing on the top 10 % percentile for the six scoring functions used would be kept. This left us with 592 ligands for the crystallographic structure and 320 for the MD simulation one. These ligands were then docked onto IL-2, in order to assess how well they bound to this protein. The difference in ranking between the same compound bound to IL-15

124 and to IL-2 was used in order to only keep 100 compounds, wherein only the ones who performed best on IL-15 and worst on IL-2 were kept. These compounds were then ordered, with in vitro studies being performed on them. Of these 100 compounds, 24 showed to have inhibitory activity towards the signaling by IL-15. However, these compounds also showed inhibitory activity to IL-2, rendering them non-specific.

The further steps of this work involve improving the final stage of the results described herein. In fact, the goal of this project was to identify compounds which would selectively inhibit IL-15, which was not entirely achieved. Even though the number of compounds with IL-15 inhibitory activity we were able to find was rather high, these compounds were also IL-2 inhibitors. In terms of the molecular modelling, it would be interesting to take the docked compounds and study them through MD simulations with both proteins, in order to further clarify the reasons why they inhibit both proteins, leading to greater insights as to how we could avoid targeting IL-2. Furthermore, as the pool of compounds is now 24, we could further try and rationally optimize them in order to potently target IL-15, while avoiding IL-2 entirely.

1. Kim, P. M., Lu, L. J., Xia, Y. & Gerstein, M. B. Relating three-dimensional structures to protein networks provides evolutionary insights. Science (80-. ). (2006). doi:10.1126/science.1136174

2. Ewing, R. M. et al. Large-scale mapping of human protein-protein interactions by mass spectrometry. Mol. Syst. Biol. (2007). doi:10.1038/msb4100134

3. Yip, A. M. & Horvath, S. Gene network interconnectedness and the generalized topological overlap measure. BMC Bioinformatics (2007). doi:10.1186/1471- 2105-8-22

4. Rachlin, J., Cohen, D. D., Cantor, C. & Kasif, S. Biological context networks: A mosaic view of the interactome. Mol. Syst. Biol. (2006). doi:10.1038/msb4100103

5. Hart, G. T., Ramani, A. K. & Marcotte, E. M. How complete are current yeast and human protein-interaction networks? Genome Biol. (2006). doi:10.1186/gb- 2006-7-11-120

125

6. Kaladhar, D. S. et al. Towards an understating of signal transduction protein interaction networks. Bioinformation (2012). doi:10.6026/97320630008437

7. Arkin, M. M. R. & Wells, J. A. Small-molecule inhibitors of protein-protein interactions: Progressing towards the dream. Nature Reviews Drug Discovery (2004). doi:10.1038/nrd1343

8. Keskin, O., Gursoy, A., Ma, B. & Nussinov, R. Principles of Protein−Protein Interactions: What are the Preferred Ways For Proteins To Interact? Chem. Rev. (2008). doi:10.1021/cr040409x

9. SUZUKI, A. et al. The Structure of Bowman-Birk Type Protease Inhibitor A-il from Peanut (Arachis hypogaea) at 3.3 Å Resolution1. J. Biochem. 101, 267– 274 (1987).

10. Jones, S. & Thornton, J. M. Review Principles of protein-protein interactions. Proc. Natl. Acad. Sci. (1996).

11. Thiel, P., Kaiser, M. & Ottmann, C. Small-molecule stabilization of protein- protein interactions: An underestimated concept in drug discovery? Angewandte Chemie - International Edition (2012). doi:10.1002/anie.201107616

12. Stumpf, M. P. H. et al. Estimating the size of the human interactome. Proc. Natl. Acad. Sci. (2008). doi:10.1073/pnas.0708078105

13. Venkatesan, K. et al. An empirical framework for binary interactome mapping. Nat. Methods (2009). doi:10.1038/nmeth.1280

14. Zhang, Q. C. et al. Structure-based prediction of protein-protein interactions on a genome-wide scale. Nature (2012). doi:10.1038/nature11503

15. Meyer, M. J. et al. Interactome INSIDER: a structural interactome browser for genomic studies. Nat. Methods 15, 107 (2018).

16. Matveev, V. V. Native aggregation as a cause of origin of temporary cellular structures needed for all forms of cellular activity, signaling and transformations. Theoretical Biology and Medical Modelling (2010). doi:10.1186/1742-4682-7-19

17. García-Sáez, A. J. The secrets of the Bcl-2 family. Cell Death and Differentiation (2012). doi:10.1038/cdd.2012.105

126

18. Kale, J., Liu, Q., Leber, B. & Andrews, D. W. Shedding light on apoptosis at subcellular membranes. Cell (2012). doi:10.1016/j.cell.2012.11.013

19. Huart, A. S., MacLaine, N. J., Narayan, V. & Hupp, T. R. Exploiting the MDM2- CK1α protein-protein interface to develop novel biologics that induce UBL- kinase-modification and inhibit cell growth. PLoS One (2012). doi:10.1371/journal.pone.0043391

20. Roberts, K. E., Cushing, P. R., Boisguerin, P., Madden, D. R. & Donald, B. R. Computational design of a PDZ domain peptide inhibitor that rescues CFTR activity. PLoS Comput. Biol. (2012). doi:10.1371/journal.pcbi.1002477

21. De Simone, A. et al. Intrinsic disorder modulates protein self-assembly and aggregation. Proc. Natl. Acad. Sci. (2012). doi:10.1073/pnas.1118048109

22. Fields, S. & Song, O. K. A novel genetic system to detect protein-protein interactions. Nature (1989). doi:10.1038/340245a0

23. Williams, N. E. Chapter 23 Immunoprecipitation Procedures. Methods Cell Biol. 62, 449–453 (1999).

24. Phizicky, E. M. & Fields, S. Protein-protein interactions : methods for detection and analysis. Microbiology (1995).

25. Hansen, J. C., Lebowitz, J. & Demeler, B. Analytical Ultracentrifugation of Complex Macromolecular Systems. Biochemistry (1994). doi:10.1021/bi00249a001

26. Doyle, M. L. Characterization of binding interactions by isothermal titration calorimetry. Curr. Opin. Biotechnol. (1997). doi:10.1016/S0958-1669(97)80154- 1

27. Lakey, J. H. & Raggett, E. M. Measuring protein-protein interactions. Curr. Opin. Struct. Biol. (1998). doi:10.1016/S0959-440X(98)80019-5

28. Mankoff, D. A. A definition of molecular imaging. J. Nucl. Med. (2007).

29. Pfleger, K. D. G. et al. Extended bioluminescence resonance energy transfer (eBRET) for monitoring prolonged protein-protein interactions in live cells. Cell. Signal. (2006). doi:10.1016/j.cellsig.2006.01.004

127

30. Pfleger, K. D. G., Seeber, R. M. & Eidne, K. A. Bioluminescence resonance energy transfer (BRET) for the real-time detection of protein-protein interactions. Nat. Protoc. (2006). doi:10.1038/nprot.2006.52

31. Giepmans, B. N. G., Adams, S. R., Ellisman, M. H. & Tsien, R. Y. The fluorescent toolbox for assessing protein location and function. Science (2006). doi:10.1126/science.1124618

32. Massoud, T. F., Paulmurugan, R. & Gambhir, S. S. A molecularly engineered split reporter for imaging protein-protein interactions with positron emission tomography. Nat. Med. (2010). doi:10.1038/nm.2185

33. Massoud, T. F., Paulmurugan, R., De, A., Ray, P. & Gambhir, S. S. Reporter gene imaging of protein-protein interactions in living subjects. Current Opinion in Biotechnology (2007). doi:10.1016/j.copbio.2007.01.007

34. Villalobos, V., Naik, S. & Piwnica-Worms, D. Current State of Imaging Protein- Protein Interactions In Vivo with Genetically Encoded Reporters. Annu. Rev. Biomed. Eng. (2007). doi:10.1146/annurev.bioeng.9.060906.152044

35. Vaynberg, J. & Qin, J. Weak protein-protein interactions as probed by NMR spectroscopy. Trends in Biotechnology (2006). doi:10.1016/j.tibtech.2005.09.006

36. Cheng, Y. Single-particle Cryo-EM at crystallographic resolution. Cell (2015). doi:10.1016/j.cell.2015.03.049

37. Stites, W. E. Protein-protein interactions: Interface structure, binding thermodynamics, and mutational analysis. Chem. Rev. (1997). doi:10.1021/cr960387h

38. Young, L., Jernigan, R. L. & Covell, D. G. A role for surface hydrophobicity in protein‐protein recognition. Protein Sci. (1994). doi:10.1002/pro.5560030501

39. Bogan, A. A. & Thorn, K. S. Anatomy of hot spots in protein interfaces. J. Mol. Biol. (1998). doi:10.1006/jmbi.1998.1843

40. Conte, L. Lo, Chothia, C. & Janin, J. The atomic structure of protein-protein recognition sites. J. Mol. Biol. (1999). doi:10.1006/jmbi.1998.2439

128

41. Archakov, A. I. et al. Protein-protein interactions as a target for drugs in proteomics. Proteomics (2003). doi:10.1002/pmic.200390053

42. Tsai, C.-J., Lin, S. L., Wolfson, H. J. & Nussinov, R. Studies of protein-protein interfaces: A statistical analysis of the hydrophobic effect. Protein Sci. (1997). doi:10.1002/pro.5560060106

43. Janin, J., Miller, S. & Chothia, C. Surface, subunit interfaces and interior of oligomeric proteins. J. Mol. Biol. (1988). doi:10.1016/0022-2836(88)90606-7

44. Fuentes, E. J., Gilmore, S. A., Mauldin, R. V. & Lee, A. L. Evaluation of Energetic and Dynamic Coupling Networks in a PDZ Domain Protein. J. Mol. Biol. (2006). doi:10.1016/j.jmb.2006.08.076

45. James, L. C., Roversi, P. & Tawfik, D. S. Antibody multispecificity mediated by conformational diversity. Science (80-. ). (2003). doi:10.1126/science.1079731

46. Lindner, A. B., Eshhar, Z. & Tawfik, D. S. Conformational changes affect binding and catalysis by ester-hydrolysing antibodies. J. Mol. Biol. (1999). doi:10.1006/jmbi.1998.2309

47. Popovych, N., Sun, S., Ebright, R. H. & Kalodimos, C. G. Dynamically driven protein allostery. Nat. Struct. Mol. Biol. (2006). doi:10.1038/nsmb1132

48. Cunningham, B. C. & Wells, J. A. High-resolution epitope mapping of hGH- receptor interactions by alanine-scanning mutagenesis. Science (80-. ). (1989). doi:10.1126/science.2471267

49. Clackson, T. & Wells, J. A. A hot spot of binding energy in a hormone-receptor interface. Science (80-. ). (1995). doi:10.1126/science.7529940

50. DeLano, W. L. Unraveling hot spots in binding interfaces: Progress and challenges. Current Opinion in Structural Biology (2002). doi:10.1016/S0959- 440X(02)00283-X

51. Moreira, I. S., Fernandes, P. A. & Ramos, M. J. Computational Determination of the Relative Free Energy of Binding – Application to Alanine Scanning Mutagenesis. in Molecular Materials with Specific Interactions – Modeling and Design (2007). doi:10.1007/1-4020-5372-x_6

129

52. Moreira, I. S., Fernandes, P. A. & Ramos, M. J. Hot spots - A review of the protein-protein interface determinant amino-acid residues. Proteins: Structure, Function and Genetics (2007). doi:10.1002/prot.21396

53. Rosenberg, S. A. et al. Observations on the systemic administration of autologous -activated killer cells and recombinant interleukin-2 to patients with metastatic cancer. N. Engl. J. Med. 313, 1485–1492 (1985).

54. Selinger, M. J. et al. -induced synthesis of serum amyloid A protein by hepatocytes. Nature 285, 498 (1980).

55. Dumonde, D. C. et al. �Lymphokines�: Non-antibody mediators of cellular immunity generated by lymphocyte activation. Nature 224, 38 (1969).

56. Cohen, S., Bigazzi, P. E. & Yoshida, T. Similarities of T cell function in cell- mediated immunity and antibody production. Cell. Immunol. 12, 150–159 (1974).

57. Dinarello, C. A. Proinflammatory cytokines. Chest (2000). doi:10.1378/chest.118.2.503

58. Vacchelli, E. et al. Trial watch: Immunostimulatory cytokines. OncoImmunology (2012). doi:10.4161/onci.20459

59. Comerford, I. & McColl, S. R. Mini-review series: focus on chemokines. Immunol. Cell Biol. (2011). doi:10.1038/icb.2010.164

60. Lyman, G. H. & Dale, D. C. Introduction to the hematopoietic growth factors. Cancer Treatment and Research (2011). doi:10.1007/978-1-4419-7073-2_1

61. O’Neill, L. A. J. The Interleukin-1 Receptor/Toll-like Receptor Superfamily: Signal Transduction During Inflammation and Host Defense. Sci. Signal. (2013). doi:10.1126/stke.442000re1

62. Fickenscher, H. et al. The interleukin-10 family of cytokines. Trends in Immunology (2002). doi:10.1016/S1471-4906(01)02149-4

63. Iwakura, Y., Ishigame, H., Saijo, S. & Nakae, S. Functional Specialization of Interleukin-17 Family Members. Immunity (2011). doi:10.1016/j.immuni.2011.02.012

130

64. Chang, S. H. & Dong, C. Signaling of interleukin-17 family cytokines in immunity and inflammation. Cellular Signalling (2011). doi:10.1016/j.cellsig.2010.11.022

65. Katze, M. G., He, Y. & Gale, M. and : A fight for supremacy. Nature Reviews Immunology (2002). doi:10.1038/nri888

66. González-Navajas, J. M., Lee, J., David, M. & Raz, E. Immunomodulatory functions of type i interferons. Nature Reviews Immunology (2012). doi:10.1038/nri3133

67. Pietras, K., Sjöblom, T., Rubin, K., Heldin, C. H. & Östman, A. PDGF receptors as cancer drug targets. (2003). doi:10.1016/S1535-6108(03)00089- 8

68. Schmierer, B. & Hill, C. S. TGFβ-SMAD signal transduction: Molecular specificity and functional flexibility. Nature Reviews Molecular Cell Biology (2007). doi:10.1038/nrm2297

69. Wang, S. & El-Deiry, W. S. TRAIL and apoptosis induction by TNF-family death receptors. Oncogene (2003). doi:10.1038/sj.onc.1207232

70. Palladino, M. A., Bahjat, F. R., Theodorakis, E. A. & Moldawer, L. L. Anti-TNF- α therapies: The next generation. Nature Reviews Drug Discovery (2003). doi:10.1038/nrd1175

71. Bazan, J. F. Structural design and molecular evolution of a cytokine receptor superfamily. Proc. Natl. Acad. Sci. U. S. A. 87, 6934–6938 (1990).

72. de Vos, A. M., Ultsch, M. & Kossiakoff, A. A. Human growth hormone and extracellular domain of its receptor: crystal structure of the complex. Science 255, 306–312 (1992).

73. Timmermann, A., Kuster, A., Kurth, I., Heinrich, P. C. & Muller-Newen, G. A functional role of the membrane-proximal extracellular domains of the signal transducer gp130 in heterodimerization with the leukemia inhibitory factor receptor. Eur. J. Biochem. 269, 2716–2726 (2002).

74. Wang, X., Lupardus, P., LaPorte, S. L. & Garcia, K. C. Structural biology of shared cytokine receptors. Annu. Rev. Immunol. 27, 29–60 (2009).

131

75. Rochman, Y., Spolski, R. & Leonard, W. J. New insights into the regulation of T cells by $γ$ c family cytokines. Nat. Rev. Immunol. 9, 480 (2009).

76. Sakaguchi, S., Yamaguchi, T., Nomura, T. & Ono, M. Regulatory T cells and immune tolerance. Cell 133, 775–787 (2008).

77. Waldmann, T. A. The biology of interleukin-2 and interleukin-15: implications for cancer therapy and vaccine design. Nat. Rev. Immunol. 6, 595 (2006).

78. Holgate, S. T. & Polosa, R. Treatment strategies for allergy and asthma. Nat. Rev. Immunol. 8, 218 (2008).

79. Spits, H. et al. Innate lymphoid cells—a proposal for uniform nomenclature. Nat. Rev. Immunol. 13, 145 (2013).

80. Hauber, H.-P., Bergeron, C. & Hamid, Q. IL-9 in allergic inflammation. Int. Arch. Allergy Immunol. 134, 79–87 (2004).

81. Lu, Y. et al. Th9 cells promote antitumor immune responses in vivo. J. Clin. Invest. 122, 4160–4171 (2012).

82. Kennedy, M. K. et al. Reversible defects in natural killer and memory CD8 T cell lineages in --deficient mice. J. Exp. Med. 191, 771–780 (2000).

83. Spolski, R. & Leonard, W. J. Interleukin-21: a double-edged sword with therapeutic potential. Nat. Rev. Drug Discov. 13, 379 (2014).

84. Burton, J. D. et al. A lymphokine, provisionally designated interleukin T and produced by a human adult T-cell leukemia line, stimulates T- and the induction of lymphokine-activated killer cells. Proc. Natl. Acad. Sci. 91, 4935–4939 (1994).

85. Grabstein, K. H. et al. Cloning of a T cell growth factor that interacts with the beta chain of the interleukin-2 receptor. Science (80-. ). 264, 965–968 (1994).

86. Hanick, N. A. et al. Elucidation of the interleukin-15 binding site on its alpha receptor by NMR. Biochemistry 46, 9453–9461 (2007).

87. Chirifu, M. et al. Crystal structure of the IL-15-IL-15Ralpha complex, a cytokine- receptor unit presented in trans. Nat. Immunol. 8, 1001–1007 (2007).

88. Wang, X., Rickert, M. & Garcia, K. C. Structure of the Quaternary Complex of

132

Interleukin-2 with Its $α$, ß, and $γ$c Receptors. Science (80-. ). 310, 1159– 1163 (2005).

89. Zurawski, S. M. et al. Definition and spatial location of mouse interleukin-2 residues that interact with its heterotrimeric receptor. EMBO J. 12, 5113–5119 (1993).

90. Onu, A., Pohl, T., Krause, H. & Bulfone-Paus, S. Regulation of IL-15 secretion via the leader peptide of two IL-15 isoforms. J. Immunol. 158, 255–262 (1997).

91. Tagaya, Y. et al. Generation of secretable and nonsecretable interleukin 15 isoforms through alternate usage of signal peptides. Proc. Natl. Acad. Sci. 94, 14444–14449 (1997).

92. Gaggero, A. et al. Differential intracellular trafficking, secretion and endosomal localization of two IL-15 isoforms. Eur. J. Immunol. 29, 1265–1274 (1999).

93. Kurys, G., Tagaya, Y., Bamford, R., Hanover, J. A. & Waldmann, T. A. The long signal peptide isoform and its alternative processing direct the intracellular trafficking of interleukin-15. J. Biol. Chem. 275, 30653–30659 (2000).

94. Nishimura, H. et al. Differential roles of interleukin 15 mRNA isoforms generated by alternative splicing in immune responses in vivo. J. Exp. Med. 191, 157–170 (2000).

95. Fehniger, T. A. & Caligiuri, M. A. Interleukin 15: Biology and relevance to human disease. Blood (2001). doi:10.1182/blood.V97.1.14

96. Granucci, F. et al. Inducible IL-2 production by dendritic cells revealed by global analysis. Nat. Immunol. 2, 882 (2001).

97. Doherty, T. M., Seder, R. A. & Sher, A. Induction and regulation of IL-15 expression in murine macrophages. J. Immunol. 156, 735–741 (1996).

98. Musso, T. et al. Human monocytes constitutively express membrane-bound, biologically active, and interferon-$γ$--upregulated interleukin-15. Blood 93, 3531–3539 (1999).

99. Leclercq, G., Debacker, V., De Smedt, M. & Plum, J. Differential effects of interleukin-15 and interleukin-2 on differentiation of bipotential T/natural killer

133

progenitor cells. J. Exp. Med. 184, 325–336 (1996).

100. Mrózek, E., Anderson, P. & Caligiuri, M. A. Role of interleukin-15 in the development of human CD56+ natural killer cells from CD34+ hematopoietic progenitor cells. Blood 87, 2632–2640 (1996).

101. Murray, A. M., Simm, B. & Beagley, K. W. Cytokine gene expression in murine fetal intestine: potential for extrathymic T cell development. Cytokine 10, 337– 345 (1998).

102. Kawai, K. et al. Requirement of the IL-2 receptor $β$ chain for the development of V$γ$3 dendritic epidermal T cells. J. Invest. Dermatol. 110, 961–965 (1998).

103. Mohamadzadeh, M. et al. Ultraviolet B radiation up-regulates the expression of IL-15 in human skin. J. Immunol. 155, 4492–4496 (1995).

104. Waldmann, T. A. Targeting the interleukin-15/interleukin-15 receptor system in inflammatory autoimmune diseases. Arthritis Res Ther 6, 174 (2004).

105. Azimi, N. et al. Human T cell lymphotropic virus type I Tax protein trans-activates interleukin 15 gene transcription through an NF-$κ$B site. Proc. Natl. Acad. Sci. 95, 2452–2457 (1998).

106. Ogasawara, K. et al. Requirement for IRF-1 in the microenvironment supporting development of natural killer cells. Nature 391, 700 (1998).

107. Ohteki, T. et al. The Interferon Regulatory Factor 1 (IRF-1) Is Important during the Maturation of Natural Killer 1.1+ T Cell Receptor-- $α$/$β$+(NK1+ T) Cells, Natural Killer Cells, and Intestinal Intraepithelial T Cells. J. Exp. Med. 187, 967–972 (1998).

108. Kozak, M. Regulation of translation in eukaryotic systems. Annu. Rev. Cell Biol. 8, 197–225 (1992).

109. Mortier, E., Woo, T., Advincula, R., Gozalo, S. & Ma, A. IL-15R$α$ chaperones IL-15 to stable membrane complexes that activate NK cells via trans presentation. J. Exp. Med. 205, 1213–1225 (2008).

110. Takeshita, T. et al. Cloning of the gamma chain of the human IL-2 receptor. Science (80-. ). 257, 379–382 (1992).

134

111. Thèze, J., Alzari, P. M. & Bertoglio, J. and its receptors: recent advances and new immunological functions. Immunol. Today 17, 481–486 (1996).

112. Bosco, M. C. et al. Regulation by interleukin-2 (IL-2) and of IL-2 receptor gamma chain gene expression in human monocytes. Blood 83, 2995–3002 (1994).

113. Dukovich, M. et al. A second human interleukin-2 binding protein that may be a component of high-affinity interleukin-2 receptors. Nature 327, 518 (1987).

114. Tsudo, M., Kozak, R. W., Goldman, C. K. & Waldmann, T. A. Demonstration of a non-Tac peptide that binds interleukin 2: a potential participant in a multichain interleukin 2 receptor complex. Proc. Natl. Acad. Sci. 83, 9694–9698 (1986).

115. Hatakeyama, M. et al. Interleukin-2 receptor beta chain gene: generation of three receptor forms by cloned human alpha and beta chain cDNA’s. Science (80-. ). 244, 551–556 (1989).

116. Kim, H. P., Imbert, J. & Leonard, W. J. Both integrated and differential regulation of components of the IL-2/IL-2 receptor system. Cytokine Growth Factor Rev. 17, 349–366 (2006).

117. Anderson, D. M. et al. Functional Characterization of the Human Interleukin-15 Receptor $α$Chain and Close Linkage of IL15RA and IL2RA Genes. J. Biol. Chem. 270, 29862–29869 (1995).

118. Krause, H. et al. GENOMIC STRUCTURE AND CHROMOSOMAL LOCALIZATION OF THE HUMAN INTERLEUKIN 15 GENE (IL--15). Cytokine 8, 667–674 (1996).

119. Giri, J. G. et al. Identification and cloning of a novel IL-15 binding protein that is structurally related to the alpha chain of the IL-2 receptor. EMBO J. 14, 3654– 3663 (1995).

120. Anderson, D. M. et al. Chromosomal assignment and genomic structure of Il15. Genomics 25, 701–706 (1995).

121. Hanisch, U.-K. et al. Mouse brain microglia express interleukin-15 and its multimeric receptor complex functionally coupled to Janus kinase activity. J.

135

Biol. Chem. 272, 28853–28860 (1997).

122. Kurowska, M., Rudnicka, W., Maślińska, D. & Maśliński, W. Expression of IL-15 and IL-15 receptor isoforms in select structures of human fetal brain. Ann. N. Y. Acad. Sci. 966, 441–445 (2002).

123. De Jong, J. L., Farner, N. L., Widmer, M. B., Giri, J. G. & Sondel, P. M. Interaction of IL-15 with the shared IL-2 receptor beta and gamma c subunits. The IL-15/beta/gamma c receptor-ligand complex is less stable than the IL- 2/beta/gamma c receptor-ligand complex. J. Immunol. 156, 1339–1348 (1996).

124. Burkett, P. R. et al. Coordinate expression and trans presentation of interleukin (IL)-15R$α$ and IL-15 supports and memory CD8+ T cell homeostasis. J. Exp. Med. 200, 825–834 (2004).

125. Dubois, S., Mariner, J., Waldmann, T. A. & Tagaya, Y. IL-15R$α$ recycles and presents IL-15 in trans to neighboring cells. Immunity 17, 537–547 (2002).

126. Mortier, E. et al. -and dendritic-cell-derived interleukin-15 receptor alpha supports homeostasis of distinct CD8+ T cell subsets. Immunity 31, 811– 822 (2009).

127. Marçais, A. et al. Regulation of mouse NK cell development and function by cytokines. Front. Immunol. 4, 450 (2013).

128. Cooper, M. A. et al. In vivo evidence for a dependence on interleukin 15 for survival of natural killer cells. Blood 100, 3633–3638 (2002).

129. Koka, R. et al. Cutting edge: murine dendritic cells require IL-15R$α$ to prime NK cells. J. Immunol. 173, 3594–3598 (2004).

130. Prlic, M., Blazar, B. R., Farrar, M. A. & Jameson, S. C. In vivo survival and homeostatic proliferation of natural killer cells. J. Exp. Med. 197, 967–976 (2003).

131. Ranson, T. et al. IL-15 is an essential mediator of peripheral NK-cell homeostasis. Blood 101, 4887–4893 (2003).

132. Huntington, N. D. et al. Interleukin 15--mediated survival of natural killer cells is determined by interactions among Bim, Noxa and Mcl-1. Nat. Immunol. 8, 856

136

(2007).

133. Carson, W. E. et al. Interleukin (IL) 15 is a novel cytokine that activates human natural killer cells via components of the IL-2 receptor. J. Exp. Med. 180, 1395– 1403 (1994).

134. Matsuda, J. L. & Gapin, L. Developmental program of mouse V$α$14i NKT cells. Curr. Opin. Immunol. 17, 122–130 (2005).

135. Gordy, L. E. et al. IL-15 regulates homeostasis and terminal maturation of NKT cells. J. Immunol. 187, 6335–6345 (2011).

136. Kanegane, H. & Tosato, G. Activation of naive and memory T cells by interleukin-15. Blood 88, 230–235 (1996).

137. Li, X. C. et al. IL-15 and IL-2: a matter of life and death for T cells in vivo. Nat. Med. 7, 114 (2001).

138. Bulfone-Paus, S. et al. Interleukin-15 protects from lethal apoptosis in vivo. Nat. Med. 3, 1124 (1997).

139. Perera, L. P., Goldman, C. K. & Waldmann, T. A. IL-15 induces the expression of chemokines and their receptors in T lymphocytes. J. Immunol. 162, 2606– 2612 (1999).

140. Edelbaum, D., Mohamadzadeh, M., Bergstresser, P. R., Sugamura, K. & Takashima, A. Interleukin (IL)-15 promotes the growth of murine epidermal $γ$$δ$ T cells by a mechanism involving the $β$-and $γ$c-chains of the IL-2 receptor. J. Invest. Dermatol. 105, 837–843 (1995).

141. Yu, Q. et al. MyD88-dependent signaling for IL-15 production plays an important role in maintenance of CD8$α$$α$ TCR$α$$β$ and TCR$γ$$δ$ intestinal intraepithelial lymphocytes. J. Immunol. 176, 6180–6185 (2006).

142. Ebert, E. C. Interleukin 15 is a potent stimulant of intraepithelial lymphocytes. Gastroenterology 115, 1439–1445 (1998).

143. Badolato, R., Ponzi, A. N., Millesimo, M., Notarangelo, L. D. & Musso, T. Interleukin-15 (IL-15) induces IL-8 and chemotactic protein 1 production in human monocytes. Blood 90, 2804–2809 (1997).

137

144. Saikh, K. U., Kissner, T. L., Nystrom, S., Ruthel, G. & Ulrich, R. G. Interleukin- 15 increases vaccine efficacy through a mechanism linked to dendritic cell maturation and enhanced antibody titers. Clin. Vaccine Immunol. 15, 131–137 (2008).

145. Masuda, A., Matsuguchi, T., Yamaki, K., Hayakawa, T. & Yoshikai, Y. Interleukin-15 prevents mouse apoptosis through STAT6-mediated Bcl-xL expression. J. Biol. Chem. 276, 26107–26113 (2001).

146. Armitage, R. J., Macduff, B. M., Eisenman, J., Paxton, R. & Grabstein, K. H. IL- 15 has stimulatory activity for the induction of B cell proliferation and differentiation. J. Immunol. 154, 483–490 (1995).

147. Girard, D., Paquet, M.-E., Paquin, R. & Beaulieu, A. D. Differential effects of interleukin-15 (IL-15) and IL-2 on human neutrophils: modulation of phagocytosis, cytoskeleton rearrangement, gene expression, and apoptosis by IL-15. Blood 88, 3176–3184 (1996).

148. Hoontrakoon, R. et al. Interleukin-15 Inhibits Spontaneous Apoptosis in Human Eosinophils via Autocrine Production of Granulocyte Macrophage--Colony Stimulating Factor and Nuclear Factor-$κ$ B Activation. Am. J. Respir. Cell Mol. Biol. 26, 404–412 (2002).

149. Yang, L., Thornton, S. & Grom, A. A. Interleukin-15 inhibits sodium nitroprusside--induced apoptosis of synovial fibroblasts and vascular endothelial cells. Arthritis Rheum. Off. J. Am. Coll. Rheumatol. 46, 3010–3014 (2002).

150. Busquets, S., Figueras, M., Almendro, V., López-Soriano, F. J. & Argilés, J. M. Interleukin-15 increases glucose uptake in skeletal muscle An antidiabetogenic effect of the cytokine. Biochim. Biophys. Acta (BBA)-General Subj. 1760, 1613– 1617 (2006).

151. Quinn, L. S., Strait-Bodey, L., Anderson, B. G., Argilés, J. M. & Havel, P. J. Interleukin-15 stimulates adiponectin secretion by 3T3-L1 adipocytes: evidence for a skeletal muscle-to-fat signaling pathway. Cell Biol. Int. 29, 449–457 (2005).

152. Shinozaki, M. et al. IL-15, a survival factor for kidney epithelial cells, counteracts apoptosis and inflammation during nephritis. J. Clin. Invest. 109, 951–960

138

(2002).

153. Briard, D., Brouty-Boyé, D., Azzarone, B. & Jasmin, C. Fibroblasts from human spleen regulate NK cell differentiation from blood CD34+ progenitors via cell surface IL-15. J. Immunol. 168, 4326–4332 (2002).

154. Gómez-Nicola, D., Valle-Argos, B., Pallas-Bazarra, N. & Nieto-Sampedro, M. Interleukin-15 regulates proliferation and self-renewal of adult neural stem cells. Mol. Biol. Cell 22, 1960–1970 (2011).

155. Huang, Y.-S. et al. Effects of interleukin-15 on neuronal differentiation of neural stem cells. Brain Res. 1304, 38–48 (2009).

156. Baslund, B. et al. Targeting interleukin-15 in patients with rheumatoid arthritis: a proof-of-concept study. Arthritis Rheum. 52, 2686–2692 (2005).

157. Avice, M.-N. et al. IL-15 promotes IL-12 production by human monocytes via T cell-dependent contact and may contribute to IL-12-mediated IFN-$γ$ secretion by CD4+ T cells in the absence of TCR ligation. J. Immunol. 161, 3408–3415 (1998).

158. Möttönen, M. et al. Interleukin-15 up-regulates the expression of CD154 on synovial fluid T cells. Immunology 100, 238 (2000).

159. Liu, T., Nishimura, H., Matsuguchi, T. & Yoshikai, Y. Differences in interleukin- 12 and-15 production by dendritic cells at the early stage of Listeria monocytogenes infection between BALB/c and C57 BL/6 mice. Cell. Immunol. 202, 31–40 (2000).

160. Ohta, N. et al. IL-15-dependent activation-induced cell death-resistant Th1 type CD8$α$$β$+ NK1. 1+ T cells for the development of small intestinal inflammation. J. Immunol. 169, 460–468 (2002).

161. Malamut, G. et al. IL-15 triggers an antiapoptotic pathway in human intraepithelial lymphocytes that is a potential new target in celiac disease-- associated inflammation and lymphomagenesis. J. Clin. Invest. 120, 2131–2143 (2010).

162. Steel, J. C., Waldmann, T. A. & Morris, J. C. Interleukin-15 biology and its therapeutic implications in cancer. Trends Pharmacol. Sci. 33, 35–41 (2012).

139

163. Barzegar, C. et al. IL-15 is produced by a subset of human melanomas, and is involved in the regulation of markers of progression through juxtacrine loops. Oncogene 16, 2503 (1998).

164. Doucet, C. et al. Role of interleukin (IL)-2 and IL-15 in the tumour progression of a melanoma cell line MELP, derived from an IL-2 progressor patient. Melanoma Res. 7, S7--17 (1997).

165. Nicot, C. Current views in HTLV-I-associated adult T-cell leukemia/lymphoma. American Journal of Hematology (2005). doi:10.1002/ajh.20307

166. Azimi, N. et al. Human T cell lymphotropic virus type I Tax protein trans-activates interleukin 15 gene transcription through an NF- B site. Proc. Natl. Acad. Sci. (2002). doi:10.1073/pnas.95.5.2452

167. Trentin, L. et al. Interleukin-15 promotes the growth of leukemic cells of patients with B-cell chronic lymphoproliferative disorders. Blood 87, 3327–3335 (1996).

168. Hinshelwood, C. N. & Pauling, L. Amedeo Avogadro. (1956).

169. Benfey, O. T. August Kekule and the birth of the structural theory of organic chemistry in 1858. J. Chem. Educ. (2009). doi:10.1021/ed035p21

170. Kaufmann, S. H. E. Paul Ehrlich: Founder of chemotherapy. Nature Reviews Drug Discovery (2008). doi:10.1038/nrd2582

171. Serturner, F. W. Ueber das Morphium, eine neue salifahige Grundlage, und die Mekonsaure als Hauptbestandtheile des Opiums. Ann. Phys. 25, 56–89 (1817).

172. Merck, G. Vorläufige Notiz über eine neue organische Base im Opium. Justus Liebigs Ann. Chem. 66, 125–128 (1848).

173. Fleming, A. On the antibacterial action of cultures of a penicillium, with special reference to their use in the isolation of B. influenzae. Br. J. Exp. Pathol. 10, 226 (1929).

174. Meldrum, N. U. & Roughton, F. J. W. Carbonic anhydrase. Its preparation and properties. J. Physiol. 80, 113–142 (1933).

175. Colebrook, L. & Kenny, M. Treatment of human puerperal infections, and of experimental infections in mice, with prontosil. Lancet 227, 1279–1281 (1936).

140

176. Langley, J. N. On the reaction of cells and of nerve-endings to certain poisons, chiefly as regards the reaction of striated muscle to nicotine and to curari. J. Physiol. 33, 374–413 (1905).

177. Buske, C., Feuring-Buske, M., Unterhalt, M. & Hiddemann, W. Monoclonal antibody therapy for B cell non-Hodgkin�s lymphomas: Emerging concepts of a tumour-targeted strategy. Eur. J. Cancer 35, 549–557 (1999).

178. Kaldor, S. W. et al. Viracept (nelfinavir mesylate, AG1343): a potent, orally bioavailable inhibitor of HIV-1 protease. J. Med. Chem. 40, 3979–3985 (1997).

179. Patel, D. V & Gordon, E. M. Applications of small-molecule combinatorial chemistry to drug discovery. Drug Discov. Today 1, 134–144 (1996).

180. Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 23, 3–25 (1997).

181. Testa, B., Carrupt, P.-A., Gaillard, P., Billois, F. & Weber, P. Lipophilicity in molecular modeling. Pharm. Res. 13, 335–343 (1996).

182. Moriguchi, I., HIRONO, S., LIU, Q., NAKAGOME, I. & MATSUSHITA, Y. Simple method of calculating octanol/water partition coefficient. Chem. Pharm. Bull. 40, 127–130 (1992).

183. Abraham, M. H., Chadha, H. S., Whiting, G. S. & Mitchell, R. C. Hydrogen bonding. 32. An analysis of water-octanol and water-alkane partitioning and the $Δ$log P parameter of Seiler. J. Pharm. Sci. 83, 1085–1100 (1994).

184. Paterson, D. A., Conradi, R. A., Hilgers, A. R., Vidmar, T. J. & Burton, P. S. A non-aqueous partitioning system for predicting the oral absorption potential of peptides. Quant. Struct. Relationships 13, 4–10 (1994).

185. Babine, R. E. & Bender, S. L. Molecular recognition of protein- ligand complexes: Applications to drug design. Chem. Rev. 97, 1359–1472 (1997).

186. Villoutreix, B. O. et al. Drug-Like Protein Protein Interaction Modulators: Challenges and Opportunities for Drug Discovery and Chemical Biology. Mol. Inform. 33, 414–437 (2014).

141

187. Hopkins, A. L. & Groom, C. R. The druggable genome. Nat. Rev. Drug Discov. 1, 727 (2002).

188. Wells, J. A. & McClendon, C. L. Reaching for high-hanging fruit in drug discovery at protein--protein interfaces. Nature 450, 1001 (2007).

189. Thanos, C. D., DeLano, W. L. & Wells, J. A. Hot-spot mimicry of a cytokine receptor by a small molecule. Proc. Natl. Acad. Sci. 103, 15422–15427 (2006).

190. DeLano, W. L., Ultsch, M. H., Wells, J. A. & others. Convergent solutions to binding at a protein-protein interface. Science (80-. ). 287, 1279–1283 (2000).

191. Villoutreix, B. O., Labbé, C. M., Lagorce, D., Laconde, G. & Sperandio, O. A leap into the chemical space of protein-protein interaction inhibitors. Curr. Pharm. Des. 18, 4648–4667 (2012).

192. Lipinski, C. A. Rule of five in 2015 and beyond: Target and ligand structural limitations, ligand chemistry structure and drug discovery project decisions. Adv. Drug Deliv. Rev. 101, 34–41 (2016).

193. Kuenemann, M. A., Bourbon, L. M. L., Labb??, C. M., Villoutreix, B. O. & Sperandio, O. Which three-dimensional characteristics make efficient inhibitors of protein-protein interactions? J. Chem. Inf. Model. 54, 3067–3079 (2014).

194. Vogler, M., Dinsdale, D., Dyer, M. J. S. & Cohen, G. M. Bcl-2 inhibitors: small molecules with a big impact on cancer therapy. Cell Death Differ. 16, 360 (2009).

195. Sleebs, B. E. et al. Quinazoline sulfonamides as dual binders of the proteins B- cell lymphoma 2 and B-cell lymphoma extra long with potent proapoptotic cell- based activity. J. Med. Chem. 54, 1914–1926 (2011).

196. Tanaka, Y. et al. Discovery of potent Mcl-1/Bcl-xL dual inhibitors by using a hybridization strategy based on structural analysis of target proteins. J. Med. Chem. 56, 9635–9645 (2013).

197. Rickert, M., Wang, X., Boulanger, M. J., Goriatcheva, N. & Garcia, K. C. The structure of interleukin-2 complexed with its alpha receptor. Science (80-. ). 308, 1477–1480 (2005).

198. Tilley, J. W. et al. Identification of a small molecule inhibitor of the IL-2/IL-2R$α$

142

receptor interaction which binds to IL-2. J. Am. Chem. Soc. 119, 7589–7590 (1997).

199. Braisted, A. C. et al. Discovery of a potent small molecule IL-2 inhibitor through fragment assembly. J. Am. Chem. Soc. 125, 3714–3715 (2003).

200. Kalsoom, S. et al. In vitro and in silico exploration of IL-2 inhibition by small drug- like molecules. Med. Chem. Res. 22, 5739–5751 (2013).

201. Ruchatz, H., Leung, B. P., Wei, X., McInnes, I. B. & Liew, F. Y. Soluble IL-15 receptor $α$-chain administration prevents murine collagen-induced arthritis: a role for IL-15 in development of antigen-induced immunopathology. J. Immunol. 160, 5654–5660 (1998).

202. Baslund, B. et al. A novel human monoclonal antibody against IL-15 (humax- IL15) in patients with active rheumatoid arthritis (RA): Results of a double-blind, placebo-controlled phase I/II trial. in ARTHRITIS AND RHEUMATISM 48, S653- -S653 (2003).

203. Quéméner, A. et al. Discovery of a Small-Molecule Inhibitor of Interleukin 15: Pharmacophore-Based Virtual Screening and Hit Optimization. J. Med. Chem. (2017). doi:10.1021/acs.jmedchem.7b00485

204. Schlick, T. Molecular Modeling and Simulation: An Interdisciplinary Guide. 21, (Springer New York, 2010).

205. Saunders, M. G. & Voth, G. A. Coarse-Graining Methods for Computational Biology. Annu. Rev. Biophys. 42, 73–93 (2013).

206. Brooks, B. R. et al. CHARMM: The biomolecular simulation program. J. Comput. Chem. (2009). doi:10.1002/jcc.21287

207. Phillips, J. C. et al. Scalable molecular dynamics with NAMD. Journal of Computational Chemistry (2005). doi:10.1002/jcc.20289

208. Case, D. A. et al. The Amber biomolecular simulation programs. Journal of Computational Chemistry (2005). doi:10.1002/jcc.20290

209. Van Der Spoel, D. et al. GROMACS: Fast, flexible, and free. Journal of Computational Chemistry (2005). doi:10.1002/jcc.20291

143

210. Brooks, B. R. et al. CHARMM: A program for macromolecular energy, minimization, and dynamics calculations. J. Comput. Chem. (1983). doi:10.1002/jcc.540040211

211. Jones, J. E. On the Determination of Molecular Fields. II. From the Equation of State of a Gas. Proc. R. Soc. A Math. Phys. Eng. Sci. 106, 463–477 (1924).

212. Levitt, M. & Lifson, S. Refinement of protein conformations using a macromolecular energy minimization procedure. J. Mol. Biol. (1969). doi:10.1016/0022-2836(69)90421-5

213. Wiberg, K. B. A Scheme for Strain Energy Minimization. Application to the Cycloalkanes 1. J. Am. Chem. Soc. 87, 1070–1078 (1965).

214. Fletcher, R. & Reeves, M. Function minimization by conjugate gradients. The Computer Journal (1964). doi:10.1093/comjnl/7.2.149

215. Polak, E. & Ribiere, G. Note sur la convergence de méthodes de directions conjuguées. ESAIM Math. Model. Numer. Anal. - Modélisation Mathématique Anal. Numérique 3, 35–43 (1969).

216. Hestenes, M. R. & Stiefel, E. Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. (1934). 49, 409 (1952).

217. Chu, J.-W., Trout, B. L. & Brooks, B. R. A super-linear minimization scheme for the nudged elastic band method. J. Chem. Phys. 119, 12708–12717 (2003).

218. Fletcher, R. Practical Methods of Optimization. (John Wiley & Sons, Ltd, 2000). doi:10.1002/9781118723203

219. McCammon, J. A., Gelin, B. R. & Karplus, M. Dynamics of folded proteins. Nature 267, 585–590 (1977).

220. Karplus, M. & McCammon, J. A. Molecular dynamics simulations of biomolecules. Nat. Struct. Biol. 9, 646–652 (2002).

221. Chavent, M., Duncan, A. L. & Sansom, M. S. P. Molecular dynamics simulations of membrane proteins and their interactions: From nanoscale to mesoscale. Current Opinion in Structural Biology (2016). doi:10.1016/j.sbi.2016.06.007

222. Jensen, F. Introduction to Computational Chemistry. (John Wiley & Sons,

144

Incorporated, 2016).

223. Paterlini, M. G. & Ferguson, D. M. Constant temperature simulations using the Langevin equation with velocity Verlet integration. Chem. Phys. 236, 243–252 (1998).

224. Hockney, R. W. & Eastwood, J. W. Computer simulation using particles. (A. Hilger, 1988).

225. Rao, M., Berne, B. J. & Kalos, M. H. Computer simulation of the nucleation and thermodynamics of microclusters. J. Chem. Phys. 68, 1325–1336 (1978).

226. Kubo, R., Toda, M. & Hashitsume, N. Statistical Physics II : Nonequilibrium Statistical Mechanics. (Springer Berlin Heidelberg, 1991).

227. Brünger, A., Brooks, C. L. & Karplus, M. Stochastic boundary conditions for molecular dynamics simulations of ST2 water. Chem. Phys. Lett. 105, 495–500 (1984).

228. Mishra, B. & Schlick, T. The notion of error in Langevin dynamics. I. Linear analysis. J. Chem. Phys. 105, 299–318 (1996).

229. Hoover, W. G. (William G. Computational statistical mechanics. (Elsevier, 1991).

230. Hoover, W. G. Constant-pressure equations of motion. Phys. Rev. A 34, 2499– 2500 (1986).

231. Hoover, W. G. Canonical dynamics: Equilibrium phase-space distributions. Phys. Rev. A 31, 1695–1697 (1985).

232. Andersen, H. C. Rattle: A “velocity” version of the shake algorithm for molecular dynamics calculations. J. Comput. Phys. 52, 24–34 (1983).

233. Ryckaert, J.-P., Ciccotti, G. & Berendsen, H. J. . Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Comput. Phys. 23, 327–341 (1977).

234. Van Gunsteren, W. F. & Karplus, M. Effect of constraints on the dynamics of macromolecules. Macromolecules 15, 1528–1544 (1982).

235. Humphreys, D. D., Friesner, R. A. & Berne, B. J. A multiple-time-step Molecular Dynamics algorithm for macromolecules. J. Phys. Chem. (1994).

145

doi:10.1021/j100078a035

236. Zhao, Y., Kormos, B. L., Beveridge, D. L. & Baranger, A. M. Molecular dynamics simulation studies of a protein-RNA complex with a selectively modified binding interface. Biopolymers (2006). doi:10.1002/bip.20408

237. Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W. & Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926–935 (1983).

238. Mahoney, M. W. & Jorgensen, W. L. A five-site model for liquid water and the reproduction of the density anomaly by rigid, nonpolarizable potential functions. J. Chem. Phys. 112, 8910–8922 (2000).

239. Berendsen, H. J. C., Postma, J. P. M., van Gunsteren, W. F. & Hermans, J. Interaction Models for Water in Relation to Protein Hydration. in (1981). doi:10.1007/978-94-015-7658-1_21

240. Berendsen, H. J. C., Grigera, J. R. & Straatsma, T. P. The missing term in effective pair potentials. J. Phys. Chem. (1987). doi:10.1021/j100308a038

241. Glättli, A., Daura, X. & Van Gunsteren, W. F. Derivation of an improved simple point charge model for liquid water: SPC/A and SPC/L. J. Chem. Phys. (2002). doi:10.1063/1.1476316

242. Glättli, A., Daura, X. & Van Gunsteren, W. F. A novel approach for designing simple point charge models for liquid water with three interaction sites. J. Comput. Chem. (2003). doi:10.1002/jcc.10235

243. Darden, T., York, D. & Pedersen, L. Particle mesh Ewald: An N·log(N) method for Ewald sums in large systems. J. Chem. Phys. (1993). doi:10.1063/1.464397

244. Essmann, U. et al. A smooth particle mesh Ewald method. J. Chem. Phys. (1995). doi:10.1063/1.470117

245. Wang, W., Donini, O., Reyes, C. M. & Kollman, P. A. Biomolecular Simulations: Recent Developments in Force Fields, Simulations of Enzyme Catalysis, Protein-Ligand, Protein-Protein, and Protein-Nucleic Acid Noncovalent Interactions. Annu. Rev. Biophys. Biomol. Struct. (2002). doi:10.1146/annurev.biophys.30.1.211

146

246. Kollman, P. A. et al. Calculating structures and free energies of complex molecules: Combining molecular mechanics and continuum models. Acc. Chem. Res. (2000). doi:10.1021/ar000033j

247. Srinivasan, J., Cheatham, T. E., Cieplak, P., Kollman, P. A. & Case, D. A. Continuum solvent studies of the stability of DNA, RNA, and phosphoramidate- DNA helices. J. Am. Chem. Soc. (1998). doi:10.1021/ja981844+

248. McQuarrie, D. A. (Donald A. Statistical mechanics. (Harper & Row, 1975).

249. Gohlke, H., Kiel, C. & Case, D. A. Insights into protein-protein binding by binding free energy calculation and free energy decomposition for the Ras-Raf and Ras- RalGDS complexes. J. Mol. Biol. (2003). doi:10.1016/S0022-2836(03)00610-7

250. Fersht, A. R. Basis of biological specificity. Trends Biochem. Sci. (1984). doi:10.1016/0968-0004(84)90122-1

251. Fersht, A. R. et al. Hydrogen bonding and biological specificity analysed by protein engineering. Nature (1985). doi:10.1038/314235a0

252. Street, I. P., Armstrong, C. R. & Withers, S. G. Hydrogen Bonding and Specificity. Fluorodeoxy Sugars as Probes of Hydrogen Bonding in the Glycogen Phosphorylase-Glucose Complex. Biochemistry (1986). doi:10.1021/bi00368a028

253. Ewing, T. J. A. & Kuntz, I. D. Critical evaluation of search algorithms for automated molecular docking and database screening. Journal of Computational Chemistry (1997). doi:10.1002/(SICI)1096- 987X(19970715)18:9<1175::AID-JCC6>3.0.CO;2-O

254. Böhm, H. J. The computer program LUDI: A new method for the de novo design of enzyme inhibitors. J. Comput. Aided. Mol. Des. (1992). doi:10.1007/BF00124387

255. Böhm, H. J. LUDI: rule-based automatic design of new substituents for enzyme inhibitor leads. J. Comput. Aided. Mol. Des. (1992). doi:10.1007/BF00126217

256. Rarey, M., Kramer, B., Lengauer, T. & Klebe, G. A fast flexible docking method using an incremental construction algorithm. J. Mol. Biol. (1996). doi:10.1006/jmbi.1996.0477

147

257. Mizutani, M. Y., Tomioka, N. & Itai, A. Rational automatic search method for stable docking models of protein and ligand. J. Mol. Biol. (1994). doi:10.1006/jmbi.1994.1656

258. Miller, M. D., Kearsley, S. K., Underwood, D. J. & Sheridan, R. P. FLOG: A system to select ‘quasi-flexible’ ligands complementary to a receptor of known three-dimensional structure. J. Comput. Aided. Mol. Des. (1994). doi:10.1007/BF00119865

259. Venkatachalam, C. M., Jiang, X., Oldfield, T. & Waldman, M. LigandFit: A novel method for the shape-directed rapid docking of ligands to protein active sites. J. Mol. Graph. Model. (2003). doi:10.1016/S1093-3263(02)00164-X

260. Trosset, J. Y. & Scheraga, H. A. PRODOCK: Software package for protein modeling and docking. J. Comput. Chem. (1999). doi:10.1002/(SICI)1096- 987X(199903)20:4<412::AID-JCC3>3.0.CO;2-N

261. Liu, M. & Wang, S. MCDOCK: A Monte Carlo simulation approach to the molecular docking problem. J. Comput. Aided. Mol. Des. (1999). doi:10.1023/A:1008005918983

262. Genetic algorithms in search, optimization, and machine learning. Choice Rev. Online (2013). doi:10.5860/choice.27-0936

263. Garrett M. Morris, David S. Goodsell, Michael E. Pique, William “Lindy” Lindstrom, Ruth Huey, Stefano Forli, William E. Hart, Scott Halliday, R. B. and A. J. O. AutoDock Version 4.2. Citeseer (2012).

264. Jones, G., Willett, P., Glen, R. C., Leach, A. R. & Taylor, R. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. (1997). doi:10.1006/jmbi.1996.0897

265. Verdonk, M. L., Cole, J. C., Hartshorn, M. J., Murray, C. W. & Taylor, R. D. Improved protein-ligand docking using GOLD. Proteins Struct. Funct. Genet. (2003). doi:10.1002/prot.10465

266. Morris, G. M. et al. Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J. Comput. Chem. (1998). doi:10.1002/(SICI)1096-987X(19981115)19:14<1639::AID-JCC10>3.0.CO;2-B

148

267. Eldridge, M. D., Murray, C. W., Auton, T. R., Paolini, G. V. & Mee, R. P. Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. J. Comput. Aided. Mol. Des. (1997). doi:10.1023/A:1007996124545

268. Wang, R., Lai, L. & Wang, S. Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J. Comput. Aided. Mol. Des. (2002). doi:10.1023/A:1016357811882

269. Gilliland, G. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).

270. Muegge, I. Effect of Ligand Volume Correction on PMF Scoring. J. Comput. Chem. (2001). doi:10.1002/1096-987X(200103)22:4<418::AID- JCC1012>3.0.CO;2-3

271. Muegge, I. & Martin, Y. C. A general and fast scoring function for protein-ligand interactions: A simplified potential approach. J. Med. Chem. (1999). doi:10.1021/jm980536j

272. Gohlke, H., Hendlich, M. & Klebe, G. Knowledge-based scoring function to predict protein-ligand interactions. J. Mol. Biol. (2000). doi:10.1006/jmbi.1999.3371

273. Velec, H. F. G., Gohlke, H. & Klebe, G. DrugScoreCSD-knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction. J. Med. Chem. (2005). doi:10.1021/jm050436v

274. Clark, R. D., Strizhev, A., Leonard, J. M., Blake, J. F. & Matthew, J. B. Consensus scoring for ligand/protein interactions. in Journal of Molecular Graphics and Modelling (2002). doi:10.1016/S1093-3263(01)00125-5

275. Charifson, P. S., Corkery, J. J., Murcko, M. A. & Walters, W. P. Consensus scoring: A method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. J. Med. Chem. (1999). doi:10.1021/jm990352k

276. Oldfield, T. J. X-ligand: An application for the automated addition of flexible

149

ligands into electron density. Acta Crystallogr. Sect. D Biol. Crystallogr. (2001). doi:10.1107/S0907444901003894

277. Oldfield, T. J. A number of real-space torsion-angle refinement techniques for proteins, nucleic acids, ligands and solvent. Acta Crystallogr. Sect. D Biol. Crystallogr. (2001). doi:10.1107/S0907444900014098

278. Lifson, S. & Warshel, A. Consistent Force Field for Calculations of Conformations, Vibrational Spectra, and Enthalpies of Cycloalkane and n ‐ Alkane Molecules . J. Chem. Phys. (2004). doi:10.1063/1.1670007

279. Mayo, S. L., Olafson, B. D. & Goddard, W. A. DREIDING: A generic force field for molecular simulations. J. Phys. Chem. (1990). doi:10.1021/j100389a010

280. Luty, B. A. et al. A molecular mechanics/grid method for evaluation of ligand– receptor interactions. J. Comput. Chem. (1995). doi:10.1002/jcc.540160409

281. Korb, O., Stützle, T. & Exner, T. E. Empirical scoring functions for advanced Protein-Ligand docking with PLANTS. J. Chem. Inf. Model. (2009). doi:10.1021/ci800298z

282. Krammer, A., Kirchhoff, P. D., Jiang, X., Venkatachalam, C. M. & Waldman, M. LigScore: A novel scoring function for predicting binding affinities. J. Mol. Graph. Model. (2005). doi:10.1016/j.jmgm.2004.11.007

283. Rogers, D. & Hopfinger, A. J. Application of Genetic Function Approximation to Quantitative Structure-Activity Relationships and Quantitative Structure- Property Relationships. J. Chem. Inf. Comput. Sci. (1994). doi:10.1021/ci00020a020

284. Levitt, M. Protein folding by restrained energy minimization and molecular dynamics. J. Mol. Biol. (1983). doi:10.1016/S0022-2836(83)80129-6

285. Wermuth, C. G. Pharmacophores: Historical Perspective and Viewpoint from a Medicinal Chemist. in Pharmacophores and Pharmacophore Searches (2006). doi:10.1002/3527609164.ch1

286. Arnold, K., Bordoli, L., Kopp, J. & Schwede, T. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics 22, 195–201 (2006).

150

287. Pande, V. S. et al. Mechanistic and structural insight into the functional dichotomy between interleukin-2 and interleukin-15. Nat. Immunol. 13, 1187– 1195 (2012).

288. Walter, T. S. et al. Lysine methylation as a routine rescue strategy for protein crystallization. Structure 14, 1617–1622 (2006).

289. Schrödinger, LLC. The {PyMOL} Molecular Graphics System, Version~1.8. (2015).

290. Jacobson, M. P., Friesner, R. A., Xiang, Z. & Honig, B. On the Role of the Crystal Environment in Determining Protein Side-chain Conformations. J. Mol. Biol. 320, 597–608 (2002).

291. Jacobson, M. P. et al. A hierarchical approach to all-atom protein loop prediction. Proteins Struct. Funct. Bioinforma. 55, 351–367 (2004).

292. Olsson, M. H. M., Søndergaard, C. R., Rostkowski, M. & Jensen, J. H. PROPKA3: Consistent Treatment of Internal and Surface Residues in Empirical

p K a Predictions. J. Chem. Theory Comput. 7, 525–537 (2011).

293. Søndergaard, C. R., Olsson, M. H. M., Rostkowski, M. & Jensen, J. H. Improved Treatment of Ligands and Coupling Effects in Empirical Calculation and

Rationalization of p K a Values. J. Chem. Theory Comput. 7, 2284–2295 (2011).

294. Wang, X., Rickert, M. & Garcia, K. C. Structure of the Quaternary Complex of Interleukin-2 with Its , , and c Receptors. Science (80-. ). 310, 1159–1163 (2005).

295. Chirifu, M. et al. Crystal structure of the IL-15--IL-15R$α$ complex, a cytokine- receptor unit presented in trans. Nat. Immunol. 8, 1001 (2007).

296. Stauber, D. J., Debler, E. W., Horton, P. a, Smith, K. a & Wilson, I. a. Crystal structure of the IL-2 signaling complex: paradigm for a heterotrimeric cytokine receptor. Proc. Natl. Acad. Sci. U. S. A. 103, 2788–2793 (2006).

297. Wang, X., Rickert, M. & Garcia, K. C. Structure of the Quaternary Complex of Interleukin-2 with Its a , b , and g c Receptors. Science (80-. ). 310, 1159–1163 (2005).

151

298. Jo, S., Kim, T., Iyer, V. G. & Im, W. CHARMM-GUI: A web-based graphical user interface for CHARMM. J. Comput. Chem. 29, 1859–1865 (2008).

299. Brooks, B. R. et al. CHARMM: The biomolecular simulation program. J. Comput. Chem. 30, 1545–1614 (2009).

300. Huang, J. & MacKerell, A. D. J. CHARMM36 all-atom additive protein force field: validation based on comparison to NMR data. J. Comput. Chem. 34, 2135–2145 (2013).

301. Humphrey, W., Dalke, A. & Schulten, K. VMD: Visual molecular dynamics. J. Mol. Graph. 14, 33–38 (1996).

302. Welch, B. B., Jones, K. & Hobbs, J. Practical Programming in Tcl/Tk. (Prentice Hall Professional, 2003).

303. Case, D. A. et al. AMBER 2018: San Francisco. (2018).

304. Chen, J., Sawyer, N. & Regan, L. Protein–protein interactions: General trends in the relationship between binding affinity and interfacial buried surface area. Protein Sci. 22, 510–515 (2013).

305. Quéméner, A. et al. Discovery of a Small-Molecule Inhibitor of Interleukin 15: Pharmacophore-Based Virtual Screening and Hit Optimization. J. Med. Chem. 60, 6249–6272 (2017).

306. Studio, D. 3.5, Accelrys Software Inc. San Diego (2013).

307. Visualizer, D. S. Release 3.5. Accelrys Inc, San Diego, CA, USA (2012).

308. Gehlhaar, D. K. et al. Molecular recognition of the inhibitor AG-1343 by HIV-1 protease: conformationally flexible docking by evolutionary programming. Chem. Biol. 2, 317–324 (1995).

309. Jain, A. N. Scoring functions for protein-ligand docking. Curr. Protein Pept. Sci. 7, 407–420 (2006).

152

Titre : Nouvelles données structurales sur l’Interleukine-15 par simulations de Dynamique Moléculaire : vers la mise au point rationnelle d’inhibiteurs spécifiques

Mots clés : Interleukine-15 (IL-15), Interactions Protéine-Protéine (PPI), Dynamique Moléculaire (MD), Filtrage par Pharmacophore, Amarrage Moléculaire, Criblage Virtuel

Résumé: L’Interleukine-15 (IL-15) est une Dans le cadre de ce travail, par des approches cytokine impliquée dans un grand nombre de de Modélisation Moléculaire, en particulier de fonctions cellulaires. Elle participe ainsi Dynamique Moléculaire (MD), nous avons: (i) notamment au développement et à l’activation de déterminé l’influence de la forme complexée de la réponse immunitaire. L’IL-15 est donc apparue l’IL-15 (dimère, trimère ou tétramère) sur les comme une cible potentielle pour différentes propriétés des interfaces (ii) mis en évidence les applications thérapeutiques. La structure de cette acides aminés « clés » des différentes cytokine est basée sur un complexe quaternaire interfaces (iii) étudié l’impact de mutations de entre IL-15 et ces récepteurs  (IL-15R),  (IL- certains de ces acides aminés (iv) utilisé ces 2R) et  (c). La modulation fonctionnelle d’IL-15 informations pour mettre au point un est liée à son interaction avec ces récepteurs, pharmacophore ayant permis, dans un second notamment avec IL-2R L’interleukine-2 (IL-2) temps, la découverte de nouveaux composés de partageant deux de ses trois récepteurs (IL-2R faible poids moléculaire capables de cibler spécifiquement une des interfaces (IL-15/IL- et c) avec l’IL-15, la recherche d’inhibiteurs spécifiques de l’IL-15 doit intégrer ces 2R). L’ensemble des données issues de ce caractéristiques. travail a été confronté à des résultats biologiques obtenus dans le cadre du projet.

Title: Structural insights on Interleukin-15 through Molecular Dynamics simulations: Towards the rational design of specific inhibitors

Keywords: Interleukin-15 (IL-15), Protein-Protein Interactions (PPI), Molecular Dynamics (MD), Pharmacophore Filtering, Molecular Docking, Virtual Screening

Abstract: Interleukin 15 (IL-15) is a cytokine In this work, through various Molecular Modeling involved in a plethora of different cellular approaches, specifically Molecular Dynamics functions. It participates, for instance, in the (MD) simulations, we have (i) determined the development and activation of immune influence of the complexed form of IL-15 (dimer, responses. IL-15 has, therefore, clearly trimer or tetramer) on the interface properties (ii) appeared as a potential target for several highlighted the key amino acid (“hot spots”) of therapeutic applications. The structure of this the various interfaces (iii) studied the impact of cytokine is based on a quaternary complex mutations of selected residues (iv) used this between IL-15 and its  (IL-15R),  (IL-2R) information to design a pharmacophore which and  (c) receptors. The key to the functional has allowed, in a subsequent step, the discovery modulation of IL-15 lies on its interaction with its of new low-molecular weight compounds able to receptors and, more particularly, with IL-2Rβ. specifically target one of the IL-15 interfaces (IL- Interleukin-2 sharing two out of the three 15/IL-2R). The theoretical data have been receptors (IL-2R and c), the search for specific compared to the results of biological IL-15 inhibitors has to take into account these experiments carried out in the framework of the features. project.