1
Molecular Mechanics/Coarse-grain simulations as a structural prediction tool
for GPCRs/ligand complexes
Francesco Musiani 1, Alejandro Giorgetti 2,3,4,* , Paolo Carloni 3,4,*
1 Scuola Internazionale Superiore di Studi Avanzati (SISSA/ISAS), Trieste, Italy.
2 Department of Biotechnology, University of Verona, Ca’ Vignal 1, Verona, Italy.
3 Computational Biophysics, German Research School for Simulation Sciences, Jülich,
Germany.
4 Institute for Advanced Simulation, Forschungszentrum Jülich, Jülich, Germany.
* To whom correspondence should be addressed: Alejandro Giorgetti, Department of
Biotechnology, University of Verona, Ca’ Vignal 1, Strada le Grazie 15, I-37134 Verona, Italy.
Phone: +39 045 8027905, Fax: +39 045 8027929, Mail: [email protected]. Paolo
Carloni, Computational Biophysics, German Research School for Simulation Sciences, Wilhelm-
Johnen-Straße, D-52425 Jülich, Germany, and Institute for Advanced Simulation,
Forschungszentrum Jülich, Wilhelm-Johnen-Straße, D-52425 Jülich, Germany; Phone: +49 2461
618941; Fax: +49 2461 614823; Mail: [email protected].
2
Abstract
G-protein coupled receptors (GPCRs) are the most common family of transmembrane receptors in humans. Bioinformatics-based approaches have provided accurate structural predictions of these proteins in complex with their agonist/antagonists when reliable template could be identified. Unfortunately, the average sequence identity across GPCR’s is in the majority of cases below 20%. In these cases, target selection and alignment required for homology modelling is nontrivial, and subsequent standard docking procedures may suffer from severe limitations. A hybrid “Molecular Mechanics/Coarse-Grained” (MM/CG) scheme, developed by some of us and reviewed here, has been shown to improve the quality of structural predictions in few cases and holds promises for high-throughput investigations of GPCR/ligand complexes which do not possess a highly reliable structural template.
1. Introduction
G protein-coupled receptors (GPCRs) form the largest membrane-bound receptor family expressed by humans (encompassing ca. 4% of the protein-coding genome) (Schoneberg et al.
2004). They are of paramount importance for pharmaceutical intervention (ca. 40% of currently marketed drugs target GPCRs) (Overington, Al-Lazikani, and Hopkins 2006). GPCRs are located in the plasma membrane and transduce signals through their interactions with both extracellular ligands (or light in the case of rhodopsin) and intracellular heterotrimeric guanine nucleotide-binding proteins (G proteins) to initiate signalling cascades that allow cells to react to changes within their environment (Audet and Bouvier 2012). The resulting response regulates a broad range of cellular processes engaged in the control of cell proliferation, differentiation, motility, as well as apoptosis. Chemicals and light sensing also rely on GPCRs signalling. These
3 proteins are also involved in a plethora of inflammatory diseases (Sun and Ye 2012), cardiovascular diseases, neurological disorders and cancer (Dorsam and Gutkind 2007).
2. Structural determinants of GPCRs
GPCRs share a common scaffold comprising an extracellular N-terminal loop (N-term), followed by seven trans-membrane (TM) α-helices (TM1 to TM7) connected by intracellular
(IL), extracellular (EL) loops, and an intracellular C-terminal loop (C-term) (Fig. 1A)
(Venkatakrishnan et al. 2013). GPCRs’ tertiary structure resembles a barrel, with the seven transmembrane helices forming a cavity within the plasma membrane that serves as ligand- binding domain, often covered by EL2. In several cases they can exist as homo- or hetero-dimers or higher-order oligomers during their life cycle in vivo (Gurevich and Gurevich 2008).
Currently, the PDB reports 23 unique experimental structures (as of October 2013), of which 18 from homo sapiens (as reported in the http://blanco.biomol.uci.edu/mpstruc web site)
(Rosenbaum, Rasmussen, and Kobilka 2009; Topiol and Sabio 2009; Sprang 2011;
Venkatakrishnan et al. 2013). Twenty of them belong to the rhodopsin family (Fig. 1B), one to the frizzled/taste2 family (Wang et al. 2013) and two to the secretin family (Hollenstein et al.
2013; Siu et al. 2013) (Fig. 1B). A great effort is presently being carried out in order to extend our structural knowledge on this receptor superfamily [i.e. the GPCR Network (Stevens et al.
2013)].
--- Place figure 1 here ---
Extensive molecular dynamics simulations have been often used to gain insights into the
4 dynamical properties of GPCRs based on the crystal structures available so far [see refs. (Vanni et al. 2009, 2010, 2011; Dror et al. 2011) for some examples on the adrenergic receptors and refs.
(Scarabelli et al. 2013; Provasi, Johnston, and Filizola 2010; Provasi, Bortolato, and Filizola
2009; Yuan, Vogel, and Filipek 2013) for the opioid receptors]. In cases for which the crystal structure is not available, homology modelling techniques are the method of choice for GPCRs’ structural characterization.
3. Predictions based on templates from the rhodopsin subfamily
Several excellent bioinformatics studies have elucidated structure/functions relationships of members of the rhodopsin subfamily (Petrel et al. 2004; de Graaf and Rognan 2009;
Bhattacharya et al. 2010) (Niv et al. 2006; Niv and Filizola 2008; Ivanov, Barak, and Jacobson
2009; Kufareva et al. 2011)1.
The GPCR Dock assessments (Michino et al. 2009; Kufareva et al. 2011) are community- wide, blind structural predictions of agonist/antagonists in complex with GPCRs (so far human proteins members of the rhodopsin subfamily). The predictions are then compared with the X-ray structures, that was released after the assessment. In the first edition, GPCR Dock 2008 (Michino et al. 2009), twenty-nine groups predicted the structure of the human A 2A adenosine receptor bound to the ligand ZM241385. Precise modelling of the extracellular loops, together with the location of the disulphide bond and an accurate alignment of the TM regions have turned out to be crucial ingredients of an accurate prediction. In the last reported competition, GPCR Dock
2010 (Kufareva et al. 2011) thirty-five groups predicted the structure of two GPCRs. The first is dopamine D3 receptor in complex with eticlopride antagonist. Its structural determinants were
1 Because of the large number of studies, this Section cannot be exhaustive and only some studies will be reported.
5 predicted fairly well using the adrenergic receptors as templates (SI ca. 40%). The second target was CXCR4 in complex with isothiourea IT1t antagonists and CVX15 cyclic peptide antagonist.
The available structural templates are distant homologues and not unexpectedly the accuracy of the prediction was less satisfactory. This shows that, indeed, in the absence of a suitable template, GPCRs modelling still remains very challenging. It will be highly interesting to see if the degree of accuracy in the prediction will increase significantly in the last competition, run this year (http://gpcr.scripps.edu/GPCRDock2013).
The power of state-of-the-art structural predictions is shown, for instance, by a recent study of Gutierrez de Teràn and co-workers (Rodriguez, Pineiro, and Gutierrez-de-Teran 2011).
Comparison with experiments shows that their prediction of the A 2B adenosine receptor’s binding cavity is very accurate. The sequence identity (SI) between the target and the templates
(A2A adenosine receptor) was around 60%. This approach was recently implemented in the
GPCR-ModSim web server (Gutierrez-de-Teran, Bello, and Rodriguez 2013). The same procedure was also used for the successful structural prediction of the human neuropeptide receptor Y2 (Fallmar et al. 2011). In another relevant example, Carlsson and collaborators have docked over 3.3 million molecules against a homology model of the dopamine D3 receptor, before the crystal structure was solved (Carlsson et al. 2011). They have then experimentally tested the 26 molecules with the highest ranking. One of these novel ligands was therefore optimized and followed as a potential drug candidate. This shows that predictions may be reliable for drug design if based on a template of the same subfamily (Carlsson et al. 2011).
4. Predictions based on other templates
The SI between most GPCRs and their best templates for homology modelling is lower
6 than 20% (Rayan 2010). These include all of olfactory and taste receptors (overall, more than
400 receptors). Side chains orientations, including those in the binding site, are surely poorly predicted (Eswar et al. 2007). This problem, along with difficulties associated with target selection and alignment required for homology modelling, as well limitations of docking procedures 1, calls for experimental validation (Khafizov et al. 2007; Biarnes et al. 2010; Carlsson et al. 2011; Levit et al. 2012; Mobarec, Sanchez, and Filizola 2009; Yarnitzky, Levit, and Niv
2010; Brockhoff et al. 2010; Slack et al. 2010; Marchiori et al. 2013)2 and/or molecular simulation-based structural refinement. The next Section focuses on our effort to address this issue.
5. A simulation approach to structural predictions of targets with low SI with their templates
As discussed in the previous section, we basically we do not know where side chains are located when the SI between template and target is about 20% or lower (Tramontano et al.
2008)]. Hence, it might actually be better not to include them at all in the model rather than including them in wrong orientations. Keeping this in mind, we have developed a computational tool aimed to improve the structural prediction quality of GPCRs/ligand complexes. This is a hybrid “Molecular Mechanics/Coarse-Grained” (MM/CG) scheme. In this approach, different parts of the system are modelled at both different levels of theory, taking care in suitably describing the coupling at the interface (Neri et al. 2005; Neri et al. 2008; Leguebe et al. 2012).
1 Standard and automatic docking procedures on homology modeling with such templates, such as those used in refs. (Garcia-Perez et al. 2011; Kothandan, Gadhe, and Cho 2012), may suffer from severe limitations. These include and neglecting the presence of explicit solvent (Camacho 2005). This is particularly important for GPCRs, as water molecules can be found in the binding site of these receptors and they may be crucial to stabilize the ligand (Angel, Chance, and Palczewski 2009; Nygaard et al. 2010). 2 One may identify residues that are important for ligand binding and validate the predictions by agonist/antagonist binding essays on target GPCR's mutants (Costanzi 2013; Marchiori et al. 2013).
7
In other words, the GPCR’s ligand, the binding site and the water molecules around it are treated using an atomistic force field, whilst the protein frame is described at CG level using a Go-like model (Go and Abe 1981) (Fig. 2). This model includes only the Cα atoms of the protein. This method is much cheaper than full-atom MD simulations (Leguebe et al. 2012).
--- Place figure 2 here ---
Theory of the MM/CG method. The potential energy function in the MM/CG scheme reads:
(Eq. 1) = + + / + + / where EMM , EI and ECG are the potential energy of the all atom (MM) region, the interface (I) and the coarse grained (CG) region, respectively. EI/MM and ECG/I describe the interaction energy between the interface and the MM region and that between the interface and the CG region, respectively. EMM , EI and EI/MM have the same form from the GROMOS96 force field (Scott et al.
1999), whereas ECG and ECG/I take the form of the Go-like model. ECG/I ensures the integrity of the protein backbone. This term includes the bonded interactions between the CG atoms and the
Cα atoms in the interface, as well as the non-bonded interactions between CG atoms and the Cα,
Cβ atoms in the interface. The Go-like model used in the MM/CG scheme to describe the CG region whose energy. The term reads: