ANALYSIS doi:10.1038/nature14663

Universal allosteric mechanism for Ga activation by GPCRs Tilman Flock1, Charles N. J. Ravarani1*, Dawei Sun2,3*, A. J. Venkatakrishnan1{, Melis Kayikci1, Christopher G. Tate1, Dmitry B. Veprintsev2,3 & M. Madan Babu1

G -coupled receptors (GPCRs) allosterically activate heterotrimeric G and trigger GDP release. Given that there are 800 human GPCRs and 16 different Ga , this raises the question of whether a universal allosteric mechanism governs Ga activation. Here we show that different GPCRs interact with and activate Ga proteins through a highly conserved mechanism. Comparison of Ga with the small Ras reveals how the evolution of short segments that undergo disorder-to-order transitions can decouple regions important for allosteric activation from receptor binding specificity. This might explain how the GPCR–Ga system diversified rapidly, while conserving the allosteric activation mechanism.

proteins bind guanine nucleotides and act as molecular switches almost 30 A˚ away from the GDP binding region5 and allosterically trig- in a number of signalling pathways by interconverting between ger GDP release to activate them. 1,2 G a GDP-bound inactive and a GTP-bound active state . They The high-resolution structure of the Gas-bound b2-adrenergic recep- 3 5 consist of two major classes: monomeric small G proteins and hetero- tor (b2AR) provided crucial insights into the receptor–G protein inter- trimeric G proteins4. While small G proteins and the a-subunit (Ga)of face and conformational changes in Ga upon receptor binding6,7. Recent 6 8 heterotrimeric G proteins both contain a GTPase domain (G-domain), studies described dynamic regions in Gas and Gai , the importance of ˚ Ga contains an additional helical domain (H-domain) and also forms a displacement of helix 5 (H5) of Gas and Gat by up to 6 A into the complex with the Gb and Gc subunits. Although they undergo a similar receptor5, the extent of helical domain opening during GDP release9,10, 7,10 signalling cycle (Fig. 1), their activation differs in one important aspect. and identified residues that contribute to Gai activation . These stud- The guanine nucleotide exchange factors (GEFs) of small G proteins are ies focused on single, specific Ga proteins; however, in humans there are largely cytosolic proteins, whereas the GEFs of Ga proteins are usually 16 different Ga genes, with at least 21 isoforms4 that can be grouped into membrane-bound GPCRs. While GEFs of small G proteins interact four functional subfamilies (Gas,Gai,Gaq,Ga12), which each regulate 11 directly with the GDP binding region1,3, GPCRs bind to Ga at a site different signalling pathways . Although they belong to the same protein fold, they have diverged significantly in their sequence such that each Ga protein can be specifically activated by one or several of the ,800 human Pi 4 GAP-bound state Inactive state GPCRs . Thus, a fundamental question is whether there is a universal mechanism of allosteric activation that is conserved across all Ga protein GAP α GTPase domain γ β α 10 (G-domain) types . Allosteric communication in proteins is mediated through con-

α-helical domain formational changes, which are facilitated by the re-organization of non- 40 structures (H-domain) 11 structures covalent contacts between residues. Thus, studying these contacts can GDP provide detailed insights into the mechanism of allostery12–16.Onthebasis of a comprehensive analysis, here we propose that GPCRs interact and Active state GEF-bound state Extracellular activate Ga subunits through a conserved mechanism. We describe molecular details of the key structural transitions and pinpoint residues Effector GPCR α GPCR that constitute the ‘common core’ of Ga activation.

Intracellular γ β α Common Ga numbering and residue contact networks 25 structures α G We created a structural and sequence alignment of 80 Ga structures from diverse organisms and 973 sequences from 66 species that have a

GTP 1 structure GPCR–G protein system (auto-activating plant Ga proteins were not considered; Methods). To enable the comparison of any residue/posi- Figure 1 | Ga signalling states and activation. Heterotrimeric G proteins (1) tion between different Ga proteins, we devised the common Ga num- release GDP upon binding to a guanine nucleotide exchange factor (GEF), bering (CGN) system (Fig. 2a). The CGN provides an ‘address’ for every which are G-protein-coupled receptors, (2) bind GTP and recruit downstream effectors, and (3) hydrolyse GTP, promoted by a GTPase activating protein residue in the DSP format, referring to: (1) the domain (D); (2) the (GAP), leading to (4) the inactive, GDP-bound state. Structures of the Ga consensus secondary structure (S); and (3) the position (P) within the subunit (blue) bound to GDP ( (PDB) accession 1got; secondary structure element. For instance, phenylalanine 336 in Gai1 is G.H5.8 inactive state; top) and bound to the b2-AR (grey) (PDB 3sn6; active state; denoted as Phe336 as it is the eighth residue within the consensus bottom) are shown. helix H5 of the G-domain. The corresponding position in Gas2 is

1MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK. 2Laboratory of Biomolecular Research, Paul Scherrer Institut, 5232 Villigen, Switzerland. 3Department of Biology, ETH Zurich, 8039 Zurich, Switzerland. {Present address: Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, California 94305, USA. *These authors contributed equally to this work.

00 MONTH 2015 | VOL 000 | NATURE | 1

G2015 Macmillan Publishers Limited. All rights reserved RESEARCH ANALYSIS

a α Secondary structure element (consensus) (e.g., H5) G s orthologue alignment GNAS2 α G G s2-chimp HN GNAL s Gα -rat ...... s2 H2 GNAI2 omain S elementosition S1 GNAI1 Human Gα alignment Gα -cow D S P S5 s2

S3 GNAI3 Gα ... S4 s2

S6 H3 H4 e.g. S2 GNAT2 Gα H5 G i2 GNAT1 i

... α 973 sequences G s2: G-domain (G) GNAT3 H1 + f α GNAO 66 organisms G i2: GNAZ 16 orthologue alignments f ... HG GNAQ HF α α G 13 G 13: omain (e.g., G) GNA11 Gq 16 human Gα genes D HA GNA14 4 Gα subfamilies Common Gα

HD GNA15 HE GNA12 G numbering GNA13 12 Position in the consensus HC

H-domain (H) HB secondary structure element (SSE) mapped www.mrc-lmb.cam.ac.uk/CGN to the human reference alignment

b PDB-11: 1zcb (Gα13) Residue contact network α G s2 for each of the 11 structures Gα PDB-2: 1gp2 (Gαi1) i2 PDB-1: 1got (Gαt1) PDB-11: 1got (Gα ) t1 Identify conserved ... Phe332 residues in all Gα s,i,q,13 Val Phe (561 complete sequences)

Gα Ala 13 e.g. G.H1.12 e.g. G.H5.8 Inactive Ile 72 highly conserved residues ... His G.H5.8 G.H5.8 11 structures, G 11 His53

PDB-1: 3sn6 (Gαs2) Xaa Phe Xaa Phe s 1,246 contacts - 326 residues Use CGN to establish topological equivalence ... Xaa Xaa α Ile Ile PDB-1: 1zcb (G 13) Phe359 His His Glu Phe GEF-bound one structure, G G.H1.12 G.H1.12 Pro State consensus State consensus network between PDB-25: 4gnk (Gαq) Ile residue contact network universially conserved residues His 260 long-range contacts (535 total) 43 contacts PDB-2: 1gia (Gαi1) between 188 residues (289 total) 44 residues PDB-1: 1azt ((Gαs2) His72

s,i,q 1,292 contacts - 318 residues Active

25 structures, G ...

PDB-3: 3cx7 (Gα13)

PDB-2: 4ekc (Gαq) PDB-1: 2gtp (Gαi1) i,q,13

α GAP-bound For example, residue interaction networks of 11 G For example, inactive state consensus network For example, inactive state consensus network

40 structures, G ... structures of the inactive state between residues conserved across all Gα subfamilies

>100,000 (2,436 unique) residue contacts 457 unique long-range residue contacts 79 unique long-range residue contacts 80 structures between 386 positions between 247 shared positions between 70 universally conserved residues

Figure 2 | The common Ga numbering (CGN) system and Ga conserved belonging to all the 4 subfamilies from 66 species allowed the identification of contact networks. a, Every position in a Ga is denoted by its domain (D), equivalent residues. An online webserver allows mapping of any Ga sequence the consensus secondary structure element where it is present (S), and position or structure to the CGN system (http://www.mrc-lmb.cam.ac.uk/CGN). (P) within the consensus SSE. Names of SSEs are shown in the cartoon; loops b, Computation of consensus residue contact networks between conserved are named with lowercase letters of the flanking SSEs (for example, h1ha; residues for different Ga signalling states. See Methods, Supplementary see Extended Data Fig. 1 for all SSEs). An alignment of all 973 Ga proteins Note and Supplementary Data.

Phe376G.H5.8. Loops are labelled in lowercase letters of their flanking the Supplementary Note and Supplementary Information. Below, we secondary structure elements (SSE); for example, s6h5 refers to the loop describe the major findings pertaining to Ga activation. We first focus connecting strand S6 with helix H5 (see Extended Data Fig. 1, Methods on the GPCR–Ga interface and then describe the molecular details of and Supplementary Note). A CGN mapping webserver is available at how the non-covalent contacts are re-organized and propagated to the http://www.mrc-lmb.cam.ac.uk/CGN. GDP binding pocket, leading to GDP release. The Ga structures were assigned to the four major signalling states (Fig. 1) and the non-covalent contacts between residues were calculated GPCR–Ga protein interface for each structure. We computed the consensus non-covalent residue Analysis of the buried surface area (BSA) and residue contacts between 5 ˚ 2 contacts from all Ga structures of the same signalling state. Using the the b2AR–Gas interface shows that H5 contributes ,70% (845 A ; CGN, we integrated information on evolutionary conservation for every 15 residues) of the total BSA. Other SSEs (s2s3, h4hg, H4, h4s6, S6) position and derived the consensus contacts mediated by universally cover ,20% (289 A˚ 2; 14 residues), and the amino-terminal membrane- conserved residues for each signalling state (Fig. 2b). We find that each anchored helix HN and its loop with strand S1 contribute ,10% step of the signalling cycle undergoes contact re-organization to variable (120 A˚ 2; 5 residues) of the total BSA (Fig. 3a). H5 is the key interface extents. Since these conformational changes involve conserved residues, element that contacts residues in transmembrane helices (TM3, TM5 5 the observed contact re-organization is likely to be universal for all Ga and TM6) and intracellular loops 2 and 3 (ICL2 and ICL3) of the b2AR . proteins. A description and additional interpretations are provided in Contacts from the other Ga regions are mainly restricted to ICL3 of the

2 | NATURE | VOL 000 | 00 MONTH 2015 G2015 Macmillan Publishers Limited. All rights reserved ANALYSIS RESEARCH interface residues a b Highly conserved Total BSA GPCR: 1.0 SSE element 60 2 TM5 1,264 Å HN TM3 H5.8 TM3 H5.13 H5.20

40 H4 H5.25

TM6 H5.15 H5 S3.3

ICL2 20 Conserved 0.8 BSA (%) Others H5.11

0 H5.19 ICL3 ICL2 TM3 TM5ICL3 TM6 0.6 GPCRG TM5 HN TM6 HN H5 TM3

ICL3 Specificity conferring ICL2 S3.1 interface residues S1.2 H5.12 H4 H5.16 hns1.3 0.4 H5.23

H4 H5.17

H5 Sequence identity HNHN Gα H4.26 H5.26 0.2 Total BSA Gα: 60 1,255.4 Å2

40 h4s6.3

Variable 0 20 BSA (%) ~ 70% GPCR interface 0 0 0.2 0.4 0.6 0.8 1.0 G protein interface HN Other H4 H5 ExposedRelative BSA (%) Buried c

Inactive state GPCR−bound state 1 2 Inter 3 Intra 4

G.H5.1 2 4 6 8 10 12 14 16 18 20 22 24 26G.H5.1 2 4 6 8 10 12 14 16 18 20 22 24 26 5 Number of contacts Transmission module Interface module Transmission module Interface module Position on helix H5 Position on helix H5

TM5 TM6 TM3 ICL3 ICL2 H5 γ β α GPCR S3 H5 γ β α S2 H5 H5 S3 H1 S2

Disorder-to-order transition H1

Figure 3 | Helix 5 contains the conserved interface region and is comprised normalized BSA highlights the conserved and variable interface residues. of two modules. a, Inter Ga–GPCR residue contact network (inset) and buried c, Consensus contact rewiring between the inactive and the GPCR-bound state surface area (BSA) analysis of the heterotrimeric Gas–b2AR structure (3sn6). by H5 residues. Positions mediating intra Ga protein contacts (blue) and The line width between the nodes (SSE) denotes the number of consensus receptor-mediated contacts (red) are shown. The circle size represents the residue contacts. GPCR positions are denoted by extending the Ballesteros– number of contacts. H5 can be divided into transmission and interface module. Weinstein numbering system (note parts of ICL3 become extended TM5 in the The disorder-to-order transition of the H5 C-terminal region upon receptor active state of the receptor). b, Scatter-plot of Ga sequence conservation and binding and SSE contact rewiring are shown.

receptor. An analysis of the contacts of the Gat carboxy-terminal pep- The role of helix H5 in Ga activation tides bound to rhodopsin17–20 shows that conserved residues in H5 tend In the 79 structures of Ga not bound to a GPCR, the C-terminal residues to interact with the corresponding, topologically equivalent residues in of H5 are characterized by missing electron density (Extended Data (Supplementary Data). Fig. 2b). This region undergoes a disorder-to-order transition and 5,6,8,20,23 We mapped evolutionary conservation onto the b2AR–Gas interface extends H5 upon receptor binding as shown for Gas/t/i . and found that H5 is the only interface region that harbours residues Analysis of H5 from 561 full-length Ga homologues suggests that the that are highly conserved across species and Ga protein types (,27% of higher disorder propensity of the last eight residues compared to the rest H5; 7 residues; Fig. 3b). These residues have significantly higher BSA of H5 is a universal feature (Extended Data Fig. 2b). Within this dis- compared to the non-conserved H5 interface residues. Several highly ordered region, hydrophobic positions IleH5.15, LeuH5.20 and LeuH5.25 conserved Ga interface residues interact with conserved interface resi- contact the GPCR, fold into a helix and are conserved between human dues on the GPCR (Extended Data Fig. 2a). Computational energy proteins and the yeast homologue Gpa1 (,1,200 million years (Myr) calculations show that these residues make the highest interface energy ago). This highly conserved peptide motif24 is involved in the disorder- contribution, suggesting that they are important for complex formation to-order transition upon receptor binding and suggests that this struc- (Extended Data Fig. 2a). Thus, the universally conserved residues on H5 tural transition mediated by the three key conserved interface residues is might form the conserved ‘interaction hotspots’ for different Ga pro- likely to be a universal feature of all G proteins. teins to interact with their cognate receptors in a similar binding mode. To understand the effect of GPCR binding on contact re-organization We also found that two-thirds of the H5 residues are variable but half of within Ga, we analysed the consensus contacts in the inactive state (11 these (8 residues) still contact the receptor. Thus, H5 harbours distinct structures) and the active state (b2AR–Gas structure). Although we had sets of interface residues that are either conserved or variable across the only one structure of the receptor–Ga protein complex, identifying the different Ga proteins. The variable positions on H5 together with the differences between the contacts of the active state and the consensus other interface regions (Fig. 3b) could be important for selective coup- contacts of the inactive state allowed us to focus on residues that are ling to different receptors, as shown for individual Ga proteins21,22. This re-organized upon receptor binding (and hence likely to be relevant for suggests that the conserved Ga interface positions provide the basis for a all Ga types; Supplementary Note). In all inactive state structures, the common mode of receptor binding, while the variable positions might N-terminal part of H5 makes extensive contacts with a number of confer selectivity in receptor coupling (Supplementary Note). SSEs within the G-domain of Ga. Two universally conserved positions

00 MONTH 2015 | VOL 000 | NATURE | 3 G2015 Macmillan Publishers Limited. All rights reserved RESEARCH ANALYSIS

(PheH5.8 and ValH5.7) on H5 contact conserved positions in H1, S2 and flexible, the conserved consensus contacts between H1, GDP and the S3 (left lobe), and S5 and S6 (right lobe), respectively (Extended Data H-domain hinge region are lost; this results in the loss of a significant Fig. 3). In the GPCR-bound state, H5 loses 20% of its intra-Ga contacts fraction of all contacts made with GDP, thereby weakening its binding (primarily with H1), and gains 27 intermolecular residue contacts with affinity (Fig. 4b), and results in increasing the likelihood of H-domain the GPCR (Fig. 3c). Upon receptor binding, the H5 contacts with the opening. Since the entire sequence of H1, the contacting residues on H5, right lobe are not lost, but are re-organized to accommodate the struc- and the H-domain hinge positions are highly conserved across all Ga tural changes and might be important for the stability of the receptor- proteins, the mechanism of GDP release is likely to be universal for all bound complex (for discussion, see Supplementary Note and ref. 25). Ga proteins. Thus, H5 is composed of two highly conserved modules with distinct functions; that is, an interface module important for receptor binding, Universal mechanism of Ga activation and a transmission module that harbours intra-G-protein contacts, While variable interface residues in H5 and elsewhere allow specific which are re-organized upon receptor binding (Fig. 3c). binding to distinct GPCRs, we find that H5 primarily harbours con- In contrast to H5, the non-conserved interface regions from H4, h4s6 served positions that might allow a common mode of receptor binding and h4hg undergo less marked re-organization of intra-Ga contacts and a conserved mechanism of allosteric activation. The contact re- upon receptor binding (Extended Data Fig. 3). This suggests that the organization between conserved residues links the disorder-to-order conserved mechanism of allosteric activation is primarily mediated by transition of H5 upon receptor binding to a change in structural stability the movement of H5, thereby breaking the contacts between H5 and H1. of H1, ultimately leading to GDP release. More specifically, H5 is As the residues that form these contacts are conserved in all 16 Ga divided into: (1) The N-terminal transmission module, which forms a proteins, the described contact re-organization is likely to be universal p–p cluster linking H5, S2, S3 and H1 via universally conserved residues H5.8 H1.12 S2.6 S3.3 for all Ga proteins. Phe , His , Phe and Phe in the inactive state; and (2) the C-terminal interface module, which undergoes a disorder-to-order The role of helix H1 in Ga activation transition in the intracellular cavity of the receptor via universally con- In the inactive state, H1 acts as a structural ‘hub’ by linking different served positions. This structural transition results in a displacement of the H5 transmission module, thereby interrupting the p–p cluster. The functional regions of Ga. H1 contacts the N-terminal part of H5 (trans- H5.8 S3.3 mission module), H-domain and GDP through universally conserved re-organized residues in the cluster (Phe and Phe ) contact con- residues (Fig. 4a). In this manner, H1 links the H-domain and the GDP served residues within ICL2 of the receptor (extrapolated Ballesteros– b 5 binding site with the conserved residues in the H5 transmission module, Weinstein numbering: 3.58 and 3.57 of 2AR ), as confirmed recently a 26 p p which in turn is physically linked to the H5 interface module that binds for the CB2 receptor–G i complex . Since the conserved – cluster is the receptor. The conserved consensus contacts between H5 and H1 important for the structural integrity of H1, its disruption leads to an increased flexibility of H1. H1 has a central role in the inactive state by seem essential for the structural integrity of H1. Computational calcula- forming contacts both to GDP (Extended Data Fig. 4) via the Walker A tions of the per-residue contribution to protein stability for the 79 non- motif2,27 and to the H-domain (through a ‘cation–p hinge’ motif; Extended receptor-bound structures are consistent with their role in stabilizing H1 Data Fig. 5). Thus, the increased flexibility due to the partial unfolding of (Extended Data Fig. 4). Upon GPCR binding, the H5–H1 contacts are H1 facilitates GDP release and H-domain opening. The only other con- lost, affecting the structural integrity of H1. The contact-mediating served inter-domain contact is an ‘ionic latch’ between the C-terminal loop positions in H1 have missing electron density in the b AR–Ga structure5 2 of helix HG of the G-domain and the hChD loop of the H-domain. This and hydrogen/deuterium exchange experiments have shown that this contact is broken upon receptor binding, which might be a result of the region is dynamic in Ga upon receptor binding6. Upon becoming s reorganization of the right Ga lobe (Extended Data Fig. 3). In addition to H1, residues around HG and within the s6h5 loop a b GPCR (TCAT motif) contact the GDP (Extended Data Fig. 3). The conserved 15 conserved consensus guanine-contacting TCAT motif preserves many of its contacts within Gα contacts H5 Ga upon receptor binding, although TCAT-to-H1 contacts are lost and H4 H5 H4 S3 new contacts are formed between H5, S5, S6 and the TCAT motif (that s6h5 is, the re-organized right lobe). Likewise, residues that contact the gua- S3 s6h5 S2 s1h1 nosine moiety, including the interaction between the TCAT motif and HG HG HG, s1h1 (P-loop), S5 and S6, are not extensively re-organized during S2 0.4

hfs2 Ga activation. Whether this arrangement poises Ga for GTP binding

HF (which differs from GDP by a single phosphate and whose physiological GDP H1 H1 concentration exceeds GDP several-fold22), and whether GTP has the capacity to stabilize H1 on its own and trigger Ga release from the hfs2 hfs2 h1ha h1ha 0.2 receptor, remains to be addressed. An analysis of the GTP-bound Ga HF HF s5hg H1 reveals that the presence of the third phosphate facilitates additional Inactive state GPCR bound state S5 contacts with the switch regions (Extended Data Fig. 4c). s6h5 a Fraction of total consensus contacts with GDP Conceptually, the GPCR-bound G conformation can be considered GPCR HG Loss of structural 0 as a metastable Ga transition state that is stable only due to interactions integrity of H1 γ β α γ β α with the GPCR. Thus, the lost intra-Ga contacts between H1 and the H1 T A HG transmission module of H5 are compensated by the helix extension of TC regionHingeregion H5, the gained receptor interface contacts, and some re-organized con- tacts in the right lobe of Ga (H5 with S5, S6; see role of conserved Figure 4 | Helix H1 is the key SSE that contacts H5, GDP and the H-domain. Y320G.S6.2 in ref. 25). In this manner, H5 and H1 act as the primary a , Consensus SSE contacts involving H1. The line width between the nodes conduits of information transfer between the receptor interaction inter- (SSE) denotes the number of consensus residue contacts. Upon receptor binding, H5 is displaced and crucial contacts between H1 and H5 (dark blue) face (input) and the GDP binding site (output). The residues of the are lost. This might explain the increased flexibility of H1 in the GPCR-bound structural motifs and functional elements described here are conserved state, which results in the loss of GDP contacts (green) and the H-domain in all the different families of Ga proteins (Extended Data Fig. 5a). Thus, hinge region (formed by H1, h1ha and HF; light blue). b, The extent of GDP the mechanism described here is likely to be universal for activation of consensus contacts mediated by the different SSEs. See Extended Data Fig. 4. all cognate GPCR and Ga protein pairs.

4 | NATURE | VOL 000 | 00 MONTH 2015 G2015 Macmillan Publishers Limited. All rights reserved ANALYSIS RESEARCH

Disease and engineered Ga mutations significantly affects Gai1 stability in the GDP-bound form, but not We analysed disease-causing mutations in the human population and in the receptor-bound form (see ref. 25). Taken together, our analysis, found three key positions that are mutated (Supplementary Table 3), the CGN numbering system and the universal activation mechanism resulting in constitutive Ga activity: (1) a variant of the transmission described here provides a unified framework to relate and interpret a module residue PheH5.8 in Ga11 causes autosomal dominant hypocal- number of independent experimental and disease mutations in differ- caemia type 2 by becoming constitutively active28, possibly by desta- ent Ga subfamilies. s6h5.3 bilizing the contact between H5 and H1; (2) Gai variant Ala Ser, a position important for H1 stabilization and GDP binding, causes tes- Evolution of Ga activation mechanism totoxicosis by constitutively activating adenylate cyclase29; and (3) To understand how allosteric regulation might have evolved in Ga,we Ga11 variant ArgH1.9Cys causes autosomal dominant hypo-parathyr- compared the crystal structures of each equivalent signalling state of Ga oidism30, possibly by affecting H-domain opening and GDP release. and Ras. We found that H5 and H1 have significantly changed their We also analysed previously performed perturbation experiments on functional role. In Ras, H5 has a disordered extension (hyper-variable 31 different Ga types using the CGN system and can explain how these region) that is post-translationally modified for membrane anchoring . mutations might affect the activation mechanism (Fig. 5a and In Ga, the equivalent region forms the GPCR interface module 32 Supplementary Note). Furthermore, comprehensive alanine-scanning (the N-terminal disordered region of HN is the membrane anchor ). The nucleotide-binding N-terminal part of H1 is conserved in its mutagenesis performed on Gai1, coupled with thermostability assays of the mutants in the GDP-bound or nucleotide-free state coupled to sequence and its structural orientation in both Ras and Ga (Fig. 6 and rhodopsin25, is also consistent with the analysis performed here Extended Data Fig. 6). In contrast, the central part of H1, which contacts (Fig. 5b). For example, mutations in the H5 interface module mainly the H-domain hinge, is only conserved in Ga. The C-terminal part of H1 affect Ga –rhodopsin complex stability, whereas mutations in the H5 has a different role in Ras and Ga. While the last turn of H1 in Ras folds i1 p p transmission module primarily affect the nucleotide-bound state back to bind the guanine moiety via a conserved – stack, the equival- a of Ga . Alanine mutations in H1 highly destabilize the GDP-bound ent region in G forms the metastable part of H1 that remains helical in i1 the GDP-bound inactive state due to contacts with the N-terminal state, but not the Gai1–rhodopsin complex. In addition, mutating H5.8 transmission module of H5. These contacts are missing in Ras, as the residues in the conserved p–p cluster that interact with Phe C terminus of H1 and the N terminus of H5 are each three residues shorter, and H1 in Ras is stable without the interactions with H5. This means that although Ras and Ga are evolutionarily related and share the a Position Gα type Effect References same architecture, minor but crucial differences in the number and H5 Gi Changing H5 helicity induces GDP release 35 Gly -spacer or Ala -spacer between H5 modules pattern of non-covalent contacts between H5 and H1 have allowed H5 modules G 5 4 36,37 t decouples Gα activation from GPCR binding the emergence of an allosteric mechanism for GDP release in Ga. Ala and Cys mutations accelerate GDP exchange 10,38 PheG.H5.8 G t,i,1 Autosomal dominant hypocalcaemia type 2 28 Small extensions in H1 and H5 permit H1 to sense whether H5 is bound G.H5.1 Thr Gt Ala mutation causes constitutive GDP/GTP exchange 38 to the GPCR. The disordered C-terminal tail of H5 provides both con- G.H5.6 Val Gt Ala mutation causes constitutive GDP/GTP exchange 38 served and variable interface residues that allow for a conserved Ga G.s6h5.3 G Ser mutation results in constitutive activation activation mechanism and yet permit the evolution of receptor-binding Ala s and causes testotoxicosis 29 G.S3.3 specificity. Phe Gi Cys mutation accelerates GDP exchange rate 10 Leu or Cys mutations cause autosomal dominant ArgG.H1.9 G 11 hypoparathyroidism 30 Discussion b Gα –GDP stability Gα –GPCR complex stability i1 i1 Our analysis suggests that GPCRs interact and activate Ga subunits G.H5.26 Interface module G.H5.25 through a highly conserved mechanism in which the interruption of G.H5.24 G.H5.23 the contacts between H1 and H5 is a key step for GDP release. In this G.H5.22 G.H5.21 sense, while H1 is the molecular switch for GDP release, H5 is the distal G.H5.20 G.H5.19 G.H5.18 trigger that is ‘pulled’ upon receptor binding. Given that Ga proteins G.H5.17 Interface module G.H5.16 belong to the same fold, the existence of evolutionarily conserved resi- G.H5.15 G.H5.14 H5 dues per se is not surprising. However, the observation that (1) the Transmission module H5 G.H5.13 G.H5.12 conserved residues form a network of non-covalent contacts that links G.H5.11 G.H5.10 the GPCR-binding site with the GDP-binding pocket and that (2) this G.H5.9 G.H5.8 network of contacts is consistently re-organized upon receptor binding * G.H5.7 G.H5.6 G.H5.5 suggests that this mechanism might constitute the common conserved G.H5.4 G.H5.3 set of structural changes for the allosteric release of GDP (Supple- G.H5.2 Transmission module Transmission G.H5.1 mentary Note). While the conserved residue contacts are crucial for G.H1.12 Ga activation, non-conserved positions can still be important for allos- G.H1.11 10 γ β α G.H1.10 R* teric activation in distinct Ga proteins . Thus, the identified residues are G.H1.9

G.H1.8 H1 necessary but not sufficient for G-protein activation. The variable inter- G.H1.7 α H1 G.H1.6 γ β G.H1.5 face residues, as well as the bc subunits, could have important roles in G.H1.4 G.H1.3 receptor binding specificity for individual proteins. Thus, the conserved G.H1.2 G.H1.1 universal mechanism probably represents the ‘skeleton’ that can be –15 –10 –5 0 –50 –25 0 25 incorporated into different contexts in different Ga proteins to maintain ΔT m (ºC) Change in relative complex stability (%) a conserved mechanism of allosteric activation and yet permit specific binding to the receptor. Figure 5 | Mutational studies support the universal Ga activation A comparison to small G proteins revealed how Ga evolved to bind mechanism. a, Disease and engineered mutations can be explained by the model of Ga activation in different Ga subfamilies. References10, 28–30, 35–38 are GPCRs at a site that is distal to the GDP binding pocket. Emergence of short regions in H5 and H1 that can undergo structural transitions cited in this figure. b, Comparison of the stability of Ga–GDP (DTm; uC) and the Gabc–GPCR complex (change in relative complex stability; %) by Gai1 seem to have been co-opted to make a new GEF (receptor) interface alanine mutagenesis of every position in H1 and H5. Asterisk indicates that and link it to the GDP binding site. In such a system, displacement mutant FH5.8A is not stable in the receptor-free state but can still form the of a secondary structure element (H5) upon receptor binding can complex with the receptor. transmit information by re-organizing key non-covalent contacts

00 MONTH 2015 | VOL 000 | NATURE | 5 G2015 Macmillan Publishers Limited. All rights reserved RESEARCH ANALYSIS

a b Disorder-to-order N-terminal lipid anchor GEF Gα H5 GAP H5 H5

Effect H1 α Order-to-disorder Displacement or

γ β H1 H1 Gα–GEF interface

C-terminal No significant change lipid anchor Ras H5 GAP H5 H5

Effe c Ras H1 tor No significant change

H1 H1 GEF Ras–GEF interface Inactive state GEF-bound state c Helical region Gα Gα H1 H5 Disordered region Ras Ras Variable/missing position Figure 6 | H5–H1 interaction permits the allosteric activation mechanism. helical region is highlighted in a grey background and the H5 disordered a, GEF interaction surfaces (red) for Ga (3sn6) and the small G protein human region is shown in a dashed grey box. Conserved residues that contact GDP HRas (PDB 1bkd) in the same orientation. b, H1 and H5 of the inactive (green), the H-domain (light blue) or form contacts between H5 and H1 (dark (1got, 4q21) and GEF-bound state (3sn6, 1bkd) for Ga (blue) and HRas (grey). blue) are highlighted. See Extended Data Fig. 6. c, Consensus sequences of equivalent residues of H5 and H1 in Ga and Ras. The between conserved residues that connect different secondary structure 4. Anantharaman, V., Abhiman, S., de Souza, R. F. & Aravind, L. Comparative genomics uncovers novel structural and functional features of the heterotrimeric elements. Thus, a common ancestor of the GTPase fold might have GTPase signaling system. 475, 63–78 (2011). provided the structural framework that can be perturbed by GPCR 5. Rasmussen, S. G. et al. Crystal structure of the b2 adrenergic receptor–Gs protein binding through the interruption of H5–H1 contacts. This ‘new’ allos- complex. Nature 477, 549–555 (2011). 6. Chung, K. Y. et al. Conformational changes in the G protein Gs induced by the b2 teric site for GEFs is physically separated from the Ga effector and adrenergic receptor. Nature 477, 611–615 (2011). regulator binding interfaces (Fig. 6a), and could have provided the basis 7. Preininger, A. M., Meiler, J. & Hamm, H. E. Conformational flexibility and structural for the expansion of the GPCR family without affecting the down- dynamics in GPCR-mediated G protein activation: a perspective. J. Mol. Biol. 425, 2288–2298 (2013). stream signalling factors. 8. Oldham, W. M., Van Eps, N., Preininger, A. M., Hubbell, W. L. & Hamm, H. E. Our findings suggest that in addition to evolving extensive inter- Mechanism of the receptor-catalyzed activation of heterotrimeric G proteins. faces and allosteric ‘wires’ as observed in other proteins13,33,another Nature Struct. Mol. Biol. 13, 772–777 (2006). 9. Westfield, G. H. et al. Structural flexibility of the Gas a-helical domain in the b2- solution for allosteric communication is evolving short segments that adrenoceptor Gs complex. Proc. Natl Acad. Sci. USA 108, 16086–16091 (2011). undergo disorder-to-order transitions upon a trigger (for example, 10. Alexander, N. S. et al. Energetic analysis of the rhodopsin-G-protein complex links receptor binding). This mechanism involves the re-organization of a the a5 helix to GDP release. Nature Struct. Mol. Biol. 21, 56–63 (2014). 11. Neves, S. R., Ram, P. T. & Iyengar, R. G protein pathways. Science 296, 1636–1639 network of existing contacts to induce conformational changes that (2002). affect a distal site without the requirement of directly contacting resi- 12. Chothia, C. & Lesk, A. M. Helix movements and the reconstruction of the haem dues but linked by the same secondary structure (for example, the pocket during the evolution of the cytochrome c family. J. Mol. Biol. 182, 151–158 (1985). interface and transmission module of H5 make distinct sets of contacts 13. Su¨el, G. M., Lockless, S. W., Wall, M. A. & Ranganathan, R. Evolutionarily conserved but are linked through the protein backbone; like a puppet on a string). networks of residues mediate allosteric communication in proteins. Nature Struct. Since disordered and loop regions tolerate more sequence changes Biol. 10, 59–69 (2003). 34 14. del Sol, A., Fujihashi, H., Amoros, D. & Nussinov, R. Residues crucial for maintaining than structured regions , an important implication is that such seg- short paths in network communication mediate signaling in proteins. Mol. Syst. ments allow for the independent evolution of regions that are import- Biol. 2 (2006). ant for binding specificity but still maintain a conserved allosteric 15. Kornev, A. P., Haste, N. M., Taylor, S. S. & Eyck, L. F. Surface comparison of active and inactive protein kinases identifies a conserved activation mechanism. Proc. activation mechanism. Generalizing this principle, we suggest that Natl Acad. Sci. USA 103, 17783–17788 (2006). disordered segments that can undergo structural transitions (regulated 16. Zhang, X., Perica, T. & Teichmann, S. A. Evolution of protein structures and folding or unfolding) and thereby re-organize existing networks of interactions from the perspective of residue contact networks. Curr. Opin. Struct. contacts within structured regions of proteins could have an important Biol. 23, 954–963 (2013). 17. Deupi, X. et al. Stabilized G protein binding site in the structure of constitutively role in other protein families and may be exploited in protein engin- active metarhodopsin-II. Proc. Natl Acad. Sci. USA 109, 119–124 (2012). eering applications. 18. Standfuss, J. et al. The structural basis of agonist-induced activation in constitutively active rhodopsin. Nature 471, 656–660 (2011). Online Content Methods, along with any additional Extended Data display items 19. Choe, H. W. et al. Crystal structure of metarhodopsin II. Nature 471, 651–655 and Source Data, are available in the online version of the paper; references unique (2011). to these sections appear only in the online paper. 20. Scheerer, P. et al. Crystal structure of opsin in its G-protein-interacting conformation. Nature 455, 497–502 (2008). Received 5 November 2014; accepted 16 June 2015. 21. Slessareva, J. E. et al. Closely related G-protein-coupled receptors use multiple and distinct domains on G-protein a-subunits for selective coupling. J. Biol. Chem. 278, Published online 6 July 2015. 50530–50536 (2003). 22. Oldham, W. M. & Hamm, H. E. activation by G-protein- 1. Vetter, I. R. & Wittinghofer, A. The guanine nucleotide-binding switch in three coupled receptors. Nature Rev. Mol. Biol. 9, 60–71 (2008). dimensions. Science 294, 1299–1304 (2001). 23. Dratz, E. A. et al. NMR structure of a receptor-bound G-protein peptide. Nature 2. Leipe, D. D., Wolf, Y. I., Koonin, E. V. & Aravind, L. Classification and evolution of 363, 276–281 (1993). P-loop and related . J. Mol. Biol. 317, 41–72 (2002). 24. Tompa, P., Davey, N. E., Gibson, T. J. & Babu, M. M. A million peptide motifs for the 3. Rojas, A. M., Fuentes, G., Rausell, A. & Valencia, A. The Ras protein superfamily: molecular biologist. Mol. Cell 55, 161–169 (2014). evolutionary tree and role of conserved amino acids. J. Cell Biol. 196, 189–201 25. Sun, D. et al. Probing Gai1 protein activation at single amino acid resolution. Nature (2012). Struct. Mol. Biol. (in the press).

6 | NATURE | VOL 000 | 00 MONTH 2015 G2015 Macmillan Publishers Limited. All rights reserved ANALYSIS RESEARCH

26. Mnpotra, J. S. et al. Structural basis of G protein-coupled receptor-Gi protein 38. Marin, E. P., Krishna, A. G. & Sakmar, T. P. Rapid activation of by interaction: formation of the cannabinoid CB2 receptor-Gi protein complex. J. Biol. mutations distant from the nucleotide-binding site: evidence for a mechanistic Chem. 289, 20259–20272 (2014). model of receptor-catalyzed nucleotide exchange by G proteins. J. Biol. Chem. 276, 27. Hanson, P. I. & Whiteheart, S. W. AAA1 proteins: have engine, will work. Nature Rev. 27400–27405 (2001). Mol. Cell Biol. 6, 519–529 (2005). 28. Nesbit, M. A. et al. Mutations affecting G-protein subunit a11 in hypercalcemia and Supplementary Information is available in the online version of the paper. hypocalcemia. N. Engl. J. Med. 368, 2476–2486 (2013). Acknowledgements We thank U. F. Lang, C. Chothia, R. Weatheritt, S. Balaji, 29. Iiri, T., Herzmark, P., Nakamoto, J. M., van Dop, C. & Bourne, H. R. Rapid GDP release H. Harbrecht, A. Morgunov, G. Murshudov and N. S. Latysheva for their comments from G in patients with gain and loss of endocrine function. Nature 371, 164–168 sa on this work. This work was supported by the Medical Research Council (1994). (MC_U105185859; M.M.B., T.F., M.K., A.J.V. and C.N.J.R.; MC_U105197215; C.G.T.), 30. Li, D. et al. Autosomal dominant hypoparathyroidism caused by germline the Swiss National Science Foundation grants 141898, 133810, 31-135754 (D.B.V.), mutation in GNA11: phenotypic and molecular characterization. J. Clin. the MRC Centenary Award (A.J.V.), the AFR scholarship from the Luxembourg National Endocrinol. Metab. 99, E1774–E1783 (2014). Research Fund (C.N.J.R.) and the Boehringer Ingelheim Fond (T.F.). M.M.B. is also a 31. de la Vega, M., Burrows, J. F. & Johnston, J. A. Ubiquitination: Added complexity in Lister Institute Research Prize Fellow. Ras and Rho family GTPase function. Small GTPases 2, 192–201 (2011). 32. Oldham, W. M. & Hamm, H. E. Structural basis of function in heterotrimeric G Author Contributions T.F. collected data, wrote scripts and performed all the analysis. proteins. Q. Rev. Biophys. 39, 117–166 (2006). C.N.J.R. helped with data analysis and writing the scripts for structure analysis with T.F. 33. Cui, Q. & Karplus, M. Allostery and cooperativity revisited. Protein Sci. 17, A.J.V. helped T.F. with data analysis. D.S. and D.B.V. contributed experimental results on 1295–1307 (2008). alanine scanning. M.K. prepared the CGN webserver. C.G.T. helped with aspects of data 34. Brown, C. J., Johnson, A. K., Dunker, A. K. & Daughdrill, G. W. Evolution and disorder. interpretation. T.F. and M.M.B. designed the project, analysed the results and wrote the Curr. Opin. Struct. Biol. 21, 441–446 (2011). manuscript. All authors read and provided their comments on the draft. M.M.B. 35. Tanaka, T. et al. a helix content of G protein or subunit is decreased upon activation supervised the project. by receptor mimetics. J. Biol. Chem. 273, 3247–3252 (1998). 36. Natochin, M., Moussaif, M. & Artemyev, N. O. Probing the mechanism of rhodopsin- Author Information Reprints and permissions information is available at catalyzed transducin activation. J. Neurochem. 77, 202–210 (2001). www.nature.com/reprints. The authors declare no competing financial interests. 37. Marin, E. P., Krishna, A. G. & Sakmar, T. P. Disruption of the a5 helix of transducin Readers are welcome to comment on the online version of the paper. Correspondence impairs rhodopsin-catalyzed nucleotide exchange. Biochemistry 41, 6988–6994 and requests for materials should be addressed to T.F. ([email protected]) or (2002). M.M.B. ([email protected]).

00 MONTH 2015 | VOL 000 | NATURE | 7 G2015 Macmillan Publishers Limited. All rights reserved RESEARCH ANALYSIS

METHODS from Homo sapiens, and the most ancestral one-to-many orthologue No statistical methods were used to predetermine sample size. extends back to Opisthokonta (yeast; Saccharomyces cerevisiae), sepa- Generation of sequence and structure data sets. Identification of relevant Ga rated by 1,215 million years from human. In this work, we only inves- protein structures. All structures related to the Ga protein family (Pfam tigated G proteins from organisms that have a GPCR–G-protein system. family: PF00503) were collected from Pfam (release 27.0) and Ensembl39 Since plants do not encode GPCRs and the heterotrimeric G proteins are using the R BioMart interface40. In addition, the identified 973 Ga known to be auto-activated, we did not consider the plant G proteins in homologue sequences (see below) were scanned against the entire our analysis. Protein Data Bank (PDB) database using the BLAST algorithm41 to Development of a common Ga numbering (CGN) system. Common Ga ensure all Ga-containing structures were identified. 91 PDB entries numbering system. Comparative analyses of different protein structures (146 Ga chains) were identified, of which two were obsolete (2pz3 and sequences to infer general principles of a protein family require a retracted, 2ebc superseded by 3umr). Crystallographic data and coord- way of relating structural, genomic, or experimental data from different inate files were retrieved from the RCSB PDB API (Tuesday 4 February studies to each topologically equivalent position on homologous pro- 54 2014 at 16.00 PST). Ga structures from the parasite Entamoeba histo- teins. For GPCRs, the Ballesteros–Weinstein numbering scheme lytica (4fid) and Arabidopsis thaliana (2xtz), as well as non-full-length enables the referencing of positions in the transmembrane helices of Ga (1aqg and 1lvz are solution NMR studies of the C-terminal helix of different GPCRs, not considering loop regions. We sought to develop Ga, 3rbq contains the 11 amino acid N-terminal part of transducin a common G protein numbering (CGN) system that includes loop bound to UNC119), were excluded from the analyses. Four structures regions and describes Ga residues in three levels of detail (DSP), similar to a postal address. D refers to the structural domain and is optional of the last 10 C-terminal amino acid residues of Gat bound to rhodopsin (2x72, 3dqb and 3pqr) or meta-rhodopsin (4a4m) were used for the (catalytic GTPase domain: G; helical domain: H); S stands for one of the GPCR–Ga interface analysis. Five PDB entries had no publication 37 consensus secondary structure elements (including loops) of the associated and were manually traced back to their original articles: conserved Ga topology; and P relates to the corresponding residue 3umr was published in Johnston et al.42 and 4g5o, 4g5q, 4g5r, 4g5s were position within the consensus secondary structure element mapped to discussed in a study by Jia et al.43. The final set of structures in our an alignment of all ‘canonical’ human Ga sequences (Extended Data Fig. analyses span orthologues from human, mouse, rat and cow and encom- 1). For a detailed guide of how to use the CGN and map any Ga protein, pass twelve different Ga genes from eight different Ga subfamilies please refer to the CGN webserver (http://www.mrc-lmb.cam. (GNAI1, GNAI3, GNAO, GNAS2, GNAT1, GNAQ, GNA12, ac.uk/CGN) and Supplementary Note. Mapping structures to Uniprot sequences. Since several Ga protein struc- GNA13), thereby representing all Ga families (Gas,Gai,Gaq and Ga12). A full list of all retrieved PDBs is provided in Supplementary tures represent chimaeric G proteins, have peptide tags, or contain point Table 1. mutations, each residue/position in a PDB structure was mapped to its Identification of canonical human Ga protein sequences and paralogue Uniprot sequence(s) using the Structure Integration with Function, 55 alignment. All relevant human Ga protein isoforms and variants were Taxonomy and Sequence (SIFTS) webserver followed by a manual obtained from Ensembl39 using R (full list in Supplementary Table 1). validation for missing positions. This allowed assigning residue posi- The ‘canonical’ protein sequences for each of the 16 human Ga genes, as tions of each Ga structure to their equivalent positions in the human defined by Uniprot44, were used as representative sequences for each paralogue alignment and the orthologue alignments. human Ga gene throughout this work. The sequences were aligned Determination of domain D and consensus secondary structure S. using MUSCLE45 and were manually refined using the consensus sec- Secondary structure assignments were made for each Ga structure using ondary structure as a guide (see below). Phylogenetic relationships of Ga the STRIDE algorithm56. The consensus secondary structure elements were obtained from Treefam46 (family TF300673). The cladogram of the (SSEs) were determined by considering the most prominent secondary 16 canonical human Ga protein alignment was built with the structure type at each topologically equivalent Ga position when com- Phylogeny.fr web service47 using the PhyML v3.0 algorithm48 with the paring the secondary structure assignment of all 80 Ga structures (mean SH-like Approximate Likelihood-Ratio Test using the Jones–Taylor– and standard deviation of secondary structure type at each CGN posi- Thornton substitution matrix and TreeDyn49 for visualization. tion were calculated). Topologically equivalent positions had a high Orthologue alignments of one-to-one Ga orthologues of 16 human Ga agreement in their SSE assignment and showed well-defined flanking genes. Phylogenetic relationships of Ga sequences were collected from regions (Supplementary Note). In addition, the assigned consensus SSEs TreeFam46, the Orthologous MAtrix (OMA) database50 and were manually confirmed through a 3D-structure alignment using EnsemblCompara GeneTrees (Compara)51 using R scripts. Compara MUSTANG57, from which the domains (G-domain and H-domain) had the highest fraction of complete Ga sequences for each human were defined. The Ga SSE nomenclature follows a standardized expan- 58 Ga gene, except for Gas, for which OMA had a better sequence coverage. sion of the previously defined nomenclature : uppercase letters H and S In total, 973 genes from 66 organisms were used, of which 773 were one- represent helices or sheets, respectively. SSEs of the G domain follow a to-one orthologues. To build an accurate, low-gap alignment of such a numerical identifier (H1, H2, …, H5 and S1, S2, …, S6 with the excep- number of sequences, 16 independent orthologous alignments for each tion of HG and HN); SSEs of the H domain have an alphabetical identi- human Ga gene were first created by aligning one-to-one orthologue fier (HA, HB, …, HF), starting from the N- to the C terminus of Ga. The groups using the PCMA algorithm52 followed by manual refinement. N-terminal region that forms a membrane-anchored helix is defined as Subsequently, each orthologue alignment was cross-referenced to the HN. Systematic identifiers for historical names of some loop regions CGN (see below) by referencing its respective human sequence to the (switch regions, P-loop, etc.) were derived by concatenating the flanking human paralogue alignment. Conservation scores of each CGN position SSE names using lowercase letters; for instance s6h5 refers to the loop were calculated using both sequence identity and sequence similarity, between S6 and H5. A reference table including the historical loop based on the BLOSUM62 substitution matrix (Supple- names is provided in Extended Data Fig. 1 and Supplementary Table 2. mentary Note) using all complete sequences of the cross-referenced Determination of position P. P describes the amino acid position within alignments (561 sequences). Sequence conservation was mapped onto an SSE, as determined by mapping the consensus secondary structure to PDB structures (Supplementary Data) and visualized by generating the human paralogue alignment (Extended Data Fig. 1 and PDB files with B-factors substituted by conservation scores. Supplementary Table 2). Insertions in orthologues are annotated P-i, Phylogenetic distances. The evolutionary distance of the retrieved where i stands for the number of inserted residues after position P, for sequences relative to human was evaluated with TimeTree53.Ga one- instance Arg334H4.27-2 for the second amino acid of an insertion after to-one orthologues extend back to chordate (sea squirts; Ciona savignyi position 27 of helix H4, found in pufferfish (Tetraodon nigroviridis)Gas and Ciona intestinalis for Ga15), separated around 722.5 million years (Supplementary Note).

G2015 Macmillan Publishers Limited. All rights reserved ANALYSIS RESEARCH

Consensus non-covalent contact networks between conserved residues. Non- with a sequence identity .90% were shown as ‘consensus contacts’ covalent residue contact networks. Non-covalent contacts between resi- between conserved residues—this threshold was chosen based on the dues of a protein define its topology, conformation and stability. For bimodal distribution of contact occurrence (Supplementary Fig. 2). In each of the 80 Ga protein structures, a local version of the RINerator 0.5 addition, only long-range interactions (.i 1 4) are shown for the con- package from 201459 was used to calculate H-bonds and van der Waals sensus RCNs. It is important to note that these cut-offs were applied interactions between residues. Matrices of the all-against-all atomic only for visualization purposes, while for the analysis, no cut-off was distances of all residue contacts within each structure were computed needed. All relevant consensus contacts were additionally visually 60 using R and the bio3d package . Non-canonical interaction such as p–p inspected for each of the 80 PDB structures by creating automated 61 stacking were identified with NCI . All other calculations, analysis and PyMol sessions from R that superimpose all the 80 structures. To gen- processing were performed using custom written scripts in R. erate RCNs between SSEs, the sum of all contacts of the respective SSE as Assignment of Ga structures to signalling states. Structural differences defined by the consensus SSE of the CGN were computed. Chimera65 between Ga seem to arise from a convolution of the conformational was used to manually re-evaluate atomic contacts, and PyMol was used state, binding partner, and Ga protein type and species to create publication-quality images. (Supplementary Note). To identify the non-covalent contacts of a struc- Interface analysis. Buried surface area and inter-Ga–GPCR residue con- ture that are crucial for each signalling state, and independent of the Ga tact networks. Inter-chain RCNs between Ga and the receptor (Gas and protein type and species, all Ga structures were assigned to one of the b2AR chains A and R in 3sn6, Gat C-terminal peptide and rhodopsin four different Ga signalling states depending on (1) the bound ligand, from chains B and A in 2x72, 3dqb, 3pqr, 4a4m) were calculated as and (2) the interaction partner (Supplementary Table 1). The four states described above. The buried surface area (BSA) was obtained from the are (1) heterotrimeric GDP-bound state (inactive state), (2) nucleotide- PDBe PISA (Proteins, Interfaces, Structures and Assemblies)66 XML free heterotrimeric receptor bound complex (GEF-bound state), (3) repository and normalized by the accessible surface area for each residue GTPcS and potentially ‘effector’-bound state (active state), and (4) position. BSA and Ga–GPCR RCNs were mapped to the CGN and the RGS-bound GDP1ALF hydrolysis transition state (GAP-bound state). Ballesteros–Weinstein numbering, respectively. Sequence conservation Eleven structures are in the inactive state, one full-length structure (and from 561 complete Ga homologue sequences and 249 human non- a four structures of the C-terminal G peptide in complex with a GPCR) olfactory class A GPCRs was mapped onto the interface to determine c in the GPCR-bound state, 25 have GTP S bound or/and are co-crystal- the conserved ‘hotspot’ residues in the interface and visualized in PyMol a lized with their downstream effectors, and 40 structures have G in the (Supplementary Data). The BSA histogram, the visualization of the GTP-hydrolysis transition state with GDP and aluminium fluoride residue interaction network per secondary structure elements, and the (ALF) bound (GDP1ALF) and/or are co-crystalized with their RGS correlation of BSA per residue versus conservation were produced in R or a GTP-hydrolysis promoting peptide mimicking the RGS binding and ggplot2. interface (for example, Go-Loco motif). Two structures (1cip, 1svs) had Force-field-based energy estimations. The per-residue energy contribu- non-standard Ga ligands bound, and 2zjz62 did not have a detailed tions to Ga monomer and Ga–GPCR complex stability were calculated description of its biochemical relevance, and thus were not assigned to using FoldX 3.0, which uses energy terms weighted by empirical data any signalling state. Eleven structures were identified as chimaeras and from protein engineering experiments to provide a quantitative estima- 21 included mutations (Supplementary Note). The publications describ- ing the protein structure of each PDB entry were checked to confirm the tion of each residue’s contribution to protein stability and protein com- relevance of the assigned signalling state. plex stability (http://foldx.crg.es/). For the interface analysis, the 3sn6 Consensus contacts between conserved residues. To compare residue structure was energy minimized with the FoldX ‘repair pdb’ function contact networks (RCNs) from different structures, topologically equi- and subsequently, the per-residue energy contributions for both the valent positions were cross-referenced with the CGN system. All RCN Gas–b2AR complex and the monomers in isolation were calculated analyses, consensus RCN calculation, and conservation analysis were using the FoldX ‘sequence detail’, ‘analyse complex’, and ‘stability’ func- conducted using customized R scripts: matrices representing the tions at 298K, pH 7.0, and 0.05M ionic strength. The per-residue energy absence or presence of non-covalent contacts between each possible pair contributions to complex stability were calculated as the difference of CGN residues in each PDB structure were computed (Supplementary between the energy contributions of each residue in the monomer and Fig. 1). The consensus contacts of each signalling state were computed as complex (DDGinterface) and visualized with R (Extended Data Fig. 2a). the probability to find a contact in all structures of the state. Structure For energy contributions of each residue within Ga monomers models can differ in their number of equivalent residues due to missing (Extended Data Fig. 4b), the average energy contribution and standard electron density, not fully fitted models, or truncations for crystal- deviation for each Ga position was computed after running the FoldX lographic purposes. Thus, each consensus contact probability was nor- ‘stability’ and ‘sequence detail’ functions at 298K, pH 7.0, and 0.05M malized by the number of structures of the state that have the respective ionic strength for each of the 79 non-complex structures. residue pair, in order to distinguish the absence of a contact from the Disorder propensity calculations for all Ga homologue sequences and absence of an equivalent position in a single PDB. To expand the struc- structures. The disorder propensity of each of the 561 complete Ga tural analysis to other Ga proteins for which only sequence data was homologue sequences was calculated with IUPred67 (prediction-type- available, sequence conservation was mapped to each CGN residue (see setting: ‘short disorder’). The missing structure positions were identified above). with bio3d package60 (Extended Data Fig. 2b). Visualization of consensus contacts and identification of universal struc- New and published mutational studies of different Ga classes. Identification tural motifs. The consensus contacts between conserved residues in the of mutations, mutant structures and chimaeras. Additional literature on different signalling states were visualized to investigate the contact re- Ga mutations was retrieved with the text mining tool MutationMapper68 organization in detail. For 2D visualization, the respective consensus and manually validated and filtered for correct hits/search results. RCNs were exported to Cytoscape63 using the RCytoscape interface64. Disease mutations were retrieved from the Database of Single For 3D visualization, R was used to create consensus RCNs in PyMol Nucleotide Polymorphisms (dbSNP) and the Catalogue of Somatic (The PyMOL Molecular Graphics System, Version 1.5.0.4 Schro¨dinger, Mutations in Cancer (COSMIC)69 with biomaRt40, and from the LLC.) by creating pseudo PDB structure coordinate files that show Human Gene Mutation Database (HGDM)70. Mutations, chimaeras residues as spheres from their C-alpha atoms and lines/edges between and peptide tags in the analysed structures were identified by comparing them using CONECT entries. Information on sequence conservation their Uniprot sequence to their PDB sequence using SIFTS55 and mining was mapped via the B-factor field of the pseudo PDB structures. For PDBe annotations. All mutation data were mapped back to the CGN simplification, only contacts present in more than 90% of all structures and visualized on their respective human Ga structure.

G2015 Macmillan Publishers Limited. All rights reserved RESEARCH ANALYSIS

50. Altenhoff, A. M., Schneider, A., Gonnet, G. H. & Dessimoz, C. OMA 2011: orthology Alanine scanning and stability of Gai. The alanine scanning expression 71 inference among 1000 complete genomes. Nucleic Acids Res. 39, D289–D294 library of Gai1 was prepared as reported before . The recombinant Gai1 (2011). alanine mutants were expressed in 24 well plates, purified by standard 51. Vilella, A. J. et al. EnsemblCompara GeneTrees: Complete, duplication-aware Ni-NTA affinity chromatography followed by buffer exchange using 96- phylogenetic trees in vertebrates. Genome Res. 19, 327–335 (2009). 52. Pei, J., Sadreyev, R. & Grishin, N. V. PCMA: fast and accurate multiple sequence well filter plates. The bovine rhodopsin and bc subunits were prepared alignment based on profile consistency. Bioinformatics 19, 427–428 (2003). from bovine retinas72. The melting temperature of each alanine mutant 53. Hedges, S. B., Dudley, J. & Kumar, S. TimeTree: a public knowledge-base of upon addition of GDP or GTPcS was measured by differential scanning divergence times among organisms. Bioinformatics 22, 2971–2972 (2006). 54. Ballesteros, J. A. & Weinstein, H. Receptor Molecular Biology Vol. 25 (Elsevier, 1995). fluorimetry assay. The effect of each alanine Gai1 mutant on R*–Gi 55. Velankar, S. et al. SIFTS: Structure Integration with Function, Taxonomy and complex formation and complex stability were measured by a high- Sequences resource. Nucleic Acids Res. 41, D483–D489 (2013). throughput assay based on native gel electrophoresis. Detailed methods 56. Heinig, M. & Frishman, D. STRIDE: a web server for secondary structure assignment from known atomic coordinates of proteins. Nucleic Acids Res. 32, and protocols are provided in ref. 25. W500–W502 (2004). Ga versus Ras comparison. The Ras conformational cycle was featured in 57. Konagurthu, A. S., Whisstock, J. C., Stuckey, P. J. & Lesk, A. M. MUSTANG: a multiple the RSCB PDB73 April 2012 PDB-101 Molecule of the Month by David structural alignment algorithm. Proteins 64, 559–574 (2006). ˚ Goodsell (http://dx.doi.org/10.2210/rcsb_pdb/mom_2012_4), with 58. Noel, J. P., Hamm, H. E. & Sigler, P. B. The 2.2 A crystal structure of transducin-a complexed with GTPcS. Nature 366, 654–663 (1993). high-resolution structures showing human HRas in its active GTPcS- 59. Doncheva, N. T., Klein, K., Domingues, F. S. & Albrecht, M. Analyzing and visualizing bound state (PDB 5p2174) and the GDP-bound inactive state (PDB residue networks of protein structures. Trends Biochem. Sci. 36, 179–182 (2011). 4q2175). 1bdk76 was used as representative of the HRas GEF-bound state. 60. Grant, B. J., Rodrigues, A. P., ElSawy, K. M., McCammon, J. A. & Caves, L. S. Bio3d: an R package for the comparative analysis of protein structures. Bioinformatics 22, These Ras representative structures were combined with an alignment of 2695–2696 (2006). all human HRas paralogues identified from the OMA database50.A 61. Babu, M. M. NCI: A server to identify non-canonical interactions in protein structural alignment between the identified active and inactive Ras structures. Nucleic Acids Res. 31, 3345–3348 (2003). 62. Morikawa, T. et al. Crystallization and preliminary X-ray crystallographic analysis of structures and the corresponding active and inactive Ga (1got and the receptor-uncoupled mutant of Ga1. Acta Crystallogr. F. 63, 139–141 (2007). 3ums) was used to accurately map Ras positions to the CGN 63. Lopes, C. T. et al. Cytoscape Web: an interactive web-based network browser. (Supplementary Data) despite the low sequence identity (,6%) between Bioinformatics 26, 2347–2348 (2010). 64. Shannon, P. T., Grimes, M., Kutlu, B., Bot, J. J. & Galas, D. J. RCytoscape: tools for Ras and Ga. The number of atomic non-covalent contacts between helix exploratory network analysis. BMC Bioinform. 14, 217 (2013). H5 and helix H1 in the Ras and Ga structures was manually compared in 65. Pettersen, E. F. et al. UCSF Chimera–a visualization system for exploratory research Chimera for the structures 1got and 4q21. and analysis. J. Comput. Chem. 25, 1605–1612 (2004). 66. Krissinel, E. & Henrick, K. Inference of macromolecular assemblies from crystalline 39. Flicek, P. et al. Ensembl 2014. Nucleic Acids Res. 42, D749–D755 (2014). state. J. Mol. Biol. 372, 774–797 (2007). 40. Kasprzyk, A. BioMart: driving a paradigm change in biological data management. 67. Dosztanyi, Z., Csizmok, V., Tompa, P. & Simon, I. IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated Database 2011, bar049 (2011). energy content. Bioinformatics 21, 3433–3434 (2005). 41. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment 68. Vohra, S. & Biggin, P. C. Mutationmapper: a tool to aid the mapping of protein search tool. J. Mol. Biol. 215, 403–410 (1990). mutation data. PLoS ONE 8, e71711 (2013). 42. Johnston, C. A. et al. Structural determinants underlying the temperature-sensitive 69. Forbes, S. A. et al. COSMIC: exploring the world’s knowledge of somatic mutations a nature of a G mutant in asymmetric cell division of Caenorhabditis elegans. J. Biol. in human cancer. Nucleic Acids Res. 43, D805–D811 (2015). Chem. 283, 21550–21558 (2008). 70. Stenson, P. D. et al. The Human Gene Mutation Database: 2008 update. Genome 43. Jia, M. et al. Crystal structures of the scaffolding protein LGN reveal the general Med 1, 13 (2009). mechanism by which GoLoco binding motifs inhibit the release of GDP from Gai. 71. Sun, D. et al. AAscan, PCRdesign and MutantChecker: a suite of programs for J. Biol. Chem. 287, 36766–36776 (2012). primer design and sequence analysis for high-throughput scanning mutagenesis. 44. Magrane, M. & Consortium, U. UniProt Knowledgebase: a hub of integrated protein PLoS ONE 8, e78878 (2013). data. Database (Oxford) 2011, bar009 (2011). 72. Maeda, S. et al. Crystallization scale preparation of a stable GPCR signaling 45. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high complex between constitutively active rhodopsin and G-protein. PLoS ONE 9, throughput. Nucleic Acids Res. 32, 1792–1797 (2004). e98714 (2014). 46. Ruan, J. et al. TreeFam: 2008 Update. Nucleic Acids Res. 36, D735–D740 (2008). 73. Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000). 47. Dereeper, A. et al. Phylogeny.fr: robust phylogenetic analysis for the non-specialist. 74. Pai, E. F. et al. Refined crystal structure of the triphosphate conformation of H-ras Nucleic Acids Res. 36, W465–W469 (2008). p21 at 1.35 A˚ resolution: implications for the mechanism of GTP hydrolysis. EMBO 48. Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood J. 9, 2351–2359 (1990). phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 75. Milburn, M. V. et al. Molecular switch for : structural differences (2010). between active and inactive forms of protooncogenic ras proteins. Science 247, 49. Chevenet, F., Brun, C., Banuls, A. L., Jacq, B. & Christen, R. TreeDyn: towards 939–945 (1990). dynamic graphics and annotations for analyses of trees. BMC Bioinform. 7, 439 76. Boriack-Sjodin, P. A., Margarit, S. M., Bar-Sagi, D. & Kuriyan, J. The structural basis (2006). of the activation of Ras by Sos. Nature 394, 337–343 (1998).

G2015 Macmillan Publishers Limited. All rights reserved ANALYSIS RESEARCH

Extended Data Figure 1 | Human paralogue reference alignment for position in the SSE of the human reference alignment (P) are shown on top of common Ga numbering system. a, Reference alignment of all canonical the alignment. b, Reference table of the definitions of SSEs used in the CGN human Ga paralogues. The domain (D), consensus secondary structure (S) and nomenclature.

G2015 Macmillan Publishers Limited. All rights reserved RESEARCH ANALYSIS

G2015 Macmillan Publishers Limited. All rights reserved ANALYSIS RESEARCH

Extended Data Figure 2 | Energy estimation of the GPCR–Ga residue plot for all Ga proteins. The mean value of the disorder propensity of all full- contributions and Ga disorder propensity. a, Energy contribution of single length Ga sequences (561 sequences; homologous to all 16 human Ga proteins) interface residues to the Gas–b2AR complex calculated with FoldX (T 5 298K, is shown as a black line; the standard deviation at each position is shown as pH 5 7.0, ionic strength 5 0.05M). Conserved Ga residues (blue sequence light red ribbon. The colour tone of the line indicates the number of gaps at logo) that were identified to form receptor–Ga inter-protein contacts with an aligned position (black, no gaps). The left inset shows the disorder conserved GPCR residues (red sequence logo) are shown. The contact network propensity of H1. The right inset highlights that H5 is highly structured in its between residues of the b2AR and Gas is shown (red, conserved receptor N terminus, and has increased disorder propensity towards the C terminus, residue; blue, conserved Ga residue; grey, variable residues; spheres represent which is in agreement with the missing electron density in the 79 structures. Ca positions and links represent non-covalent contact). b, Consensus disorder

G2015 Macmillan Publishers Limited. All rights reserved RESEARCH ANALYSIS

Extended Data Figure 3 | Rewiring of consensus contacts between highlighted with a blue background (G-domain darker blue, H-domain conserved Ga residues upon receptor binding. CGN numbers and sequence light blue). This figure highlights the most important consensus residue logo for consensus contacts within Ga in the inactive state (left) and GPCR- contacts between conserved residues. Additional contacts in the right lobe are bound state (right) are shown. Receptor residues are shown in red; H5 residues discussed in the Supplementary Note and in ref. 25. For a full list of residue in dark blue; H1 residues in light blue; and GDP in green. The domains are contacts, please refer to Supplementary Data.

G2015 Macmillan Publishers Limited. All rights reserved ANALYSIS RESEARCH

G2015 Macmillan Publishers Limited. All rights reserved RESEARCH ANALYSIS

Extended Data Figure 4 | Details of helix H1 linking H5, GDP and the from all four Ga subfamilies in the non-receptor-bound signalling states using H-domain. a, This figure expands Fig. 4 from the main text to provide residue- FoldX (T 5 298K, pH 5 7.0, ionic strength 5 0.05M). The average energy level details of the role of helix H1. Residues forming contacts with H5 are contribution is shown as dots, the standard deviation as bars. c, Per residue shown in blue, with the H-domain in light blue and with GDP in green. Non- detail of Ga-GDP and Ga-GSP (non-hydrolysable GTP analogue) consensus covalent consensus contacts between universally conserved residues at the SSE contacts. The bar-plot shows the frequency of finding a contact mediated by level (left) and per residue-level (centre). Lines denote non-covalent contacts topologically equivalent positions with GDP/GSP. Number of side-chain between residues. The degree of conservation is shown as sequence logo. and main-chain contacts are shown as dark grey and light grey bars, Residues are numbered according to the CGN. Helix H1 is almost 100% respectively. The degree of conservation of contacting residues (calculated conserved across all 16 Ga types and forms three structural motifs for from the 561 complete Ga sequences) is represented in the right panel and interactions with H5, the H-domain and GDP (right). b, Average per residue the consensus sequence for each position is shown. energy contribution to Ga protein stability as calculated from 79 structures

G2015 Macmillan Publishers Limited. All rights reserved ANALYSIS RESEARCH

Extended Data Figure 5 | Conserved structural motifs of Ga and known (AspH.hdhe.5). The hinge region is formed by H1, the loop between H1 and HA, disease and engineered mutations. a, A universally conserved cluster of and HF. H1 interacts via (1) a cation–p interaction mediated by a universally p–p and hydrophobic interactions between S2 (PheG.S2.6) and S3 (PheG.S3.3), H1 conserved residue with the loop connecting H1 and HA (LysG.H1.6 and (MetG.H1.8 and HisG.H1.12) and H5 (PheG.H5.8) links H5 and H1 in the absence of TyrG.h1ha.4) and (2) a hydrophobic interaction with HF (LysG.H1.9 and the receptor. Upon receptor binding, residues within this motif (PheG.H5.8 LeuH.HF.5). b, Disease and engineered mutations that can be explained by the and PheG.S3.3) interact with the conserved Pro and Leu of ICL2 of the receptor universal Ga activation mechanism mapped on a Ga protein. Ca position as has been shown for Gas (3sn6) and Gai (ref. 26). Interrupting the contacts of residues are shown as spheres; mutations at green positions cause between H5 and H1 seems to be the trigger for transmitting the signal of spontaneous GDP release by interrupting consensus contacts between GPCR binding to helix H1 (which interacts with GDP and the H-domain.) The conserved residues, thereby ‘mimicking’ the effect of receptor binding to Ga. only conserved residue contact between the H-domain and the G-domain Pink positions have also been reported to cause disease by constitutively that is not in the hinge region is formed by a universally conserved salt bridge activating Ga. Insertion of an Ala4 or Gly5 after the yellow position separate the (H-domain ionic latch) between the very N-terminal end of HG of the H5 transmission and interface module, thereby allowing GPCR binding G-domain (LysG.s5hg.1) and the loop connecting HD and HE in the H domain without triggering GDP release.

G2015 Macmillan Publishers Limited. All rights reserved RESEARCH ANALYSIS

G2015 Macmillan Publishers Limited. All rights reserved ANALYSIS RESEARCH

Extended Data Figure 6 | Helix H5–H1 interaction in Ga provides the residues of H1 in Ga and Ras is also depicted. b, Comparison of the residue allosteric GEF activation mechanism. a, Schematic representation of contact network between topologically equivalent residues in H5 and H1 in the structural motifs on H1 that are shared or unique to Ga and Ras. While the part corresponding inactive GDP-bound states of Ga (1got) and Ras (4q21). of H1 with the phosphate-binding motif is conserved across both protein The weight of the link between SSEs denotes the number of atomic contacts. families, the C-terminal part is conserved only in Ga.H1inGa has three c, Sequence alignments of H1 and H5 of human Ga and Ras paralogues. The additional residues that allow for extensive residue contacts between H1 and sequence alignment was obtained based on cross-referencing the alignments H5. In Ras, these interactions are missing and H5 and H1 are both 3 residues using the structures of Ga and Ras. shorter. The consensus sequence and secondary structure of equivalent

G2015 Macmillan Publishers Limited. All rights reserved