Stereochemical Preferences Modulate Affinity and Selectivity

View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Elsevier - Publisher Connector Structure Article

Stereochemical Preferences Modulate Afﬁnity and Selectivity among Five PDZ Domains that Bind CFTR: Comparative Structural and Sequence Analyses

Jeanine F. Amacher,1 Patrick R. Cushing,1 Lionel Brooks 3rd,2 Prisca Boisguerin,3,4 and Dean R. Madden1,* 1Department of Biochemistry, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA 2Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA 3Institute of Medical Immunology, Charite´ , 10115 Berlin, Germany 4Centre de Recherches de Biochimie Macromoleculaire, CRBM-CNRS, UMR-5237, UM1-UM2, University of Montpellier, Department of Molecular Biophysics and Therapeutics, 34293 Montpellier, France *Correspondence: [email protected] http://dx.doi.org/10.1016/j.str.2013.09.019

SUMMARY binding cleft is the core of the interaction, conferring critical target specificity. As a result, the underlying sequence:affinity PDZ domain interactions are involved in signaling relationships are essential to understand connectivity in PDZ- and trafficking pathways that coordinate crucial mediated protein networks. cellular processes. Alignment-based PDZ binding To identify the binding preferences of individual domains, early motifs identify the few most favorable residues at studies compared the sequences of known binding partners and highlighted the importance of the amino acids at the P0 (extreme certain positions along the peptide backbone. How- 2 ever, sequences that bind the CAL (CFTR-associated C terminus) and P positions (e.g., Songyang et al., 1997). High- throughput screens of phage-display and peptide-array libraries ligand) PDZ domain reveal only a degenerate motif have since revealed more complex ‘‘motifs’’ involving varying that overpredicts the true number of high-affinity in- combinations of up to seven C-terminal residues, consistent teractors. Here, we combine extended peptide-array with stereochemical interactions observed in individual com- motif analysis with biochemical techniques to show plexes (Doyle et al., 1996; Laura et al., 2002; Schultz et al., that non-motif ‘‘modulator’’ residues influence CAL 1998; Skelton et al., 2003; Stiffler et al., 2007; Tonikian et al., binding. The crystallographic structures of 13 CAL: 2008). Furthermore, although few motifs have been investigated peptide complexes reveal defined, but accommoda- beyond the P6 position, biochemical and structural studies ting stereochemical environments at non-motif posi- have identified affinity contributions and stereochemical con- 10 tions, which are reflected in modulator preferences tacts extending as far as the P position (reviewed in Luck uncovered by multisequence substitutional arrays. et al., 2012). These preferences facilitate the identification of Even with extended motifs, PDZ binding motifs on average constrain fewer than four of the potential interacting residues high-affinity CAL binding sequences and differen- and often accommodate similar residues at a given position (Da- tially affect CAL and NHERF PDZ binding. As a result, vey et al., 2012). As a result, PDZ domains are frequently promis- they also help determine the specificity of a PDZ cuous, binding multiple target proteins and sharing targets with domain network that regulates the trafficking of other PDZ domains. This is illustrated by a set of PDZ proteins CFTR at the apical membrane. that regulate the intracellular trafficking and localization of the cystic fibrosis transmembrane conductance regulator (CFTR): the CFTR-associated ligand (CAL) and the NHE3 regulatory fac- INTRODUCTION tor proteins NHERF1 and NHERF2 (Cheng et al., 2002; Guerra et al., 2005; Wolde et al., 2007). Consistent with shared target If each residue is uniquely specified, a decapeptide can encode specificity, sequence alignments revealed overlapping motifs more than 1013 distinct sequences, providing a versatile and at the P0 and P2 positions, and sequence optimization in the specific mechanism to encode protein:protein interactions. four C-terminal residues led to only 10-fold selectivity for the Indeed, domains recognizing short linear motif (SLiM) peptides CAL PDZ (CALP) domain (Vouilleme et al., 2010). Using a pep- are found throughout the human genome, including the abun- tide-array approach to extend the sequence iteratively toward dant family of PDZ domains first recognized in the proteins the N terminus, we engineered a CALP inhibitor (iCAL36; PSD-95, Dlg, and ZO-1 (Davey et al., 2012; Harris and Lim, sequence ANSRWPTSII) with 170-fold selectivity (Cushing 2001; Lee and Zheng, 2010; Nourry et al., 2003). PDZ target in- et al., 2010; Vouilleme et al., 2010). teractions are governed by a variety of constraints, including Here we characterize the interaction of the CAL PDZ domain local concentration, multidentate binding interfaces, and interac- with these N-terminal (‘‘upstream,’’ i.e., P4 to P9) binding deter- tions with additional proteins (Luck et al., 2012). Nevertheless, minants and with the P1 and P3 side chains. Although not the recognition of a generally C-terminal peptide by the PDZ captured by sequence alignment techniques, they act as binding

AB Figure 1. Sequence Determinants of CAL PDZ Binding and CAL/NHERF Speciﬁcity (A) WebLogo analysis (Crooks et al., 2004) of the top 100 sequences that bound CALP in a 6223HumLib peptide array experiment reveals clear preferences at the P0 and P2 positions, similar to the short, degenerate binding motifs seen with the NHERF PDZ domains (Figure S1A). The C-terminal residue (P0) position is labeled 0; adjacent residues are 1, 2, etc. (B) For each PDZ domain, free energies of binding (kcal/mol) were calculated from the binding afﬁnities determined by FP for ten different iCAL36/CFTR chimeric peptide sequences (Table 1). The number of intersection points in pairwise free-energy plots (Figure S2) were tabulated to reveal rank-order exchanges as a function of sequence.

‘‘modulators’’ for each domain, exerting a modest influence indi- position (e.g., S > E > T for N1P2 and E > S > D for N2P2) are slightly vidually, but a powerful impact collectively. Crystallographic anal- stronger than those observed for CALP. At least in the case of the ysis of 13 new CALP:peptide complexes spanning a wide range of N2P2 domain, the presence of acidic residues is broadly consis- affinities reveals an accommodating pharmacophore binding tent with a P3 preference for Asp reported in earlier phage- model. In-depth stereochemical and substitutional array analysis display studies (Tonikian et al., 2008). However, as for CALP, no provides clues to the identification of preferences that remain hid- strong signatures were observed upstream of P3. den from alignment motifs. These modulatory effects facilitate the identification of high-affinity binding sequences and contribute to Chimeric Peptides Reveal Positive and Negative Affinity inhibitor specificity in the context of CFTR trafficking. Modulators Although our experience suggested that residues outside the RESULTS P0/P2 motif positions significantly enhanced binding selectivity of iCAL36 for CAL versus NHERF domains, we had not focused Sequence Alignments Identify only Core Binding Motifs on CALP affinity per se, which changed relatively little in the later for CAL and NHERF stages of iCAL36 sequence engineering (Vouilleme et al., 2010). Based on the demonstrated contributions of upstream residues Given the lack of upstream binding motifs, we decided to explore to the selectivity of our engineered iCAL36 peptide, we attempted the free-energy contributions of non-motif residues using nested to identify extended binding motifs for the CAL and NHERF PDZ sets of chimeric peptides based on reference sequences with domains. We performed WebLogo analysis of all ten C-terminal both high (iCAL36; Ki =23mM) and low (CFTR; Ki = 420 mM) positions in peptides identified by screening a 6223HumLib affinity for the CALP domain (Amacher et al., 2013). Using fluo- array (Cushing et al., 2008) with CALP or with the NHERF1 and rescence polarization (FP) competition experiments, we deter-

NHERF2 PDZ1 and PDZ2 domains (N1P1, N1P2, N2P2, and mined Ki values for each of the chimeric peptides binding to N2P2, respectively). However, the results (Figure 1A; Figure S1A CALP (Table 1). CALP has modest preferences for an 9ANSR6 9 6 available online) did not highlight any significant upstream sequence over TEEE (iCAL36VQDTRL versus CFTR), Trp over 0 5 preferences outside the previously examined core residues P Val at P (iCAL36QDTRL versus iCAL36VQDTRL), and Gln over Pro 3 4 to P (Roberts et al., 2012; Vouilleme et al., 2010). at P (iCAL36 versus iCAL36Q4). A positive CALP preference for Weak secondary CALP preferences are observed at a few aP1 Arg is also suggested by the 5-fold higher affinity of the 3 positions (e.g., S > E > V at P ; Figure 1A). To test the robust- iCAL36TRL sequenced compared to iCAL36L and the approxi- ness of such signatures, we calculated motifs using a varying mate equivalence of the Ser/Thr motif at P2 (Vouilleme et al., number of input sequences, and observed that both the rank 2010). However, the most significant effect is seen at P3. 5 3 5 3 order and stereochemical nature of the residues changed. For Replacing WPT with VQD in iCAL36VQD3 dramatically 3 5 example, at P , the top ten CALP-binding sequences yielded reduces CALP affinity (Ki > 1 mM). Because neither the P nor the pattern E > S > T, whereas the top 50 sequences yielded P4 substitutions account for such a large effect, we synthesized 3 S>I>Q(Figure S1B). Furthermore, while the top ten CALP-bind- a CFTR peptide with a single substitution at P . CFTRT3 1 ing sequences show modest signals, e.g., for Arg at P or Ser at (TEEEVQTTRL; Ki =18mM) binds 23-fold more tightly than the P7, these grow progressively weaker or shift with the addition of native CFTR sequence (Table 1). more sequences (Figures 1A and S1B). To evaluate the possibil- To test the hypothesis that non-motif residues also affect ity of correlated binding requirements across multiple ligand po- NHERF1 and NHERF2 affinities, we measured the binding con- sitions, we reprobed our top 100 sequences using the MUltiple stants of the chimeric peptides for each of the four PDZ domains Specificity Identifier (MUSI) program, which uses both single (N1P1, N1P2, N2P2, and N2P2, respectively) in FP displacement and multiple position weight matrices to determine specificity assays (Table 1). The NHERF PDZ domains bind the CFTR and patterns (Kim et al., 2012a). Overall, the results again confirm iCAL36 peptides with high affinity for CFTR (Ki <2mM) and low the highly degenerate CALP binding motif originally seen at the affinity for iCAL36 (Ki > 1 mM; Amacher et al., 2013). Among 0 2 1 P and P positions (Figure S1C). these domains, substitutions at P (iCAL36TRL versus iCAL36L), 3 4 6 9 Similar patterns are observed for the NHERF domains (Fig- P (CFTRT3), P (iCAL36Q4), and P P (CFTR versus 3 ure S1A). In some cases, secondary preferences seen at the P iCAL36VQDTRL) each affect individual Ki values.

Table 1. FP Binding Afﬁnities of CALP and NHERF PDZ Domains for iCAL36/CFTR Chimeric Peptides

Ki (mM) Peptidea Sequenceb CALP CALP-E317A N1P1 N1P2 N2P1 N2P2 iCAL36TRL ANSRWPTTRL 4.5 ± 0.8 2.2 ± 1.2 8.0 ± 1.4 110 ± 10 8.7 ± 3.1 16.5 ± 3.9 iCAL36Q4 ANSRWQTSII 14.8 ± 3.5 ND 280 ± 100 >1,000 300 ± 100 450 ± 150

CFTRT3 TEEEVQTTRL 18 ± 5.5 15.1 ± 2.0 0.76 ± 0.51 4.6 ± 1.6 1.3 ± 0.4 0.45 ± 0.25 iCAL36c ANSRWPTSII 22.6 ± 8.0 15.1 ± 2.1 >1,000 >1,000 >1,000 >1,000 c iCAL36L ANSRWPTSIL 23.6 ± 2.2 ND 230 ± 100 >1,000 76.8 ± 1.6 530 ± 120 iCAL36QDTRL ANSRWQDTRL 56.2 ± 5.6 16.2 ± 3.5 1.5 ± 0.2 6.7 ± 2.4 3.1 ± 1.4 1.1 ± 0.6 iCAL36VQDTRL ANSRVQDTRL 230 ± 30 140 ± 40 2.5 ± 0.3 6.5 ± 2.3 3.8 ± 1.4 0.94 ± 0.42 CFTRc TEEEVQDTRL 420 ± 80 64.7 ± 1.8 0.47 ± 0.2 1.8 ± 0.5 0.83 ± 0.33 0.07 ± 0.05 c CFTRI TEEEVQDTRI 490 ± 120 ND 10.7 ± 2.8 39.8 ± 1.0 19.1 ± 2.9 2.5 ± 0.43 iCAL36VQD3 ANSRVQDSII >1,000 770 ± 50 >1,000 >1,000 >1,000 420 ± 120 ND, value was not measured. aSubstitutions noted in subscript; the ﬁnal substituted position is C terminal unless numbered. bBold indicates residue(s) from the CFTR sequence; non-bold indicates residues from iCAL36. cValues previously reported in Amacher et al., 2013.

Finally, to explore the potential of these non-motif differences IPDZ values, several side chain atoms are occluded at least to contribute to PDZ inhibitor selectivity, we developed an 50% upon CALP docking (atoms colored white in Figure 2D). approach to identify substitutions that switched rank order of Of course, contact mapping cannot by itself distinguish affinity. We first plotted the binding free energies for the series between favorable and unfavorable binding relationships. It of peptides along a vertical axis for each domain. When placed also does not capture long-range (e.g., electrostatic) interactions side-by-side for a given pair of PDZ domains, with straight line or shifts in the free energy of the unbound state. Nevertheless, connections between the values for each peptide, rank-order the observed steric interactions are consistent with evidence switches appear as points of intersection (Figure S2). Compared that non-motif residues can affect CALP affinity. to each NHERF PDZ domain, the CALP affinities exhibit 19–25 pairwise affinity crossovers (Figure 1B) that presumably account Defined CALP Binding Sites Interact with Each Residue for the engineered selectivity of iCAL36. Thus, although the CAL Position and NHERF binding motifs fail to highlight preferences at P-1, Given the characteristically shallow peptide-binding pocket and P-3 through P-9, these residues are capable of ‘‘modulating’’ of the CALP domain, we wished to determine whether our both the affinity and selectivity of peptides for these PDZ substituted peptides adopt significantly variable conformations, domains. Furthermore, despite high sequence identity (58%– which could complicate stereochemical analysis and obscure 72%) between the NHERF PDZ domains, pairwise comparisons modulator preferences in sequence alignments. To address reveal two to six crossovers (Figure 1B), suggesting potential this point, we crystallized and solved the structures of CALP non-motif contributions to selectivity even in this tightly related bound to four chimeric iCAL36/CFTR peptides, selected to cover cluster. a range of affinities: iCAL36TRL (Ki = 4.5 mM), iCAL36Q4 (Ki = 14.8 mM), iCAL36QDTRL (Ki = 56.2 mM), and iCAL36VQD3 (Ki > All Six C-Terminal Positions Form Side-Chain- 1 mM). To further explore sequence and conformational space, Dependent Interactions we also crystallized CALP bound to two additional low-affinity Despite the absence of motif signatures at four of the six resi- binding partners with unrelated sequences. HPV E6 viral onco- dues from P0 to P5, each residue directly contacts the surface protein C-terminal peptides have been studied in complex of the CALP binding site, as illustrated in Figure 2A for one of with other PDZ domains with nuclear magnetic resonance and the two protomers in our previously published CALP:iCAL36 X-ray crystallography (Charbonnier et al., 2011; Liu et al., 2007; structure (Amacher et al., 2013). Corresponding contact sur- Mischo et al., 2013; Zhang et al., 2007). Here, we generated faces on the peptide are highlighted by comparison of the sol- CALP complexes with the decameric E6 C-terminal peptides vent-accessible surface of the bound peptide in the absence from HPV16 (SSRTRRETQL) and HPV18 (RLQRRRETQV; Jeong

(Figure 2C) and presence of the CALP domain (Figure 2D). et al., 2007; Pim et al., 2012), with Ki values of 340 ± 70 mM and The non-motif contacts also represent potentially significant 490 ± 20 mM, respectively. stereochemical constraints. The fraction of solvent-accessible Structure determination and refinement protocols were de- surface area (SASA) buried serves as an interaction index (IPDZ) signed to avoid phase bias (Table S1; Figures S3 and S4). Final that gauges the extent of side-chain engagement at each posi- data and refinement statistics are shown in Table 2. In general, tion along the backbone (Figure 2B). At the non-motif position the iCAL-based structures are very similar. When the PDZ back- 5 P ,IPDZ values (0.87 and 0.63 for the A and B protomers, bone atoms of each structure are aligned to the CALP:iCAL36 B- respectively) are comparable to those for the motif residue P-2 protomer, they exhibit an average root-mean-square deviation ˚ ˚ (0.73 and 0.64). The other side chains exhibit IPDZ values ranging (rmsd) value of 0.34 ± 0.09 A, with a maximum of only 0.52 A from 0.26 to 0.60, and even for residues with lower side chain (B-protomer of CALP:iCAL36TRL). Visually, the superposition of

A -5 -6 B can extend previous models (e.g., Kang et al., 2003) to define P P -5 0 5 -8 a pharmacophore model in which positions P to P each -4 P 0 5 P P -7 -4 interact with cognate sites S to S (Figure 3C). According to -9 this model, each CALP site exhibits a characteristic set of stereo- P -3 -2 chemical constraints that favor the binding of a corresponding P P -3 -2 subset of complementary amino acid(s). If the constraints are -1 iCAL36 B-protomer stringent, they may be present in enough sequences to appear P 0 P -1 iCAL36 A-protomer Position from C-terminus as part of a motif. If they are more accommodating, they may 0 not, but may still contribute to affinity as modulators. 0 0.2 0.4 0.6 0.8 IPDZ Positional SubAna Analysis Confirms Multiple -6 Modulatory Residues C -5 R D W N -8 As a further test of the idea that each peptide side chain contacts -4 -7 P S a defined stereochemical environment relatively independent -2 S of sequence context, we performed substitutional analysis A-9 -3 (SubAna) using peptide arrays derived from a variety of starting T sequences. Each SubAna array contains a single peptide -1 135° 0 I 135° sequence with all 20 natural L-amino acids individually I substituted at each position along the chain (Boisguerin et al., 2007). We chose to use SubAnas based on ten starting se- Figure 2. Six Peptide Positions Interact with the CALP Binding Cleft quences that encompassed a number of different CALP-binding (A) The van der Waals surface of the CALP protein is shown together with the attributes (sequences are described in detail in Figure S5). bound iCAL36 peptide (stick figure), with individual residues labeled and colored by position. The CALP surface (cyan) is colored at contact sites Except for the artificial iCAL36 sequence, these peptides are according to the closest peptide residue: P0 Ile (red), P1 Ile (purple), P2 Ser derived from endogenous proteins, include known interacting (gray), P3 Thr (orange), P-4 Pro (yellow), P5 Trp (blue), P6 Arg (green), P7 partners (Cheng et al., 2002; Wente et al., 2005), cover a wide Ser (black), P8 Asn (pink), and P9 Ala (forest green). range of affinities for CALP (Cushing et al., 2008), and provide 2 (B) The peptide side-chain interaction index (IPDZ) is shown as a function of a variety of binding motif residue combinations (S/T at the P residue position for the peptides interacting with the A- (gray), and B- (black) position, I/L/V at the P0 position) for comparison. Furthermore, CALP protomers of the asymmetric unit. while the amount of protein bound is measured at a single con- (C and D) The van der Waals surface of the bound conformation of the iCAL36 peptide is shown in the same perspective as in (A, left) and following rotation by centration under nonequilibrium conditions, we can qualitatively 135 around the vertical axis (right). The surface is colored by residue as in (A). compare trends across arrays (Boisguerin et al., 2011). To

To illustrate the surface contacted by CALP, individual atoms with IPDZ values compare positional preferences across the full set of peptides, > 0.5 are colored white in (D), involving residues P0P 5. we separated each SubAna array into six strips, corresponding to the residues from P0 to P5, and then aligned all of the strips these CALP domains (Figure 3A) confirms that the framework of for P0, all of the strips for P1, etc. The resulting collage (Figure 4) the binding cleft is largely independent of ligand sequence. shows the variability of side chain preference(s) at a given posi- This domain superposition also reveals strong conformational tion. Consistent selective preferences at the P0 and P2 posi- similarity of the peptides. We first compared the complexes tions reflect their status as motif residues, largely independent formed by derivatives of the iCAL sequence, including the of peptide context. chimeric peptides described above, as well as earlier structures At the other positions, a more heterogeneous picture of iCAL36 and iCAL36L (Amacher et al., 2013). Despite widely emerges, consistent with the lack of motif constraints observed varying binding affinities, all peptide co-complexes reflect previously (Figure 1B). A residue may be favored for certain canonical Class I PDZ domain-target interactions: the P0 residue sequences, and disfavored for others. For example, a P4 Phe with the carboxylate-binding loop, the P2 residue with His349, residue appears to be strongly favored for CFTR (row 7) and and conserved b sheet interactions (Figure 3B). Pairwise com- other sequences, while disfavored for VIPR2 (row 9). However, parisons of the peptides reveal Ca variations less than 1.1 A˚ for despite this variability, certain trends are observed that apply the P0 to P3 positions, and up to 4.4 A˚ for the P4 and P5 to most or all sequences. For example, our positional arrays positions. These latter variations may partially reflect conforma- confirm positive preferences for a P1 Arg and for a P5 Trp tional changes due to differential crystal packing (Figure S3), but residue. They also reveal negative preferences, including P3 the side chains at each position still contact relatively similar sur- Asp (Figure 4), consistent with the affinity difference observed faces on CALP. Thus, even the substitution of half of the residues between CFTR and CFTRT3 (Table 1). CALP also appears to in the peptide has only a limited effect on the stereochemical disfavor a Gly or Pro at P1 and Pro at P3. environment experienced by the side chain at a given position along the backbone. Single Substitutions at P–5 Modulate Binding Affinity Comparison of iCAL36 with the HPV E6 peptides also revealed To investigate the stereochemistry of preferential accommoda- similar peptide backbone conformations, despite very low tion of a modulator residue, we focused on the impact of single sequence similarity (Figure 3C). Thus, in all of our structures, residue substitutions in iCAL36 at the P5 position. Competition the side chains at a given peptide position are oriented similarly, FP experiments confirm a range of affinities for P5-substituted contacting a common CALP local environment. As a result, we peptides (Table 3). The Leu- and Tyr-substituted peptides bind

Table 2. Data Collection and Reﬁnement Statistics for CALP Co-Crystals with iCAL36/CFTR Chimeric Peptides

iCAL36Q4 iCAL36TRL iCAL36VQD3 iCAL36QDTRL iCAL36QDTRL HPV16 E6 HPV18 E6 CALP CALP CALP CALP CALP-E317A CALP CALP Data collection

Space group (number) P 21 21 21 (19) P 1 (1) P 21 (4) P 63 2 2 (182) P 63 2 2 (182) P 21 (4) P 21 (4) Unit cell dimensions a,b,c (A˚ ) 36.0,47.7,98 30.8,50.2,55.2 36.3,48.8,54.9 61.6,61.6,97.6 61.8,61.8,97.4 33.1,48.4,52 33.2,48.0,52.9 a,b,g () 90,90,90 68.8,75.8,87.9 90,92.8,90 90,90,120 90,90,120 90,101.6,90 90,102.0,90 Resolution (A˚ )a 19.9–1.48 19.3–1.75 19.6–1.90 19.7–1.50 19.1–1.80 19.4–1.80 19.3–1.34 (1.58–1.48) (1.79–1.7) (2.03–1.9) (1.58–1.5) (1.85–1.8) (1.90–1.8) (1.42–1.34) b Rsym (%) 5.9 (45.0) 4.5 (37.2) 6.5 (36.0) 4.9 (61.5) 11.0 (61.7) 5.9 (52.3) 3.8 (26.5) c I/sI 34.60 (6.93) 15.48 (2.60) 15.58 (3.59) 49.46 (6.32) 26.01 (6.53) 20.37 (3.16) 36.69 (7.53) Completeness (%) 99.7 (99.2) 94.8 (91.1) 99.6 (99.5) 99.9 (100.0) 99.9 (100.0) 99.4 (98.1) 94.2 (77.8) Refinement Total no. of reflections 28,816 30,585 15,162 18,179 10,780 14,953 34,482 Reflections in test set 1,460 1,493 763 920 560 758 1,729 d e Rwork /Rfree (%) 18.1/20.9 17.2 /22.1 17.9/22.7 19.7/22.1 18.1/22.1 18.3/22.3 17.2/18.1 Number of atoms Protein 1,467 2,927 1,426 774f 772f 1,431 1,507 Water 180 267 141 69 69 147 258 Ramachandran plotg (%) 96.8/3.2/0/0 95.6/4.4/0/0 98.0/2.0/0/0 97.3/2.7/0/0 100/0/0/0 96.6/3.4/0/0 97.4/2.6/0/0 ˚ 2 Bav (A ) Protein 13.27 18.96 22.38 19.36 16.32 23.40 9.86 Solvent 24.62 28.73 31.27 30.03 27.43 31.13 23.10 Bond length rmsd (A˚ ) 0.006 0.006 0.007 0.007 0.008 0.007 0.007 Bond angle rmsd () 1.078 1.006 1.097 1.169 1.203 1.064 1.167 a ValuesP in parenthesesP areP forP data in the highest-resolution shell. b Rsym = h ijIðhÞIiðhÞj= h iIiðhÞ , where Ii(h) and I(h) values are the i-th and mean measurements of the intensity of reflection h. c = j ð + Þ ð Þj= SigAnoPF -F - s. P d Rwork = kFobsjh jFcalckh= jFobsjh , h e {working set}. e Rfree is calculated as Rwork for the reflections h e {test set}. fIncluding sulfocysteine at position 319 (see Figure S4). gCore/allowed/generously allowed/disallowed.

5 CALP with Ki % 100 mM, whereas the Ala- and His-substituted hydrophobic ledge that interacts with the P Trp in iCAL36 peptides each show R 10-fold losses in affinity. A P5 His side (Amacher et al., 2013). However, despite this shared environ- chain results in the weakest affinity of the set, at 450 mM, compa- ment (Figure 5B; Figure S6), there is no obvious correlation of rable to that of CALP:CFTR binding. the observed range of affinities with stereochemical parameters, All six peptides were crystallized in complex with CALP. Struc- including electrostatic interactions or buried surface area met- tures were determined using the methods described above for rics. For example, although the highest affinity peptide (Trp) is the chimeric peptide complexes (Table S1 and Supplemental associated with the largest contact surface area, the second Experimental Procedures), with data and refinement statistics highest-affinity peptide (Leu) has a smaller contact surface reported in Table 4. For structural comparisons, we aligned the than the significantly weaker His, Phe, or Tyr peptides (Table 3). main-chain atoms of the P5-substituted peptide:CALP com- However, more sophisticated computational approaches have plexes, using CALP:iCAL36 as a reference. Because the super- shown promise in capturing CALP affinity determinants: the position of chimeric peptides had shown slight variations at the K* algorithm used energy minimization of conformational C-terminal core (positions P0 to P3), we first compared the po- ensembles to estimate partition functions for the free and sitions of the Ca atoms in these residues. In this region, the rmsd bound partners. Interestingly, eight of the top 11 K*-designed for the P5-substituted peptides is less than the coordinate error sequences contained a P1 Arg, including kCAL01 (WQVTRV), (Figure 5A). There is a larger degree of variability at the P-4 and which has the highest affinity observed to date for a hexameric P5 positions (% 3A˚ ), but the side chains remain clustered, again sequence (Roberts et al., 2012). suggesting that modulations in binding affinity at the P5 residue are due to local binding effects, not to large-scale alterations of Electrostatic Potential Contributes to Modulator residue interactions (Figure 5A). Preferences at P–1 and P–3 As seen for the chimeric peptides (Figure 3C), all P5 residues Our SubAna comparisons (Figure 4) also revealed prom- interact with S5, comprised of the same His309-Gly310-Val311 inent modulatory preferences at P1 (favoring Arg) and P3

AB310 C His349 S-4 0 -2 0 -2 -4 SS 350 PP P Leu299 2.7 Å 2.8 Å 330 340 2.9 Å 290 Gly300 360 300 -5 320 -1 -3 -5 S PP P SS-1 -3

Figure 3. Peptide Residues Lie in Distinct CALP Binding Sites

(A) Ca traces are shown for CALP protomers from each of four different space-groups (Table S1), following superposition of main-chain atoms on the B-protomer of CALP:iCAL36. Traces are colored by space group: P212121 (black), P1 (green), P21 (cyan), and P6322 (pink). Every tenth residue is indicated with a black circle and residue number. The tight clustering reveals no gross conformational changes in the PDZ domain based on lattice packing, peptide sequence, or affinity. See also Figure S3. (B) Following superposition of PDZ main-chain atoms, chimeric iCAL36/CFTR peptides (stick figures) bind CALP (cartoon, gray) in similar conformations, despite sequence and affinity differences (Table 1). Characteristic PDZ hydrogen bonds are shown as dashed lines, including carboxylate interactions with residues Gly300 299 -2 349 and Leu and a P interaction with His (stick figures). iCAL36 (green carbons) and the highest (iCAL36TRL; cyan) and lowest (iCAL36VQD3; orange) affinity peptides are highlighted for reference. Other peptides (gray) include iCAL36QDTRL, iCAL36Q4, iCAL36L, HPV16 E6, and HPV18 E6. See also Figures S8A–S8G. (C) Following superposition of PDZ main-chain atoms, bound peptide conformations are shown for C-terminal peptides derived from the HPV16 (black) and HPV18 (pink) E6 proteins together with iCAL36 (green). Binding sites are shown schematically for each position (S0,S1, etc.). Noncarbon atoms are colored by element (red = O, blue = N).

(disfavoring Asp), both of which involve charged residues. As a each of the NHERF1/2 PDZ domains (Table 1), confirming their result, we investigated the electrostatic environment of the cor- modest preference for the P3 Asp. responding binding sites. Rendered at 10 kBT/e, the CALP van der Waals surface potential reveals distinct positive and negative Using Modulator Preferences to Identify High-Affinity surfaces on opposite sides of the peptide-binding cleft (Fig- Sequences for CAL ure 5C; Figure S7A). In the CALP:iCAL36QDTRL structure, both Lastly, we wished to evaluate the ability of modulator prefer- the P1 Arg and the P3 Asp residues point toward a ridge of ences to improve our classification of peptide affinities for the negative potential, comprised of the b2, b3, a1, and b4 second- CALP domain. The human deca-C-terminome, the set of C-ter- ary structure elements, as well as connecting loops. This elec- minal decapeptides for all 84,888 protein isoforms listed in the trostatic environment could thus account for both of these UniProt database (UniProt Consortium, 2012), contains a total modulator preferences (Figure 5D). of 57,632 unique sequences—Of these sequences, 2,197 Specifically, the P1 Arg is hydrogen-bonded with Gln322, conform to the CALP binding motif (Figure 6A). However, facilitating a favorable interaction between the positively CALP only weakly binds some of the peptide sequences that charged arginine guanidino group and an area of negative poten- satisfy its motif, including iCAL36VQD3 and the CFTR C terminus tial (Figure 5D). In contrast, the negatively charged P3 Asp side- (Table 1), and others previously tested (Cushing et al., 2008; chain carboxylate is positioned directly adjacent to the negative Roberts et al., 2012). potential surface, creating a Coulombic repulsion, even though To detect candidate high affinity sequences, we applied the the nearest charged residue is > 5 A˚ away. To test the role of positive modulator preferences at P-1 and P-5. Of the 2,197 motif this side chain in determining modulator preferences, we sequences, 141 and 36 contain an Arg and a Trp side chain, expressed and purified a CALP-E317A mutant. Unfortunately, respectively (Figure 6B). Only four sequences contain both of a crystal structure of the mutant (Table 2; Figures S7B and our positive modulator preferences. Although they may not be S7C, and Supplemental Experimental Procedures) showed physiological partners, all have CALP affinities better than or that the region of negative electrostatic potential was only comparable to previously reported endogenous CALP ligands modestly attenuated relative to wild-type (Figure 5C, insets). (Cushing et al., 2008): insulin-like growth factor-binding protein

Correspondingly small (< 7-fold) effects were observed on pep- 3 receptor (TM219_HUMAN; RESHWSRTRL; Ki =29±8mM), tide binding afﬁnities (Table 1). semaphorin 4G isoform 3 (SEM4G_HUMAN; PHSPWSFSRV;

However, a clearer picture is provided by analysis of the Ki =46±14mM), histone H4 transcription factor (HINFP_ NHERF PDZ domains, which do not share the charge distribution HUMAN; CGCSWFATRV; Ki =46±24mM), and proteasome of CALP. For example, in N1P1, the residue corresponding to subunit beta type-9 (B4DZW2_HUMAN; GFETWEGSRL; Ki = CALP-Glu317 is Leu41, a nonpolar residue. Consistently, all four 120 ± 60 mM). Moreover, the proteasomal sequence contains a NHERF1/2 PDZ domains make favorable hydrogen bonds with P3 Gly residue (underlined), which has a negative influence in the P3 Asp, which is in a positive environment (Figures S7D– the positional arrays (Figure 4) and thus may account for its S7G). Indeed, in contrast to the dramatic increase in binding somewhat lower affinity. Likewise, endogenous targets contain- affinity for CALP upon substitution of the P3 Asp to Thr, ing the strongest negative modulator (Asp at P3) are among the

CFTRT3 afﬁnity was 2- to 6-fold weaker than that of CFTR for weakest CALP interactors (Figure 6B). Taken together, our data

0 -1 Figure 4. Position-Specific Peptide Arrays P wtACDE FGH I KLMNPQRSTVWY P wtACDE FGH I K LMNPQRSTVWY Reveal Modulator Preferences for CALP iCAL36 I I SubAna arrays of ten peptides are clustered by BCR V E residue position, with each row labeled according GPR19 V F to the background sequence of the source array MAP4 I S (see key for sequence information). The WT resi- L CAC1D T dues are listed to the right of each row. Residues in GPER V A the initial sequence are substituted by all 20 L- CFTR L R SSR5 L K amino acids, as indicated at top (see text). The 0 2 VIPR2 I V motif positions P and P reveal clear preferences ZO-3 L D for only a few residues. Positive modulator preferences are seen as consistently stronger binding -2 -3 than WT following substitution of e.g., Arg at P1 or P wtACDEFGH I KLMNPQRS TVWY P wtACDEFGH I KLMNPQRSTVWY Trp at P 5, independent of the background pep- iCAL36 S T tide sequence. Negative preferences are also seen BCR T S GPR19 T N as consistently weaker binding compared to WT, 1 MAP4 T E e.g., following substitution of Gly at P or Asp at CAC1D T I P 3. See also Figure S5. GPER S S CFTR T D SSR5 S T VIPR2 S T ZO-3 T A sequences. For example, while the Lin-7 -4 -5 homolog A (Lin7A) PDZ domain exhibits P wtACDEFGH I KLMNPQRSTVWY P wtACDEFGH I KLMNPQRSTVWY a highly selective motif with up to 6 deter- P W iCAL36 minants (Tonikian et al., 2008), fewer than BCR F L GPR19 P P four of these motif constraints are satis- MAP4 Q I fied in any single Lin7A target sequence CAC1D C I (UniProt Consortium, 2012), and no GPER F R CFTR Q V annotated human C-terminal sequence SSR5 Q M matches the complete Lin7A PDZ binding T VIPR2 E motif. In contrast, although the binding ZO-3 P G motif for the CAL PDZ domain is highly Sequence Key degenerate (Figure 1B; Figure S1A), iCAL36: WPTSII GPER: RFSSAV many motif-compatible sequences actu- BCR: LFSTEV CFTR: VQDTRL ally have negligible affinity for the domain GPR19: PPNTFV SSR5: MQTSKL (Roberts et al., 2012). IQETSI TETSVI MAP4: VIPR2: Overall, it appears that PDZ domains CAC1D: ICITTL ZO-3: GPATDL achieve sufficient binding affinity by har- nessing strong free energy contributions confirm the value of modulator preferences in assessing the at a few anchor positions, supplemented by additional stereo- affinity of candidate CALP-binding sequences. chemical interactions that may involve either additional motif constraints or non-motif ‘‘modulator’’ residues. Structural com- DISCUSSION parisons suggest that such modulator preferences reflect relatively loose pharmacophore sites, each of which engages the As illustrated by the weak CALP binding of C-terminal peptides side chain at a particular position (Figure 3C). In fact, although from the known endogenous interactors CFTR and b-catenin only a limited number of other PDZ domains have been co-crys- (Cushing et al., 2008), there is abundant evidence that SLiM pep- tallized with more than one peptide ligand (e.g., Babault et al., tide affinity is not always the sole determinant of in vivo interac- 2011; Grembecka et al., 2006; Kang et al., 2003; Nomme et al., tions, which also depend on molecular and cellular context (Luck 2011), as for CALP, superpositions in these cases also suggest et al., 2012). Nevertheless, PDZ:peptide binding remains a key that each domain imposes a characteristic binding conformation element of specificity. Thus, deciphering the relationship be- on its ligands, and can thus define a set of domain-specific tween peptide sequence and PDZ affinity is a core requirement sequence preferences (Figure 3B; Figure S8). for understanding the networks of interactions formed by these Whether these preferences are readily captured in alignment ubiquitous protein-protein interaction domains. Over the past motifs depends on a number of factors, including the stringency 15 years, studies have identified binding motifs and contacts of the preference itself, as well as the nature of the experiment, from the C terminus to the P10 position (Appleton et al., 2006; limitations of the technique (e.g., number of sequences tested), Luck et al., 2012; e.g., Skelton et al., 2003; Smith and Kortemme, and the baseline affinity of the domain for the peptide backbone. 2010; Thomas et al., 2008). In many cases, preferences are subtle and lead only to marginal However, recent evidence suggests that PDZ sequence: enrichment relative to random sequences. For example, our P5 affinity relationships can involve either more or fewer residues residue analysis suggests that all seven substituted amino acids than indicated by motifs defined by alignments of high-affinity (Trp/Leu/Tyr/Phe/Val/Ala/His) could have been found in peptides

Table 3. CALP Binding Affinities for P-5-Substituted iCAL36 chemical interactions (e.g., Serrano et al., 1990). Whatever the Peptides source of accommodation, alignments are unlikely to achieve a ˚ 2 sufficient enrichment to create an obvious motif constraint. Ki (mM) Contact Surface Area (A ) Negative contributions are particularly difficult to detect robustly Peptide Sequenceb CALP Peptide P 5 Position by alignment, since the expected observation frequency of a iCAL36c ANSRWPTSII 22.6 ± 8.0 533.4 109.5 ‘‘neutral’’ residue is its already low rate of natural occurrence, L-iCAL36 ANSRLPTSII 68 ± 20 492.2 73.9 associated with a correspondingly high level of random Y-iCAL36 ANSRYPTSII 85.3 ± 1.5 518.2 99.6 variability. F-iCAL36 ANSRFPTSII 110 ± 20 506.9 86.6 Our data (Figure 4) suggest that multisequence substitution V-iCAL36 ANSRVPTSII 190 ± 40 485.8 66.4 analysis represents an effective extension of mutagenetic ap- A-iCAL36 ANSRAPTSII 275 ± 80 461.9 42.5 proaches (e.g., Wiedemann et al., 2004) to highlight preferences H-iCAL36 ANSRHPTSII 450 ± 10 505.5 79.2 that are insufficiently robust or sequence-independent to estab- a -5 lish clear motif signatures. SubAna arrays can also readily detect Contact surface area for each P position includes all atoms of the res- 3 idue. negative contributions (e.g., Asp at P ) that have important impli- 3 bSubstitutions at the P5 position are indicated in bold. cations for the biological specificity of PDZ interactions. The P cValues previously reported in Amacher et al., 2013. Asp side chain at the C terminus of CFTR significantly limits its affinity for CAL, which targets CFTR to the lysosome, but not for NHERF1 and NHERF2, which functionally stabilize CFTR at the apical membrane (Table 1). Thus, the CFTR sequence at this po- that were included in the binding motif analysis, as the sition specifically favors interactions with PDZ proteins that pro- substituted iCAL peptides all bind CALP with affinity greater mote endocytic recycling over degradation (Cheng et al., 2002). than that of CFTR, a top 100 sequence hit. This variability may This example underscores the potential for modulator resi- reflect the attenuated specificity of surface-exposed stereo- dues to differentially affect interactions with PDZ domains that

Table 4. Data Collection and Reﬁnement Statistics for CALP Co-Crystals with P–5-Substituted iCAL36 Peptides A-iCAL36 L-iCAL36 V-iCAL36 H-iCAL36 F-iCAL36 Y-iCAL36 Data collection

Space group P 21 21 21 P 21 21 21 P 21 21 21 P 21 21 21 P 21 21 21 P 21 21 21 Unit cell dimensions a,b,c (A˚ ) 36.7,47.8,97.6 36.2,47.6,97.3 36.7,47.9,97.8 36.2,47.8,98.1 36.2,48.1,97.6 36.2,47.8,97.5 a,b,g () 90,90,90 90,90,90 90,90,90 90,90,90 90,90,90 90,90,90 Resolution (A˚ )a 18.5–1.14 34.0–1.2 19.7–1.46 19.5–1.47 19.6–1.2 18.5–1.09 (1.19–1.14) (1.24–1.2) (1.54–1.46) (1.58–1.47) (1.26–1.2) (1.15–1.09) b Rsym (%) 11.6 (39.3) 6.3 (51.5) 6.8 (50.4) 9.3 (50.6) 8.4 (47.1) 7.5 (34.5) c I/sI 10.14 (3.99) 20.98 (4.24) 16.71 (3.05) 21.17 (6.39) 13.28 (3.86) 15.57 (4.98) Completeness (%) 99.4 (99.9) 99.9 (99.9) 98.1 (88.7) 98.3 (97.2) 98.3 (95.7) 98.2 (92.4) Refinement Total no. of reflections 63,011 53,398 30,171 28,702 53,211 70,084 Reflections in the test set 3,144 2,674 1,516 1,446 2,658 3,525 d e Rwork /Rfree (%) 18.5/19.5 18.9/20.1 17.9/19.5 18.0/20.7 18.5/18.8 18.2/18.8 Number of atoms Protein 1,502 1,483 1,472 1,428 1,522 1,520 Water 236 230 180 170 227 220 Ramachandran plot (%)f 94.8/3.4/0/0 96.6/3.4/0/0 97.3/2.7/0/0 96.6/3.4/0/0 97.4/2.6/0/0 97.3/2.7/0/0 ˚ 2 Bav (A ) Protein 12.35 12.27 14.85 9.77 12.20 11.95 Solvent 25.17 23.39 27.11 21.72 24.24 23.52 Bond length rmsd (A˚ ) 0.005 0.006 0.006 0.006 0.006 0.007 Bond angle rmsd () 1.108 1.102 1. 043 1.063 1.107 1.073 a ValuesP in parenthesesP areP forP data in the highest-resolution shell. b Rsym = h ijIðhÞIiðhÞj= h iIiðhÞ , where Ii(h) and I(h) values are the i-th and mean measurements of the intensity of reflection h. c = j ð + Þ ð Þj= SigAnoPF -F - s. P dR = k j j k = j j work Fobs h Fcalc h Fobs h ,he {working set}. e Rfree is calculated as Rwork for the reflections he {test set}. fCore/allowed/generously allowed/disallowed.

Figure 5. Stereochemical Analysis of Modulator Preferences (A) The clustering of CALP-bound P-5-substituted iCAL36 peptides (stick figures) is shown following main-chain superposition of the CALP B-protomers. Peptides are colored by sequence: iCAL36 (W; green carbons), F- (red), Y- (black), H- (yellow), A- (cyan), L- (orange), and V- (purple). Rmsd offsets at the P0P3 residues 4 5 were within coordinate error, while modest variations were observed at the P and P positions (black arrow). The largest difference is 3.2 A˚ , between the Ca of the L- and Ca of the A-substituted peptides. (B) Each substituted P5 side chain (colored stick figure) is shown, as docked against its CALP binding interface (surface, cartoon and stick figure). The residues that contact the P5 residue (His309, Gly310, and Val311) are labeled in the CALP:F-iCAL36 structure. Each panel is labeled with the P5 residue identity. See also Figure S6. (C) The electrostatic potential surface of WT CALP (left) reveals a highly polarized distribution on either side of the peptide binding cleft. The electrostatic potential 3 surface of CALP-E317A (right) shows a reduced, although still negative, interaction surface for the P Asp residue of iCAL36QDTRL (insets). See also Figure S7. 1 3 (D) Electrostatic interactions are shown for P Arg (left) and P Asp (right) residues of the iCAL36QDTRL peptide (stick figure, yellow carbons) bound to CALP 1 314 (gray Ca trace). At the top, nearby charged CALP residues are shown (stick figures, green carbons). The P Arg forms a hydrogen bond (dashed line) with Gln . The closest charged residue for the P3 Asp is Glu317 (d > 5 A˚ ). In the lower panels, the corresponding electrostatic potential surfaces are shown (rendered at 1 3 10 kBT/e) underscoring the negative electrostatic environments of both the P and the P side chains. share overlapping motifs. Recent work by a number of groups domain interaction networks and for the development of specific suggests that PDZ interactions are a finely tuned product of evo- peptide inhibitors of individual nodes. lution operating both on the PDZ domains (Ernst et al., 2009; Kaneko et al., 2011; McLaughlin et al., 2012) and on the target EXPERIMENTAL PROCEDURES C-terminal sequences themselves (Kim et al., 2012b). If degenerate two- or three-residue motifs were sufficient to determine Protein Expression and Purification target affinity for PDZ domains such as CAL or NHERF, such The CALP E317A mutant was generated using the Quikchange Lightning evolution would be limited by a very small set of combinatorial (Agilent Technologies) protocol. The wild-type (WT) and mutant CALP possibilities, as would the engineering of highly selective (Amacher et al., 2011) and NHERF PDZ domains (Cushing et al., 2008) were sequences. While individual modulator residues may contribute expressed and purified as previously described. For fluorescence polarization (FP) experiments, proteins were dialyzed overnight into storage buffer only modestly to target affinity, collectively they have the poten- (150 mM NaCl; 0.02% [w/v] sodium azide, 25 mM sodium phosphate pH tial to significantly differentiate PDZ specificity profiles. As a 8.0, 0.2 mM tris[2-carboxyethyl]phosphine [TCEP], 5% [v/v] glycerol). For result, characterizing these previously cryptic binding determi- CALP crystallization, the protein was stored in GF2 buffer (25 mM Tris, nants may be critical both for a fuller understanding of PDZ 150 mM sodium chloride, 0.02% [w/v] sodium azide, 0.1 mM TCEP, 0.1 mM

A A1BG_HUMAN Alpha-1B-glycoprotein Figure 6. Impact of Binding Motif and Proteins and protein fragments ATX2L_HUMAN Ataxin-2-like-protein Modulator Residues on PDZ Binding Speci- in the human proteome A4GAT_HUMAN Lactosylceramide 4... ficity 1A01_HUMAN HLA class I histoc... (A) Evaluation of the number of distinct C-terminal . decamers in the human proteome that satisfy the . 84,888 CALP binding motif (red residues in example . sequences listed at right). (B) A schematic pharmacophore model of the series of side chain-binding sites along the CAL Unique C-terminal decamers PDZ cleft is shown, some of which are identified as binding motif residues by high-throughput analysis (Figure 1A). Positive (green) and negative (red) 57,632 preferences are shown at non-motif sites, together ANO9_HUMAN (SIFSARSTDV) with the frequencies of these residues in the pool PRS38_HUMAN (AVPMTKDTLV) of potential CALP binders. FP displacement titra- F92A1_HUMAN (VWSEITVSLL) tions confirm the expected affinities for represen- Sequences that match CALP binding motif tative peptides that contain positive (left panel, A2A296_HUMAN (PVDQKFQSIV) green) and negative (right, red) modulator resi- . dues, suggesting that modulators can significantly 2,197 . restrict the number of binding partners by adjust- . ing the overall affinity balance of the PDZ:target B Of the sequences that match the CALP binding motif interaction. See also Figure S8H.

36 sequences 96 sequences 141 sequences TRP ASP ARG crystallized, data collected, and structures determined as previously described (Amacher et al., 2011, 2013).

Structure Analysis Electrostatic potential surface calculations and MOTIF MOTIF structure alignments were performed in PYMOL -5 -4 -3 -2-1 0 (DeLano, 2008). The contact surface of each pep- P PPPPP tide residue was depicted using CHIMERA (Pet- tersen et al., 2004). SASA, including contact, was calculated using AREAIMOL (Lee and Richards, high low 1971), part of the CCP4 suite of programs (CCP4, 1994). Contact surface area for each peptide

residue atom was calculated as [SASA]COMPLEX Positive modulators affinity Negative modulators [SASA]ALONE. The Interaction Index (IPDZ)is deﬁned as the fractional change in solvent acces- 0.6 0.6 sible surface area upon ligand binding, using the equation:

SASAcomplex : 0.4 0.4 IPDZ = 1 SASAfree

0.2 iCAL36TRL 0.2 iCAL36TRL Fluorescence Polarization SEM4G CFTR Fluorescence polarization assays were performed TM219 β-catenin as previously described (Cushing et al., 2008; 0 DMSO 0 DMSO Vouilleme et al., 2010). Triplicate experiments -7 -5 -3 -7-5 -3 were performed, using separate protein and Fraction of Reporter Bound Fraction of Reporter Bound log ([Peptide]/M) log ([Peptide]/M) reporter stock solutions in independent peptide

titrations. The Ki values were determined using 4 sequences contain P -5 CFTR and β-catenin a SOLVER-based least-squares fitting algorithm -1 -3 in EXCEL, as described previously (Amacher Trp and P Arg residues contain P Asp residues et al., 2013; Cushing et al., 2008; Vouilleme et al., 2010). The following reporter peptides were used: F*-iCAL36 (F*-ANSRWPTSII) for CALP and E317A CALP; F*-CFTR6 (F*-VQDTRL) ATP, 5% [v/v] glycerol; Amacher et al., 2011), prior to dialysis into crystalliza- for N1P1 and N1P2; and F*-CFTR10 (F*-TEEEVQDTRL) for N2P1 and tion buffer. Peptides for co-crystallization and FP experiments were synthe- N2P2. F* corresponds to a fluorescein group linked to the peptide N terminus sized and purified by the Tufts Core Peptide Facility. via an amino-hexanoic acid linker. The concentrations used for each PDZ

domain were 1.5 3 Kd, where the Kd values of each PDZ domain for their

Crystallization and Structure Determination respective reporter peptides were: CALP (Kd = 0.97 mM), E317A CALP Detailed methods for crystallization and determination of each co-crystal (0.61 mM), N1P1 (0.37 mM), N1P2 (1.08 mM), N2P1 (0.32 mM), and N2P2 structure and accession codes are presented in Table S1. Complexes were (0.23 mM).

Peptide Arrays of SPOT synthesis to analyze PDZ domain regulation mechanisms. The full set of 6223HumLib sequences is described elsewhere (Roberts et al., ChemBioChem 8, 2302–2307. 2012; see Supplemental Experimental Procedures). Position-specific SubAna Boisguerin, P., Madden, D.R., and Volkmer, R. (2011). Target-oriented peptide arrays were generated by juxtaposing data for a given peptide position from a arrays in a palliative approach to cystic fibrosis (CF). In Methods in Protein series of SubAna arrays based on distinct peptide sequences. Individual Sub- Biochemistry, H. Tschesche, ed. (Berlin: de Gruyter), pp. 249–269. Ana arrays were synthesized and binding analyzed as previously described CCP4; Collaborative Computational Project, Number 4. (1994). The CCP4 (Boisguerin et al., 2007; Vouilleme et al., 2010). suite: programs for protein crystallography. Acta Crystallogr. D Biol. Crystallogr. 50, 760–763. Proteome Analysis of Sequences The Python scripting language (Van Rossum, 2009) was used to map all occur- Charbonnier, S., Nomine´ , Y., Ramı´rez, J., Luck, K., Chapelle, A., Stote, R.H., rences of the CALP binding motif and the frequency distribution of individual Trave´ , G., Kieffer, B., and Atkinson, R.A. (2011). The structural and dynamic modulator preferences in the set of all decameric C-terminal peptides present response of MAGI-1 PDZ1 with noncanonical domain boundaries to the bind- 406 in the FASTA file provided by UniProt Knowledgebase release 2012_10 ing of human papillomavirus E6. J. Mol. Biol. , 745–763. (UniProt Consortium, 2012). Cheng, J., Moyer, B.D., Milewski, M., Loffing, J., Ikeda, M., Mickle, J.E., Cutting, G.R., Li, M., Stanton, B.A., and Guggino, W.B. (2002). A Golgi-asso- ACCESSION NUMBERS ciated PDZ domain protein modulates cystic fibrosis transmembrane regulator plasma membrane expression. J. Biol. Chem. 277, 3520–3529. The Protein Data Bank accession numbers for the coordinates and structure Crooks, G.E., Hon, G., Chandonia, J.M., and Brenner, S.E. (2004). WebLogo: a factors reported in this paper are 4JOE, 4JOF, 4JOG, 4JOH, 4JOJ, 4JOK, sequence logo generator. Genome Res. 14, 1188–1190. 4JOP, 4JOR, 4K6Y, 4K72, 4K75, 4K76, and 4K78. Cushing, P.R., Fellows, A., Villone, D., Boisgue´ rin, P., and Madden, D.R. (2008). The relative binding affinities of PDZ partners for CFTR: a biochemical SUPPLEMENTAL INFORMATION basis for efficient endocytic recycling. Biochemistry 47, 10084–10098. Cushing, P.R., Vouilleme, L., Pellegrini, M., Boisguerin, P., and Madden, D.R. Supplemental Information includes Supplemental Experimental Procedures, (2010). A stabilizing influence: CAL PDZ inhibition extends the half-life of eight figures, and one table and can be found with this article online at DF508-CFTR. Angew. Chem. Int. Ed. Engl. 49, 9907–9911. http://dx.doi.org/10.1016/j.str.2013.09.019. Davey, N.E., Van Roey, K., Weatheritt, R.J., Toedt, G., Uyar, B., Altenberg, B., ACKNOWLEDGMENTS Budd, A., Diella, F., Dinkel, H., and Gibson, T.J. (2012). Attributes of short linear motifs. Mol. Biosyst. 8, 268–281. This work was supported by the National Institutes of Health (R01-DK075309 DeLano, W. (2008). The PyMOL molecular graphics system. The PyMOL and R01-HG004499, training grants T32-GM008704 to JFA and T32- molecular graphics system. (Palo Alto, CA: DeLano Scientific LLC). DK007301 to PRC, and IDeA awards P20-RR018787 and P20-GM103413 to Doyle, D.A., Lee, A., Lewis, J., Kim, E., Sheng, M., and MacKinnon, R. (1996). the Dartmouth Lung Biology COBRE), the Cystic Fibrosis Foundation RDP Crystal structures of a complexed and peptide-free membrane protein-binding (STANTO11R0), a Hitchcock Foundation PPG, and by the Deutsche For- domain: molecular basis of peptide recognition by PDZ. Cell 85, 1067–1076. schungsgemeinschaft (DFG grant VO 885/3-2). We thank the entire Madden Ernst, A., Sazinsky, S.L., Hui, S., Currell, B., Dharsee, M., Seshagiri, S., Bader, lab and Dr. James Fraser (UCSF) for advice and discussions, as well as G.D., and Sidhu, S.S. (2009). Rapid evolution of functional complexity in a Dr. Karl Griswold and Dr. Tom Scanlon (Dartmouth) for insight on rendering domain family. Sci. Signal. 2, ra50. electrostatic potential maps. We would also like to thank Edwin Lazo, Dr. Vivian Stojanoff and Dr. Jean Jakoncic (NSLS) for assistance with data Grembecka, J., Cierpicki, T., Devedjiev, Y., Derewenda, U., Kang, B.S., collection and analysis. Bushweller, J.H., and Derewenda, Z.S. (2006). The binding of the PDZ tandem of syntenin to target proteins. Biochemistry 45, 3674–3683. Received: May 15, 2013 Guerra, L., Fanelli, T., Favia, M., Riccardi, S.M., Busco, G., Cardone, R.A., Revised: September 14, 2013 Carrabino, S., Weinman, E.J., Reshkin, S.J., Conese, M., and Casavola, V. Accepted: September 16, 2013 (2005). Na+/H+ exchanger regulatory factor isoform 1 overexpression modu- Published: December 3, 2013 lates cystic fibrosis transmembrane conductance regulator (CFTR) expression and activity in human airway 16HBE14o- cells and rescues DeltaF508 CFTR REFERENCES functional expression in cystic fibrosis cells. J. Biol. Chem. 280, 40925–40933. Harris, B.Z., and Lim, W.A. (2001). Mechanism and role of PDZ domains in Amacher, J.F., Cushing, P.R., Weiner, J.A., and Madden, D.R. (2011). signaling complex assembly. J. Cell Sci. 114, 3219–3231. Crystallization and preliminary diffraction analysis of the CAL PDZ domain in Jeong, K.W., Kim, H.Z., Kim, S., Kim, Y.S., and Choe, J. (2007). Human papil- complex with a selective peptide inhibitor. Acta Crystallogr. Sect. F Struct. lomavirus type 16 E6 protein interacts with cystic fibrosis transmembrane Biol. Cryst. Commun. 67, 600–603. regulator-associated ligand and promotes E6-associated protein-mediated Amacher, J.F., Cushing, P.R., Bahl, C.D., Beck, T., and Madden, D.R. (2013). ubiquitination and proteasomal degradation. Oncogene 26, 487–499. Stereochemical determinants of C-terminal specificity in PDZ peptide-binding Kaneko, T., Sidhu, S.S., and Li, S.S. (2011). Evolving specificity from variability domains: a novel contribution of the carboxylate-binding loop. J. Biol. Chem. for protein interaction domains. Trends Biochem. Sci. 36, 183–190. 288, 5114–5126. Appleton, B.A., Zhang, Y., Wu, P., Yin, J.P., Hunziker, W., Skelton, N.J., Kang, B.S., Cooper, D.R., Devedjiev, Y., Derewenda, U., and Derewenda, Z.S. Sidhu, S.S., and Wiesmann, C. (2006). Comparative structural analysis (2003). Molecular roots of degenerate specificity in syntenin’s PDZ2 domain: 11 of the Erbin PDZ domain and the first PDZ domain of ZO-1. Insights reassessment of the PDZ recognition paradigm. Structure , 845–853. into determinants of PDZ domain specificity. J. Biol. Chem. 281, 22312– Kim, T., Tyndel, M.S., Huang, H., Sidhu, S.S., Bader, G.D., Gfeller, D., and Kim, 22320. P.M. (2012a). MUSI: an integrated system for identifying multiple specificity 40 Babault, N., Cordier, F., Lafage, M., Cockburn, J., Haouz, A., Prehaud, C., Rey, from very large peptide or nucleic acid data sets. Nucleic Acids Res. , e47. F.A., Delepierre, M., Buc, H., Lafon, M., and Wolff, N. (2011). Peptides target- Kim, J., Kim, I., Yang, J.S., Shin, Y.E., Hwang, J., Park, S., Choi, Y.S., and Kim, ing the PDZ domain of PTPN4 are efficient inducers of glioblastoma cell death. S. (2012b). Rewiring of PDZ domain-ligand interaction network contributed to Structure 19, 1518–1524. eukaryotic evolution. PLoS Genet. 8, e1002510. Boisguerin, P., Ay, B., Radziwill, G., Fritz, R.D., Moelling, K., and Volkmer, R. Laura, R.P., Witt, A.S., Held, H.A., Gerstner, R., Deshayes, K., Koehler, M.F., (2007). Characterization of a putative phosphorylation switch: adaptation Kosik, K.S., Sidhu, S.S., and Lasky, L.A. (2002). The Erbin PDZ domain binds

with high affinity and specificity to the carboxyl termini of delta-catenin and specificity. Structure determination and mutagenesis of the Erbin PDZ domain. ARVCF. J. Biol. Chem. 277, 12906–12914. J. Biol. Chem. 278, 7645–7654. Lee, B., and Richards, F.M. (1971). The interpretation of protein structures: Smith, C.A., and Kortemme, T. (2010). Structure-based prediction of the pep- estimation of static accessibility. J. Mol. Biol. 55, 379–400. tide sequence space recognized by natural and synthetic PDZ domains. Lee, H.J., and Zheng, J.J. (2010). PDZ domains and their binding partners: J. Mol. Biol. 402, 460–474. 8 structure, specificity, and modification. Cell Commun. Signal. ,8. Songyang, Z., Fanning, A.S., Fu, C., Xu, J., Marfatia, S.M., Chishti, A.H., Liu, Y., Henry, G.D., Hegde, R.S., and Baleja, J.D. (2007). Solution structure of Crompton, A., Chan, A.C., Anderson, J.M., and Cantley, L.C. (1997). the hDlg/SAP97 PDZ2 domain and its mechanism of interaction with HPV-18 Recognition of unique carboxyl-terminal motifs by distinct PDZ domains. papillomavirus E6 protein. Biochemistry 46, 10864–10874. Science 275, 73–77. Luck, K., Charbonnier, S., and Trave´ , G. (2012). The emerging contribution of Stiffler, M.A., Chen, J.R., Grantcharova, V.P., Lei, Y., Fuchs, D., Allen, J.E., sequence context to the specificity of protein interactions mediated by PDZ Zaslavskaia, L.A., and MacBeath, G. (2007). PDZ domain binding selectivity domains. FEBS Lett. 586, 2648–2661. is optimized across the mouse proteome. Science 317, 364–369. McLaughlin, R.N., Jr., Poelwijk, F.J., Raman, A., Gosal, W.S., and Thomas, M., Dasgupta, J., Zhang, Y., Chen, X., and Banks, L. (2008). Analysis Ranganathan, R. (2012). The spatial architecture of protein function and adap- of specificity determinants in the interactions of different HPV E6 proteins with 491 tation. Nature , 138–142. their PDZ domain-containing substrates. Virology 376, 371–378. Mischo, A., Ohlenschla¨ ger, O., Hortschansky, P., Ramachandran, R., and Tonikian, R., Zhang, Y., Sazinsky, S.L., Currell, B., Yeh, J.H., Reva, B., Held, Go¨ rlach, M. (2013). Structural insights into a wildtype domain of the oncopro- H.A., Appleton, B.A., Evangelista, M., Wu, Y., et al. (2008). A specificity map tein E6 and its interaction with a PDZ domain. PLoS ONE 8, e62584. for the PDZ domain family. PLoS Biol. 6, e239. Nomme, J., Fanning, A.S., Caffrey, M., Lye, M.F., Anderson, J.M., and Lavie, A. UniProt Consortium. (2012). Reorganizing the protein space at the Universal (2011). The Src homology 3 domain is required for junctional adhesion mole- Protein Resource (UniProt). Nucleic Acids Res. 40, D71–D75. cule binding to the third PDZ domain of the scaffolding protein ZO-1. J. Biol. Chem. 286, 43352–43360. Van Rossum, G. (2009). The Python Language Reference, F.L. Drake, Jr., ed. (Godalming, UK: Network Theory Ltd). Nourry, C., Grant, S.G., and Borg, J.P. (2003). PDZ domain proteins: plug and play! Sci. STKE 2003, RE7. Vouilleme, L., Cushing, P.R., Volkmer, R., Madden, D.R., and Boisguerin, P. Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., (2010). Engineering peptide inhibitors to overcome PDZ binding promiscuity. 49 Meng, E.C., and Ferrin, T.E. (2004). UCSF Chimera—a visualization system Angew. Chem. Int. Ed. Engl. , 9912–9916. for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612. Wente, W., Stroh, T., Beaudet, A., Richter, D., and Kreienkamp, H.J. (2005). Pim, D., Bergant, M., Boon, S.S., Ganti, K., Kranjec, C., Massimi, P., Subbaiah, Interactions with PDZ domain proteins PIST/GOPC and PDZK1 regulate intra- V.K., Thomas, M., Tomaic, V., and Banks, L. (2012). Human papillomaviruses cellular sorting of the somatostatin receptor subtype 5. J. Biol. Chem. 280, and the specificity of PDZ domain targeting. FEBS J. 279, 3530–3537. 32419–32425. Roberts, K.E., Cushing, P.R., Boisguerin, P., Madden, D.R., and Donald, B.R. Wiedemann, U., Boisguerin, P., Leben, R., Leitner, D., Krause, G., Moelling, K., (2012). Computational design of a PDZ domain peptide inhibitor that rescues Volkmer-Engert, R., and Oschkinat, H. (2004). Quantification of PDZ domain CFTR activity. PLoS Comput. Biol. 8, e1002477. specificity, prediction of ligand affinity and rational design of super-binding 343 Schultz, J., Hoffmu¨ ller, U., Krause, G., Ashurst, J., Macias, M.J., Schmieder, peptides. J. Mol. Biol. , 703–718. P., Schneider-Mergener, J., and Oschkinat, H. (1998). Specific interactions Wolde, M., Fellows, A., Cheng, J., Kivenson, A., Coutermarsh, B., Talebian, L., between the syntrophin PDZ domain and voltage-gated sodium channels. Karlson, K., Piserchio, A., Mierke, D.F., Stanton, B.A., et al. (2007). Targeting Nat. Struct. Biol. 5, 19–24. CAL as a negative regulator of DeltaF508-CFTR cell-surface expression: an Serrano, L., Horovitz, A., Avron, B., Bycroft, M., and Fersht, A.R. (1990). RNA interference and structure-based mutagenetic approach. J. Biol. Estimating the contribution of engineered surface electrostatic interactions Chem. 282, 8099–8109. to protein stability by using double-mutant cycles. Biochemistry 29, 9343– Zhang, Y., Dasgupta, J., Ma, R.Z., Banks, L., Thomas, M., and Chen, X.S. 9352. (2007). Structures of a human papillomavirus (HPV) E6 polypeptide bound to Skelton, N.J., Koehler, M.F., Zobel, K., Wong, W.L., Yeh, S., Pisabarro, M.T., MAGUK proteins: mechanisms of targeting tumor suppressors by a high-risk Yin, J.P., Lasky, L.A., and Sidhu, S.S. (2003). Origins of PDZ domain ligand HPV oncoprotein. J. Virol. 81, 3618–3626.