Supporting Information

Liban et al. 10.1073/pnas.1619170114 SI Materials and Methods X-Ray Structure Determination. Data were collected from single Protein Expression and Purification. Human DP1 (residues 199 to crystals under cryogenic conditions (100 K) at the Advanced Light 350) and (residues 91 to 198) or (residues 124 to 232) Source (beamline 8.3.1) and Advanced Photon Source (APS) proteins were coexpressed from pET-derived vectors containing (beamline 23IDB). For E2F4–DP1, a molecular replacement so- N-terminal GST and N-terminal 6×His tags, respectively. Escherichia lution was found with Phaser (37) using PDB ID code 2AZE as a coli BL21(DE3) cells cotransformed with both vectors were grown search model. We could not find a molecular replacement solu- – – to an OD600 of 0.6 to 0.8 and induced with 0.5 mM isopropyl β-D- tion for p107C E2F5 DP1 native diffraction data, perhaps be- 1-thiogalactopyranoside (IPTG), with protein expression taking cause of the different angles between the coiled-coil and marked- place overnight at 18 °C. Cells were pelleted at 3,470 × g for box domains (Fig. 2B). Phases were alternatively calculated from 10 min and resuspended in a lysis buffer containing 25 mM Hepes, a single anomalous diffraction (SAD) experiment using a sele- 100 mM NaCl, 5 mM DTT, 5 mM EDTA, 10% glycerol, and nomethionine derivative. SAD data were collected at the sele- 1 mM PMSF (pH 8). A cell homogenizer was used to lyse the cells nium peak energy. Six selenium sites and an initial structural and, following centrifugation at 25,200 × g for 30 min, the model were generated by using the program AutoSol (38) from resulting soluble fraction was purified using glutathione Sepharose within the Phenix (39) interface. Although we expected in- affinity chromatography followed immediately by Ni Sepharose corporation of 12 selenium atoms (6 per molecule in the asym- affinity chromatography. Following the affinity chromatography metric unit), only 8 methionine residues were visible in the columns, the eluate was further purified by anion-exchange chro- structure. High-resolution native data and the initial model from matography and cleaved with GST–TEV (Tobacco Etch Virus) pro- AutoSol were then used for further model building and re- tease overnight at 4°C. To remove cleaved GST and His tags, proteins finement by using Coot (40) and Phenix, respectively. Coordi- CM were again passed over glutathione and nickel Sepharose resin, and nates and structure factors for the E2F4–DP1 and p107C– CM finally the heterodimer was subjected to size-exclusion chromatog- E2F5–DP1 structures were deposited in the Protein Data raphy to achieve the final pure sample. All p107C and RbC con- Bank under ID codes 5TUU and 5TUV, respectively. structs were expressed and purified similar to as described (19). Isothermal Titration Calorimetry. Preceding ITC experiments, Protein Phosphorylation. Phosphorylation of p107C peptides was protein samples were dialyzed overnight at 4 °C in 20 mM Tris, achieved by adding 10% by mass Cdk2–Cyclin A to the p107C 100 mM NaCl, and 1 mM beta-mercaptoethanol. Binding experi- substrate in 25 mM Tris, 150 mM NaCl, 5 mM DTT, 10 mM ments involving p107C were conducted at pH 8.0, and experiments MgCl2, and 1 mM ATP. The kinase was purified away from involving RbC were performed at pH 7.0. Typically, 0.5 to 1 mM phosphorylated p107C using size-exclusion chromatography. peptide was injected into 20 to 40 μMDP1–E2FCM using a Incorporation of phosphate was determined quantitatively using MicroCal VP-ITC. Origin software (www.originlab.com) was used electrospray mass spectrometry. Samples were desalted and di- to calculate binding constants by fitting the data to a one-site rectly injected into the spectrometer. binding model. The error associated with the reported binding constants is the SD calculated from two to four independent Protein Crystallization. Before crystallization trials, the E2F4– binding experiments. DP1CM heterodimer was purified over an SD75 Superdex column in 20 mM Tris (pH 8.0), 100 mM NaCl, and 4 mM Tris(2- Homolog Detection and Phylogenetic Analyses. We used the HMMER carboxyethyl)phosphine (TCEP). Crystal trays were set at 13 mg/mL, 3 (41) package and profile-hidden Markov models (profile-HMMs) using a 1:1 protein-to-buffer ratio at 18 °C. Diffraction-quality built previously for eukaryotic-wide phylogenetic analyses (35) to crystals appeared within 3 to 4 d in 100 mM Mes, 100 mM NaCl, retrieve homologs (E-value threshold of 1E-10) of the –DP and 16% PEG 6000 (pH 6.3). Crystals were harvested, incubated and pocket protein family from 52 genomes with particular in the above condition plus 25% ethylene glycol as a cryoprotec- focus on the Metazoa. Some detected homologs were discarded tant, and flash-frozen in liquid nitrogen before data collection. because they were incomplete or too divergent to be included To crystallize the trimeric protein complex, p107C was mixed in a in our phylogenies. All reliable homologs were aligned using 2:1 molar ratio with purified DP1–E2F5. The complex was isolated MAFFT-L-INS-i (-maxiterate 1000) (42). Resulting alignments using an SD75 Superdex column equilibrated in 20 mM Hepes were masked using probabilistic alignment masking with ZORRO (pH 7.0), 100 mM NaCl, and 4 mM TCEP. Crystal trays were set (43). ProtTest 3 (44) was used to determine the empirical amino at 15 mg/mL and incubated at 18 °C. Diffracting crystals were acid evolutionary model that best fit each of our protein datasets grown in 100 mM Hepes (pH 7.0), 7% PEG 5000, 5% 1-propanol, using several criteria: Akaike information criterion, corrected and 2% 2-propanol. Crystals were flash-frozen in liquid nitrogen Akaike information criterion, Bayesian information criterion, and in a buffer containing the crystallization condition supplemented decision theory (E2F–DP: JTT+I+G+F; Rb: LG+I+G+F). Last, with 25% glycerol. Both native and selenomethionine-containing for each dataset (Datasets S1 and S2) and its best-fitting model, crystals were grown using this process. All crystals in this study we ran different phylogenetic programs that use maximum-likelihood were grown using the sitting-drop vapor-diffusion method. methods with different algorithmic approximations (RAxML To incorporate selenomethionine into E2F5–DP1CM protein and PhyML) to reconstruct the phylogenetic relationships be- complexes, cotransformed BL21 cells were grown in M9 minimal tween proteins. For RAxML (45) analyses, the best-likelihood media to an OD600 of 0.6. The methionine pathway was inhibited tree was obtained from five independent maximum-likelihood through addition of lysine, phenylalanine, and threonine at runs started from randomized parsimony trees using the empirical 100 mg/L and isoleucine and valine at 50 mg/L, and supplemented evolutionary model provided by ProtTest. We assessed branch with 100 mg of selenomethionine per L. Twenty minutes after support via rapid bootstrapping (RBS) with 100 pseudoreplicates. addition of the amino acids, cells were induced with 0.5 mM PhyML 3.0 (46) phylogenetic trees were obtained from five in- IPTG. E2F5–DP1CM containing selenomethionine was otherwise dependent randomized starting neighbor-joining trees using the expressed and purified similar to unlabeled protein. best topology from both NNI (nearest neighbor interchanges)

Liban et al. www.pnas.org/cgi/content/short/1619170114 1of11 and SPR (subtree pruning and regrafting) moves. Nonpara- independent PhyML 3.0 runs. The Rb phylogeny (Fig. S6) and metric Shimodaira–Hasegawa–like approximate-likelihood E2F phylogeny (Fig. S7) were then used to classify orthologs and ratio tests (SH-aLRTs) and parametric à la Bayes aLRTs paralogs to create Fig. S5. Taxa silhouettes in Fig. 5 were obtained (aBayes) were calculated to determine branch support from two from www.phylopic.org.

p107 CTD Construct

994- 994- 1000- 1000- --- MW 1038 1031 1038 1031 (kDa) U BUBUBUBUB 116/97 66 45

31 GST-p107 21.5 14.4 DP1

E2F4

Fig. S1. Coprecipitation assay identifying a minimal p107 CTD fragment that binds E2F4–DP1CM. One hundred micrograms of the indicated GST– p107 fragment and 100 μg of purified E2F4–DP1CM were incubated on ice for 1 h in a reaction volume of 200 μL containing 100 mM NaCl, 25 mM Tris, and 5 mM DTT (pH 8.0). Proteins were affinity-precipitated with glutathione Sepharose resin, washed, and eluted in the binding buffer plus 10 mM glutathione. For each binding reaction, the eluate containing bound proteins (B) was loaded onto an SDS polyacrylamide gel along with a sample of the unbound (U) reaction.

Liban et al. www.pnas.org/cgi/content/short/1619170114 2of11 AB

E2F1 E2F4 E2F5

DP1 DP1 CD

E2F4 E2F5 E2F5 E2F5

DP1 DP1

E E2F

DP1

90°

β5

β5

Fig. S2. Conservation of the E2F–DP CM structure. Comparison of CM β-sandwich domains reveals similar structures. (A) Alignment of E2F4–DP1CM and E2F1– DP1CM structures. The root-mean-square deviation (rmsd) of the Cα position is 0.77 Å. (B) Alignment of E2F5–DP1CM and E2F1–DP1CM structures. Rmsd is 1.24 Å. (C) Alignment of E2F5–DP1CM and E2F4–DP1CM structures. Rmsd is 1.35 Å. (D) Alignment of two different E2F5–DP1CM heterodimers in the asymmetric unit of E2F5–DP1–p107C crystals. Rmsd is 1.14 Å. (E) E2F4–DP1CM is shown with residues that are conserved among E2F paralogs in yellow. For sequence alignment, see Fig. 1B. Most of the 20 residues that are identical among family members are critical for the structural core of the domain. A group of highly conserved residues forms the last E2F strand (β5) and the preceding loop. Their side chains form a surface on the edge of the sandwich opposite the edge that binds pocket proteins (Fig. 2A). The conservation here suggests that the exposed hydrophobic cleft along this sandwich edge is a potential protein interaction surface common to all E2F family members.

Liban et al. www.pnas.org/cgi/content/short/1619170114 3of11 K1012

R1022 p107994-1031 E2F4-DP1CM

Kd ( M)

WT 1.1 ± 0.2

I1021A 33 ± 6 I1021 I1021M 2.5 ± 0.1

K1012E 1.6 ± 0.2

R1022C 3.4 ± 0.1

R1022H 2.7 ± 0.4

Fig. S3. Cancer-associated mutations in p107C. We used the Cancer Genome Atlas (https://cancergenome.nih.gov) to identify cancer-associated mutations in p107 and p130 that are localized to the CTD, and we mapped these mutations onto the crystal structure. Several missense mutations have been found at I1021 in p107 (I1092 in p130), which is found on the p107C helix and is buried into the E2F5–DP1CM sandwich core. We found that an I1021A mutation weakens the affinity of p107C for E2F4–DP1CM 30-fold, whereas an I1021M mutation found in colorectal cancer has a relatively modest 2-fold effect. We used E2F4 in our binding measurements because E2F4 is more abundant in cells and expresses well as a recombinant protein. E2F4 and E2F5 are highly conserved in the β3- strand that binds p107C (Fig. 1B), and both bind wild-type p107C with similar affinity (Fig. 3D). Most of the other mutations map to the surface of the p107C strand and helix that does not form the interface with E2F5–DP1CM. We did find that two mutations found in uterine cancer, K1012E in p107 and R1093C in p130 (R1022C in p107), and one mutation found in both colorectal and prostate cancer, R1093H in p130 (R1022H in p107), have two- to threefold effects on the complex affinity. We conclude that these mutations only slightly impair the ability of p107/p130 to bind E2F. However, an interesting alternative possibility, given the localization of these residues to the exposed face of the p107C helix, is that these mutations interfere with other protein interactions. Notably, this region of p107/p130 has been implicated in regulation of protein stability, and mutation of K774 in Drosophila (K1012 in human p107) leads to a de- velopmental defect from loss of regulation of protein levels (23). In addition to E2F binding, the C-terminal helix may mediate protein interactions that control p107/p130 degradation.

Liban et al. www.pnas.org/cgi/content/short/1619170114 4of11 Rb771-928

Kd ( M)

E2F1-DP1CM 0.11 ± 0.05

E2F3-DP1CM 0.15 ± 0.06

E2F4-DP1CM 0.2 ± 0.1

Rb pocket Rb pocket Wild type R467E/K548E

Kd ( M) E2F1 TD 0.07 ± 0.01 13 ± 3

E2F3 1.2 ± 0.1 No binding signal TD

Fig. S4. RbC associates similarly with E2F1–DPCM and E2F3–DPCM, but the Rb pocket domain binds the transactivation domains (TDs) differently. It has been proposed that the RbC association is specific to E2F1 (11, 25, 26). One observation supporting this specificity is that a mutation in the pocket domain (R467E/ K548E) at the transactivation-binding site inhibits Rb complex formation in cell extracts with E2F3 but not E2F1 (25, 26). We measured the affinity of RbC771–928 for E2F3–DP1CM and found that it is similar to the RbC affinity for E2F1–DP1CM and E2F4–DP1CM that we previously reported (Fig. 1C) (19). We considered that the Rb–E2F1 specificity observed in coimmunoprecipitation experiments in cell extracts is due to differences in other interactions outside the C terminus.For example, the E2F1 and TDs bind the Rb pocket domain with significantly higher affinity than other E2Fs (27). We measured the affinity of the R467E/ K548E Rb mutant for both the E2F1 and E2F3 TDs by ITC. Measurements of the Rb pocket domain (residues 380 to 787) mutant for the E2F1 TD (residues 409 to 426) and E2F3 (residues 432 to 449) are reported here. Wild-type measurements were previously reported (27). We find that whereas the mutations abolish E2F3 binding completely, we still observe some weak affinity for E2F1. We suggest that such differences in transactivation domain binding and not E2F–DPCM binding explain the preference of Rb for E2F1 in cells when probed using Rb constructs that contain E2F-binding mutations in the pocket domain (25, 26).We believe that the correct interpretation of the specific affinity previously observed between the pocket domain (R467E/K548E) mutant and E2F1 in cells is that the mutant still has some residual affinity for the E2F1 TD and not that only E2F1 interacts with RbC.

Liban et al. www.pnas.org/cgi/content/short/1619170114 5of11 Fig. S5. Summary of pocket protein and E2F/DP protein families in analyzed genomes. Rows represent different sequenced genomes, which have been organized from premetazoan (green) to metazoan (yellow and red). Columns denote the number of homologs discovered in each genome, where different paralog subgroups were classified using our pocket protein phylogeny (Fig. S6) and E2F/DP phylogeny (Fig. S7) (SI Materials and Methods). Gray rows denote changing names and column positions of paralogs that have arisen from gene duplication at the two major transitions (green to yellow and yellow to red). (Bottom) The gray row reflects the pocket protein and E2F/DP gene names in Homo sapiens.

Liban et al. www.pnas.org/cgi/content/short/1619170114 6of11 Fig. S6. Pocket protein phylogeny. Phylogeny depicts evolutionary relationships between 95 sequences of the pocket protein family in the metazoan and closely related lineages. Columns with the top 25% ZORRO score (980 positions) were used in our alignment. Confidence at nodes was assessed with multiple support metrics using different phylogenetic programs under the LG+I+G+F model of evolution (aBayes and SH-aLRT metrics with PhyML; RBS with RAxML). The tree from RAxML is shown, and colored dots at branches indicate corresponding branch supports. Thick branches indicate significant support by at least two metrics, one parametric and one nonparametric; branch support thresholds are shown in the center of the figure (SI Materials and Methods). For species abbreviations, refer to Fig. S5. aRB, ancestral RB.

Liban et al. www.pnas.org/cgi/content/short/1619170114 7of11 Fig. S7. E2F and DP protein phylogeny. Phylogeny depicts evolutionary relationships between 264 sequences of the E2F and DP protein family in the metazoan and closely related lineages. Columns with the top 10% ZORRO score (488 positions) were used in our alignment. Confidence at nodes was assessed with multiple support metrics using different phylogenetic programs under the JTT+I+G+F model of evolution (aBayes and SH-aLRT metrics with PhyML; RBS with RAxML). The tree from RAxML is shown, and colored dots at branches indicate corresponding branch supports. Thick branches indicate significant support by at least two metrics, one parametric and one nonparametric. Branch support thresholds are shown in the center of the figure (SI Materials and Methods). For species abbreviations, refer to Fig. S5. The fact that Petromyzon marinus E2F1236 forms a well-supported clade with E2F1, E2F2, and E2F3 suggests that the odd location of the E2F6 lineage outside of E2F1236 can be explained by what is called a long-branch attraction artifact (47). Nevertheless, improved sampling of the jawless fish and cartilaginous fish should clarify further the order of duplication events and support during this key evolutionary transition, particularly the relationships at the base of the E2F123 lineages. DEL (DP–E2F–like), ancestral E2F78 in plants.

Liban et al. www.pnas.org/cgi/content/short/1619170114 8of11 Fig. S8. Alignment of pocket protein and E2F sequences. Sequences in the pocket protein C terminus around the structured core and RbCnter are shown. E2F sequences containing the β3-strand in the E2FCM domain are shown. Full organism names can be found in Fig. S5.

Liban et al. www.pnas.org/cgi/content/short/1619170114 9of11 Fig. S9. Sequence alignment of pocket protein sequences that show the emergence of K475 and H555. It has been demonstrated that K475 and H555 (human Rb numbering) contribute to the higher affinity of Rb for activator E2F transactivation domains compared with p107 (27). Alignment of sequences in the pocket protein pocket domains demonstrates that K475 and H555 first appear in the cartilaginous fish (Callorhinchus milii, Leucoraja erinacea, Squalus acanthias) but are maintained through the human sequence. Full organism names can be found in Fig. S5.

Table S1. Data collection and refinement statistics E2F4–DP1 E2F5–DP1–P107 (SAD) E2F5–DP1–P107

Data collection APS APS APS

Space group I2 P21 P21 Unit-cell dimensions a, b, c, Å 73.57, 37.54, 109.9 62.47, 56.17, 99.30 60.98, 57.34, 99.20 α, β, γ, ° 90, 103.4, 90 90, 93.7, 90 90, 96.4, 90 Resolution range, Å 53.96–2.25 (2.32–2.25) 99.09–4.09 (4.58–4.09) 60.57–2.90 (3.08–2.90) Wavelength, Å 1.02 1.02 1.02 Total observations 46,394 (2,904) 146,148 (41,730) 71,791 (11,049) Unique reflections 13,397 (1,302) 5,606 (1,578) 15,301 (1,529) Completeness, % 95.2 (93.6) 99.9 (99.7) 99.0 (99.2)

Rmerge 5.6 (30.8) 11.9 (21.7) 10.4 (41.3) 20.7 (2.9) 22.6 (14.9) 10.6 (5.5) CC1/2 0.997 (0.679) 0.999 (0.998) 0.992 (0.921) Redundancy 3.5 (2.4) 26.1 (26.4) 4.7 (4.5) Refinement

Rwork/Rfree, % 19.68/23.77 21.67/26.38 Number of atoms 1,976 3,539 Protein 1,928 3,539 Ligand/ion 12 Water 36 B factor (Wilson) 44.60 60.88 Unmodeled residues 5 148 Rmsd bond length, Å 0.010 0.003 Rmsd bond angle, ° 1.21 0.92

Values in parentheses are for the highest-resolution shell.

Liban et al. www.pnas.org/cgi/content/short/1619170114 10 of 11 Other Supporting Information Files

Dataset S1 (TXT) Dataset S2 (TXT)

Liban et al. www.pnas.org/cgi/content/short/1619170114 11 of 11