bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Mutations in disordered regions cause disease by creating endocytosis motifs

Katrina Meyer1, Bora Uyar2, Marieluise Kirchner3, Jingyuan Cheng1, Altuna Akalin2, Matthias Selbach1,4

1 Proteome Dynamics, Max Delbrück Center for Molecular Medicine, Robert-Rössle-Str. 10, D-13092 Berlin, Germany

2 Bioinformatics Platform, Max Delbrück Center for Molecular Medicine, Robert-Rössle- Str. 10, D-13092 Berlin, Germany

3 BIH Core Facility Proteomics, Robert-Rössle-Str. 10, D-13092 Berlin, Germany

4 Charité-Universitätsmedizin Berlin, 10117 Berlin, Germany

Corresponding author: Matthias Selbach Max Delbrück Centrum for Molecular Medicine Robert-Rössle-Str. 10, D-13092 Berlin, Germany Tel.: +49 30 9406 3574 Fax.: +49 30 9406 2394 email: [email protected]

Word count: 2741

bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Abstract

2 Mutations in intrinsically disordered regions (IDRs) of can cause a wide

3 spectrum of diseases. Since IDRs lack a fixed three-dimensional structure, the

4 mechanism by which such mutations cause disease is often unknown. Here, we employ

5 a proteomic screen to investigate the impact of mutations in IDRs on -protein

6 interactions. We find that mutations in disordered cytosolic regions of three

7 transmembrane proteins (GLUT1, ITPR1 and CACNA1H) lead to an increased binding

8 of clathrins. In all three cases, the mutation creates a dileucine motif known to mediate

9 clathrin-dependent trafficking. Follow-up experiments on GLUT1 (SLC2A1), a glucose

10 transporter involved in GLUT1 deficiency syndrome, revealed that the mutated protein

11 mislocalizes to intracellular compartments. A systematic analysis of other known

12 disease-causing variants revealed a significant and specific overrepresentation of

13 gained dileucine motifs in cytosolic tails of transmembrane proteins. Dileucine motif

14 gains thus appear to be a recurrent cause of disease.

1 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

15 Genome sequencing technologies have greatly facilitated the discovery of human

16 protein variants. In many cases it is not known if such variants cause disease, and even

17 when associations have been discovered, determining the mechanisms by which this

18 happens remains a major challenge 1. Most disease-causing missense mutations affect

19 evolutionarily conserved amino acids within structured regions of a protein and

20 destabilize its structure 2,3. However, over 20% of human disease mutations occur in so

21 called intrinsically disordered regions (IDRs) 4. Contrary to the traditional understanding

22 of protein structure and function, it is now clear that IDRs represent a functionally

23 important and abundant part of eukaryotic proteomes 5,6. Yet, since IDRs lack a defined

24 tertiary structure and are typically poorly conserved, the classical structure-function

25 paradigm cannot explain how mutations in IDRs cause disease.

26

27 One way to approach this issue is by analyzing protein-protein interactions (PPIs),

28 which can help to understand how mutations cause disease 7,8. The impact of PPIs on

29 disease is highlighted by the enrichment of missense mutations on interaction interfaces

30 of proteins associated with the corresponding disorders 9. Moreover, comparing the

31 interaction partners of wild-type proteins and their disease-associated variants can

32 reveal disease mechanisms 10,11. We therefore sought to systematically investigate how

33 mutations in IDRs affect PPIs.

34

35 IDRs often harbour short linear motifs (SLiMs) which mediate their function 12,13. These

36 SLiMs typically fall into two major classes -- motifs that mediate interactions with

37 globular domains and/or motifs which harbor posttranslational modification sites 14.

2 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

38 Mutations in IDRs can cause disease by disrupting such motifs or by creating novel

39 ones. A number of examples of such pathogenic changes in motifs have been reported

40 in the literature 15–18. Additionally, computational studies have revealed that pathogenic

41 mutations often target SLiMs 19–21. Despite these insights, however, there has not yet

42 been a systematic experimental analysis of the way disease-causing mutations in IDRs

43 affect interactions. One reason for this is that the small binding area between SLiMs and

44 cognate domains results in low binding affinities, which makes it difficult to study these

45 interactions 22.

46

47 A peptide-based interaction screen on disease-causing mutations

48 We reasoned that quantitative interaction proteomics with immobilized synthetic

49 peptides should enable us to systematically assess the impact of mutations in IDRs.

50 Such peptide pull-downs can maintain specificity even with low affinity interactions 23.

51 We therefore designed a scalable proteomic screen that employs peptides synthesized

52 on cellulose membranes (Fig. 1 A). These membranes carry peptides with a length of

53 15 amino acids that correspond to IDRs in both the wild-type and mutant form.

54 Membranes are incubated with cell extracts to pull-down interacting proteins. After

55 washing, peptide spots are excised and the proteins associated with them are identified

56 and quantified by shotgun proteomics. The main challenge in such interaction screens

57 is to distinguish specific interaction partners from non-specific contaminants 24–27. We

58 addressed this challenge through the use of two levels of quantification. First, two

59 replicates of a pull-down with a specific peptide sequence are compared to all other

60 peptide pull-downs via label-free quantification (LFQ) 28. This LFQ-filter selects proteins 3 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

61 that bind specifically to a given peptide. Secondly, the screen uses SILAC-based

62 quantification 29 to identify differential interaction partners of the wild-type and disease-

63 causing form of a peptide. This requires incubating the two replicates of the membrane

64 with cell lysates that have been differentially SILAC labeled. Wild-type peptide spots

65 from the heavy pull-down are combined with spots from the light pull-down that

66 correspond to the mutant forms of the same peptide, and vice versa. SILAC ratios give

67 a measure of the degree to which each particular mutation affects a specific interaction.

68

69 For the screen we selected 128 mutations in IDRs which are known to cause

70 neurological diseases (Fig. S1, Table S1). We included a peptide from an IDR in the

71 SOS1 protein that contains a proline-rich motif by which it is known to recruit several

72 specific binders via their SH3 domains 23. We analyzed the 2 x 129 pull-down samples

73 by using high-resolution shotgun proteomics in 45 min runs, resulting in a total

74 measurement time of about eight days. Replicates of the same peptide clustered with

75 each other with a median correlation coefficient of 0.87 (Pearson’s R), indicating good

76 reproducibility (Fig. S2). The LFQ data identified nine specific interactors of the SOS1

77 peptide, including four of the five that were previously known (Fig. 1 B). In the

78 corresponding SILAC data, seven of the nine LFQ-specific binders show preferential

79 binding to the wild-type compared to the mutant which has a disrupted proline-rich motif

80 (Fig. 1 C). Importantly, all interactors that are both specific (LFQ) and differential

81 (SILAC) contain SH3 domains. To further assess the relationship between peptide

82 motifs and cognate domains we also analyzed all pull-downs combined. We found that

83 mutations which disrupt a predicted SLiM in the peptide tend to reduce binding of

4 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

84 proteins with cognate domains (Fig. S3). Conversely, the gain of a SLiM in a peptide

85 tends to increase its binding to proteins with matching domains. In summary, these data

86 demonstrate that our screen efficiently detects how mutations in IDRs affect interactions

87 mediated by SLiMs.

88

89 A quantitative interaction network for disease-associated IDRs

90 Individual pull-downs typically led to the identification of ~ 400 proteins. If all of these

91 proteins were considered specific binders, this would correspond to ~ 400 binary

92 interactions per pulldown and a total of more than 100,000 interactions. However, since

93 many proteins are in fact background binders, we applied our quantitative filters with

94 cut-offs derived from the SOS1 control (Fig. 2 A). About half of the 2x128 peptides

95 showed at least one specific binder according to the LFQ-filter (Fig. S4). Applying the

96 LFQ-filter dramatically reduced the total number of interactions to 618. All of these 618

97 interactions are specific for the wild-type and/or the mutant form of a peptide as

98 compared to all other peptides in the screen (Table S2). However, not all of these

99 specific interactions are differential, i.e. affected by the mutation. Therefore, we next

100 applied the SILAC-filter, which led to a final list of 180 differential interactions (Table

101 S3). 111 of these interactions are lost through mutations in the peptide, while 69 are

102 gained. Of note, since pull-downs can also capture indirect binders, not all of these

103 interactions are necessarily direct.

104

105 To provide an overview of the data we displayed all of the differential interactions as a

106 network (Fig. 2 B). This revealed that several wild-type or mutant peptides shared 5 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

107 differential interactors, suggesting functional similarities. Moreover, subnetworks were

108 enriched in specific ontology terms (Fig. S5). The figure highlights two

109 subnetworks that we find particularly interesting. One is enriched in proteins that are

110 connected to clathrin-coated vesicles (see below). The other is enriched in splicing

111 factors (insets in Fig. 2 B) that interact with an IDR corresponding to amino acids 512-

112 526 of Fused in sarcoma (FUS). These interactions are disrupted by the R521C mutant.

113 FUS is an RNA-binding protein that is best known for its role in amyotrophic lateral

114 sclerosis (ALS) 30. The R521C and other mutations in the C-terminal region of the

115 protein are thought to be pathogenic because they disrupt a nuclear localisation signal

116 31. Our data suggests that impaired binding of splicing factors could be an

117 additional/alternative explanation for the pathogenicity of this mutation. This observation

118 is interesting because FUS has already been implicated in splicing 32–34. In fact, the C-

119 terminal region of the protein was found to interact with SRSF10 even before

120 pathogenic mutations in this region were identified 35.

121

122 Recruitment of clathrins through gains of dileucine motifs

123 The finding we considered most interesting is that mutated IDRs from CACNA1H,

124 GLUT1/SLC2A1 and ITPR1 led to specific interactions with clathrins (Fig. 3 A). The

125 corresponding SILAC data revealed that in all three cases clathrin had a strong

126 preference for the mutant form of the peptides over the wild-type (Fig. 3 B). Clathrins

127 mediate endocytosis and intracellular trafficking of transmembrane proteins. They are

128 recruited to the membrane by adaptor proteins that recognize specific cargo, form

129 clathrin-coated pits which are then pinched off by dynamin 36,37. Our finding that 6 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

130 clathrins are specifically recruited to mutated IDRs suggests that the mutations might

131 affect protein trafficking. Intriguingly, the three mutations share several other features

132 beyond an increased affinity for clathrin. First, all three mutations affect transmembrane

133 proteins -- a (CACNA1H) and a glucose transporter (GLUT1) residing

134 in the plasma membrane and an inositol 1,4,5-trisphosphate receptor (ITPR1) in the ER

135 (Fig. 3 C). Second, all three mutations affect disordered regions exposed to the cytosol,

136 which makes them accessible to cytosolic adaptor proteins that mediate clathrin

137 recruitment. Third, all three mutations involve the change of a proline to a leucine

138 residue and thereby result in the appearance of a novel dileucine motif (“LL”) in the IDR

139 (Fig. 3 D). Such motifs are known to recruit clathrin to the plasma membrane or

140 intracellular locations 38. The classical dileucine motif for clathrin-dependent endocytosis

141 is [D/E]XXXL[L/I] 39, but variations of this scheme are common 38,40,41.

142

143 The P485L mutation causes mislocalization of the glucose transporter

144 GLUT1

145 The data presented so far are derived from our artificial in vitro screen based on short

146 peptides. We therefore selected the P485L mutation in GLUT1 for follow-up

147 experiments. This mutation causes GLUT1 deficiency syndrome (G1DS), a disorder

148 characterized by seizures and developmental delays 42,43. GLUT1 facilitates glucose

149 transport into the brain across the blood-brain barrier. GLUT1 mutations in G1DS

150 patients impair this glucose transport which gives rise to the disease phenotype.

151

152 To determine the impact of the P485L mutation on the subcellular localization, we 7 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

153 generated stable inducible cell lines expressing epitope tagged full-length wild-type or

154 mutant GLUT1. While the wild-type protein mainly localized to the plasma membrane,

155 the P485L mutant showed a more vesicular pattern (Fig. 4 A). Next we tested whether

156 the mutant form of GLUT1 was taken up via endocytosis by adding fluorescently labeled

157 transferrin to GLUT1 expressing cells. In contrast to wild-type GLUT1, the mutant

158 extensively colocalized with endocytosed transferrin (Fig. 4 B). To systematically

159 characterize the endocytic compartment in which GLUT1_P485L resides, we used

160 BioID as a proximity labeling method 44. We performed this experiment in a comparative

161 manner for both wild-type and mutant GLUT1 with SILAC-based quantification (see

162 Methods). GLUT1_P485L showed increased colocalization with proteins involved in

163 clathrin-mediated endocytosis, the trans-Golgi network (TGN), retrograde endosome-to-

164 TGN transport and lysosomes (Fig. 4 C). Of note, AP2B1 also showed increased

165 association with mutated GLUT1. This protein is part of the adaptor protein 2 (AP2)

166 complex that recognizes dileucine motifs to mediate endocytosis 38,40,41. Together, these

167 data indicate that the P485L mutation causes internalization of Glut1 via AP2-mediated

168 clathrin-dependent endocytosis.

169

170

171 Gains in dileucine motifs as a general disease mechanism

172 We next wondered if dileucine motif gains might be a more general mechanism by

173 which diseases could arise. To test this, we conducted a search of missense mutations

174 known to cause disease and occurring within disordered cytosolic regions of

175 transmembrane proteins. We found four additional pathogenic dileucine motif gains 8 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

176 (Fig. 5 A, Table S4). These mutations affect different proteins and cause a range of

177 diseases. Since we focused our follow-up experiments on GLUT1, we cannot state with

178 certainty if the other dileucine motif gains also cause protein mistrafficking. Alternatively,

179 the mutations could cause disease by different mechanisms and might just create

180 dileucine motifs as a by-product. If that was the case dileucine motifs should be

181 homogeneously distributed between disease-causing mutations and non-pathogenic

182 polymorphisms. Alternatively, if dileucine motif gains are responsible for pathogenesis,

183 they would be predicted to occur more often in disease than in non-pathogenic variants.

184 Moreover, pathogenic dileucine motif gains should be specific for cytosolic regions of

185 transmembrane proteins since this is where they exert their function. To test these

186 predictions, we compared the frequency of dileucine motif gains that have been found in

187 disease-causing mutations to their appearance in non-pathogenic polymorphisms. A

188 global survey of all disordered regions of the entire proteome revealed that gains in

189 dileucine motif gains occurred at about the same rate in disease and non-pathogenic

190 variants (OR = 0.81, p-value = 0.319, two-sided Fisher’s Exact Test). In the cytosolic

191 tails of transmembrane proteins, however, we observed a 3.7-fold enrichment of

192 dileucine motifs implicated in disease (OR = 3.7, p-value = 0.017, two-sided Fisher’s

193 Exact Test, Fig. 5 B). Disordered extracellular regions of transmembrane proteins do

194 not show this enrichment. We conclude that dileucine motif gains in disordered regions

195 of the cytosolic segments of transmembrane proteins are significantly and specifically

196 enriched in disease. To further assess the significance of this finding, we systematically

197 searched within cytosolic regions of transmembrane proteins for all other annotated

198 SLiMs contained in the ELM database 39. Intriguingly, of all 263 SLiMs tested, the

9 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

199 dileucine motif (LIG_diLeu_1) was the only significantly enriched motif in disease (Fig. 5

200 C).

201

202 Discussion

203 Understanding the functional relevance of protein variants is a major challenge in the

204 era of personal genomics -- especially for missense mutations in disordered regions.

205 Our proteomic screen provides a first systematic experimental analysis of how

206 mutations in disordered regions affect protein-protein interactions. Our results show that

207 the method can (i) capture known interactions, (ii) detect how mutations in SLiMs affect

208 binding of cognate domains and (iii) provide novel mechanistic insights into

209 pathogenesis. The peptide-based method is especially useful for mutations in proteins

210 that are otherwise difficult to study, such as large transmembrane proteins.

211 Nevertheless, it is also important to consider the intrinsic limitations of the approach.

212 Most importantly, in vitro pull-downs do not necessarily reflect physiological interactions

213 in vivo. For example, artifactual binding can occur when combining peptides and

214 proteins that never see each other in the cell 45. Moreover, taking IDRs out of the

215 context of the full length protein and immobilizing them as short peptides can affect

216 interactions. Finally, amino acids within IDRs often carry posttranslational modifications

217 -- a possibility which we did not consider here. In the future, it will be interesting to

218 include modified peptides, especially since mutations often affect modification sites 19,20.

219

220 Our screen revealed that a mutation in the cytosolic tail of the glucose transporter Glut1

221 generates a dileucine motif, which leads to clathrin-dependent endocytosis of the

10 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

222 protein. Consistently, we found that mutant Glut1 mislocalizes from the plasma

223 membrane to endocytic compartments. The finding that pathogenic mutations in

224 cytosolic tails of other transmembrane proteins also create dileucine motifs is

225 particularly intriguing. It is tempting to classify diseases caused by such motif gains as

226 “dileucineopathies”. Whether the other dileucine motif gains cause protein

227 mislocalization similar to Glut1 remains to be investigated. The observation that these

228 mutations are significantly and specifically enriched in cytosolic tails suggests that at

229 least some of them are functional. Why are dileucine motif gains a recurrent cause of

230 disease? We think this is due to a combination of several factors: First, the dileucine

231 motifs are not very complex and can thus arise by chance. Second, proline codons can

232 mutate to leucine codons by changing a single base. Third, proline is overrepresented in

233 IDRs, which are also the sites where the motif needs to be located in order to be

234 functional.

235

236 Bioinformatic studies have established that pathogenic mutations in disordered regions

237 often affect SLiMs 19–21. However, whether these predicted motif changes affect protein-

238 protein interactions is not clear. Moreover, many motifs have not yet been defined and

239 thus escape computational predictions 14. The biochemical approach presented here

240 provides a useful complementary strategy to computational studies. Key advantages of

241 our set-up are its scalability (by using synthetic peptides) and specificity (by employing

242 two quantitative filters). While we focused on Mendelian neurological disorders here, the

243 approach can also be applied to other types of variants such as somatic mutations in

244 cancer and non-pathogenic polymorphisms.

11 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

245 Acknowledgements

246 We would like to thank Michael Krauß, Volker Haucke (both FMP Berlin) and Philip Kim

247 (University of Toronto) for helpful discussions and Russ Hodge (MDC Berlin) for

248 valuable comments on the manuscript. We also thank Daniel Perez Hernandez (MDC

249 Berlin) and Ulf Reimer (JPT) for practical suggestions in setting up the screen. Markus

250 Landthaler (MDC Berlin) kindly provided stable cell lines and the Advanced Light

251 Microscopy Technology Platform (MDC Berlin) helped with microscopy.

252

253 Author contributions

254 K.M. and M.K. established the screen and selected the candidates. K.M. performed

255 most of the wet-lab experiments with contribution from M.K. and J.C.. K.M. processed

256 and analyzed the mass spectrometric data. B.U. carried out most of the remaining

257 bioinformatic analyses (mainly motif based analyses) supervised by A.A.. M.S.

258 conceived and supervised the work and wrote the manuscript with input from all

259 authors.

260

261 Competing financial interest statement 262 263 The authors declare no competing financial interest.

12 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

264 Figure Legends

265 Fig. 1: Quantitative interaction screen with disease-associated disordered regions 266 A, Cellulose membranes with synthetic wild-type (circles) and mutated (stars) peptides 267 are incubated with lysate from light (light red) or heavy (signal red) SILAC labeled cells 268 to pull-down interacting proteins. Spots are excised, corresponding wild-type/mutant 269 pairs are combined and analyzed by quantitative shotgun proteomics. Label free 270 quantification (LFQ) identifies specific interactors by comparing both replicates to all 271 other pull-downs. SILAC identifies differential binders by directly comparing 272 corresponding wild-type and mutant pairs. B, C Results for a SOS1-derived peptide with 273 a proline-rich motif as a benchmark. B, Volcano plot from the LFQ data for wild-type 274 SOS1. Specific binders are shown as red dots. 4 out of 5 known binders (red gene 275 names) are detected. C, SILAC log2 fold changes for differential binders of the wild-type 276 and mutant SOS1 peptide. Proteins with SH3 domains are shown with black outlines. 277 278 Fig. 2: Differential interactors of wild-type and mutant IDRs 279 A, Quantitative filters to select specific and differential interactions. Only a minor fraction 280 of all detected interactions is specific (LFQ filter). Moreover, only a fraction of specific 281 interactions is differential (SILAC filter), i.e. show preferential binding to the wild-type or 282 mutant form of a peptide. Mutation-induced interaction losses are more frequent than 283 mutation-induced gains. 284 B, Network of all differential interactions. Peptides (rectangles) and interacting proteins 285 (ovals) are presented as nodes. The edges indicate preferential binding to the wild-type 286 (blue) or mutant (red) form of a peptide (edge width indicates SILAC ratios). Highlighted 287 subnetworks are enriched in splicing regulators and clathrin-coated vesicle proteins 288 (see text).

13 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

289 290 Fig. 3: Recruitment of clathrins by recurrent gains of dileucine motifs 291 A, Volcano plots for pull-downs with mutated peptides derived from CACNA1H, GLUT1 292 and ITPR1. Specific binders (relative to all other pull-downs) are highlighted in red. All 293 three peptides specifically interact with clathrins. 294 B, Corresponding SILAC plots show that clathrins and related proteins preferentially 295 bind to the mutant form of peptides (relative to the wild-type). 296 C, Graphical representation of the mutation sites. All three mutations affect cytosolic 297 regions of transmembrane proteins. CACNA1H and GLUT1 are located in the plasma 298 membrane and ITPR1 in the ER. 299 D, Aligning the three peptide sequences reveals a common gain of a dileucine motif. 300 301 302 Fig. 4: A mutation-induced dileucine motif gain causes mislocalization of the 303 glucose transporter GLUT1 304 A, Confocal images of GLUT1 localization in Hek cells stably expressing FLAG-GLUT1 305 reveal that the wild-type is localized mainly at the cell membrane while the P485L 306 mutant is mislocalized to cytoplasmic compartments. 307 B, GLUT1 expressing HeLa cells are incubated with fluorescently labeled transferrin for 308 10 min before fixation. Mutant but not wild-type GLUT1 extensively co-localizes with 309 endocytosed transferrin. 310 C, Comparison of proteins co-localizing with wild-type and mutant GLUT1 by proximity 311 labeling (BioID). The upper panel shows SILAC log2 fold changes from two replicate 312 experiments with swapped isotope labels. Proteins are colored according to their typical 313 subcellular compartment (lower panel). The P485L mutant shows increased 314 colocalization with proteins involved in in clathrin-dependent endocytosis (CDE), the 315 trans-Golgi network (TGN), retrograde endosome-to-TGN transport and lysosomes. 316 Figure adapted from 46.

14 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

317 Fig. 5: Mutation-induced gains of dileucine motifs are a significant cause of 318 disease 319 A, A systematic bioinformatic search revealed four additional pathogenic mutations in 320 cytosolic segments of transmembrane proteins that create dileucine motifs. 321 B, Relative frequency of dileucine motif gains in disease mutations and polymorphisms 322 in different disordered regions (IUPred Score >= 0.4) of the proteome. Dileucine motif 323 gain is significantly enriched only in disordered regions of the cytoplasmic domains of 324 transmembrane proteins (two-sided Fischer’s exact test). 325 C, Comparison of gained motifs in disordered regions of cytoplasmic tails of 326 transmembrane proteins reveals the dileucine motif to have the most significant specific 327 enrichment when compared with polymorphisms.

15 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

328 References

329

330 1. Cooper, G. M. & Shendure, J. Needles in stacks of needles: finding disease-causal

331 variants in a wealth of genomic data. Nat. Rev. Genet. 12, 628–640 (2011).

332 2. Subramanian, S. & Kumar, S. Evolutionary anatomies of positions and types of disease-

333 associated and neutral amino acid mutations in the . BMC Genomics 7, 306

334 (2006).

335 3. Yue, P., Li, Z. & Moult, J. Loss of protein structure stability as a major causative factor in

336 monogenic disease. J. Mol. Biol. 353, 459–473 (2005).

337 4. Vacic, V. et al. Disease-associated mutations disrupt functionally important regions of

338 intrinsic protein disorder. PLoS Comput. Biol. 8, e1002709 (2012).

339 5. Wright, P. E. & Jane Dyson, H. Intrinsically disordered proteins in cellular signalling and

340 regulation. Nat. Rev. Mol. Cell Biol. 16, 18–29 (2014).

341 6. Uversky, V. N., Oldfield, C. J. & Dunker, A. K. Intrinsically disordered proteins in human

342 diseases: introducing the D2 concept. Annu. Rev. Biophys. 37, 215–246 (2008).

343 7. Ryan, C. J. et al. High-resolution network biology: connecting sequence with function. Nat.

344 Rev. Genet. 14, 865–879 (2013).

345 8. Wang, P. I. & Marcotte, E. M. It’s the machine that matters: Predicting gene function and

346 phenotype from protein networks. J. Proteomics 73, 2277–2289 (2010).

347 9. Wang, X. et al. Three-dimensional reconstruction of protein networks provides insight into

348 human genetic disease. Nat. Biotechnol. 30, 159–164 (2012).

349 10. Zhong, Q. et al. Edgetic perturbation models of human inherited disorders. Mol. Syst. Biol.

350 5, 321 (2009).

351 11. Hosp, F. et al. Quantitative interaction proteomics of neurodegenerative disease proteins.

16 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

352 Cell Rep. 11, 1134–1146 (2015).

353 12. Van Roey, K. et al. Short linear motifs: ubiquitous and functionally diverse protein

354 interaction modules directing cell regulation. Chem. Rev. 114, 6733–6778 (2014).

355 13. Fuxreiter, M., Tompa, P. & Simon, I. Local structural disorder imparts plasticity on linear

356 motifs. Bioinformatics 23, 950–956 (2007).

357 14. Tompa, P., Davey, N. E., Gibson, T. J. & Babu, M. M. A million peptide motifs for the

358 molecular biologist. Mol. Cell 55, 161–169 (2014).

359 15. Kadaveru, K., Vyas, J. & Schiller, M. R. Viral infection and human disease--insights from

360 minimotifs. Front. Biosci. 13, 6455–6471 (2008).

361 16. Silvis, M. R. et al. A mutation in the cystic fibrosis transmembrane conductance regulator

362 generates a novel internalization sequence and enhances endocytic rates. J. Biol. Chem.

363 278, 11554–11560 (2003).

364 17. Cordeddu, V. et al. Mutation of SHOC2 promotes aberrant protein N-myristoylation and

365 causes Noonan-like syndrome with loose anagen hair. Nat. Genet. 41, 1022–1026 (2009).

366 18. Vogt, G. et al. Gains of glycosylation comprise an unexpectedly large group of pathogenic

367 mutations. Nat. Genet. 37, 692–700 (2005).

368 19. Radivojac, P. et al. Gain and loss of phosphorylation sites in human cancer. Bioinformatics

369 24, i241–i247 (2008).

370 20. Narayan, S., Bader, G. D. & Reimand, J. Frequent mutations in acetylation and

371 ubiquitination sites suggest novel driver mechanisms of cancer. Genome Med. 8, 55

372 (2016).

373 21. Uyar, B., Weatheritt, R. J., Dinkel, H., Davey, N. E. & Gibson, T. J. Proteome-wide analysis

374 of human disease mutations in short linear motifs: neglected players in cancer? Mol.

375 Biosyst. 10, 2626–2642 (2014).

376 22. Neduva, V. & Russell, R. B. Linear motifs: Evolutionary interaction switches. FEBS Lett.

17 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

377 579, 3342–3345 (2005).

378 23. Schulze, W. X. & Mann, M. A novel proteomic screen for peptide-protein interactions. J.

379 Biol. Chem. 279, 10756–10764 (2004).

380 24. Gingras, A.-C. & Raught, B. Beyond hairballs: The use of quantitative mass spectrometry

381 data to understand protein-protein interactions. FEBS Lett. 586, 2723–2731 (2012).

382 25. Gstaiger, M. & Aebersold, R. Applying mass spectrometry-based proteomics to genetics,

383 genomics and network biology. Nat. Rev. Genet. 10, 617–627 (2009).

384 26. Meyer, K. & Selbach, M. Quantitative affinity purification mass spectrometry: a versatile

385 technology to study protein–protein interactions. Front. Genet. 6, (2015).

386 27. Smits, A. H. & Vermeulen, M. Characterizing Protein-Protein Interactions Using Mass

387 Spectrometry: Challenges and Opportunities. Trends Biotechnol. 34, 825–834 (2016).

388 28. Cox, J. et al. Accurate proteome-wide label-free quantification by delayed normalization

389 and maximal peptide ratio extraction, termed MaxLFQ. Mol. Cell. Proteomics 13, 2513–

390 2526 (2014).

391 29. Mann, M. Functional and quantitative proteomics using SILAC. Nat. Rev. Mol. Cell Biol. 7,

392 952–958 (2006).

393 30. Deng, H., Gao, K. & Jankovic, J. The role of FUS gene variants in neurodegenerative

394 diseases. Nat. Rev. Neurol. 10, 337–348 (2014).

395 31. Dormann, D. et al. ALS-associated fused in sarcoma (FUS) mutations disrupt Transportin-

396 mediated nuclear import. EMBO J. 29, 2841–2857 (2010).

397 32. Rogelj, B. et al. Widespread binding of FUS along nascent RNA regulates alternative

398 splicing in the brain. Sci. Rep. 2, 603 (2012).

399 33. Ishigaki, S. et al. Position-dependent FUS-RNA interactions regulate

400 events and transcriptions. Sci. Rep. 2, 529 (2012).

401 34. Qiu, H. et al. ALS-associated mutation FUS-R521C causes DNA damage and RNA splicing

18 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

402 defects. J. Clin. Invest. 124, 981–999 (2014).

403 35. Yang, L., Embree, L. J., Tsai, S. & Hickstein, D. D. Oncoprotein TLS Interacts with Serine-

404 Arginine Proteins Involved in RNA Splicing. J. Biol. Chem. 273, 27761–27764 (1998).

405 36. McMahon, H. T. & Boucrot, E. Molecular mechanism and physiological functions of

406 clathrin-mediated endocytosis. Nat. Rev. Mol. Cell Biol. 12, 517–533 (2011).

407 37. Ferguson, S. M. & De Camilli, P. Dynamin, a membrane-remodelling GTPase. Nat. Rev.

408 Mol. Cell Biol. 13, 75–88 (2012).

409 38. Pandey, K. N. Functional roles of short sequence motifs in the endocytosis of membrane

410 receptors. Front. Biosci. 14, 5339–5360 (2009).

411 39. Dinkel, H. et al. ELM 2016—data update and new functionality of the eukaryotic linear motif

412 resource. Nucleic Acids Res. 44, D294–D300 (2015).

413 40. Staudt, C., Puissant, E. & Boonen, M. Subcellular Trafficking of Mammalian Lysosomal

414 Proteins: An Extended View. Int. J. Mol. Sci. 18, (2016).

415 41. Traub, L. M. Tickets to ride: selecting cargo for clathrin-regulated internalization. Nat. Rev.

416 Mol. Cell Biol. 10, 583–596 (2009).

417 42. De Vivo, D. C. et al. Defective Glucose Transport across the Blood-Brain Barrier as a

418 Cause of Persistent Hypoglycorrhachia, Seizures, and Developmental Delay. N. Engl. J.

419 Med. 325, 703–709 (1991).

420 43. Pascual, J. M. et al. Structural signatures and membrane helix 4 in GLUT1: inferences from

421 human blood-brain glucose transport mutants. J. Biol. Chem. 283, 16732–16742 (2008).

422 44. Roux, K. J., Kim, D. I., Raida, M. & Burke, B. A promiscuous biotin ligase fusion protein

423 identifies proximal and interacting proteins in mammalian cells. J. Cell Biol. 196, 801–810

424 (2012).

425 45. Gibson, T. J., Dinkel, H., Van Roey, K. & Diella, F. Experimental detection of short

426 regulatory motifs in eukaryotic proteins: tips for good practice as well as for bad. Cell

19 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

427 Commun. Signal. 13, (2015).

428 46. Raiborg, C. & Stenmark, H. The ESCRT machinery in endosomal sorting of ubiquitylated

429 membrane proteins. Nature 458, 445–452 (2009).

20 a Defining specific interactions

SILAC Li 1.5 ght

p-value

bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright1.0 holder for this preprint (which was not certified by peer review)SILAC is the He author/funder. All rights reserved. No reuse allowed without permission. avy Log10 -

0.5 LFQ specific

0.0 Log2FC Mut/other peptides Log2FC Wt/other peptides SILAC differential Defining differential interactions Quantitative Peptide array shotgun proteomics

FC Wt/Mut FC

Ratio H/L Log2 H/L Ratio

Ratio L/H Log2FC Wt/Mut

b SNX9 PACSIN3 c PACSIN2 BIN1 NCK1 GRB2 EPS15L1 CD2AP RCN2 SH3 domain 8 1.5 6 GRB2 CRKL CD2AP NCK1 Mut 4 SNX9 PACSIN3 CAPZA1 PACSIN2 CLTC 1.0 2 PFN2

C Wt/ C p value p BIN1

0

-log[10] -log[10] L Log2F L -2 0.5 -4

Ratio H/ Ratio -6

0.0 -8

-10 -8 -6 -4 -2 0 2 4 6 8 10 -8 -6 -4 -2 0 2 4 6 8 Log2FC Wt/other peptides Ratio L/H Log2FC Wt/Mut

Fig.1 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

a Peptide-protein interactions Fig. 2

lost

gained

101,765 618 180 detected specific differential interactions interactions interactions

CCDC137 RPL39P5 C19orf53 MORF4L2 ZNF768 RUVBL2 RUVBL1 CSNK1D LANCL1 b IPO11 regulation of mRNA splicing, SUPT5H TINF2_K280E DDOST via spliceosome (p = 3.9e-12) TUBB2B RPUSD3 CACNA1A_R2135C DARS2_Y629C PCNA SRRM2 SRSF1 SRSF2 UBA1_S547G CSNK1A1 SLTM SSR4 RPN2 TRA2B EARS_E96K SPAST_P293L DDX39B PDIA3 SRSF3 RTL1 AURKA CALU MAP1B YWHAG PPM1G TPM3 FUS_R521C HDGFRP2 ALYREF HIST1H1D SEC63 L1CAM_S1194L EMG1 YWHAH COPA SRSF9 NASP SRP14 SRSF6 CCDC47 RBMX ZC4H2_R213W FAM91A1 CTNNBL1 PDIA6 HNRNPK NXF1 H1FX TRPV4_R315W SFPQ SRSF5 NOC2L RBBP4 TCOF1 MAPRE1 DARS2 SRSF10 HNRNPA1 AP3B1 NAP1L1 FUS_R216C DDX49 HDGF BCKDK CHMP2B_D148Y CASR_R898Q GJB1_R230C YWHAZ G3BP2 HNRNPA0 PANK2_E134G C1QBP HSP90AA1 PDIA4 CALR FTSJ3 HSP90B1 CKB NONO

EARS2_R168G PTMA MAP2 Clathrin-coated vesicle (p = 1.4e-05) PDCD11 P4HB TCF4_R565W NME4 ITPR1_P1059L UBL5 HUWE1 CACNA1H_P648L G3BP TMEM240_P170L POLRMT CHTOP AF1Q

PYCR1 SET PRIC295 ERCC6_P1042L ANP32A MDN1 WDR11 CLTC CLTB SOD1_I152T ZC2HC1A AP2A1 CLTA WWOX_P47T CC2D2A_T1114M XPO1 CC2D1A DYNC1H1 RPUSD4 PHGDH ADAR_K999N CKMT1A

SETX_R1294C RRP12 SLC2A1_P485L ANP32E ZEB2_Q1119R FLOT2 ANP32B DEPD5_S1073R

ACADM DKC1_G402R UBR4 CLN6_R6T DECR1 CLINT1

GJB1_R264C ERLIN2 CD2AP IPO9 FLOT1 SETBP1_I871T GJB1_C280G IF140T_E664K MATR3_T622A MECP2_G161V SMPD1_A196P EGR2_I268N GCH1_P23L TYMP_R44Q CDKL5_N399T CACNA1H_A748V AAAS_Q15K

CAPZA1 IPO7 DNCL1 POLDIP2 TRAF2 SPTAN1 GPHN RCN2 ERH VCP SSBP1 MYCBP2 a CACNA1H P648L GLUT1 P485L ITPR1 P1059L CLTA CLINT1 CLTC CLTC CLTA CLTB CLTA CLTB DECR1

1.5 1.5 1.5

bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1.0 1.0 1.0 -log[10]pvalue -log[10]pvalue -log[10]pvalue

0.5 0.5 0.5

0.0 0.0 0.0

-8 -6 -4 -2 0 2 4 6 8 -8 -6 -4 -2 0 2 4 6 8 -8 -6 -4 -2 0 2 4 6 8 Log2FC Mut/other peptides Log2FC Mut/other peptides Log2FC Mut/other peptides b

-2 -2 -2 CLINT1 DECR1 -4 CLTA -4 -4 CLTC CLTC CLTB -6 -6 CLTB -6 CLTC Ratio H/L Log2FC Wt/Mut Log2FC Ratio H/L Ratio H/L Log2FC Wt/Mut Log2FC Ratio H/L CLTA Wt/Mut Log2FC Ratio H/L -8 -8 -8 CLTA

-8 -6 -4 -2 0 2 4 6 8 -8 -6 -4 -2 0 2 4 6 8 -8 -6 -4 -2 0 2 4 6 8 Ratio L/H Log2FC Wt/Mut Ratio L/H Log2FC Wt/Mut Ratio L/H Log2FC Wt/Mut c d

GLUT1 CACNA1H wild-type mutant CACNA1H PGTGGHGPLSLNSPD PGTGGHGLLSLNSPD GLUT1 TPEELFHPLGADSQV TPEELFHLLGADSQV

L L L P648L ITPR1 GGSEENTPLDLDDHG GGSEENTLLDLDDHG P485L L

2.0 2.0 ITPR1 L L FG H F H P1059L 1.0 G E E D D 1.0 G E G D D bits E PGS GHH G SPG HH P cytoplasm D P S G G G ER E L G E LDS 0.0 TPTGLNT SANDQV TGL S QV L 0.0 TP NT AND 5 P 10 15 LL 5 10 15

Fig. 3 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. a c FLAG-GLUT1 Wt FLAG-GLUT1 P485L 4

2

0 TFRC AP2B1 ST7 AGFG1 APP CLCN7 -2 CPD IGF2R ITM2B ITM2C

Ratio M/H Log2FC Wt/Mut Log2FC Ratio M/H CLINT1 SMAP2

b -4 SNX1 FLAG-GLUT1 Wt Transferrin Overlay -6 -4 -2 0 2 4 6 Ratio H/M Log2FC Wt/Mut

Plasma membrane

Cytosol

CIE Fast recycling

CDE Slow recycling

Recycling endosome Early Endocytic endosome FLAG-GLUT1 P485L Transferrin Overlay degradation Late endosome TGN

MVE Golgi

Endoplasmic reticulum

Lysosome Nucleus

Fig. 4 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. a L1CAM RET CACNA1H

S1149L L L L L P618L P189L L

L P1039L

L RHBDF2 L

ER cytoplasm

b c LIG_diLeu_1 p = 0.017

1.5

3

n.s. 1 2

n.s.

-log10(pval)

Odds Ratio dileucine Ratio Odds gain disease mutations disease polymorphisms vs

n.s. 1 0.5

0 All proteins Transmembrane Cytosplasmic Extracellular proteins domains domains -2 0 2 of of transmembrane transmembrane log2oddsRatio proteins proteins polymorphism disease

Fig. 5 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Materials and Methods

2 Peptide-protein interaction screen

3 Candidate selection

4 Disease mutations in humans were taken from UniProt annotations 1 of Online 5 Mendelian Inheritance in Man, OMIM®. McKusick-Nathans Institute of Genetic Medicine, 6 Johns Hopkins University (Baltimore, MD), https://omim.org/. This dataset consists of 7 experimentally validated missense mutations that contribute to inherited diseases. 8 Inherited disease mutations were downloaded from UniProt 9 (http://www.uniprot.org/docs/humsavar.txt, Release: 2015_07 of 24-Jun-2015, 2). Only 10 mutations that were associated to ‘Disease’ were kept. ‘Unclassified’ mutations or 11 ‘Polymorphisms’ were excluded. The 26,649 disease mutations were further filtered by 12 applying a disorder cut-off. Disorder tendencies of 15 amino acids (AAs) long peptides, 13 with the AA mutated in disease if possible located at position eight, were predicted 14 using IUPred 3 using the ‘SHORT’ profile considering sequential neighborhood of 25 15 residues. IUPred disorder scores above 0.5 denote regions of the proteins that have 16 95% likelihood to be disordered. For filtering, the mean disorder score for all 15AA as 17 well as the mutation position were required to be >0.5. This resulted in 1,878 disease 18 mutations in disordered regions. Next we assigned disease classes to 3,119 different 19 diseases included in the humsavar database by combining a manual approach with 20 automatic annotation with the Human Phenotype Ontology database, HPO 4. We 21 selected 305 mutations causing neurological diseases. After manual inspection, we 22 remained with 128 mutations causing 124 distinct neurological diseases that were used 23 for the peptide-protein interaction screen.

24

25 Experimental setup

26 Peptides of 15 AAs, in total 128 wild-type peptide and 128 related peptides containing 27 the disease causing mutation (256 peptides) plus one control peptide pair were bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

28 synthesized in situ on cellulose membrane using PepTrack™ techniques (JPT Peptide 29 Technologies, Berlin, Germany). Two of those peptide filters were moistened in cell lysis 30 buffer [50 mM HEPES pH 7.6 at 4°C, 150mM NaCl, 1 mM EGTA, 1mM MgCl2, 10% 31 Glycerol, 0.5% Nonidet P-40, 0.05% SDS and 0.25% Sodiumdeoxycholate, 32 supplemented with protease inhibitor (Roche) and benzonase (Merck)]. In order to 33 reduce unspecific binding, the membrane was incubated with 1 mg/ml yeast t-RNA 34 (Invitrogen) for 10 min and then washed twice with cell lysis buffer. The entire peptide 35 libraries were incubated with 15 ml of light or heavy SILAC labeled cell lysate (5 mg/ml) 36 from SH-SY5Y cells for 2h. Membranes were washed three times and air dried.

37 38 Cell culture 39 SH-SY5Y, Hek293 and HeLa cells were cultured under standard cell culture conditions. 40 In brief, cells were cultured in DMEM (life technologies) complemented with 10% fetal 41 calf serum (Pan-Biotech). 42 Cells used for SILAC based experiments were cultured in SILAC DMEM (life 43 technologies) complemented with glutamine (Glutamax, life technologies), Pyruvate (life 44 technologies), non-essential amino acids (life technologies) and 10% dialyzed fetal calf 45 serum (Pan-Biotech). The SILAC DMEM was supplemented with standard L-arginine 46 (Arg0, Sigma-Aldrich) and L-lysine (Lys0, Sigma-Aldrich)(“light”) as in 5. Alternatively, 47 Arg6 and Lys4 (“medium-heavy”) or Arg10 and Lys8 (“heavy”) were added in place of

48 their light counterparts. Cells were cultured at 37°C and 5% CO2.

49

50 Sample preparation for mass spectrometric analysis

51 Single spots were punched out from cellulose membrane with a 2mm diameter ear 52 punch (Carl Roth) and SILAC pairs were placed together in a 96-well plate (Thermo 53 Scientific) prepared with 30 µl of denaturation buffer [6M urea (Sigma-Aldrich), 2M 54 thiourea (Sigma-Aldrich), 10mM HEPES, pH 8]. Samples were reduced by incubating 55 with 10µl of 3.3 mM DTT (Sigma-Aldrich) for 30 min at RT, followed by an alkylation 56 step using 10µl of 18.3 mM iodoacetamide (IAA) (Sigma-Aldrich) for 60 min at RT. The 57 samples were first digested using 1µg endopeptidase LysC (Wako, Osaka, Japan) for 4 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

58 hours. The samples were diluted by adding 100 µl of 50mM ammonium bicarbonate (pH 59 = 8.5), and finally digested with 1µg trypsin (Promega) for 16h. The digestion was 60 stopped by acidifying each sample to pH < 2.5 by adding 10% trifluoroacetic acid 61 solution. The peptide extracts were purified and stored on stage tips according to 6.

62

63 LC-MS/MS analysis

64 Peptides were eluted using Buffer B (80% Acetonitrile and 0.1% formic acid) and 65 organic solvent was evaporated using a speedvac (Eppendorf). Samples were diluted in 66 Buffer A (5% acetonitrile and 0.1% formic acid). Peptides were separated on a 67 reversed-phase column with 45 min gradient with a 250 nl/min flow rate of increasing 68 Buffer B concentration on a High Performance Liquid Chromatography (HPLC) system 69 (ThermoScientific). Peptides were ionized using an electrospray ionization (ESI) source 70 (ThermoScientific) and analyzed on a Q-exactive plus Orbitrap instrument 71 (ThermoScientific). Dynamic exclusion for selected precursor ions was 30 s. The mass 72 spectrometer was run in data dependent mode selecting the top 10 most intense ions in 73 the MS full scans, selecting ions from 300 to 1700 m/z (Orbitrap resolution: 70,000; 74 target value: 1,000,000 ions; maximum injection time of 120 ms). The resulting MS/MS 75 spectra from the Orbitrap had a resolution of 17,500 after a maximum ion collection time 76 of 60 ms with a target of reaching 100,000 ions.

77 78 Data analysis 79 The resulting raw files were analyzed using MaxQuant software version 1.5.2.8 7. 80 Default settings were kept except that ‘match between runs’ and ‘re-quantify’ was turned 81 on. Lys8 and Arg10 were set as labels and oxidation of methionines and n-terminal 82 acetylation were defined as variable modifications. Carbamidomethyl of cysteines was 83 set as fixed modification. The in silico digests of the human Uniprot database (2015-12), 84 a fasta file containing all peptides used for pull-down and a database containing 85 common contaminants were done with Trypsin/P. The false discovery rate was set to 86 1% at both the peptide and protein level and was assessed by in parallel searching a 87 database containing the reversed sequences from the Uniprot database. Following bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

88 statistics and figures were done using R (R version 3.2.1, RStudio Version 1.0.143). 89 The resulting text files were filtered to exclude reverse database hits, potential 90 contaminants, and proteins only identified by site. Missing LFQ-intensity values were 91 imputed with random noise simulating the detection limit of the mass spectrometer 8. To 92 this end, imputed values were taken from a log normal distribution with 0.25× the 93 standard deviation of the measured, logarithmized values, down-shifted by 1.8 standard 94 deviations. In this way, a distribution of quantitative values for each protein across 95 samples is obtained. For determination of specific interactions, two replicated pull- 96 downs for the same peptide were tested against all other pull-downs, excluding the 97 corresponding variant peptide, by the nonparametric Mann–Whitney U test. Resulting p- 98 values and fold-changes (log2 space) have been plotted as volcano plots to determine 99 cut-offs. For cut-offs, an approach was used that employs a graphical formula to 8 10( ) 100 combination a fold-change and p-value cut-off : | | with x: enrichment 𝑐𝑐 101 factor of a protein, p: p-value of the Mann–Whitney−𝑙𝑙𝑙𝑙𝑙𝑙 U test𝑝𝑝 calculated≥ 𝑥𝑥 −𝑥𝑥0 from replicates, x0: 102 fixed minimum enrichment, c: curvature parameter. The curvature parameter c 103 determines the maximum acceptable p-value for a given enrichment x.

104 The parameters c and x0 can be optimized based on prior knowledge of known true and 105 false positives 8. Here, cut-offs were chosen according to known interaction partners of 8,9 106 the SOS1 control peptide . This resulted in x0=0, c=8. 107 This cut-off was applied to all other pull-downs to separate specific binders from 108 background. SILAC ratios were normalized by subtracting the median SILAC ratio of 109 every experiment from all SILAC ratios in that experiment. To define interaction partners 110 that bind differentially to wild-type and mutant peptide, a SILAC cut-off has been 111 defined. For wild-type specific interaction partners, the mean SILAC ratio of the two 112 replicates needed to be >1 and none of the two ratios <0 (mutant specific mean SILAC 113 ratio < -1 and none of the two ratios >0). Resulting figures were modified in Inkscape 114 (0.91). bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

115 Follow-up on GLUT1 116 117 Generation of vectors 118 We purchased SLC2A1 (GLUT1) from Harvard Plasmid repository (HsCD00378964). A 119 stop codon has been added to the gene with the following primers 120 Fw:TCCCAAGTGTAATTGCCAACTTTCTTGTACAAAGTTG, 121 Rev:ATCAGCCCCCAGGGGATG. 122 P485L Mutation has been introduced by changing c.1454 C>T 10 with Q5® Site-Directed 123 Mutagenesis Kit (NEB) Fw:CTGTTCCATCtCCTGGGGGCT, 124 Rev:CTCCTCGGGTGTCTTGTCAC. 125 SLC2A1 has been further cloned into two destination vectors adding either an N- 126 terminal FLAG-HA Tag or a BirA-FLAG Tag with Gateway cloning strategy (Thermo 127 Fisher Scientific).

128 BioID

129 Light, medium-heavy and heavy labelled Hek293 cells have either been mock- 130 transfected (light cells served as a control for background binding) or transiently 131 transfected with BirA-FLAG-GLUT1 wt or mutant (medium-heavy and heavy labeled 132 cells). SILAC labeling allowed for quantitative comparison of proteins that have been 133 proximity labelled by the transiently expressed constructs (Forward experiment: Light - 134 Control, Medium-heavy - wt, Heavy - mut; Label swap experiment: Light - Control, 135 Medium-heavy - mut, Heavy - wt). All three cell lines have been incubated for 24h in cell 136 culture medium containing biotin. BioID experiment has been performed essentially as 137 in 11, with minor adaptations. 138 Mass spec setup and analysis was done similarly as to samples from peptide pull- 139 downs, but digested peptides were separated on a 2,000 mm monolithic column with a 140 100-µm inner diameter filled with C18 material that was kindly provided by Yasushi 141 Ishihama (Kyoto University) using a 4 hr linear gradient with a 300 nl/min flow rate of 142 increasing Buffer B concentration on a High Performance Liquid Chromatography 143 (HPLC) system (ThermoScientific). bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

144 GLUT1 localization

145 Hek293 cells stably expressing either FLAG-tagged GLUT1 wt or P485L mutant 146 exhibiting tetracycline-inducible expression were generated using the Flp-In system 147 developed by Life Technologies according to the manufacturer’s protocol. After 148 induction for 24h in doxycyclin (0.1 µg/ml) containing media, cells have been stained 149 against FLAG (F1804, SIGMA). Nucleus has been stained with DAPI (Sigma).

150 Transferrin uptake 151 HeLa cells seeded on coverslips coated with poly-l-lysine (Sigma) have been transiently 152 transfected with FLAG-GLUT1-wt or -P485L construct. After 24h they were serum- 153 starved for 1 h and used for Transferrin (Tf) uptake. For Tf uptake, cells were treated 154 with 20 μg ml−1 Tf-Alexa568 (life technologies) for 10 min at 37 °C.

155 Fluorescence Microscopy 156 Cells were cultured on poly-l-lysine (Sigma) coated coverslips and fixed with 4% PFA. 157 Standard procedures were used for immunostaining. Images were acquired by confocal 158 microscopy (Transferrin uptake: Zeiss Observer.Z1; GLUT1 localization: Leica, 159 DMI6000). Images were further processed with Fiji 12. 160

161 Analysis of human missense variants and short linear motifs 162 (SLiMs)

163 SLiM regular expression patterns

164 262 annotated SLiM class definitions (regular expression patterns) were downloaded 165 from the Eukaryotic Linear Motif (ELM) database 13. In order to analyse dileucine motifs, 166 an additional motif ‘.LL.’ was added to this compilation and named ‘LIG_diLeu_1’ in 167 order to conserve the naming convention followed by the ELM database. 168 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

169 Pathogenic and non-pathogenic missense variants

170 For the analysis of the missense variants that lead to de novo SLiM instances in protein 171 sequences Uniprot Humsavar dataset (version 12-Apr-2017) 2 was downloaded and 172 filtered for missense variants. Variants that are classified as ‘Disease’ or ‘Polymorphism’ 173 in this dataset were selected. 174

175 Protein domains

176 PFAM domain annotations of proteins were downloaded from the PFAM database 177 (ftp://ftp.ebi.ac.uk/pub/databases/Pfam//releases/Pfam30.0/proteomes/9606.tsv.gz) 14. 178

179 SLiM - PFAM associations

180 PFAM domains and SLiM classes that are known to interact were downloaded from the 181 ELM database (http://elm.eu.org/interactiondomains). 182 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

183 Analysis of gain of SLiMs via missense variants in disordered regions

184 For each reviewed human protein from Uniprot (20191 proteins), the disorder scores of 185 each residue were calculated using IUPred (using the ‘short’ setting). Using a IUPred 186 disorder score cut-off of 0.4, the missense variants in disordered regions were selected. 187 The missense variants that overlap PFAM domains were further filtered out based on 188 the PFAM domain annotations found in the protein feature files downloaded from 189 Uniprot in GFF format (e.g. the link to the GFF file for GLUT1 is 190 http://www.uniprot.org/uniprot/P11166.gff). These protein feature files were also used to 191 detect the transmembrane proteins and their cytoplasmic/extracellular regions. The 192 missense variants in disordered regions and not overlapping any PFAM domains were 193 further classified as variants from 1) the whole proteome, 2) the transmembrane 194 proteins (only those that have annotation of at least one cytoplasmic domain or an 195 extracellular domain, in total 3836 proteins), 3) the cytoplasmic domains of 196 transmembrane proteins, and 4) extracellular domains of transmembrane proteins. For 197 each of these classes, the number of disease-causing variants and the number of 198 polymorphisms that lead to a gain of SLiMs was counted and a two-sided Fisher’s Exact 199 Test was applied to see if there is a statistically significant difference for the likelihood of 200 a given class of SLiMs to be gained via disease-causing variant compared to that of 201 polymorphisms.

202 Peptide-Protein Interaction Network Analysis

203 180 peptide-protein interactions that passed the strict LFQ filter and showed significant 204 differential SILAC ratios between wild-type and mutant forms of the peptides were used 205 to compose a peptide-protein interaction network. The network was visualized using 206 Cytoscape 3.5.1 15. Sub-graphs of the significant interactions were generated using R 207 package igraph (version 1.0.1) 16 (using fastgreedy.community function) and visualized 208 using the R packages ggnetwork 17 and ggplot2 18. Enriched GO terms for each sub- 209 graph were calculated using the topGO R package 19. 210 211 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

212 References 213

214 1. UniProt Consortium. Reorganizing the protein space at the Universal Protein Resource

215 (UniProt). Nucleic Acids Res. 40, D71–5 (2012).

216 2. Famiglietti, M. L. et al. Genetic variations and diseases in UniProtKB/Swiss-Prot: the ins

217 and outs of expert manual curation. Hum. Mutat. 35, 927–935 (2014).

218 3. Dosztányi, Z., Csizmok, V., Tompa, P. & Simon, I. IUPred: web server for the prediction of

219 intrinsically unstructured regions of proteins based on estimated energy content.

220 Bioinformatics 21, 3433–3434 (2005).

221 4. Köhler, S. et al. The Human Phenotype Ontology in 2017. Nucleic Acids Res. 45, D865–

222 D876 (2017).

223 5. Schwanhäusser, B. et al. Global quantification of mammalian control.

224 Nature 473, 337–342 (2011).

225 6. Rappsilber, J., Ishihama, Y. & Mann, M. Stop and go extraction tips for matrix-assisted

226 laser desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in

227 proteomics. Anal. Chem. 75, 663–670 (2003).

228 7. Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized

229 p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol.

230 26, 1367–1372 (2008).

231 8. Keilhauer, E. C., Hein, M. Y. & Mann, M. Accurate protein complex retrieval by affinity

232 enrichment mass spectrometry (AE-MS) rather than affinity purification mass spectrometry

233 (AP-MS). Mol. Cell. Proteomics 14, 120–135 (2015).

234 9. Schulze, W. X. & Mann, M. A Novel Proteomic Screen for Peptide-Protein Interactions. J.

235 Biol. Chem. 279, 10756–10764 (2003).

236 10. Slaughter, L., Vartzelis, G. & Arthur, T. New GLUT-1 mutation in a child with treatment- bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

237 resistant epilepsy. Epilepsy Res. 84, 254–256 (2009).

238 11. Couzens, A. L. et al. Protein interaction network of the mammalian Hippo pathway reveals

239 mechanisms of kinase-phosphatase interactions. Sci. Signal. 6, rs15 (2013).

240 12. Schindelin, J. et al. Fiji: an open-source platform for biological-image analysis. Nat.

241 Methods 9, 676–682 (2012).

242 13. Dinkel, H. et al. ELM 2016--data update and new functionality of the eukaryotic linear motif

243 resource. Nucleic Acids Res. 44, D294–300 (2016).

244 14. Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future.

245 Nucleic Acids Res. 44, D279–85 (2016).

246 15. Shannon, P. et al. Cytoscape: a software environment for integrated models of

247 biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

248 16. Csardi, G. & Nepusz, T. The igraph software package for complex network research.

249 InterJournal, Complex Systems 1695 (2006).

250 17. Briatte, F. ggnetwork: Geometries to Plot Networks with ‘ggplot2’, R package version 0.5.1.

251 CRAN (2016).

252 18. Wilkinson, L. ggplot2: Elegant Graphics for Data Analysis by WICKHAM, H. Biometrics 67,

253 678–679 (2011).

254 19. Alexa, A. & Rahnenfuhrer, J. topGO: Enrichment Analysis for . R package

255 version 2.24.0. CRAN (2016).

256 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 Supplementary figure legends 2 3 Fig. S1: Candidate selection for peptide-protein interaction screen 4 Candidates were selected from missense disease mutations in the Humsavar 5 database (Uniprot) by selecting mutations in disordered regions that cause 6 neurological diseases. Only mutations in disordered regions of a protein were kept. 7 Mutations were further filtered for those that cause neurological diseases and the 8 final candidate set was manually selected. 9 10 Fig. S2: Reproducibility of technical replicates 11 Pearson’s R shows significantly higher correlation of technical replicates than 12 correlations between all pull-downs. 13 14 Fig. S3: SILAC ratio distributions of detected interactions that can be 15 explained by presence of SLiMs in the peptides and PFAM domains in the 16 interaction partners 17 Peptide-protein interactions detected in the screen were classified as ‘gained’ or 18 ‘lost’ according to the following criteria: An interaction is classified as ‘gained’ if the 19 mutant peptide sequence matches a SLiM pattern that doesn’t match the wild-type 20 peptide sequence and the mutant peptide has an interaction partner that contains a 21 compatible PFAM domain to bind that SLiM instance. On the other hand, an 22 interaction is classified as ‘lost’ if the wild-type peptide sequence matches a SLiM 23 pattern that is not matched in the mutant peptide sequence and the wild-type peptide 24 has an interaction partner that contains a compatible PFAM domain to bind that 25 SLiM instance. Gained and lost interactions are further sub-classified as ‘LFQ 26 positive’ and ‘LFQ negative’ depending on whether the peptide-protein interaction 27 has an LFQ value that passes the loose LFQ cut-off. The median SILAC ratio 28 distributions (wild-type versus mutant) of each of these four categories of interactions 29 (‘Gained interactions - LFQ negative’, ‘Gained interactions - LFQ positive’, ‘Lost 30 interactions - LFQ negative’, and ‘Lost interactions - LFQ positive’) are compared 31 with the median SILAC ratio distributions of all detected interactions from the array 32 using a Wilcoxon-Mann-Whitney test. Compared to the background distribution of 33 median SILAC ratios (in red), the gained interactions that pass the LFQ filter (in bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 34 green) show a significant negative skew while the lost interactions that pass the LFQ 35 filter (in purple) show a significant positive skew. 36 37 Fig. S4: Impact of specificity cut-off (LFQ) and differential cut-off (SILAC) on 38 peptide candidates 39 After applying the specificity cut-off (derived from control peptide, see Methods) on 40 all interactions, only about half of the 2x128 peptides showed at least one specific 41 binder according to the LFQ-filter (left pie chart, red). More than one third of all 42 peptide pairs with specific interactors for wild-type and/or mutant show differential 43 interactions after applying the SILAC cut-off (see Methods) (right pie chart, red). 44 45 Fig. S5: GO Term analysis of subclusters of the peptide-protein interaction 46 network composed of significant differential interactions 47 180 peptide-protein interactions that passed the strict LFQ filter and showed 48 significant differential SILAC ratios between wild-type and mutant forms of the 49 peptides were used to compose a peptide-protein interaction network. Communities 50 (sub-graphs) of this network was extracted using fastgreedy.community function of 51 the igraph R package (See Methods). GO term enrichment was calculated for the 52 nodes of each subgraph of the network. Within the subgraphs, the edges 53 (interactions) are colored differentially depending on whether the interaction is with a 54 mutant or wild-type peptide. The peptides are depicted as triangles while the proteins 55 are depicted as circles and they are differentially colored. Below the subgraph figure, 56 a bar plot of the top 10 enriched GO terms are displayed (differentially colored by the 57 GO term category as ‘Biological Process’, ‘Molecular Function’ or ‘Cellular 58 Compartment), in which the x axis shows the log10 p-values corrected for multiple- 59 testing correction using the Benjamini Hochberg method. bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Fig. S1

Mutation database 26,649 Humsavar UniProt annotations of OMIM

Disorder prediction 1,878 IUPred

Disease classification 305 Neurological diseases Human Phenotype Ontology, Manual Manual selection 128 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Fig. S2 p < 2.2e-16

0.9

0.8

0.7 Pearson's R

0.6

0.5

All Replicates bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 3.4e−11

0.052 10 0.00064

0.6

5

0 Median.SILAC.ratio.Wt/Mut

−5

N: 99526 N: 38 N: 15 N: 112 N: 40

All interactions

Fig. S3 Lost interactions (specific (LFQ)) Gained interactions (specific Lost(LFQ)) interactions (unspecific (LFQ)) Gained interactions (unspecific (LFQ)) All peptides bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted MayPeptide 24, 2017. pairs The with copyright specific interactors holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

n = 136 n = 120 n = 76 n = 44

peptides with specific interactors peptide pairs with differential interactions

Fig. S4 bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

RTL1 ● HDGFRP2 A ● ● ● ● MAP1B ● NOC2L AP3B1 TCOF1 ● CCDC47 SRRM2 ● TRPV4_R315W PDIA6 NASP ●SEC63 ●● TINF2_K280E PDIA3 ● DKFZp686J1372 ● PCNA YWHAG CTNNBL1 ● ● CALU TRA1 ● ● RBBP4 ● DDX39B MAPRE1● PPM1G ● ● CALR● ● NAP1L1 YWHAH ● SPAST_P293L ZC4H2_R213W ● TM240_P170L ●HDGF CXB1_R230C P4HB PDIA4 ● ● ● ● HNRNPK SUPT5H PTMA ● ● ● ● YWHAZ CKB HSP90AA1

mutant−edge peptide protein wt−edge

physiological muscle hypertrophy positive regulation of cardiac muscle ce... regulation of DNA damage checkpoint U4 snRNP positive regulation of DNA−templated tra... U6 snRNA binding Term U4 snRNA binding transcription export complex RNA binding poly(A) RNA binding 0 5 10 15 −log10(bh)

Biological_Process Fig. S5 Cellular_Compartment Molecular_Function bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

DARS2 L1CAM_S1194L B CC2D1A● ● G3BP FAM91A1 ● SRP14 ● ● ● ●NXF1 G3BP2 ● H1FX EMG1●● RPUSD4 ALYREF● ● HNRNPA0 AP2A1● RBMX ● CHM2B_D148Y HIST1H1D ●HNRNPA1 ● NONO ZC2HC1A ● BCKDK ● FTSJ3 ● MAP2● ● NME4 ● ● ● COPA DDX49 ● ● PYCR1 PDCD11 ● POLRMT SFPQ WDR11● ● ● CHTOP

mutant−edge peptide protein wt−edge

macromolecular complex organic cyclic compound binding heterocyclic compound binding transcription export complex membrane−enclosed lumen organelle lumen Term intracellular organelle lumen nucleic acid binding RNA binding poly(A) RNA binding 0 3 6 9 −log10(bh)

Cellular_Compartment Molecular_Function bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

C19orf53● C ● ● ● RPL39P5 AURKA CCDC137 ● DDOST ZNF768 ● ● ALYREF SYEM_E96K LANCL1● ● ● EMG1 SLTM CAC1A_R2135C ●MORF4L2 CSNK1D ● RPUSD3 ● ● ● CSNK1A1 RPN2

mutant−edge peptide protein wt−edge

positive regulation of proteasomal prote... Term

positive regulation of proteasomal ubiqu...

0.0 0.3 0.6 0.9 −log10(bh)

Biological_Process bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

● D PANK2_E134G ANP32B ANP32E● SYEM_R168G ● SETX_R1294C C1QBP ●ANP32A DSRAD_K999N FUS_R216C ●SET

WWOX_P47T CASR_R898Q ITF2_R565W ● PTMA TRA1 HUWE1● ●

mutant−edge peptide protein wt−edge

nuclear part

nuclear lumen

membrane−enclosed lumen Term organelle lumen

intracellular organelle lumen

0.0 0.5 1.0 1.5 2.0 −log10(bh)

Cellular_Compartment bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

● E ACADM ●CKMT1A ● ZEB2_Q1119R RRP12 ●XPO1 ● C2D2A_T1114M DEPD5_S1073R UBR4 PRIC295●

CLN6_R6T MDN1 ● ● POLRMT

mutant−edge peptide protein Term

−log10(bh) bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

● F ● DKFZp686F18120 SRSF2 ● SRSF1 ● SRSF6 SRSF9 FUS_R521C ● ● SRSF3 ●SRSF10 SRSF5●

peptide protein wt−edge

mRNA transport RNA splicing, via transesterification re... mRNA splicing, via spliceosome RNA splicing, via transesterification re... termination of RNA polymerase II transcr... ribonucleoprotein complex localization Term RNA export from nucleus ribonucleoprotein complex export from nu... mRNA−containing ribonucleoprotein comple... mRNA export from nucleus 0.0 2.5 5.0 7.5 10.0 12.5 −log10(bh)

Biological_Process bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

● G CLTB ITPR1_P1059L ●CLTC

DECR1● GTR1_P485L CLTA●

CAC1H_P648L CLINT1●

mutant−edge peptide protein

Golgi−associated vesicle coated pit coated vesicle Golgi−associated vesicle membrane vesicle coat clathrin coat Term clathrin−coated vesicle trans−Golgi network transport vesicle clathrin vesicle coat clathrin binding 0 2 4 6 −log10(bh)

Cellular_Compartment Molecular_Function bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

● H RUVBL1 RUVBL2 ● IPO11 ●

SYDM_Y629C SSR4● ●CALU UBA1_S547G

PCNA TUBB2B● ●

peptide protein wt−edge

acetyltransferase complex protein acetyltransferase complex SWI/SNF superfamily−type complex histone acetyltransferase complex histone methyltransferase complex MLL1 complex Term MLL1/2 complex INO80−type complex DNA helicase complex Ino80 complex 0.0 0.5 1.0 1.5 2.0 −log10(bh)

Cellular_Compartment bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

● I DYNC1H1 ●PHGDH

ERCC6_P1042L

SODC_I152T ●AF1Q UBL5●

mutant−edge peptide protein wt−edge Term

−log10(bh) bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

● J FLOT2

●ERLIN2 DKC1_G402R

CXB1_R264C

FLOT1●

mutant−edge peptide protein wt−edge

positive regulation of toll−like recepto... cortical actin cytoskeleton positive regulation of myoblast fusion positive regulation of heterotypic cell−... cell−cell adhesion mediated by cadherin membrane biogenesis Term glutamate receptor binding membrane assembly ionotropic glutamate receptor binding membrane raft organization 0 5 10 15 20 −log10(bh)

Biological_Process Cellular_Compartment Molecular_Function bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

CAC1H_A748V K●CD2AP

CAPZA1●

peptide protein wt−edge Term

−log10(bh) bioRxiv preprint doi: https://doi.org/10.1101/141622; this version posted May 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

● L IPO9

AAAS_Q15K

IPO7●

mutant−edge peptide protein

GTPase binding

small GTPase binding

Ras GTPase binding

Term histone binding

protein transporter activity

Ran GTPase binding

0 1 2 −log10(bh)

Molecular_Function