TRAINING SCHOOL 1 24 -28 M AY 2021

Computational methods to study Phase Separation

Lecture: Short Linear Motifs - Concepts, identification and prediction - Bálint Mészáros

This content is shared under the license CC BY-NC-SA Short Linear Motifs

Concepts, identification and prediction

Bálint Mészáros Structural and Computational Biology Unit European Molecular Biology Laboratory 26/05/2021 Short linear motifs as models of protein-protein interactions What?

“The sequences of many proteins contain short, conserved motifs that are involved in recognition and targeting activities, often separate from other functional properties of the molecule in which they occur. These motifs are linear, in the sense that three- dimensional organization is not required to bring distant segments of the molecule together to make the recognizable unit.”

Hunt TiBS 1990, Protein sequence motifs involved in recognition and targeting: a new series Why?

• Condensation is driven by various forces

The realm of short linear motifs

Ryan & Fawzi TiNS 2019 Short linear motifs / SLiMs

Nuclear glucocorticoid Protein Region Sequence receptor PRGC1_HUMAN 143-149 AEEPS LLKKLLL APANT (PDB: 1m2z) PA2G4_HUMAN 353-359 EVQDA ELKALLQ SSASR CBP_HUMAN 69-75 ASKHK QLSELLR GGSGS CBP_HUMAN 357-363 KLIQQ QLVLLLH AHKCQ NRIP1_HUMAN 20-26 SIVLT YLEGLLM HQAAG NRIP1_HUMAN 132-138 KQDST LLASLLQ SFSSR regular expression: Interacting NRIP1_HUMAN 184-190 GVASS HLKTLLK KSKVK SLiM IDPs HAIR_HUMAN 565-571 SGLGD RLCRLLR REREA building .L..LL. HAIR_HUMAN 757-763 PLPCP SLCELLA STAVK Q90ZL7_DANRE 69-75 GEKSN VLRKLLK RANSY NCOA6_MOUSE1494-1500 REAPT SLSQLLD NSGAP sequence logo: NCOA2_HUMAN 640-646 SKGQT KLLQLLT TKSDQ NCOA2_HUMAN 689-695 KEKHK ILHRLLQ DSSSP NCOA2_HUMAN 744-750 KKENA LLRYLLD KDDTK MED1_HUMAN 603-609 VSQNP ILTSLLQ ITGNG MED1_HUMAN 644-650 TKNHP MLMNLLK DNPAQ * **

SLiMs are autonomous functional units Short linear motifs / SLiMs

…100,000 transient, 1/3 of the human conditional, regulatory …and 70% of PTMs proteome, interaction modules… containing…

Tompa et al. Mol. Cell. 2014 Short linear motifs / SLiMs

• SLiMs are: • Short (3-10 residues, close in the sequence) • Mediating interaction with a given domain • Residing in disordered regions • Evolutionarily conserved

Linear and compact

Specific stabilizing interactions

Structural adaptation

Preserved function (in a generally quickly evolving sequence region)

SLiMs are often conserved

Relative

conservation scoreconservation Conservationscore

Key residues of SLiMs tend to stand out in alignments

Davey et al. Mol. BioSyst. 2012

Mészáros et al. Sci Sig. 2021 Describing SLiMs SLiM representations

Regular expressions (RegEx) F[EDQS][MILV][ED][MILV]((.{0,1}[ED])|($))

Defines the key positions for the interaction

Advantages: • Makes key positions clear • Can have conditionals • Easy to implement in programming languages • Can encode PTMs Disadvantages: • Binary description

http://elm.eu.org/infos/help.html SLiM representations

Position specific scoring matrices (PSSMs)

Defines a score for the occurence of each residue in each position

The better the fit, the higher the score

Sum of scores = fit of the SLiM

Advantages: • Better resolution (not binary) • Can handle residue preferences Disadvantages: • Cannot handle flexible distances (see {0,2}) • Cannot handle conditional positions

http://slim.icr.ac.uk/pssmsearch/ Structural complexity is reflected in SLiMs

• Some SLiM definitions are more complex than others – why?

LIG_AP2alpha_1 F.D.F vs LIG_LIR_Gen_1 [EDST].{0,2}[WFY][^RKPG][^PG][ILV] vs LIG_PCNA_yPIPBox_3 ([KR].{0,6}[QN].[^FHWY][LIVM][^P][^PFWYMLIV][FYLMWV][FYLMWVI])|([QN].[^FHWY][LIVM][^P][^PFWYMLIV][FYLMWV][FYLMWVI].{0,6}[KR])

Averaging over homologous domains

Domain heterogeneity presents in Complexity in motif definition variations in binding preference

yeast rat Trypanosoma brucei Noda et al., FEBS lett 2010;584(7):1379-85. Structural complexity is reflected in SLiMs

• Compensating positions

mandatory

either / or

Conditional positions cannot be captured in a single regex / logo

Hertz et al., Mol Cell 2016 18;63(4):686-695 Disorder causes problems (with the model)

• Ideal motif model: disordered partner

fully ordered domain with a single, deep binding pocket

Simple motif, captured by regex

Cladosporin:lysyl-tRNA synthetase NCBD:ACTR E-cadherin:β-catenin

too long too many too much Functions of SLiMs Functions of SLiMs

• CLV – cleavage sites • Sites recognized by , e.g. PAP cleavage site: [ILV]..R[VF][GS]. • DOC – docking sites • Specific sites recognized by modifier enzymes, e.g. MAPK JIP1 docking motif: [RK]P[^P][^P]L.[LIVMF] • MOD – post-translational modification sites • Sites that are modified by PTM enzymes, e.g. GSK3 phospho-site: ...([ST])...[ST] • TRG – targeting sites • Sites recognized by subcellular transport, e.g. NLS: [^DE]((K[RK])|(RK))[KRP][KR][^DE] • DEG – degrons • Sites recognized by E3 ubiquitin ligases, e.g. SCFFBW7 recognition site: [LIVMP].{0,2}(T)P..([ST]) • LIG – other binding sites • Generic protein-protein interaction sites, e.g. clathrin binding motif: .[NP]W[DES].W SLiMs are heavily regulated

LIG_LIR_Gen_1

[EDST].{0,2}[WFY][^RKPG][^PG][ILV]

regulation often shows in the motif definitions

Van Roey et al., Curr Opin Struct Biol 2012 SLiMs driving phase separation

• More and more systems are known to be driven by SLiM mediated interactions

Phase transition correlates with biochemical activity transition in the nephrin–NCK–N-WASP system

• The dominant forces are SLiM-domain interactions • Phospho-regulated SLiMs • Multivalency is crucial

SLiM driven condensation typically involves several proteins – co-driver systems

Li et al. Nature 2012 SLiMs in disease Viruses hijack host processes using motifs

Davey et al. TiBS 2011 SLiMs in disease

• Cancer modifies/deletes/creates motifs Experimental identification of SLiMs Low throughput experimental identification of SLiMs

SPOT arrays

Ian Cushman: Utilizing peptide SPOT arrays to identify protein interactions Current Protocols in Protein Science 2008 High throughput experimental identification of SLiMs HTP Proteomic peptide-phage display

N.E. Davey, M.-H. Seo, V.K. Yadav, J. Jeon, S. Nim, I. Krystkowiak, C. Blikstad, D. Dong, N. Markova, P.M. Kim, Y. Ivarsson, Discovery of short linear motif-mediated interactions through phage display of intrinsically disordered regions of the human proteome, FEBS J., 284 (2017) 485–498. High throughput experimental identification of SLiMs

HTP Proteomic peptide-phage display Computational resources in SLiM research Eukaryotic Linear Motif resource - ELM

http://elm.eu.org/ Eukaryotic Linear Motif resource - ELM

Total of 295 motifs with 3,600+ examples • CLV – 11 motifs (89 examples) • DEG – 25 motifs (188 examples) • DOC – 32 motifs (450 examples) • MOD – 37 motifs (786 examples) • TRG – 22 motifs (281 examples) • LIG – 168 motifs (1,821 examples) Eukaryotic Linear Motif resource - ELM Identifying domains with

http://pfam.xfam.org/ Identifying domains with Pfam

Pfam is cross referenced in UniProt – easy to check domain architectures PhosphoSitePlus

Collection of post-translational modifications in human, mouse and rat proteins • , methylation, acetylation, ubiquitination, SUMOylation • partitioned into low-throughput (reliable) and high-throughput (more data)

https://www.phosphosite.org/ Acknowledgements

Toby J. Gibson Hugo Sámano-Sánchez Manjeet Kumar Jesús Alvarado-Valverde Jelena Calyseva Renato Alves