MS-based Interactomics: Computational resources and tools for studying the physical

ASMS MS Interest Group Wednesday Evening Workshop Isabell Bludau & Bill Noble What is ‘interactomics’ and why do we discuss it?

• Many MS- studies focus on cataloging and quantifying individual molecules of a particular type • e.g. quantitative or metabolite matrix

• Most biological molecules don’t operate in isolation but they interact with each other • protein complexes • activity regulation via metabolite/drug binding

• ‘Interactome’ = comprehensive set of molecular in biological system • here we focus only on physical (not functional) interactions

Current MS-based techniques for large-scale interactomics

Protein-protein (PPI) networks: • Affinity-purification MS (AP-MS) ➭ Interaction network • Proximity-dependent labeling: APEX, BioID

Protein-protein complexes: • Protein co-fractionation MS (CoFrac-MS) ➭ Protein complexes

protein intensity fractions

Structural information on PPIs: • Cross-linking MS ➭ Structure: interacting protein residues Current MS-based techniques for large-scale interactomics

Protein-metabolite/drug interactions: • Thermal profiling (TPP) / ➭ Protein-ligand Cellular Thermal Shift Assay (CETSA) +/- drug protein interactions non-denatured • Limited proteolysis-coupled MS (LipMS) temperature

Protein-RNA interactions: ➭ Protein-RNA • Protein-RNA crosslinking interactions at residue resolution Available resources and databases

Databases based on: Databases: • High-throughput methods Protein-protein interactions (PPIs) • Low-throughput assays • STRING (https://string-db.org/) • Computational predictions • BioPlex (http://bioplex.hms.harvard.edu/) • PrePPI (https://honiglab.c2b2.columbia.edu/PrePPI/)

Goals: Protein complexes • Expand current knowledge • CORUM (https://mips.helmholtz-muenchen.de/corum/) • Use knowledge from existing databases • Complex Portal (https://www.ebi.ac.uk/complexportal/home) • Look at interactome changes and dynamics Protein-metabolite/drug interactions • STITCH (http://stitch.embl.de/)

General interactions: • IntAct (https://www.ebi.ac.uk/intact/) Computational challenges and what we would like to discuss today

• Protein-protein interaction networks: affinity-purification MS (AP-MS) • Many reciprocal pull-downs to map the full PPI network of a Eduard • 100s - 1000s of MS measurements > prevent error accumulation Huttlin • Confidently distinguish true from false interactions

• Protein-protein complexes: Co-fractionation MS (CoFrac-MS) • Comprehensive interactome map from a single experiment Isabell Bludau (< 100 MS measurements) • Distinguish true interactions from random co-elutions

• Protein-drug interactions: thermal proteome profiling (TTP) Dominic • Many metabolites are tested against thousands of Helm • How to estimate significance? Edward Huttlin Harvard Medical School From IP’s to : Computational Analysis of Large-scale AP-MS Data

Ed Huttlin Harvard Medical School Department of Cell Boston, Massachusetts The Achilles Heel of AP-MS: Which Interactions are Real? Computational Strategies for Identifying Interacting Proteins

COMPARISON WITH NEGATIVE CONTROLS COMPARISON ACROSS UNRELATED IP’s

Negative control IP’s are used to Background is assumed to be constant across IP’s. Define background. Proteins present Interacting proteins are found by seeking proteins At higher abundance than background Whose abundance is increased above their Are assumed to be interactors. Average across many IP’s of unrelated baits.

Examples: QUBIC, SAINT, many others Examples: CompPASS, SAINT, MiST, HGScore, Socio-affinity Index CompPASS Scoring Algorithm

Stats Table Bait 1 Bait 2 Bait 3 Bait 4 Bait k

Interactor 1 X1,1 X2,1 X3,1 X4,1 Xk,1

Interactor 2 X1,2 X2,2 X3,2 X4,2 Xk,2

Interactor 3 X1,3 X2,3 X3,3 X4,3 Xk,3

Interactor 4 X1,4 X2,4 X3,4 X4,4 Xk,4

Interactor m X1,m X2,m X3,m X4,m Xk,m

Xi,j = Spectral counts for interactor j with bait i Z-Score WDN-Score How frequently Is the protein more is this protein abundant than usual detected? Is X - X N S p in this IP? i,j N X It detected Z = WD = n i,j S X reproducibly?

X = Average spectral counts for protein across baits N = Number of runs Xi,j = Spectral counts for interactor j with bait i n = Number of runs in which the protein is found S = Standard Deviation spectral counts across baits p = Number of replicates observed (1 or 2)

CompPASS: Sowa et al. (2009) Cell, 138:389-403. Mat Sowa Wade Harper An R Implementation of CompPASS

https://github.com/dnusinow/cRomppass David Nusinow http://bioplex.hms.harvard.edu/comppass/ From IP’s to Interactomes: Meeting the Challenges of Scale

CHALLENGES OF SCALE 1. 2. 3. 4. BioPlex 1.0:How Huttlin to doet al. (2015) Cell, 162:425How- 440.to ensure quality How to minimize errors How to rapidly and accurately BioPlexall 2.0:those Huttlin searches? et al. (2017) Nature,across545:505 thousands-509. of runs? due to false positive ID’s? identify interactors? BioPlex 3.0: In Preparation. Data processing overview

Backup Server Interactome Server

Raw Auto File Loader CORE Peptide Protein Sequest Filtering Mass Filtering Spectrometer

Quality Control

Sample LIMS Prep. CompPASS

Existing Technology Modified for Interactome Project New for Interactome Project Automated and Adaptive Run QC

GOOD

BAD

TRAINING DATA

GOOD BAD Cycle repeats For every run. CLASSIFICATION PREDICTIVE MODEL ? ? GOOD BAD

NEW RUN EXPERT PREDICTION VALIDATION Automatic Notification of Problematic Runs Distinguishing Interactors from Background and False Positives

Relative Protein Abundances across IP’s

Both bona fide interactors and false positives appear in AP-MS experiments as “rare” events that score as likely interactions. Unless precautions are taken, enrichment of these false positives can cause surprisingly high false discovery rates among AP-MS datasets. Distinguishing Interactors from Background and False Positives

• A key challenge of AP-MS is distinguishing the few true interacting proteins (~1-2%) from a much larger number of background proteins (~98%), false positives (~1%), and other experimental artifacts.

• While existing algorithms for AP-MS distinguish enriched interacting proteins from nonspecific background, CompPASS-Plus uniquely accounts for false positive ID’s. CompPASS-Plus for Interactor Identification

• CompPASS-Plus is a Naive Bayes classifier that extends CompPASS for improved Identification of interacting proteins.

• Features include standard CompPASS scores as well as customized scores.

• Training data is obtained from STRING/GeneMania and from Target/Decoy methods

• Leave-one-out cross-validation is incorporated at the 96-well-plate level for classification. CORUM Validation BioPlex Associates New Proteins with Known Complexes

PAH

Phe Tyr

DNAJC12

Trp-5OH Trp Tyr L-dopa TPH1/2 TH Serotonin Dopamine

Chantranupong (2016) Cell, 165:153. Anikster et al. (2017) Saxton (2016) Nature, 536:229. Am. J. Hum. Genet. 100, 257. Miles (2017) eLife, 6:e22693.

Gu et al. (2017) Science, 358:813. Nguyen (2017) Science, 357:eaan0218. Rebsamen (2015) Nature, 519:477. Summary

• Separating true interacting protein partners from background is a serious challenge for AP-MS

• A variety of algorithms have been developed to address this that accommodate a variety of experimental designs

• Performing AP-MS at truly large scale leads to additional challenges that must be addressed.

• Our CompPASS-Plus algorithm has enabled us to efficiently identify interacting proteins and produce reliable maps of the interactome in multiple cell types. Isabell Bludau ETH Zurich Underlying technology for data acquisition: Protein co-fractionation

Native complex fractionation (SEC, IEX, BN, ...)

Bottom-up LC-MS/MS Protein intensity Fractions

Andersen et al. (2003), Dunkley et al. (2004), Foster et al. (2006), Scott et al. (2015), Wan et al. (2015)

Underlying technology for data acquisition: Protein co-fractionation mass spectrometry

Native complex fractionation (SEC, IEX, BN, ...)

Bottom-up LC-MS/MS Benefits: + Independent of genetic engineering or antibody availability + Parallel detection of protein complexes on a

proteome-wide scale Protein intensity Fractions

Andersen et al. (2003), Dunkley et al. (2004), Foster et al. (2006), Scott et al. (2015), Wan et al. (2015)

Established data analysis strategy: Protein correlation profiling

Protein correlation profiling Pairwise scoring Clustering

1 0.9 0.8 0.9

0.9 1 0.9 0.1

0.8 0.9 1 0.2

Protein intensity Fractions 0.9 0.1 0.2 1 PrInCE (Stacey et al. 2017)

Data properties: • Profiles for thousands of proteins (~5000) • Limited peak capacity (~20-30)

Consequences: • Random co-elution of proteins Protein intensity Fractions • Limited selectivity and sensitivity

Scott et al. (2015), Wan et al. (2015), Stacey et al. (2017) Targeted anaysis strategy: Complex-centric proteome profiling

Native complex fractionation (SEC, IEX, BN, ...)

Bottom-up LC-MS/MS Protein intensity

SEC fractions

Heusel & Bludau et al. (2019) Targeted anaysis strategy: Complex-centric proteome profiling

Prior information: protein complex DB CCprofiler Native complex https://github.com/CCprofiler/ fractionation (SEC, IEX, BN, ...)

Bottom-up LC-MS/MS Protein intensity

SEC fractions

ü Automated software for targeted complex-centric analysis ü Parallel and sensitive protein complex detection ü Complex-level FDR estimation

Heusel & Bludau et al. (2019) Complex-centric proteome profiling: CCprofiler Decoy based FDR estimation https://github.com/CCprofiler/ Targets Decoys a) Defined hypotheses (e.g. CORUM) • Generate matching decoy for each target (same complex size distribution) • Require minimum network distance between decoy proteins to avoid overlap b) Interaction network with targets (e.g. AP-MS, BioPlex, StringDB) Select 1st neighbors of each Inf protein as one target hypothesis 3 Inf Complex-centric proteome profiling: CCprofiler Benchmark https://github.com/CCprofiler/

Complex Signal scoring hypotheses

CORUM manual annotation

Decoy generation CCprofiler

CORUM automated scoring MORUC CCprofiler Complex-centric proteome profiling: CCprofiler Benchmark https://github.com/CCprofiler/

Complex Signal scoring Manual benchmark hypotheses

CORUM manual annotation 0.15 high quality 1.0

all 0.8 0.10 0.6

Decoy generation TPR 0.4

0.05 TPR_high

CCprofiler manualFDR

ManualFDR 0.2 CORUM automated scoring 0.00 0.0 MORUC CCprofiler 0.00 0.05 0.10 0.15 0.00 0.05 0.10 0.15 qvalue_cutoff qvalue_cutoff Decoy FDR Decoy FDR Targeted anaysis strategy: Complex-centric proteome profiling

Prior information: protein complex DB CCprofiler Native complex https://github.com/CCprofiler/ fractionation (SEC, IEX, BN, ...)

Bottom-up LC-MS/MS Protein intensity

SEC fractions

ü Automated software for targeted complex-centric analysis ü Parallel and sensitive protein complex detection ü Complex-level FDR estimation

Heusel & Bludau et al. (2019) Targeted anaysis strategy: Complex-centric proteome profiling

† These authors contributed equally to this work.

ü Automated software for targeted complex-centric analysis ü Parallel and sensitive protein complex detection ü Complex-level FDR estimation

Heusel & Bludau et al. (2019) Complex-centric proteome profiling by SEC-SWATH-MS

ü Detect and quantify hundreds HEK293 soluble proteome of protein complexes - SEC fractionation: 1 mg, Yarra-SEC-4000, 81 fractions - SWATH-MS: Triple-TOF 5600, 64 vw, 120 min gradient ➤ Consistent quantification of 4916 proteins

Detection of 572 out of 1753 CORUM complexes (5% FDR)

572 complexes

Heusel & Bludau et al. (2019) Complex-centric proteome profiling by SEC-SWATH-MS

ü Detect and quantify hundreds HEK293 soluble proteome 26S proteasome of protein complexes Annotated subunits: 22 Subunits with signal: 22 26S proteasomeMax. coeluting subunits: 22 Max. completeness: 1 AnnotatedProteasome assembly subunits: 22 Subunits with signal: 22 ü Investigate sub-complexes Max. coeluting subunits: 22 Max. completeness:molecular 1 weight (kDa) apparent molecular weight [kDa] molecular weight (kDa) and assembly intermediates 11724 4130 1455 513 181 64 22 8 3 11724 4130 1455 513 181 64 22 8 3 1000000 monomers 26S proteasome id 1000000 idSubunits: PRS10 PSA6 PRS10 PSA6 PRS4 PSA7 PRS4 PSA7 750000750000 PRS6A PSB1 PRS6A PSB1 PRS6B PSB2 PRS6B PSB2 PRS7 PSB3 500000 PRS8 PSB4 PRS7 PSB3 intensity 500000 PSA1 PSB5 PRS8 PSB4

intensity PSA2 PSB6 PSA1 PSB5 250000protein intensity PSA3 PSB7 PSA2 PSB6 250000 PSA4 PSD13 PSA5 PSMD4 PSA3 PSB7 0 PSA4 PSD13 1 11 21 31 41 51 61 71 81 fraction PSA5 PSMD4 0

1 11 21 31 41 51 61 71 81 SEC SEC fractions fractionfraction number

Heusel & Bludau et al. (2019) Complex-centric proteome profiling by SEC-SWATH-MS

ü Detect and quantify hundreds HEK293 soluble proteome 26S proteasome of protein complexes Annotated subunits: 22 Subunits with signal: 22 26S proteasomeMax. coeluting subunits: 22 Max. completeness: 1 AnnotatedProteasome assembly subunits: 22 Subunits with signal: 22 ü Investigate sub-complexes Max. coeluting subunits: 22 Max. completeness:molecular 1 weight (kDa) apparent molecular weight [kDa] molecular weight (kDa) and assembly intermediates 11724 4130 1455 513 181 64 22 8 3 11724 4130 1455 513 181 64 22 8 3 1000000 monomers 26S proteasome id 1000000 idSubunits: PRS10 PSA6 PRS10 PSA6 PRS4 PSA7 PRS4 PSA7 late early 750000750000 PRS6A PSB1 PRS6A PSB1

G4 PRS6B PSB2 POMP α5 α1 α1 PRS6B PSB2 α3 G3 α5 α3 α7 α6 β3 α7 PRS7 PSB3 G1 α4 G1 protein class G2 α2 β6 α4 PRS7 PSB3 β7 β2 G2 500000 PRS8 PSB4 intensity 500000 PSA1 PSB5 PRS8 PSB4

intensity PSA2 PSB6 PSA1 PSB5 250000protein intensity PSA3 PSB7 PSA2 PSB6 250000 PSA4 PSD13 PSA5 PSMD4 PSA3 PSB7 0 PSA4 PSD13 1 11 21 31 41 51 61 71 81 fraction PSA5 PSMD4 0

1 11 21 31 41 51 61 71 81 SEC fractions

Intermediatesin SEC-SWATH-MS SEC fractionfraction number

Heusel & Bludau et al. (2019) Complex-centric proteome profiling by SEC-SWATH-MS

ü Detect and quantify hundreds HEK293 soluble proteome of protein complexes Assembled ü Investigate sub-complexes 66% and assembly intermediates only ü Evaluate global proteome monomeric 34% assembly characteristics ➤ The majority of the proteins appear in at least one assembled state

log(MW) multiple assembled monomeric peaks 27% 5503 elution peaks

intensity for 4065 proteins single peak 73% Protein SEC fractions ➤ Many proteins are observed in multiple distinct assembly states

Heusel & Bludau et al. (2019) Current developments: Quantitative comparison of protein complexes across conditions intensity Condition A Protein

SEC fractions intensity Condition B Protein

SEC fractions Quantitative comparison of protein complexes across conditions Current developments:

Condition B Condition A

Protein intensity Protein intensity SEC fractions SEC fractions

intensity 0e+00 1e+06 2e+06 3e+06 0e+00 1e+06 2e+06 3e+06 SMN complex 33625 1 ➤ ➤ Splicing efficient and PRPF8 depleted human cells

protein intensity

intensity SMN complex Spliceosome biogenesis is down-regulated 110 out of 553 complexes are differentially abundant (FDR < 0.05) 0e+00 1e+06 2e+06 3e+06 0e+00 1e+06 2e+06 3e+06 SMN complex 9396 33625 11 1 9396 11 2311 21 2311 21 GEMIN2 DDX20 GEMIN2 DDX20 646 31 646 31 SEC fractions SEC fractions SNRPD2 SNRPD1 SNRPD2 SNRPD1 MW (kDa) MW (kDa) fraction fraction 159 41 SNRPF SNRPD3 159 41 SNRPF SNRPD3 SNRPG 51 44 SNRPG 51 44 61 11

71 3

61 11

depleted control

71 3

depleted control ➭ ➭ ‘ Data-driven detection of assembly specific proteoforms Current developments: ü Proteoforms

specific proteoforms proteoforms based on unique peptides available in SEC-SWATH-MS Parallel detection of 1,378 assembly Distinguish assembly specific Make use of peptide-level information

Poster ’ = protein variants that originate from the but that have a ThP626 unique amino acid sequence and post-translational modifications

peptide intensity Eukaryotic initiation factor 3 subunit C (EIF3C) same gene , SEC fractions SEC fractions Current developments: Network-centric analysis

SEC analysis toolkit (SECAT) ➭ Determine perturbed nodes in the PPI interaction network

Rosenberger et al. (in preparation) Take home message: Complex-centric analysis of CoFrac-MS data ü Consistent detection and quantification of 100s protein complexes on a proteome-wide scale ü Controlled complex-level FDR Protein intensity ü Sub-complex resolution SEC fractions

Requirements: CCprofiler • Quantitative peptide or protein matrix (DIA or DDA) https://github.com/CCprofiler/ • Annotation table that matches MS runs to the sampled fractions • Prior protein connectivity information (e.g. CORUM, BioPlex, StringDB)

ü Coming next: quantitative complex comparison, proteoform detection, network-centric analysis Thank you for your attention!

Aebersold lab External collaborators • Ruedi Aebersold • Hannes Röst • Ben Collins • Yuija Cai • Moritz Heusel • Vihandha Wickramasinghe • Max Frank • Ashok Venkitaraman • George Rosenberger • Claudia Martinelli • Peng Xue • Robin Hafen • Amir Banaei- Esfahani • Audrey van Drogen • Yansheng Liu • Matthias Gstaiger Dominic Helm EMBL Heidelberg Thermal proteome profiling for interactomics ASMS 2019 – MS-Based interactomics dominic helm

15/07/2019 Acknowledgement

Martin Beck Peer Bork Nassos Typas Wolfgang Huber Eileen Furlong Christoph Mueller Natalie Romanov Malte Paulsen Marcus Bantscheff Gerard Drewes Thilo Werner Cellzome Friedrich Reinhard Holger Franken Celia Berkers (Utrecht University) Sindhuja Sridharan, Nils Kurzawa, Isabelle Becher, Frank Stein, Dominic Helm, Tomi Määttä, Andre Mateus, Per Haberkant, Mandy Rettel, Jianguo Zhu, Henrik Hammaren, Core Facility Clement Potel, Malay Sha, Vallo Varik, Kerri Malone, Flow Cytometry Core Facility 15/07/2019 Cecilia Perez-Borrajero, Matteo Perino Advanced Light Microscopy Core Facility Former members: Johannes Hevler, Maike Schramm Mechanical Workshops Prelude: Isobarquant

Franken et al., 2015, Nature Protocols Protein/drug interactions in living cells proteome-wide

Savitski MM et al Science 2014 Protein/drug interactions in living cells proteome-wide

Savitski MM et al Science 2014 Protein/drug interactions in living cells proteome-wide

Savitski MM et al Science 2014 Protein/drug interactions in living cells proteome-wide

Savitski MM et al Science 2014

Protein-Drug interaction: Staurosporine targets

denatured

- fraction non fraction

Savitski MM et al Science 2014 Melting curve fits

• Fit parametric sigmoidal curves for each condition • Estimate melting points • Compare melting points between the treatment conditions using a z-Test

Franken et al., 2015, Nature Protocols Problems with the melting point (Tm) comparison

Several reasons can lead to Tm shift being insufficient to detect ligand effect: • Small but reproducible shifts (BTK) • Shifts in non-centered curve areas (PRKCB) • Melting points outside the temperature range (NQO2)

Childs*, Bach*, Franken* et al. (2018) bioRxiv Our solution: Functional melting curve analysis

Fit two competing models per protein R R Compute F-statistic:

RSS0 − RSS1 퐹 = RSS1

RSS: residual sum of squares

Karsten Bach Dorothee Childs Holger Franken

Childs*, Bach*, Franken* et al. (2018) bioRxiv Functional melting curve analysis

New method captures old targets + cases with more subtle effects

Childs*, Bach*, Franken* et al. (2018) bioRxiv More sensitive experimental design: 2D-TPP Compound concentration-dependent profiling over a range of temperatures

Instead of statistics: More experiments – more data – more targets

1 .5 P A H p a n o b in o s ta t 1

d e

r p a n o b in o s ta t 2

u t

a v e h ic le 1 n

e 1 .0 v e h ic le 2

d

-

n

o

n

n 0 .5

o

i

t

c

a

r f

0 .0 PAH 4 0 5 0 6 0 7 0 o

te m p e ra tu re ( C ) temperature

compound concentration

Becher I, … , Bantscheff M*, Savitski MM* Nature Chemical Biology 2016 Protein-Metabolite interaction: Adenosine triphosphate (ATP)

Model system Metabolite and MS-analysis Data analysis heat treatment

FDR controlled dose-response assessment

pEC50 – measure of affinity 0 2 ATP (mM)

Sridharan S#, Kurzawa N#, Werner T, Guenthner I, Helm D, Huber W, Bantscheff M*, Savitski MM* Nature Communications 2019 Functional 2D-TPP data analysis using sliding temperature windows

• Fit models to adjacent temperatures RSS RSS • H0 Model: intercept model (blue) • H1 Model: dose-response curve (orange) • Compare goodness of fit

• Summarize per protein score Fcomb RSS RSS • Estimate the FDR for given scores by repeating the procedure with permuted data and ranking results jointly RSS: residual sum of squares Validation of the approach on drug datasets with known targets

• Ampicillin treated E. coli lysate • beta-lactam antibotic, inhibting bacterial cell- 12 YHJX DACB ● ● André Mateus wall synthesis ● NHAB ● • Known targets: MrcA, FtsI, DacB & PbpG 9 MURA

• Other binders: AmpC b

m 6 DLD o ● RPLT c ● AMPC ●

F ● RPSN ● ● ATPA MRCA ● ● ● ● ● 3 ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ●●● ●●● ● ● ● ●● ● ● ● ● ● ●● ● ● ●●● ●● ●●●●● ● ● ● ● ● ●●●● ●● ● ● ● ●●●● ●●●●● ● ●● ● ●●●●●●●● ●● ●●● ● ● ● ● ● ●●●●● ●●●● ●●●●●● ● ● ●● ● ● ●●●●●● ●●●● ● ● ● ● ●●●●●●●●●●●●●●● ●●●●●● ●●●●●●● ● ● ● ●●●●●●●●●●●●●●● ●●●●●●● ●● ●●● ● ●●● ● ●●●●●●●●●●●●● ● ●●●●●● ●●●● ● ●●● ●● ● ●●● ●●●●●●●●● ●●●●●●●● ● ● ● ● ●● ● ●● ● ●●●●●●●●●●●●●● ●●●●●●●●●● ●● ● ● ● ● ●●●● ●● ●●●●●●●●●●●●●●● ●●●●●●●●●● ● ● ● ● ● ● ● ● ● ●● ● ●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●● ●● ●● ● ● ● ● ● ● ● ●● ● ●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●● ● ● ● ● ● ● ●●● ●●●● ●●● ● ●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●● ● ●● ● ● ● 0 ● ● ●●● ● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●● ●● ● ●●● ● ●● ● −3 −2 −1 0 1 2 3 log2 FCmax

Mateus et al. Mol. Sys Biol. 2018 Thermal proteome profiling

• Unbiased

• Proteome wide

• Versatile Thank you for your attention!

15/07/2019