A SARS-CoV-2-Human -Protein Interaction Map Reveals Drug Targets and Potential Drug- Repurposing

Extended Data

nsp14 nsp14−2 65 1 nsp14 nsp14−1 64 nsp14 nsp14−3 66 nsp5 nsp5−1 10 0.5 nsp5 nsp5−2 11 nsp5 nsp5−3 12 EV EV−6 76 EV EV−7 93 0 orf3a orf3a−3 26 orf3b orf3b−1 67 nsp2 nsp2−1 70 orf6 orf6−1 34 −0.5 E E−1 48 orf7a orf7a−1 31 orf7a orf7a−2 32 orf7a orf7a−3 33 −1 orf10 orf10−1 37 orf10 orf10−2 38 orf10 orf10−3 39 nsp4 nsp4−1 7 nsp4 nsp4−2 8 nsp4 nsp4−3 9 nsp6 nsp6−1 83 nsp6 nsp6−2 84 nsp6 nsp6−3 85 orf6 orf6−2 35 EV EV−4 30 EV EV−5 40 orf3a orf3a−1 24 orf3a orf3a−2 25 orf3b orf3b−2 68 orf3b orf3b−3 69 orf6 orf6−3 36 E E−2 49 E E−3 50 nsp2 nsp2−2 71 nsp2 nsp2−3 72 nsp11 nsp11−1 61 nsp11 nsp11−2 62 nsp11 nsp11−3 63 EGFP EGFP−1 6 EGFP EGFP−2 16 EGFP EGFP−4 23 EGFP EGFP−5 47 S SNoRE−1 80 S SNoRE−2 81 S SNoRE−3 82 nsp1 nsp1−2 4 nsp1 nsp1−1 3 nsp1 nsp1−3 5 EV EV−2 2 EV EV−1 1 EV EV−3 57 M M−1 44 M M−2 45 M M−3 46 orf9c orf9c−2 42 orf9c orf9c−1 41 orf9c orf9c−3 43 orf8 orf8−3 29 orf8 orf8−1 27 orf8 orf8−2 28 nsp7 nsp7−1 17 nsp7 nsp7−2 18 nsp7 nsp7−3 19 N N−1 20 N N−2 21 N N−3 22 nsp8 nsp8−1 51 nsp8 nsp8−2 52 nsp8 nsp8−3 53 nsp13 nsp13−3 96 nsp13 nsp13−1 94 nsp13 nsp13−2 95 orf9b orf9b−3 92 orf9b orf9b−1 90 orf9b orf9b−2 91 nsp10 nsp10−1 58 nsp10 nsp10−2 59 nsp10 nsp10−3 60 nsp9 nsp9−1 54 nsp9 nsp9−2 55 nsp9 nsp9−3 56 nsp12 nsp12−1 77 nsp12 nsp12−2 78 nsp12 nsp12−3 79 nsp15 nsp15−2 98 nsp15 nsp15−1 97 nsp15 nsp15−3 99 nsp5_C145A nsp5_C145A−3 75 nsp5_C145A nsp5_C145A−1 73 nsp5_C145A nsp5_C145A−2 74 nsp14 − 2 65 nsp14 − 1 64 nsp14 − 3 66 nsp5 − 1 10 nsp5 − 2 11 nsp5 − 3 12 EV − 6 76 EV − 7 93 orf3a − 3 26 orf3b − 1 67 nsp2 − 1 70 orf6 − 1 34 E − 1 48 orf7a − 1 31 orf7a − 2 32 orf7a − 3 33 orf10 − 1 37 orf10 − 2 38 orf10 − 3 39 nsp4 − 1 7 nsp4 − 2 8 nsp4 − 3 9 nsp6 − 1 83 nsp6 − 2 84 nsp6 − 3 85 orf6 − 2 35 EV − 4 30 EV − 5 40 orf3a − 1 24 orf3a − 2 25 orf3b − 2 68 orf3b − 3 69 orf6 − 3 36 E − 2 49 E − 3 50 nsp2 − 2 71 nsp2 − 3 72 nsp11 − 1 61 nsp11 − 2 62 nsp11 − 3 63 EGFP − 1 6 EGFP − 2 16 EGFP − 4 23 EGFP − 5 47 S SNoRE − 1 80 S SNoRE − 2 81 S SNoRE − 3 82 nsp1 − 2 4 nsp1 − 1 3 nsp1 − 3 5 EV − 2 EV − 1 EV − 3 57 M − 1 44 M − 2 45 M − 3 46 orf9c − 2 42 orf9c − 1 41 orf9c − 3 43 orf8 − 3 29 orf8 − 1 27 orf8 − 2 28 nsp7 − 1 17 nsp7 − 2 18 nsp7 − 3 19 N − 1 20 N − 2 21 N − 3 22 nsp8 − 1 51 nsp8 − 2 52 nsp8 − 3 53 nsp13 − 3 96 nsp13 − 1 94 nsp13 − 2 95 orf9b − 3 92 orf9b − 1 90 orf9b − 2 91 nsp10 − 1 58 nsp10 − 2 59 nsp10 − 3 60 nsp9 − 1 54 nsp9 − 2 55 nsp9 − 3 56 nsp12 − 1 77 nsp12 − 2 78 nsp12 − 3 79 nsp15 − 2 98 nsp15 − 1 97 nsp15 − 3 99 nsp5_C145A − 3 75 nsp5_C145A − 1 73 nsp5_C145A − 2 74

Extended Data Figure 1. Clustering analysis of AP-MS dataset reveals biological replicates of individual baits are well correlated. All MS runs were compared and clustered using artMS (David Jimenez-Morales, Alexandre Rosa Campos and John Von Dollen. (2019). artMS: Analytical R tools for Mass Spectrometry. R package version 1.3.9. https://github.com/biodavidjm/artMS). This figure depicts all Pearson’s pairwise correlations between MS runs, and is clustered according to similar correlation patterns.

endomembrane system organization nuclear transport silencing cellular component disassembly protein targeting viral life cycle protein import nuclear envelope structural constituent of nuclear pore cell cycle phase transition cell cycle g1 s phase transition anatomical structure homeostasis dna recombination transferase complex transferring phosphorus containing groups organelle inner membrane cellular protein complex assembly lipoprotein metabolic process organelle subcompartment regulation of protein stability transferase activity transferring acyl groups peptidyl cysteine modification lipid particle myosin binding trna metabolic process ribonucleoprotein complex biogenesis posttranscriptional regulation of amide biosynthetic process regulation of rna stability microtubule based process microtubule organizing center organization protein kinase a binding microtubule intra golgi vesicle mediated transport mitochondrial matrix organic cyclic compound catabolic process regulation of gene expression epigenetic spliceosomal complex snorna binding sensory perception of mechanical stimulus regulation of response to oxidative stress cell redox homeostasis interaction with symbiont cell death in response to oxidative stress regulation of extracellular matrix organization lytic vacuole membrane wnt signaling pathway regulation of multi organism process endocytosis coated membrane microtubule based movement organelle envelope lumen chaperone binding atpase activity nuclear inner membrane active transmembrane transporter activity neuronal postsynaptic density cytoplasmic ubiquitin ligase complex ubiquitin like protein transferase activity regulation of dna templated transcription in response to stress lysine acetylated histone binding hops complex glycolipid biosynthetic process tau protein kinase activity phagolysosome assembly regulation of reactive oxygen species metabolic process leukocyte proliferation response to interleukin 4 vacuolar lumen extracellular regulation of signal transduction glycoprotein metabolic process response to endoplasmic reticulum stress extracellular matrix protein deglycosylation response to vitamin v 6 N M E S Orf6 Orf8 4 Nsp1 Nsp9 Nsp8 Nsp7 Nsp4 Orf9b Orf3a Nsp6 Orf10 Nsp13 Nsp10 Nsp14 2 Protein14 0 Nsp5_C145A

Extended Data Figure 2. Biological Process Enrichments for SARS-CoV-2 Host Factors. We performed GO biological process enrichments (Methods) for the host factors identified as binding to each SARS-CoV-2 viral protein and represent here the top 5 most significant terms for each viral protein. −log(P−value) 10 20 30

CYTH ECSIT_C PIG−S BCS1_N CIA30 ECSIT DUF747 Gaa1 KA1 RRM_1 PRKCSH AAA_lid_7 AAA_lid_5 Nup96 Nucleoporin2 Peptidase_M16_C Peptidase_M16 BET Band_7_C ARL6IP6 VPS11_C MCLC Erf4 USP8_interact Melibiase_2_C Melibiase_2 TLE_N DUF3697 TBCA Alpha_adaptin_C GrpE PFAM domains enriched in AP − MS preys PFAM cEGF KH_6 Ras ERG2_Sigma1R TRM Hist_deacetyl zf−Tim10_DDP WASH−7_C WASH−7_N WASH−7_mid DNA_pol_alpha_N zf−DNA_Pol Pol_alpha_B_N DNA_primase_lrg

S E M N Nsp1Nsp2Nsp4Nsp5 Nsp6Nsp7Nsp8Nsp9 Orf6 Orf8 Nsp10Nsp11Nsp12Nsp13Nsp14Nsp15 Orf3aOrf3b Orf7a Orf9bOrf9cOrf10

Nsp5_C145A SARS−CoV−2 protein as AP−MS bait

Extended Data Figure 3. Pfam Protein Families Enrichments for SARS-CoV-2 Host Factors. The enrichment of individual PFAM domains was calculated using a hypergeometric test where success is defined as the number of domains, and the number of trials is the number of individual preys pulled-down with each viral bait. The population values were the numbers of individual PFAM domains in the human proteome. To make sure that the p-values that signify enrichment were meaningful, we only considered PFAM domains that have been pulled-down at least three times with any SARS-CoV-2 protein, and which occur in the human proteome at least five times. Here, we show PFAM domains with the lowest p-value for a given viral bait protein.

Extended Data Figure 4. Lung mRNA expression of the interacting human relative to other proteins. The gene expression in the lung of the interacting proteins was observed to be higher that all other proteins (blue interacting proteins; median=25.52 TPM, grey all other proteins; median=3.198 TPM), p=0.0007).

0.4

0.2

0

-0.2

0.2 1.0 1.5 2.0 2.5

Extended Data Figure 5. Gene level expression values and specificity for the lung. Scatterplot of the lung mRNA expression (TPM) versus enrichment of lung mRNA expression (lung TPM/median all tissue TPM) for human interacting proteins. Red points denote drug targets that are labelled with their gene names. Points above the horizontal blue line represent interacting proteins that are enriched in lung expression and show how most human interacting proteins tend to be enriched in the lung.

Extended Data Figure 6. Evolutionary Conservation of SARS-CoV-2 Interacting Proteins. We calculated the observed/expected ratio from gnomAD1 for loss of function, missense, and synonymous mutations across all RefSeq . Compared to other genes (RefSeq Genes, grey), interacting proteins (blue) had lower ratios (indicating stronger intolerance of mutations) of observed/expected missense (median refseq score: 0.49, median interacting score: 0.36, p=2.46*10-11; t-test), and loss of function (median refseq score: 0.89, median interacting score: 0.85 p=9.12*10-7; t-test) but no difference in observed/expected synonymous mutations (median refseq score: 1.009 median interacting score: 1.004, p=0.48; t-test). Collectively, these results indicate that the interacting proteins have reduced genetic variation in human populations.

Coordinates of HDAC2 Annotation: IFKPIISKVMEMYQPSAVVLQCGADSLSGDRLGCFNLTVKGHAKCVEVVKTFNLPLLMLG a b c Total Length = 488aa GGGYTIRNVARCWTYETAVALDCEIPNELPYNDYFEYFGPDFKLHISPSNMTNQNTPEYMHDAC Domain (Interpro) 29-318 NLS (KKGAKK) 389-444 HDAC2 EKIKQRLFENLRMLPHAPGVQMQAIPEDAVHEDSGDEDGEDPDKRISIRASDKRIACDEEPredicted nsp5 cleavage (VQMQ|AIPE) 380-387 FSDSEDEGEGGRRNVADHKKGAKKARIEEDKKETEDKKTDVKEEDKSKDNSGEKTDTKGT nsp5 Cleavage KSEQLSNP NLSNLS 1 HDAC 488 HDAC2 Nsp5

380 VQMQ|AIPE 387 Coordinates of HDAC2 Annotation: Total Length = 488aa HDAC Domain (Interpro) 29-318 TRMT1 NLS (KKGAKK) 389-444 Predicted nsp5 cleavage (VQMQ|AIPE) 380-387 NLS nsp5 Cleavage MTS Nsp5 C145A 1 SAM-MT Zn-finger 659

GPX1 527 PRLQ|ANFT 534 TRMT1

Extended Data Figure 7. Candidate targets for the viral Nsp5 protease. (a) nsp5 WT and nsp5 C145A (catalytic dead mutant) interactome. (b) Domain maps of HDAC2 and TRMT1 illustrating predicted cleavage sites (using Coronanet 1.0). HDAC: Histone Deacetylase Domain, NLS: Nuclear Localization Sequence, MTS: Mitochondrial Targeting Sequence, SAM-MT: S-adenosylmethionine-Dependent Methyltransferase Domain. (c) Peptide docking of predicted cleavage peptides into the crystal structure of SARS-CoV nsp5. Consensus analysis of SARS-CoV-2 Orf6 homologs

Reference sequence (1): ref|YP_009724394.1 Identities normalised by aligned length. Colored by: identity

cov pid 1 [ . . . . : .] 61 1 ref|YP_009724394.1 100.0% 100.0% MFHLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLIIKNLSKSLTENKYSQLDEEQPMEID 2 gb|QIG55989.1 100.0% 98.4% MFHLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLIIKNLSKSLTVNKYSQLDEEQPMEID 3 gb|AVP78035.1 100.0% 93.4% MFHLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLIIKNLSKPPTENNCSQLDEEQPMEID 4 gb|AGZ48837.1 100.0% 73.8% MFHLVDFQVTIAEILIIIMRTFRIAIWNLDMIISSIVRQLFKPLTKNKYSELDDEEPMEID 5 gb|ALK02462.1 100.0% 72.1% MFHLVNFQVTIAEILIIIMRTFRIAIWNLDMIISSIVRQLFKPLTKNKYSELDDEEPMEID 6 gb|ACU31045.1 100.0% 70.5% MFHLVDFQVTIAEILIIIMRTFRIAIWNLDVIISSIVRQLFKPLTKKKYSELDDEEPMELD 7 gb|AGZ48811.1 100.0% 70.5% MFHLVDFQVTIAEILIIIMRTFRIAIWNLDMIISSIVRQLFKPLTKKKYSELDDEEPMELD 8 gb|ABD75318.1 100.0% 68.9% MFHLVDFQVTIAEILIIIMRTFRIAIWNLDVLISSIVRQLFKPLTKKKYPQLDDEEPMELD 9 gb|ASO66813.1 100.0% 67.2% MFHLVDFQVTIAEMLIIIMRTFRIAIWNLDVLISSIVRQLFKPLTKKKYPQLDDEEPMELD 10 sp|Q0Q471.1 100.0% 68.9% MFHLVDFQVTIAEILIIIMKTFRVAIWNLDILISSIVRQLFKPLTKKKYSELDDEEPMELD 11 gb|AHX37562.1 100.0% 70.5% MFHLVDFQVTIAEILIIIMRTFKIAIWNLDVIISSIVRQLFKPLTKKNYSELDDEEPMEIE 12 ref|NP_828856.1 100.0% 68.9% MFHLVDFQVTIAEILIIIMRTFRIAIWNLDVIISSIVRQLFKPLTKKNYSELDDEEPMELD 13 gb|ATO98150.1 100.0% 68.9% MFHLVDFQVTIAEILIIIMRTFRIAIWNLDMIISSIVRQLFKPLTKKNYSELDDEEPMELD 14 gb|AEA11018.1 100.0% 68.9% MFHLVDFQVTIAEILIIIMRTFRIAIWNLDVIISSIVRQLLKPLTKKNYSELDDEEPMELD 15 gb|ATO98137.1 100.0% 68.9% MFHLVDFQVTIAEILIIIMRTFRITIWNLDMIISSIVRQLFKPLTKKNYSELDDEEPMELD 16 gb|ATO98186.1 100.0% 68.9% MFHLVDFQVTIAEILIIIMRTFRIAIWNLDIIISSIVRQLFKPLTKKNYSELDDEEPMELD 17 gb|AAS01069.1 100.0% 67.2% MFHLVDFQVTVAEILIIIMRTFRIAIWNLDVIISSIVRQLFKPLTKKNYSELDDEEPMELD 18 gb|ACZ71831.1 100.0% 67.2% MFHLVDFQVTIAEILIIIMRTFRIAIWNLDVVISSIVRQLFKPLTKKNYSELDDEEPMELD 19 gb|AAP72979.1 100.0% 67.2% MFHLVDFQVTIAEILIIIMRTFRIAIWNLDVIISSIVRQLFKPLTKKNYSELDDEEPMELB 20 gb|ARO76386.1 100.0% 67.2% MFHLVDFQVTIAEILIIIMRTFRIAIWNLDVLISSIVRQLFKPLTKKNYSELDDEEPMELD 21 gb|AKZ19091.1 100.0% 70.5% MFHLVDFQVTIAEILVIIMRTFRIAIWNLDMITSSIVTQLFKPLTKKKYSELDDEVPMEID 22 gb|ATO98162.1 100.0% 67.2% MFHLVDFQVTIAEILIIIMRTFRIAIWNLDMIISSIVRQLFKPLTKKNYPELDDEEPMELD 23 gb|ACZ72113.1 100.0% 67.2% MFHLVDFQVTIAEILIIIMRTFRIAIWNLDVIISSIVRQLFKPLTKKNYSELDDEEPMKLD 24 sp|Q3LZX8.1 100.0% 67.2% MFHLVDFQVTIAEILIIIMKTFRVAIWNLDILISSIVRQLFKPLTKKNYSELDDEEPMELD 25 gb|AKZ19080.1 100.0% 70.5% MFHLVDFQVTIAEILVIIMRTFRIAIWNLDMITSSIVTQLFKPLTKKKYSELDDEVPMEID 26 gb|AGT21083.1 100.0% 67.2% MFHLVDFQVTIAEISIIIMRTFRIAIWNLDVIISSIVRQLLKPLTKKNYSELDDEEPMELD 27 gb|ANA96032.1 100.0% 65.6% MFHLVDFQVTIAEMLIIIMRTFRIAILNLDVLISSIVRQLFKPLTKKKYPQLDDEEPMELD 28 gb|AAP30035.1 100.0% 67.2% MFHLVDFQVTIAEILIIIMRTFRIAIWNLDVIISSIVRQLFKPLTKKNYSELDDEELMELD 29 gb|AIA62282.1 100.0% 62.3% MFHLVDFQVTIAEMLIIIMRTFRIAILDLDVLISSIVRQSFKPLTKKKYPQLDDEEPMELD 30 gb|AIA62334.1 82.0% 68.0% MFHPVDFQVTIAEILIIIMRTFRIAIWNLDVIISSIVRQLFKPLTKKNYS------31 gb|APO40583.1 100.0% 50.8% MFSLVEFQVTIAELLIIIMRSLGIGLVQFQIRMIALLKIISKHLDRNQHSKLDEEVPMEID 32 dbj|BAC81367.1 49.2% 86.7% MFHLVDFQVTIAEILIIIMRTFRIAIWNLD------33 ref|YP_003858588.1 98.4% 47.5% MFSLVAFQVTVAELLILIMKSFGLALTHIQIGIVSLLKILTNRL-DRRYSKLDEEEPMEID 34 gb|AAS44598.1 65.6% 55.0% ------FRIAIWNLDVIISSIVRQLFKPLTKKNYSELDDEEPMELD 35 gb|AAS44653.1 65.6% 55.0% ------FRIAIWNLDVIISSIVRQLFKPLTKKNYSKLDDEEPMELD consensus/100% ...... htlslhphp...... consensus/90% MFpLVsFQVTlAEhLlIIM+oF+luIhNLDhlhs.Il+pL.KsLTcppYspLD-E.PMEl- consensus/80% MFHLVDFQVTIAEILIIIMRTFRIAIWNLDhlISSIV+QLhKPLTK+pYSpLDDEpPMElD consensus/70% MFHLVDFQVTIAEILIIIMRTFRIAIWNLDhlISSIVRQLFKPLTKKpYScLDDEEPMElD

Extended Data Figure 8. Consensus analysis of SARS-CoV-2 Orf6 homologs. Multiple sequence alignment of 34 homologs of SARS-CoV-2 Orf6. Sequence coverage (cov) and percent identity (pid) shown for each homologous sequence. Amino acids are colored according to identity with SARS-CoV-2 Orf6 (sequence 1 ref|YP_009724394.1). Colors indicate chemical properties of amino acids: small and/or hydrophobic (bright green, A, V, L, I, M, P, G), bulky (dark green, F, Y, W, H), basic (red, R, K), acidic (blue, D, E), neutral (purple, Q, N), polar (light blue, S, T), cysteine (C, yellow). The key methionine M58, and the acidic residues E55, D59, D61 of the putative NUP98-RAE1 binding motif are shown to be highly conserved. Multiple sequence alignment was visualized using the MView web server2: https://www.ebi.ac.uk/Tools/msa/mview/.

Experimental and predicted structural features of BRD2 and Protein E

BRD2 73 BD1 194 344 BD2 455 621 NET 750 1 801

Protein E 8 65 1 * * 76

Extended Data Figure 9. Experimental and predicted structural features of BRD2 and Protein E. Domain names are indicated for BRD2 along with residue numbers. Areas of protein E where its SARS-CoV homolog has been described by high-resolution structural models (PDB ID: 2MM4) are shown in white blocks and thick black lines above the sequence line. The ribbons indicate alpha helical sections. Protein E is predicted to have two N-linked glycosylation sites3 (red stars on sequence line). Structural predictions, as generated using PredictProtein4 are found below the sequence line. The first row indicates secondary structure predictions for alpha helices (red) and beta sheets (blue). The second row beneath the sequence line indicates areas expected to be buried (yellow) and exposed (blue). At the bottom, green bars indicate areas predicted to be disordered.

Extended Data References

1. Karczewski, K. J. et al. Variation across 141,456 human exomes and genomes reveals the spectrum of

loss-of-function intolerance across human protein-coding genes. bioRxiv 531210 (2019)

doi:10.1101/531210.

2. Brown, N. P., Leroy, C. & Sander, C. MView: a web-compatible database search or multiple alignment

viewer. Bioinformatics 14, 380–381 (1998).

3. Fung, T. S. & Liu, D. X. Post-translational modifications of coronavirus proteins: roles and function. Future

Virol. 13, 405–430 (2018).

4. Yachdav, G. et al. PredictProtein--an open resource for online prediction of protein structural and

functional features. Nucleic Acids Res. 42, W337–43 (2014).