SARS-CoV-2 Nucleocapsid attenuates stress granule formation and alters expression via direct interaction with host mRNAs

Syed Nabeel-Shah1,2,5, Hyunmin Lee1,3,5, Nujhat Ahmed1,2,5, Edyta Marcon1,5, Shaghayegh Farhangmehr1,2, Shuye Pu1, Giovanni L. Burke1,2, Kanwal Ashraf1, Hong Wei4, Guoqing Zhong1, Hua Tang1, Jianyi Yang4, Benjamin J. Blencowe1,2, Zhaolei Zhang1,2,3, Jack F. Greenblatt1,2, * 1. Donnelly Centre, University of Toronto, Toronto, M5S 3E1, Canada 2. Department of Molecular Genetics, University of Toronto, Toronto, M5S 1A8, Canada. 3. Department of Computer Sciences, University of Toronto, Toronto, M5S 1A8, Canada. 4. School of Mathematical Sciences, Nankai University, Tianjin 300071, China 5 These authors contributed equally to this work * Corresponding author: Jack F. Greenblatt; Email: [email protected]

Keywords: SARS-CoV-2, COVID-19, Nucleocapsid N protein, Stress granules, G3BP1 and G3BP2, regulation, host mRNA-binding

Supplemental Figures Extended Figure 1: Dot plot overview of the interaction partners identified with 21 SARS-CoV- 2 expressed in HEK293 cells. Inner circle color represents the average spectral count, the circle size maps to the relative prey abundance across all samples shown, and the circle outer edge represents the SAINT FDR. Legend is provided. Extended Figure 2: KEGG pathway (A) and Biological processes (B) enrichment analysis using cellular proteins identified as high confidence (FDR<1%) interaction partners for the SARS- CoV-2 proteins. Only top 10 most significant terms are shown (Q0.05). Darker nodes are more significantly enriched gene sets. Bigger nodes represent larger gene sets. Thicker edges represent more overlapped . Extended Figure 3: A- Pie chart representing the distribution of N iCLIP peaks across various RNA categories. Majority of the peaks were identified in the 3’ untranslated region (3’UTR) of the mRNAs. B- GO enrichment analysis using N target genes as identified by iCLIP-seq. Dot plot on the left represents enrichment for biological processes and network on the right indicates KEGG pathways that were found to be enriched in the N targets. Legend is provided for the dot plot. C: Left- Five most significantly enriched sequence motifs as identified by the MEME software using N iCLIP peaks. E-values are provided below each motif to indicate the statistical significance over background sequences. Right- Centrino output plot indicating the distribution of the top two motifs around N’s crosslinking sites on the RNAs as identified by iCLIP. Motifs were found to be centrally enriched. Extended Figure 4: Principal component analysis showing the correlation between RNA-seq replicates for different conditions. Figure legend is provided. Extended Figure 5: A- GO enrichment analysis using genes that were significantly upregulated in N cells in comparison to the GFP cells. Only top 10 most significant terms related to biological processes are shown (Q0.05). Darker nodes are more significantly enriched gene sets. Bigger nodes represent larger gene sets. Thicker edges represent more overlapped genes. B- GO enrichment analysis using genes that were significantly downregulated in N cells in comparison to the GFP cells. Figure legend same as described for A. Extended Figure 6: Cumulative distribution analysis in abundance of G3BP1, G3BP2 and N shared iCLIP target mRNAs after N overexpression in HEK293 cells. Legend is provided.

NSP10 ● ● ● N ● ● ● ● ● ●●●●●●●●●●●●● ● ● NSP9 ● ● ● ● ● ● ● ●●●●●●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● NSP5 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●●●●●●●●●●●●●● ● ● ● ●●●●●● ● NSP13 ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●●●● ● ●● ● ● ●●●●●●●●●●●●●●●●● NSP12 ●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●●●● ● ● ● ● ● ● ●● ● ● ● ● ● NSP16 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●● ORF9B ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● NSP4 ●● ● ● ● ● NSP2 ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

NSP6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Bait ● ●●●●●●●● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●● ●●●● ● ●●● ●●● ● ● ● ● ● ● ORF7B ●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ORF9BWU ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● NSP7 ● ● ● ● ● ● ● ● ● ● ●●●●●● ● ● ●● ● ● ● ● ● ●● NSP8 ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● S ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ORF8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● NSP15 ●●●● ● ● ● ● ● ● ● ● ● ● ● ORF3B ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ORF6 ● ● ●● ●● ● ● ● ●●● ●●●●●●●●● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●● NSP14 ● ● ● ● ● ●● ● ● AIP FLII ELL TKT SSB PHB CKB PNP CAD TPI1 PKM EMD SRM IRS4 DIS3 IPO7 IPO8 TP53 H1 − 4 CFL1 VAPA RPL4 EEF2 IMMT TCP1 TOP1 FASN RO60 CCT2 CCT4 CCT5 CCT7 CCT8 CCT3 GET3 PHB2 RPS5 RPS7 P4HB PDP1 ILVBL BAG6 BAG2 BAG5 SUN2 RPN2 TUBB RPN1 ERC1 DHX9 DKC1 CBR1 XPO1 ELOB GLO1 XPOT COPA TECR RNH1 EIF3F ASNS NOP2 ENO1 NACA CLGN CANX LMNA SMU1 GCN1 PDHB SDHA NME2 EIF5A AAMP GTF2I EIF3D RBMX TMPO IARS1 PDIA3 MTOR COMT MCM7 MCM6 H1 − 10 DIMT1 SF3B1 RPL23 RPL14 RPL10 SF3B3 RPL27 RPL13 RPL31 RPL12 PA2G4 HSPA5 LRFN1 RPLP1 CSE1L SEC63 RPLP0 ESYT1 RPS17 RPS16 RPS24 RPS26 RPS25 UBA52 PTBP1 RPS11 RPS23 RPS13 FKBP4 NUP93 SFXN1 PTPN1 GFPT1 PRPF8 DDX21 FUBP3 CTPS1 G3BP1 G3BP2 HPRT1 ACTN4 VDAC3 VDAC1 VDAC2 ACOT7 CCT6A PFDN4 HTRA2 UCHL1 NOP58 ERBB2 NOP56 KPNB1 CKAP4 RPS4X EPRS1 PCBP1 ANXA5 LMAN2 TNPO1 NOC2L LMNB2 LMNB1 DDX3X NARS1 RBM14 NOLC1 PDHA1 HSPD1 EIF4A1 RHOT2 HACD3 CAND1 PRMT1 XRCC5 CCAR2 PDCD5 TRIP13 NTPCR SRPRB PSMC4 SARM1 MARK3 MARK2 PSMC6 PSMC5 PSMC3 PSMC2 PSMC1 PSMD1 PSMD2 MARS1 PSMD3 TTLL12 MYO1C NAMPT PRKDC NUMA1 ZMYM1 ERLIN2 PGAM1 NUFIP2 ATP2B1 ATP2A2 ATP1A1 HUWE1 ATAD3A TIMM50 SLC1A5 SLC3A2 AFG3L2 EEF1A2 EEF1A1 RPL9P8 RPL27A RPS27L ALYREF SPTAN1 ATP5PD IMPDH2 NUP210 NUP205 H2BC13 DNAJA1 DNAJA2 DNAJA3 HEATR1 ATP5PO HSPA1B LRRC59 YME1L1 RUVBL2 COX7A2 ZSWIM8 LMAN2L TUBB4A TUBA1A TUBB2B DDX39B NDUFA5 PDCD11 PTGES3 SERBP1 NDUFS2 NDUFS3 NDUFS7 CCDC93 CCDC22 PSMD11 MTHFD1 SNRPD1 TOMM70 HNRNPF ATP5F1B POLDIP2 POLDIP3 ATP5F1C HNRNPC HNRNPU HNRNPR PGRMC1 SLC25A6 SLC25A5 SLC25A3 SLC25A1 HNRNPM DNAJB11 SEC61A1 MAGOHB COMMD4 COMMD9 COMMD8 COMMD3 ALDH3A2 PPP2R2A NDUFAF4 NDUFAF3 DYNC1H1 MYBBP1A HNRNPH3 HNRNPH1 GORASP2 SLC25A19 SLC25A11 SLC25A13 COMMD10 HSD17B10 HSP90AA1 SNRNP200 AASDHPPT HSP90AB2P HNRNPA2B1 Prey 0 AvgIntensity 50

Relative abundance

BFDR

£ 0.01 £ 0.05 > 0.05 Extended Figure 2

KEGG pathway enrichment A

B Biological processes enrichment Extended Figure 3

A 0.1 0.5 0.3 0.2

Features 0.99 antisense lincRNA miRNA Mt_rRNA Mt_tRNA protein_coding

Features

antisense lincRNA miRNA Mt_rRNA Mt_tRNA protein_coding pseudogene B GO biological processes

posttranscriptional regulation of gene expression mRNA processing KEGG pathway proteasomal protein catabolic process

regulation of cellular amide metabolic process

RNA catabolic process

nuclear transport

nucleocytoplasmic transport p.adjust mRNA catabolic process 5.0e−09 1.0e−08 regulation of 1.5e−08 2.0e−08 regulation of mRNA metabolic process

RNA localization Count

80 nucleobase−containing compound transport 120

nuclear−transcribed mRNA catabolic process 160 200 nucleic acid transport

RNA transport

nuclear export

establishment of RNA localization

protein export from nucleus

regulation of mRNA catabolic process

RNA export from nucleus

0.02 0.03 0.04 0.05 GeneRatio

C Motif 1 Motif 2

E value: 6.5e-1603 2.5e-1248

Motif 4 Motif 3

Motif-1 Motif-2 Features pseudogene protein_coding Mt_tRNA Mt_rRNA miRNA lincRNA antisense 2.5e-1042 3.2e-974 Motif 5

3.0e-892 Extended Figure 4

6

3 Type GFP N 0 Treatment Ar No PC2: 25% variance −3

−6 −5 0 5 PC1: 70% variance Extended Figure 5

A

B Extended Figure 6

1.00

0.75

Status G3BP1_bound 0.50 G3BP2_bound N_bound Not_bound Cumulative Probability Cumulative 0.25

0.00 P value ~= 0

−5 0 5 log2fc