Supplement for TWO-SIGMA-G

Eric Van Buren, Ming Hu, Liang Chen, John Wrobel, Kirk Wilhelmsen, Lishan Su, Yun Li, Di Wu S1 Additional Type-I Error and Power Results

Independent (No IGC) Genes Simulated with IGC (A) No -Level Random Effects (B) No Gene-Level Random Effects Test Size = 30, Ref. Size = 30 Test Size = 30, Ref. Size = 30 0.03 0.03

0.02 0.02

0.01 0.01 Type-I Error Type-I Error Observed Set-Level Observed Set-Level 0.00 0.00 0.0000 0.0025 0.0050 0.0075 0.0100 0.0000 0.0025 0.0050 0.0075 0.0100 Significance Threshold Significance Threshold

CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G

Genes Simulated with IGC Genes Simulated with IGC (C) Gene-Level Random Effects Present (D) Gene-Level Random Effects Test Size = 30, Ref. Size = 30 Incorrectly Absent 0.03 Test Size = 30, Ref. Size = 30 0.03

0.02 0.02

0.01 0.01 Type-I Error Type-I Error Observed Set-Level 0.00 Observed Set-Level 0.00 0.0000 0.0025 0.0050 0.0075 0.0100 0.0000 0.0025 0.0050 0.0075 0.0100 Significance Threshold Significance Threshold

CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Supplementary Figure S1: Type-I error performance of CAMERA, MAST, and TWO-SIGMA-G as significance threshold varies from 0 to .01. Each panel varies the existence of IGC between genes in the test set and the presence of gene-level random effect terms in gene-level model (CAMERA never includes gene-level random effect terms). Each plot combines six different settings, and 10 replicates per setting, which vary both the magnitude of the average inter-gene correlation (where applicable) in the test set and the nature of the correlation structure via the introduction of other individual-level covariates. See the Methods section of the main text for more details regarding the simulation procedure.

1 Independent Genes (No IGC) Genes Simulated with IGC (A) No Gene-Level Random Effects (B) No Gene-Level Random Effects Test Size = 30, Ref. Size = 30 Test Size = 30, Ref. Size = 30 0.100 0.100

0.075 0.075

0.050 0.050

0.025 0.025 Type-I Error Type-I Error Observed Set-Level Observed Set-Level 0.000 0.000 0.00 0.01 0.02 0.03 0.04 0.05 0.00 0.01 0.02 0.03 0.04 0.05 Significance Threshold Significance Threshold

CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G

Genes Simulated with IGC Genes Simulated with IGC (C) Gene-Level Random Effects Present (D) Gene-Level Random Effects Test Size = 30, Ref. Size = 30 Incorrectly Absent 0.100 Test Size = 30, Ref. Size = 30 0.100

0.075 0.075

0.050 0.050

0.025

Type-I Error 0.025 Type-I Error Observed Set-Level 0.000 Observed Set-Level 0.000 0.00 0.01 0.02 0.03 0.04 0.05 0.00 0.01 0.02 0.03 0.04 0.05 Significance Threshold Significance Threshold

CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Supplementary Figure S2: Type-I error performance of CAMERA, MAST, and TWO-SIGMA-G as significance threshold varies from 0 to .05. Each panel varies the existence of IGC between genes in the test set and the presence of gene-level random effect terms in gene-level model (CAMERA never includes gene-level random effect terms). Each plot combines six different settings, and 10 replicates per setting, which vary both the magnitude of the average inter-gene correlation (where applicable) in the test set and the nature of the correlation structure via the introduction of other individual-level covariates. See the Methods section of the main text for more details regarding the simulation procedure.

2 Independent Genes (No IGC) Genes Simulated with IGC (A) No Gene-Level Random Effects (B) No Gene-Level Random Effects Test Size = 30, Ref. Size = 100 Test Size = 30, Ref. Size = 100

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0.0 0.0 Set-Level Type-I Error Unadjusted Adjusted Set-Level Type-I Error Unadjusted Adjusted p-values p-values

CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G

Genes Simulated with IGC Genes Simulated with IGC (C) Gene-Level Random Effects Present (D) Gene-Level Random Effects Test Size = 30, Ref. Size = 100 Incorrectly Absent Test Size = 30, Ref. Size = 100 0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0.0 0.0 Set-Level Type-I Error

Unadjusted Adjusted Set-Level Type-I Error Unadjusted Adjusted p-values p-values

CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Supplementary Figure S3: Type-I error performance of CAMERA, MAST, and TWO-SIGMA-G using a reference set size of 100 genes. Each panel varies the existence of IGC between genes in the test set and the presence of gene-level random effect terms in gene-level model (CAMERA never includes gene-level random effect terms). Within each panel, both unadjusted and adjusted set-level p-values are plotted (unadjusted p-values are unavailable for MAST). Each boxplot aggregates six different settings which vary both the magnitude of the average inter-gene correlation (where applicable) in the test set and the nature of the correlation structure via the introduction of other individual-level covariates. Such settings are intended to represent the diversity seen in real data sets to paint an accurate picture of testing properties over a wide range of gene sets. Each of the six settings is further composed of 10 replicates which vary only random seed to mimic the impact of a different starting pool of cells from which genes were simulated. See the Methods section of the main text for more details regarding the simulation procedure. 3 Genes Simulated with IGC Genes Simulated with IGC (A) No Gene-Level Random Effects (B) Gene-Level Random Effects Test Size = 30, Ref. Size = 30 Incorrectly Absent 0.3 Test Size = 30, Ref. Size = 30 0.3

0.2 0.2

0.1 0.1

Set-Level0.0 Type-I Error 0.0 T0, T20, T50, Set-Level Type-I Error T0, T20, T50, R0 R20 R50 R0 R20 R50 Configuration (% of Genes DE in Configuration (% of Genes DE in Test/Reference Sets) Test/Reference Sets)

CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G

Genes Simulated with IGC Genes Simulated with IGC (C) No Gene-Level Random Effects (D) Gene-Level Random Effects Test Size = 30, Ref. Size = 100 Incorrectly Absent 0.3 Test Size = 30, Ref. Size = 100 0.3

0.2 0.2

0.1 0.1

Set-Level0.0 Type-I Error 0.0 T0, T20, T50, Set-Level Type-I Error T0, T20, T50, R0 R20 R50 R0 R20 R50 Configuration (% of Genes DE in Configuration (% of Genes DE in Test/Reference Sets) Test/Reference Sets)

CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Supplementary Figure S4: Type-I error performance of TWO-SIGMA-G, CAM- ERA, and MAST for various set-level null hypotheses. Each panel varies the refer- ence set size and the presence of gene-level random effect terms in gene-level model. Scenarios along the x-axis of each panel vary the percentage of genes that are differentially expressed (with the same effect size) in the test and reference sets. Each boxplot aggregates six differ- ent settings which vary both the magnitude of the average inter-gene correlation in the test set and the nature of the correlation structure via the introduction of other individual-level covariates. Such settings are intended to represent the diversity seen in real data sets to paint an accurate picture of testing properties over a wide range of gene sets. Each of the six settings is further composed of 10 replicates which vary only random seed to mimic the impact of a different starting pool of cells from which genes were simulated. See the Methods section of the main text for more details regarding the simulation procedure.

4 Genes Simulated with IGC Genes Simulated with IGC (A) No Gene-Level Random Effects (B) No Gene-Level Random Effects Test Size = 30, Ref. Size = 100 Test Size = 30, Ref. Size = 30 0.100 0.100

0.075 0.075

0.050 0.050

0.025 0.025

Set-Level0.000 Type-I Error Set-Level0.000 Type-I Error T0, T20, T50, T0, T20, T50, R0 R20 R50 R0 R20 R50 Configuration (% of Genes DE in Configuration (% of Genes DE in Test/Reference Sets) Test/Reference Sets)

iDEA PAGE TWO-SIGMA-G iDEA PAGE TWO-SIGMA-G Supplementary Figure S5: Type-I error performance of iDEA, PAGE, and TWO- SIGMA for various set-level null hypotheses for genes simulated with IGC. Refer- ence set sizes of 100 and 30 are shown. Each boxplot aggregates six different settings which vary both the magnitude of the average inter-gene correlation in the test set and the nature of the correlation structure via the introduction of other individual-level covariates. Such settings are intended to represent the diversity seen in real data sets to paint an accurate picture of testing properties over a wide range of gene sets. Each of the six settings is further composed of 10 replicates which vary only random seed to mimic the impact of a different starting pool of cells from which genes were simulated. See the Methods section of the main text for more details regarding the simulation procedure.

5 Independent Genes (no IGC) Genes Simulated with IGC (A) No Gene-Level Random Effects (B) No Gene-Level Random Effects Test Size = 30, Ref. Size = 100 Test Size = 30, Ref. Size = 100 1.00 1.00

0.75 0.75

0.50 0.50

0.25 0.25

Set-Level0.00 Power Set-Level0.00 Power T100,T100, T80, T80, T80, T50, T50, T20, T100,T100, T80, T80, T80, T50, T50, T20, R0 R50 R0 R20 R50 R0 R20 R0 R0 R50 R0 R20 R50 R0 R20 R0 Configuration (% of Genes Configuration (% of Genes DE in Test/Reference Sets) DE in Test/Reference Sets)

CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G

Genes Simulated with IGC Genes Simulated with IGC (C) Gene-Level Random Effects Present (D) Gene-Level Random Effects Test Size = 30, Ref. Size = 100 Incorrectly Absent 1.00 Test Size = 30, Ref. Size = 100 1.00 0.75 0.75 0.50 0.50

0.25 0.25 Set-Level Power

0.00 Set-Level0.00 Power T100,T100, T80, T80, T80, T50, T50, T20, T100,T100, T80, T80, T80, T50, T50, T20, R0 R50 R0 R20 R50 R0 R20 R0 R0 R50 R0 R20 R50 R0 R20 R0 Configuration (% of Genes Configuration (% of Genes DE in Test/Reference Sets) DE in Test/Reference Sets)

CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Supplementary Figure S6: Set-level power of CAMERA, MAST, and TWO-SIGMA- G using a reference set size of 100 genes (corresponds to Figure 3 in main text). Each panel varies the existence of IGC between genes in the test set and the presence of gene- level random effect terms in gene-level model (CAMERA never includes gene-level random effect terms). Scenarios along the x-axis of each panel vary the percentage of genes that are differentially expressed (with the same effect size) in the test and reference sets. For example, “T80,R50” corresponds to the configuration under the alternative hypothesis in which 80% of test set genes are DE and 50% of reference set genes are DE. Each boxplot aggregates six different settings which vary both the magnitude of the average inter-gene correlation in the test set and the nature of the correlation structure via the introduction of other individual-level covariates. Such settings are intended to represent the diversity seen in real data sets to paint an accurate picture of testing properties over a wide range of gene sets. Each of the six settings is further composed of 10 replicates which vary only random seed to mimic the impact of a different starting pool of cells from which genes were simulated. See 6 the Methods section of the main text for more details regarding the simulation procedure. Genes Simulated with IGC Genes Simulated with IGC (A) No Gene-Level Random Effects (B) No Gene-Level Random Effects Test Size = 30, Ref. Size = 100 Test Size = 30, Ref. Size = 30 1.00 1.00

0.75 0.75

0.50 0.50

0.25 0.25 Set-Level Power Set-Level Power

0.00 0.00 T100M,T100, T80M, T80, T80M, T80, T50M, T50, T20M, T20, T100M,T100, T80M, T80, T80M, T80, T50M, T50, T20M, T20, R50 R50 R50 R50 R20 R20 R20 R20 R0 R0 R50 R50 R50 R50 R20 R20 R20 R20 R0 R0 Configuration Configuration

CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Supplementary Figure S7: Set-level power of CAMERA, MAST, and TWO-SIGMA- G using differing DE magnitudes at the gene-level. Genes are simulated with IGC, and reference set sizes of 100 and 30 are used for gene set testing. Scenarios along the x- axis of each panel vary the percentage of genes that are differentially expressed (with the same effect size) in the test and reference sets. For example, “T80,R50” corresponds to the configuration under the alternative hypothesis in which 80% of test set genes are DE and 50% of reference set genes are DE. Within some test sets, the amount of DE is mixed: with 50% of genes having twice as large of an effect size as the other half (see, e.g., “T100M, R50”). Each boxplot aggregates six different settings which vary both the magnitude of the average inter-gene correlation in the test set and the nature of the correlation structure via the introduction of other individual-level covariates. Such settings are intended to represent the diversity seen in real data sets to paint an accurate picture of testing properties over a wide range of gene sets. Each of the six settings is further composed of 10 replicates which vary only random seed to mimic the impact of a different starting pool of cells from which genes were simulated. See the Methods section of the main text for more details regarding the simulation procedure.

7 Genes Simulated with IGC Genes Simulated with IGC (A) No Gene-Level Random Effects (B) No Gene-Level Random Effects Test Size = 30, Ref. Size = 100 Test Size = 30, Ref. Size = 30 1.00 1.00

0.75 0.75

0.50 0.50

0.25 0.25

Set-Level0.00 Power Set-Level0.00 Power T100, T80, T80, T50, T100, T80, T80, T50, R50 R20 R50 R20 R50 R20 R50 R20 Configuration (% of Genes Configuration (% of Genes DE in Test/Reference Sets) DE in Test/Reference Sets)

iDEA PAGE TWO-SIGMA-G iDEA PAGE TWO-SIGMA-G Supplementary Figure S8: Set-level power of iDEA, PAGE, and TWO-SIGMA-G using differing DE magnitudes at the gene-level. Genes are simulated with IGC using reference set sizes of 100 and 30. Scenarios along the x-axis of each panel vary the percentage of genes that are differentially expressed (with the same effect size) in the test and reference sets. For example, “T80,R50” corresponds to the configuration under the alternative hypothesis in which 80% of test set genes are DE and 50% of reference set genes are DE. Because iDEA performed poorly in scenarios involving “R0”, they were excluded. Each boxplot aggregates six different settings which vary both the magnitude of the average inter-gene correlation in the test set and the nature of the correlation structure via the introduction of other individual-level covariates. Such settings are intended to represent the diversity seen in real data sets to paint an accurate picture of testing properties over a wide range of gene sets. Each of the six settings is further composed of 10 replicates which vary only random seed to mimic the impact of a different starting pool of cells from which genes were simulated. See the Methods section of the main text for more details regarding the simulation procedure.

8 S2 Additional HIV Data Results

BROWNE INTERFERON RESPONSIVE GENES, BROWNE INTERFERON RESPONSIVE GENES, Cell Type: NK, Cell Type: Erythroid cells, HIV only vs Mock HIV only vs Mock 6.6 2.4 Enrichment Enrichment 0 0 Up Up Down Down 0.4 1.0 1.7 3.1 -3.0 -2.0 -1.2 -0.7 -0.1 49.8 0.09 0.68 1.50 2.81 -27.4 -2.89 -2.06 -1.46 -0.95 -0.46 16.95 -19.88 Statistic Statistic

BROWNE INTERFERON RESPONSIVE GENES, BROWNE INTERFERON RESPONSIVE GENES, Cell Type: ILC, Cell Type: B cells, HIV only vs Mock HIV only vs Mock 6.8 6.5 Enrichment Enrichment 0 0 Up Up Down Down 0.05 0.41 0.79 1.32 2.14 0.08 0.51 1.01 1.80 -1.91 -1.15 -0.68 -0.30 -2.41 -1.57 -1.06 -0.64 -0.29 31.93 24.09 -25.26 -21.95 Statistic Statistic

BROWNE INTERFERON RESPONSIVE GENES, BROWNE INTERFERON RESPONSIVE GENES, Cell Type: mDC, Cell Type: pDC, HIV only vs Mock HIV only vs Mock 6.8 6.6 Enrichment Enrichment 0 0 Up Up Down Down 0.39 0.86 1.44 2.55 -2.55 -1.50 -0.90 -0.46 -0.03 29.57 4e-01 8e-01 -22.61 -8e-01 -4e-01 -3e-03 1e+00 2e+00 2e+01 -1e+01 -2e+00 -1e+00 Statistic Statistic

Supplementary Figure S9: Barcode plots showing gene-level statistics for the six most common cell types in the HIV dataset for the gene set “BROWNE INTERFERON RESPONSIVE GENES.” The x-axis shows the concentration of gene-level statistics (each bar represents one gene-level statistic), and the enrichment shown in the y-axis demonstrates that gene-level statistics tend to be overwhelmingly positive and large in magnitude for all cell types except Erythroid cells.

9 S3 Comparing Early Stage AD Patients to Control

Gene-Level Log-FC for Set: KEGG OXIDATIVE PHOSPHORYLATION Early Stage AD Patients vs. Control

Mic (N = 1491, set rank = 2271) 0.4

Ex (N = 29018, set rank = 3) 0.2

In (N = 7621, set rank = 1) 0

Opc (N = 2207, set rank = 1299) -0.2

Oli (N = 14806, set rank = 486) -0.4

Ast (N = 2840, set rank = 1797) NDUFA5 ATP6V0D1 ATP6V1C1 SDHA COX7A2L NDUFB5 CYC1 NDUFS8 COX7A2 NDUFAB1 UQCRQ NDUFA8 NDUFV2 UQCRFS1 LHPP SDHC NDUFV3 ATP6V0B ATP6V1D NDUFB7 ATP6V1E1 NDUFS1 NDUFB1 ATP6V0A1 COX11 ATP6V1B2 COX6B1 PPA1 NDUFA2 NDUFA1 COX5A NDUFB2 COX6C COX7C UQCRB UQCRHL NDUFB4 ATP6AP1 UQCRH UQCRC1 ATP6V1A SDHB NDUFC1 NDUFA10 PPA2 NDUFS4 NDUFS2 ATP6V0E1 NDUFB6 ATP6V1F COX4I1 NDUFA4 UQCR10 NDUFB9 NDUFS7 COX5B NDUFB10 COX7B ATP6V0E2 NDUFV1 NDUFS5 NDUFS3 NDUFB3 ATP6V1H UQCRC2 ATP6V1G1

Heatmap of Gene-Level log10 p-values (Unadjusted) by Cell Type 0 Mic -2 Ex -4

In -6

Opc -8

Oli -10

Ast

Supplementary Figure S10: Cell-type specific variation in gene-level significance for genes in the KEGG OXIDATIVE PHOSPHORYLATION pathway comparing early stage AD patients to controls. Gene names that are bolded are signifcant over all cell types after FDR-adjustment of the Fisher’s method p-value.

10 Gene-Level Log-FC for Set: MOOTHA VOXPHOS Early Stage AD Patients vs. Control

Mic (N = 1491, set rank = 3595) 0.4

Ex (N = 29018, set rank = 8) 0.2

In (N = 7621, set rank = 5) 0

Ast (N = 2840, set rank = 2123) -0.2

Oli (N = 14806, set rank = 779) -0.4

Opc (N = 2207, set rank = 1581) NDUFA2 NDUFA1 CYCS NDUFB7 NDUFS8 COX7A2 NDUFAB1 SDHA NDUFB5 CYC1 COX7A2L COX11 NDUFB1 NDUFS1 GATB SDHC BCL2L1 UQCRQ NDUFA8 UQCRC1 SDHB NDUFC1 NDUFA10 CNOT7 NDUFS2 NDUFS4 NDUFB6 UQCRC2 COX5B COX7B NDUFS7 NDUFS5 COX5A NDUFS3 NDUFB2 COX6C NDUFB4 COX7C UQCRB NDUFB3 UQCRH NDUFA5 COX4I1 NDUFA4 UQCR10 UQCRFS1 COX6B1 TBCB

Heatmap of Gene-Level log10 p-values (Unadjusted) by Cell Type 0 Mic -2 Ex -4

In -6

Ast -8

Oli -10

Opc Supplementary Figure S11: Cell-type specific variation in gene-level significance for genes in the MOOTHA VOXPHOS pathway comparing early stage AD patients to controls. Gene names that are bolded are signifcant over all cell types after FDR- adjustment of the Fisher’s method p-value.

11 Gene-Level Log-FC for Set: KEGG OXIDATIVE PHOSPHORYLATION Late vs. Early Stage AD Patients

Mic (N = 955, set rank = 632) 0.4

Ast (N = 1830, set rank = 17) 0.2

Opc (N = 1290, set rank = 26) 0

Oli (N = 9035, set rank = 14) -0.2

Ex (N = 17878, set rank = 11) -0.4

In (N = 4371, set rank = 1) NDUFS8 NDUFB7 COX5B NDUFA4 COX6B1 COX7A2 NDUFB2 COX4I1 NDUFB9 UQCRH UQCRFS1 UQCRHL COX5A NDUFS7 ATP6V0B UQCRB UQCRC1 NDUFB6 NDUFC1 NDUFA8 NDUFA2 ATP6V1G1 NDUFA1 NDUFS5 NDUFV2 NDUFB3 NDUFS3 COX7C COX6C NDUFB4 COX7B NDUFB10 UQCR10 UQCRQ ATP6V1F NDUFB5 NDUFAB1 LHPP UQCRC2 SDHA NDUFS1 ATP6V1A PPA2 ATP6V1H NDUFS4 SDHB COX11 ATP6V0E1 ATP6AP1 SDHC NDUFS2 NDUFV1 NDUFA5 NDUFA10 ATP6V0A1 CYC1 ATP6V0D1 ATP6V1C1 ATP6V1E1 NDUFV3 ATP6V1B2 PPA1 ATP6V0E2 NDUFB1 COX7A2L ATP6V1D

Heatmap of Gene-Level log10 p-values (Unadjusted) by Cell Type 0 Mic -2 Ast -4

Opc -6

Oli -8

Ex -10

In

Supplementary Figure S12: Cell-type specific variation in gene-level significance for genes in the KEGG OXIDATIVE PHOSPHORYLATION pathway comparing late stage AD patients to early stage AD patients. Gene names that are bolded are signifcant over all cell types after FDR-adjustment of the Fisher’s method p-value.

12 Gene-Level Log-FC for Set: MOOTHA VOXPHOS Late vs. Early Stage AD Patients

Mic (N = 955, set rank = 1057) 0.4

Ast (N = 1830, set rank = 5) 0.2

Opc (N = 1290, set rank = 38) 0

Oli (N = 9035, set rank = 7) -0.2

Ex (N = 17878, set rank = 14) -0.4

In (N = 4371, set rank = 2) NDUFB4 COX7B NDUFB6 NDUFC1 NDUFA8 NDUFA2 NDUFS5 NDUFA1 NDUFB3 NDUFS3 COX7C COX6C COX5A UQCRB NDUFS7 CYC1 UQCRFS1 COX7A2L TBCB UQCRH CYCS UQCRQ UQCR10 NDUFS8 NDUFB7 COX5B NDUFA4 COX6B1 COX7A2 NDUFB2 COX4I1 NDUFB5 NDUFAB1 SDHB COX11 UQCRC1 GATB NDUFS4 BCL2L1 NDUFS2 SDHC NDUFA10 NDUFA5 CNOT7 NDUFB1 UQCRC2 NDUFS1 SDHA

Heatmap of Gene-Level log10 p-values (Unadjusted) by Cell Type 0 Mic -2 Ast -4

Opc -6

Oli -8

Ex -10

In

Supplementary Figure S13: Cell-type specific variation in gene-level significance for genes in the MOOTHA VOXPHOS pathway comparing late stage AD patients to early stage AD patients. Gene names that are bolded are signifcant over all cell types after FDR-adjustment of the Fisher’s method p-value.

13 S4 Comparing Late Stage AD Patients to Control Heatmap of Set-Level Average Log FC by Cell Type (Late Stage AD vs. Control Patients)

Mic (N = 1394) 0.4

Ex (N = 23056) 0.2

In (N = 6400) 0

Oli (N = 12629) -0.2

Ast (N = 2114) -0.4

Opc (N = 1757) FARMER STANHILL REACTOME REACTOME MOREIRA REACTOME REACTOME REACTOME KEGG REACTOME REACTOME REACTOME PID YAO ZHAN BEGUM ROZANOV DAZARD ABE REACTOME REACTOME REACTOME CHICAS LI FLOTHO HSIAO REACTOME HILLION POMEROY SESTO ZHANG DANG REACTOME REACTOME REACTOME DIRMEIER LINDSTEDT CREIGHTON REACTOME GUTIERREZ GARGALOVIC WANG SASSON BORCZUK SCHLOSSER SUH MIKKELSEN REACTOME HALMOS KIM BROWNE GAVIN AMPLIFIED FOXM1 COEXPRESSED GERMINAL INNER TEMPORAL IL2 V1 IOOE(ak=10.5) = (rank RIBOSOME MYC HOUSEKEEPING RESPONSE TUMOR TARGETS PROLIFERATING CEBPA RB1 RESPONSE INTERFERON PEDIATRIC HMGA1B UV BREAST RESPONSIVE LATE AHA rn 2518) = PATHWAY (rank RESPONSE HRAS MALIGNANT MMP14 MEDULLOBLASTOMA MEF LMP1 A rn 51.5) = (rank EAR TARGETS DENDRITIC RESPONSE NONSENSE INFLUENZA TAK1 ATTENUATION CELLULAR RHO IRE1ALPHA EUKARYOTIC ACTIVATION REGULATION EUKARYOTIC SRP HSF1 SELENOAMINO RRNA DEFECTS MYC MULTIPLE TARGETS RESPONSE AKT1 CENTER RESPONSE TARGETS IN INVASIVENESS DIFFERENTIATION HCP OF DEPENDENT TRANSFROMATION RESPONSE RESPONSE GTPASES WITH LUNG TARGETS CIAIN(ak=16) = (rank ACTIVATION CANCER ACTIVATES TO TARGETS AGT rn 136) = (rank TARGETS RCSIG(ak=20) = (rank PROCESSING PAX3 SIGNALING TO WITH RESPONSIVE ALL FOXP3 TO UV IN T P(ak=47) = (rank UP LOW ID1 GONADOTROPHINS N(ak=831.5) = (rank DN OF MYELOMA RESPONSE MEDIATED EE rn 23) = (rank GENES HELPER MESOTHELIOMA FOXO1 ACR(ak=27) = (rank CANCER CLUSTER ACTIVATES CELL NETO rn 7) = (rank INFECTION VITAMIN 32M3(ak=854) = (rank H3K27ME3 TSA VS THERAPY 0(ak=33) = (rank C0 OF TO OF AND TRANSLATION TRANSLATION EIF2AK4 ACTIVATE TARGETS CLUSTER HS rn 41) = (rank PHASE REPRESSED ACID EU rn 48.5) = (rank SERUM TO LATE N(ak=511) = (rank DN UECN rn 113) = (rank QUIESCENT THE OXIDIZED HSF1 P(ak=121) = (rank UP NFKB COTRANSLATIONAL MATURATION FUSION P(ak=56) = (rank UP ID2 VIA N(ak=1447) = (rank DN PROGESTERONE EE rn 661) = (rank GENES DESMOPLASIC EAOIM(ak=15) = (rank METABOLISM MRNA P(ak=65.5) = (rank UP 1(ak=32) = (rank G1 P(ak=589.5) = (rank UP P(ak=219.5) = (rank UP AND TO GENES MEDIATED DECAY MTOR GCN2 RESPONSE P(ak=1129) = (rank UP HPRNS(ak=284) = (rank CHAPERONES BY P(ak=90) = (rank UP rn 539) = (rank 1 N(ak=2438.5) = (rank DN T1(ak=185) = (rank KTN1 HEAT PHOSPHOLIPIDS COFACTOR PHOSPHORYLATION BY UPON P(ak=108.5) = (rank UP LNAIN(ak=4.5) = (rank ELONGATION NTAIN(ak=4.5) = (rank INITIATION TO P(ak=334.5) = (rank UP P(ak=174) = (rank UP M rn 14) = (rank NMD P(ak=224) = (rank UP EU rn 117) = (rank SERUM TES(ak=63) = (rank STRESS rn 397) = (rank B AMINO HEAT P(ak=54) = (rank UP BINDING VS CLUSTER CLASSIC EAOIM(ak=209) = (rank METABOLISM SHOCK ACID BLUE OF 0(ak=85) = (rank 10 EIINY(ak=10.5) = (rank DEFICIENCY N(ak=169) = (rank DN H. rn 10.5) = (rank TH... EPNE(ak=69) = (rank RESPONSE A..(ak=4.5) = (rank TAR... N. rn 102) = (rank AN... P(ak=99) = (rank UP

Heatmap of Set-Level log10 p-values (Unadjusted) by Cell Type

Mic (N = 1920)

-1 Ex (N = 34976)

In (N = 9196) -2

Oli (N = 18235) -3

Ast (N = 3392) -4

Opc (N = 2627)

Supplementary Figure S14: Heatmap of the most significant gene sets (and their corresponding p-values) comparing late state AD patients to controls by cell type. Sets plotted are among the top 10 in significance for at least once cell type. Sets in bold are significant over all cell types after FDR-adjustment of the Fisher’s method p-value.

14 Gene-Level Log-FC for Set: KEGG OXIDATIVE PHOSPHORYLATION Late Stage AD Patients vs. Control

Mic (N = 1394, set rank = 3662) 0.4

Ast (N = 2114, set rank = 251) 0.2

Ex (N = 23056, set rank = 164) 0

In (N = 6400, set rank = 84) -0.2

Oli (N = 12629, set rank = 843) -0.4

Opc (N = 1757, set rank = 273) NDUFA8 ATP6V1C1 ATP6V0D1 NDUFA10 ATP6V1G1 LHPP UQCRC2 NDUFS1 ATP6V1E1 SDHB NDUFS2 UQCRFS1 ATP6V1A ATP6V0E2 NDUFAB1 NDUFS7 NDUFB5 NDUFB9 NDUFB1 UQCRQ COX5A COX5B NDUFB2 UQCRHL UQCRB COX6B1 NDUFA2 NDUFB4 NDUFA1 COX7C COX7A2 ATP6V1F NDUFS8 COX7B NDUFB7 UQCRH NDUFA4 COX7A2L ATP6V1D NDUFS3 PPA1 ATP6AP1 COX4I1 NDUFS5 ATP6V1H NDUFS4 ATP6V0E1 PPA2 SDHC NDUFV1 ATP6V0A1 SDHA NDUFA5 COX11 NDUFV3 ATP6V1B2 CYC1 NDUFV2 UQCRC1 ATP6V0B COX6C UQCR10 NDUFC1 NDUFB3 NDUFB6 NDUFB10

Heatmap of Gene-Level log10 p-values (Unadjusted) by Cell Type 0 Mic -2 Ast -4

Ex -6

In -8

Oli -10

Opc

Supplementary Figure S15: Cell-type specific variation in gene-level significance for genes in the KEGG OXIDATIVE PHOSPHORYLATION pathway comparing late stage AD patients to controls. Gene names that are bolded are signifcant over all cell types after FDR-adjustment of the Fisher’s method p-value.

15 Gene-Level Log-FC for Set: MOOTHA VOXPHOS Late Stage AD Patients vs. Control

Mic (N = 1394, set rank = 3912) 0.4

Ast (N = 2114, set rank = 84) 0.2

Opc (N = 1757, set rank = 324) 0

Oli (N = 12629, set rank = 362) -0.2

Ex (N = 23056, set rank = 136) -0.4

In (N = 6400, set rank = 56) NDUFA2 NDUFB4 NDUFA1 COX7B NDUFB7 COX7C COX7A2 UQCRC1 COX6C UQCR10 SDHC COX7A2L NDUFS5 NDUFS4 COX4I1 BCL2L1 SDHA NDUFA5 GATB COX11 NDUFC1 CYC1 NDUFB6 NDUFB3 NDUFS8 CNOT7 NDUFB1 UQCRC2 SDHB NDUFS2 UQCRFS1 NDUFS1 NDUFA10 NDUFS3 UQCRB COX6B1 NDUFB2 CYCS TBCB UQCRH NDUFA4 NDUFA8 COX5B COX5A UQCRQ NDUFB5 NDUFAB1 NDUFS7

Heatmap of Gene-Level log10 p-values (Unadjusted) by Cell Type 0 Mic -2 Ast -4

Opc -6

Oli -8

Ex -10

In

Supplementary Figure S16: Cell-type specific variation in gene-level significance for genes in the MOOTHA VOXPHOS pathway comparing late stage AD patients to controls. Gene names that are bolded are signifcant over all cell types after FDR- adjustment of the Fisher’s method p-value.

16 S5 Comparing AD Patients (Early and Late Stage) to Control

Heatmap of Set-Level Average Log FC by Cell Type (AD vs. Control)

Mic (N = 1920) 0.4

Oli (N = 18235) 0.2

Ex (N = 34976) 0

In (N = 9196) -0.2

Ast (N = 3392) -0.4

Opc (N = 2627) CERIBELLI MATZUK REACTOME CAFFAREL VALK BIOCARTA MEDINA REACTOME BURTON VERRECCHIA REACTOME FARMER GRANDVAUX ONO CREIGHTON REACTOME QI WEINMANN ZHAN CHIBA DELASERNA MARTENS FLECHNER BIOCARTA KEGG FLOTHO GERHOLD ABE BILANGES DIRMEIER PID BIOCARTA BIOCARTA PID RODWELL ZHAN KYNG REACTOME BORCZUK FAELT LAIHO LIN RICKMAN BROWNE MIKKELSEN WALLACE HOLLEMAN YAMASHITA ROZANOV URS OSMAN STAMBOLSKY KIM OSADA HALMOS HYPOXIA NPAS4 S1P AP1 ADIPOCYTE GERMINAL INNER FOXP3 AML V1 MULTIPLE PRION ENVIRONMENTAL B COLORECTAL RESPONSE ASCL1 BLADDER S1P4 AHA rn 14.5) = PATHWAY (rank SMARCA4 CLL PEDIATRIC ADIPOGENESIS CEBPA ETLZTO rn 1808.5) = (rank FERTILIZATION LATE INTERFERON HEAD TRETINOIN PROSTATE MALIGNANT BREAST MMP14 ADIPOGENESIS AGING TARGETS TFF ERK5 PTEN BCELLSURVIVAL CLUSTER RESPONSE PROMOTERS BIOPSY A rn 72) = (rank EAR PREDNISOLONE INTERFERON COLLAGEN COLLAGEN AFLATOXIN ESTROGEN TARGETS ADAPTATION RAPAMYCIN MEF LMP1 LIVER AKT1 TARGETS MYOD AHA rn 82) = PATHWAY (rank IRF3 WITH IESS(ak=23) = (rank DISEASES RESPONSE TARGETS TARGETS DIFFERENTIATION AHA rn 23) = PATHWAY (rank CENTER TARGETS AND AHA rn 254) = PATHWAY (rank MYELOMA AHA rn 47.5) = PATHWAY (rank DIFFERENTIATION HCP KIDNEY TARGETS TO CANCER TARGETS SIGNALING AGT rn 918) = (rank TARGETS CANCER RESPONSE KIDNEY VH3 TARGETS CANCER ALL N(ak=1124.5) = (rank DN 0(ak=1316) = (rank 10 CANCER CANCER TSA NECK N(ak=1258) = (rank DN TO WITH RESPONSE OF RESPONSIVE MESOTHELIOMA DEPENDENT OMTO rn 8.5) = (rank FORMATION BIOSYNTHESIS 21 OF P(ak=142) = (rank UP ACTIVATION INACTIVE PEAK STRESS TO T THERAPY THC ALPHA TO N(ak=272.5) = (rank DN HIF1A P(ak=1316) = (rank UP NO N(ak=584.5) = (rank DN P(ak=98) = (rank UP HELPER P(ak=476) = (rank UP SENSITIVE MUTATED AHA rn 221) = PATHWAY (rank N(ak=121.5) = (rank DN CANCER TRANSPLANT MF WITH RESISTANCE P(ak=619) = (rank UP 32M3(ak=152) = (rank H3K27ME3 TGFB1 HYPOXIA N(ak=1570.5) = (rank DN BLOOD SERRATED VIA RACE 8HR AT CLUSTER P(ak=1525.5) = (rank UP LATE AND RESPONSE P(ak=71) = (rank UP BETA H rn 23) = (rank 2HR GENES EPCAM MTOR AND 5 RESPONSE 5(ak=325) = (rank C5 N(ak=404) = (rank DN N(ak=46) = (rank DN EE rn 13) = (rank GENES P(ak=1808.5) = (rank UP GENE N(ak=188) = (rank DN rn 319) = (rank E AND OA rn 43) = (rank FOXA2 P(ak=476) = (rank UP TP53 P(ak=3.5) = (rank UP AND INLN rn 388) = (rank SIGNALING N(ak=692) = (rank DN BOUND EE rn 3.5) = (rank GENES P(ak=1405.5) = (rank UP B P(ak=8.5) = (rank UP N(ak=92) = (rank DN P(ak=1275) = (rank UP rn 1) = (rank 1 OK P(ak=87) = (rank UP EOIIAIN(ak=274.5) = (rank DETOXIFICATION XRSIN(ak=233) = (rank EXPRESSION ALL N(ak=251.5) = (rank DN P(ak=1525.5) = (rank UP MODIFYING VS P(ak=92) = (rank UP BY P(ak=39) = (rank UP DONOR F rn 251.5) = (rank NFY NYE rn 200.5) = (rank ENZYMES N(ak=448.5) = (rank DN

Heatmap of Set-Level log10 p-values (Unadjusted) by Cell Type

Mic (N = 1920) -0.5

Oli (N = 18235) -1 -1.5 Ex (N = 34976) -2

In (N = 9196) -2.5 -3 Ast (N = 3392) -3.5

Opc (N = 2627)

Supplementary Figure S17: Heatmap of the most significant gene sets (and their corresponding p-values) comparing AD patients (early and late stage) to controls by cell type. Sets plotted are among the top 10 in significance for at least once cell type. Sets in bold are significant over all cell types after FDR-adjustment of the Fisher’s method p-value.

17 Gene-Level Log-FC for Set: KEGG OXIDATIVE PHOSPHORYLATION AD (Early or Late Stage) vs. Control

Mic (N = 1920, set rank = 4827) 0.4

Ast (N = 3392, set rank = 3558) 0.2

Ex (N = 34976, set rank = 452) 0

In (N = 9196, set rank = 291) -0.2

Oli (N = 18235, set rank = 4792) -0.4

Opc (N = 2627, set rank = 4216) UQCRQ NDUFA8 ATP6V1A ATP6V0E2 NDUFB9 NDUFS7 SDHB COX5B NDUFB2 COX5A ATP6V1G1 UQCRC2 ATP6V0D1 NDUFS2 ATP6V1C1 NDUFA2 NDUFA1 COX7A2L COX11 NDUFB5 NDUFA5 NDUFV3 NDUFB3 NDUFV2 ATP6V0B NDUFB6 SDHA CYC1 NDUFS8 COX7A2 NDUFB7 SDHC ATP6V1B2 NDUFS4 ATP6V1H COX6C NDUFAB1 ATP6V1D PPA2 ATP6V0A1 UQCRC1 COX6B1 NDUFC1 ATP6V0E1 LHPP UQCRFS1 NDUFB10 NDUFV1 UQCR10 COX4I1 NDUFS5 ATP6V1F NDUFA4 NDUFB1 NDUFS1 ATP6V1E1 PPA1 NDUFB4 COX7C UQCRHL UQCRB NDUFS3 ATP6AP1 UQCRH COX7B

Heatmap of Gene-Level log10 p-values (Unadjusted) by Cell Type 0 Mic -2 Ast -4

Ex -6

In -8

Oli -10

Opc

Supplementary Figure S18: Cell-type specific variation in gene-level significance for genes in the KEGG OXIDATIVE PHOSPHORYLATION pathway comparing AD patients (early and late stage) to controls. Gene names that are bolded are signifcant over all cell types after FDR-adjustment of the Fisher’s method p-value.

18 Gene-Level Log-FC for Set: MOOTHA VOXPHOS AD (Early or Late Stage) vs. Control

Mic (N = 1920, set rank = 4565) 0.4

Ex (N = 34976, set rank = 665) 0.2

In (N = 9196, set rank = 502) 0

Ast (N = 3392, set rank = 2068) -0.2

Oli (N = 18235, set rank = 3166) -0.4

Opc (N = 2627, set rank = 3963) UQCRQ NDUFA8 NDUFS7 SDHB COX5B NDUFA2 NDUFA1 COX7A2L COX11 GATB NDUFA5 NDUFB5 CYCS NDUFB3 BCL2L1 SDHA CYC1 NDUFS8 NDUFS2 UQCRC2 UQCRFS1 CNOT7 NDUFB1 COX5A NDUFB2 TBCB NDUFB4 COX7C UQCRB NDUFS3 NDUFA4 UQCRH COX7B NDUFS1 UQCR10 NDUFS5 COX4I1 NDUFS4 COX6C NDUFAB1 NDUFC1 UQCRC1 COX6B1 SDHC NDUFB6 COX7A2 NDUFB7

Heatmap of Gene-Level log10 p-values (Unadjusted) by Cell Type 0 Mic -2 Ex -4

In -6

Ast -8

Oli -10

Opc

Supplementary Figure S19: Cell-type specific variation in gene-level significance for genes in the MOOTHA VOXPHOS pathway comparing AD patients (early and late stage) to controls. Gene names that are bolded are signifcant over all cell types after FDR-adjustment of the Fisher’s method p-value.

19 S6 Additional Real Data Figures

HIV Dataset Ref Set 0.25 All Other Same Size

0.20 Gene-Level RE FALSE TRUE

0.15

0.10

0.05

0.00 % of Sets Rejected (FDR-adjusted p-values) 2-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99 >100 (N=909) (N=1065) (N=708) (N=430) (N=330) (N=229) (N=175) (N=119) (N=124) (N=97) (N=829) Number of Genes Present Per Gene Set (mSigDB c2 Collection) Alz Dataset Ref Set 0.15 All Other Same Size Gene-Level RE FALSE TRUE 0.10

0.05

0.00 % of Sets Rejected (FDR-adjusted p-values) 2-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99 >100 (N=841) (N=1087) (N=671) (N=461) (N=327) (N=262) (N=181) (N=159) (N=105) (N=109) (N=871) Number of Genes Present Per Gene Set (mSigDB c2 Collection)

Supplementary Figure S20: Percentage of sets rejected in the HIV and Alzheimer’s datasets. Fisher’s-method p-values adjusted for FDR were used for testing in four settings varying the choice of reference set between the complement set of genes (“All Other”) or a random reference of the same size as the test set (“Same Size”), and with and without random effects present at the gene-level. The presence of gene-level random effects in the model does not greatly affect the percentage of sets rejected in either the HIV dataset (top) or the Alzheimer’s dataset (bottom).

20 HIV Dataset

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

Percent of0.2 Genes Present

0.1

0.0 2-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99 >100 (N=390) (N=929) (N=669) (N=496) (N=386) (N=284) (N=219) (N=201) (N=172) (N=103)(N=1166) Set Size (mSigDB c2 Collection) Alz Dataset

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

Percent of0.2 Genes Present

0.1

0.0 2-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99 >100 (N=422) (N=939) (N=682) (N=500) (N=391) (N=283) (N=217) (N=200) (N=173) (N=102)(N=1165) Set Size (mSigDB c2 Collection)

Supplementary Figure S21: Percentage of genes present by set size in the HIV dataset (top) and the Alzheimer’s dataset (bottom).

21