Supplement for TWO-SIGMA-G S1 Additional Type-I Error and Power Results
Total Page:16
File Type:pdf, Size:1020Kb
Supplement for TWO-SIGMA-G Eric Van Buren, Ming Hu, Liang Chen, John Wrobel, Kirk Wilhelmsen, Lishan Su, Yun Li, Di Wu S1 Additional Type-I Error and Power Results Independent Genes (No IGC) Genes Simulated with IGC (A) No Gene-Level Random Effects (B) No Gene-Level Random Effects Test Size = 30, Ref. Size = 30 Test Size = 30, Ref. Size = 30 0.03 0.03 0.02 0.02 0.01 0.01 Type-I Error Type-I Error Observed Set-Level Observed Set-Level 0.00 0.00 0.0000 0.0025 0.0050 0.0075 0.0100 0.0000 0.0025 0.0050 0.0075 0.0100 Significance Threshold Significance Threshold CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Genes Simulated with IGC Genes Simulated with IGC (C) Gene-Level Random Effects Present (D) Gene-Level Random Effects Test Size = 30, Ref. Size = 30 Incorrectly Absent 0.03 Test Size = 30, Ref. Size = 30 0.03 0.02 0.02 0.01 0.01 Type-I Error Type-I Error Observed Set-Level 0.00 Observed Set-Level 0.00 0.0000 0.0025 0.0050 0.0075 0.0100 0.0000 0.0025 0.0050 0.0075 0.0100 Significance Threshold Significance Threshold CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Supplementary Figure S1: Type-I error performance of CAMERA, MAST, and TWO-SIGMA-G as significance threshold varies from 0 to .01. Each panel varies the existence of IGC between genes in the test set and the presence of gene-level random effect terms in gene-level model (CAMERA never includes gene-level random effect terms). Each plot combines six different settings, and 10 replicates per setting, which vary both the magnitude of the average inter-gene correlation (where applicable) in the test set and the nature of the correlation structure via the introduction of other individual-level covariates. See the Methods section of the main text for more details regarding the simulation procedure. 1 Independent Genes (No IGC) Genes Simulated with IGC (A) No Gene-Level Random Effects (B) No Gene-Level Random Effects Test Size = 30, Ref. Size = 30 Test Size = 30, Ref. Size = 30 0.100 0.100 0.075 0.075 0.050 0.050 0.025 0.025 Type-I Error Type-I Error Observed Set-Level Observed Set-Level 0.000 0.000 0.00 0.01 0.02 0.03 0.04 0.05 0.00 0.01 0.02 0.03 0.04 0.05 Significance Threshold Significance Threshold CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Genes Simulated with IGC Genes Simulated with IGC (C) Gene-Level Random Effects Present (D) Gene-Level Random Effects Test Size = 30, Ref. Size = 30 Incorrectly Absent 0.100 Test Size = 30, Ref. Size = 30 0.100 0.075 0.075 0.050 0.050 0.025 Type-I Error 0.025 Type-I Error Observed Set-Level 0.000 Observed Set-Level 0.000 0.00 0.01 0.02 0.03 0.04 0.05 0.00 0.01 0.02 0.03 0.04 0.05 Significance Threshold Significance Threshold CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Supplementary Figure S2: Type-I error performance of CAMERA, MAST, and TWO-SIGMA-G as significance threshold varies from 0 to .05. Each panel varies the existence of IGC between genes in the test set and the presence of gene-level random effect terms in gene-level model (CAMERA never includes gene-level random effect terms). Each plot combines six different settings, and 10 replicates per setting, which vary both the magnitude of the average inter-gene correlation (where applicable) in the test set and the nature of the correlation structure via the introduction of other individual-level covariates. See the Methods section of the main text for more details regarding the simulation procedure. 2 Independent Genes (No IGC) Genes Simulated with IGC (A) No Gene-Level Random Effects (B) No Gene-Level Random Effects Test Size = 30, Ref. Size = 100 Test Size = 30, Ref. Size = 100 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0.0 0.0 Set-Level Type-I Error Unadjusted Adjusted Set-Level Type-I Error Unadjusted Adjusted p-values p-values CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Genes Simulated with IGC Genes Simulated with IGC (C) Gene-Level Random Effects Present (D) Gene-Level Random Effects Test Size = 30, Ref. Size = 100 Incorrectly Absent Test Size = 30, Ref. Size = 100 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0.0 0.0 Set-Level Type-I Error Unadjusted Adjusted Set-Level Type-I Error Unadjusted Adjusted p-values p-values CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Supplementary Figure S3: Type-I error performance of CAMERA, MAST, and TWO-SIGMA-G using a reference set size of 100 genes. Each panel varies the existence of IGC between genes in the test set and the presence of gene-level random effect terms in gene-level model (CAMERA never includes gene-level random effect terms). Within each panel, both unadjusted and adjusted set-level p-values are plotted (unadjusted p-values are unavailable for MAST). Each boxplot aggregates six different settings which vary both the magnitude of the average inter-gene correlation (where applicable) in the test set and the nature of the correlation structure via the introduction of other individual-level covariates. Such settings are intended to represent the diversity seen in real data sets to paint an accurate picture of testing properties over a wide range of gene sets. Each of the six settings is further composed of 10 replicates which vary only random seed to mimic the impact of a different starting pool of cells from which genes were simulated. See the Methods section of the main text for more details regarding the simulation procedure. 3 Genes Simulated with IGC Genes Simulated with IGC (A) No Gene-Level Random Effects (B) Gene-Level Random Effects Test Size = 30, Ref. Size = 30 Incorrectly Absent 0.3 Test Size = 30, Ref. Size = 30 0.3 0.2 0.2 0.1 0.1 Set-Level0.0 Type-I Error 0.0 T0, T20, T50, Set-Level Type-I Error T0, T20, T50, R0 R20 R50 R0 R20 R50 Configuration (% of Genes DE in Configuration (% of Genes DE in Test/Reference Sets) Test/Reference Sets) CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Genes Simulated with IGC Genes Simulated with IGC (C) No Gene-Level Random Effects (D) Gene-Level Random Effects Test Size = 30, Ref. Size = 100 Incorrectly Absent 0.3 Test Size = 30, Ref. Size = 100 0.3 0.2 0.2 0.1 0.1 Set-Level0.0 Type-I Error 0.0 T0, T20, T50, Set-Level Type-I Error T0, T20, T50, R0 R20 R50 R0 R20 R50 Configuration (% of Genes DE in Configuration (% of Genes DE in Test/Reference Sets) Test/Reference Sets) CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Supplementary Figure S4: Type-I error performance of TWO-SIGMA-G, CAM- ERA, and MAST for various set-level null hypotheses. Each panel varies the refer- ence set size and the presence of gene-level random effect terms in gene-level model. Scenarios along the x-axis of each panel vary the percentage of genes that are differentially expressed (with the same effect size) in the test and reference sets. Each boxplot aggregates six differ- ent settings which vary both the magnitude of the average inter-gene correlation in the test set and the nature of the correlation structure via the introduction of other individual-level covariates. Such settings are intended to represent the diversity seen in real data sets to paint an accurate picture of testing properties over a wide range of gene sets. Each of the six settings is further composed of 10 replicates which vary only random seed to mimic the impact of a different starting pool of cells from which genes were simulated. See the Methods section of the main text for more details regarding the simulation procedure. 4 Genes Simulated with IGC Genes Simulated with IGC (A) No Gene-Level Random Effects (B) No Gene-Level Random Effects Test Size = 30, Ref. Size = 100 Test Size = 30, Ref. Size = 30 0.100 0.100 0.075 0.075 0.050 0.050 0.025 0.025 Set-Level0.000 Type-I Error Set-Level0.000 Type-I Error T0, T20, T50, T0, T20, T50, R0 R20 R50 R0 R20 R50 Configuration (% of Genes DE in Configuration (% of Genes DE in Test/Reference Sets) Test/Reference Sets) iDEA PAGE TWO-SIGMA-G iDEA PAGE TWO-SIGMA-G Supplementary Figure S5: Type-I error performance of iDEA, PAGE, and TWO- SIGMA for various set-level null hypotheses for genes simulated with IGC. Refer- ence set sizes of 100 and 30 are shown. Each boxplot aggregates six different settings which vary both the magnitude of the average inter-gene correlation in the test set and the nature of the correlation structure via the introduction of other individual-level covariates. Such settings are intended to represent the diversity seen in real data sets to paint an accurate picture of testing properties over a wide range of gene sets. Each of the six settings is further composed of 10 replicates which vary only random seed to mimic the impact of a different starting pool of cells from which genes were simulated.