Supplement for TWO-SIGMA-G S1 Additional Type-I Error and Power Results

Supplement for TWO-SIGMA-G S1 Additional Type-I Error and Power Results

Supplement for TWO-SIGMA-G Eric Van Buren, Ming Hu, Liang Chen, John Wrobel, Kirk Wilhelmsen, Lishan Su, Yun Li, Di Wu S1 Additional Type-I Error and Power Results Independent Genes (No IGC) Genes Simulated with IGC (A) No Gene-Level Random Effects (B) No Gene-Level Random Effects Test Size = 30, Ref. Size = 30 Test Size = 30, Ref. Size = 30 0.03 0.03 0.02 0.02 0.01 0.01 Type-I Error Type-I Error Observed Set-Level Observed Set-Level 0.00 0.00 0.0000 0.0025 0.0050 0.0075 0.0100 0.0000 0.0025 0.0050 0.0075 0.0100 Significance Threshold Significance Threshold CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Genes Simulated with IGC Genes Simulated with IGC (C) Gene-Level Random Effects Present (D) Gene-Level Random Effects Test Size = 30, Ref. Size = 30 Incorrectly Absent 0.03 Test Size = 30, Ref. Size = 30 0.03 0.02 0.02 0.01 0.01 Type-I Error Type-I Error Observed Set-Level 0.00 Observed Set-Level 0.00 0.0000 0.0025 0.0050 0.0075 0.0100 0.0000 0.0025 0.0050 0.0075 0.0100 Significance Threshold Significance Threshold CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Supplementary Figure S1: Type-I error performance of CAMERA, MAST, and TWO-SIGMA-G as significance threshold varies from 0 to .01. Each panel varies the existence of IGC between genes in the test set and the presence of gene-level random effect terms in gene-level model (CAMERA never includes gene-level random effect terms). Each plot combines six different settings, and 10 replicates per setting, which vary both the magnitude of the average inter-gene correlation (where applicable) in the test set and the nature of the correlation structure via the introduction of other individual-level covariates. See the Methods section of the main text for more details regarding the simulation procedure. 1 Independent Genes (No IGC) Genes Simulated with IGC (A) No Gene-Level Random Effects (B) No Gene-Level Random Effects Test Size = 30, Ref. Size = 30 Test Size = 30, Ref. Size = 30 0.100 0.100 0.075 0.075 0.050 0.050 0.025 0.025 Type-I Error Type-I Error Observed Set-Level Observed Set-Level 0.000 0.000 0.00 0.01 0.02 0.03 0.04 0.05 0.00 0.01 0.02 0.03 0.04 0.05 Significance Threshold Significance Threshold CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Genes Simulated with IGC Genes Simulated with IGC (C) Gene-Level Random Effects Present (D) Gene-Level Random Effects Test Size = 30, Ref. Size = 30 Incorrectly Absent 0.100 Test Size = 30, Ref. Size = 30 0.100 0.075 0.075 0.050 0.050 0.025 Type-I Error 0.025 Type-I Error Observed Set-Level 0.000 Observed Set-Level 0.000 0.00 0.01 0.02 0.03 0.04 0.05 0.00 0.01 0.02 0.03 0.04 0.05 Significance Threshold Significance Threshold CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Supplementary Figure S2: Type-I error performance of CAMERA, MAST, and TWO-SIGMA-G as significance threshold varies from 0 to .05. Each panel varies the existence of IGC between genes in the test set and the presence of gene-level random effect terms in gene-level model (CAMERA never includes gene-level random effect terms). Each plot combines six different settings, and 10 replicates per setting, which vary both the magnitude of the average inter-gene correlation (where applicable) in the test set and the nature of the correlation structure via the introduction of other individual-level covariates. See the Methods section of the main text for more details regarding the simulation procedure. 2 Independent Genes (No IGC) Genes Simulated with IGC (A) No Gene-Level Random Effects (B) No Gene-Level Random Effects Test Size = 30, Ref. Size = 100 Test Size = 30, Ref. Size = 100 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0.0 0.0 Set-Level Type-I Error Unadjusted Adjusted Set-Level Type-I Error Unadjusted Adjusted p-values p-values CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Genes Simulated with IGC Genes Simulated with IGC (C) Gene-Level Random Effects Present (D) Gene-Level Random Effects Test Size = 30, Ref. Size = 100 Incorrectly Absent Test Size = 30, Ref. Size = 100 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0.0 0.0 Set-Level Type-I Error Unadjusted Adjusted Set-Level Type-I Error Unadjusted Adjusted p-values p-values CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Supplementary Figure S3: Type-I error performance of CAMERA, MAST, and TWO-SIGMA-G using a reference set size of 100 genes. Each panel varies the existence of IGC between genes in the test set and the presence of gene-level random effect terms in gene-level model (CAMERA never includes gene-level random effect terms). Within each panel, both unadjusted and adjusted set-level p-values are plotted (unadjusted p-values are unavailable for MAST). Each boxplot aggregates six different settings which vary both the magnitude of the average inter-gene correlation (where applicable) in the test set and the nature of the correlation structure via the introduction of other individual-level covariates. Such settings are intended to represent the diversity seen in real data sets to paint an accurate picture of testing properties over a wide range of gene sets. Each of the six settings is further composed of 10 replicates which vary only random seed to mimic the impact of a different starting pool of cells from which genes were simulated. See the Methods section of the main text for more details regarding the simulation procedure. 3 Genes Simulated with IGC Genes Simulated with IGC (A) No Gene-Level Random Effects (B) Gene-Level Random Effects Test Size = 30, Ref. Size = 30 Incorrectly Absent 0.3 Test Size = 30, Ref. Size = 30 0.3 0.2 0.2 0.1 0.1 Set-Level0.0 Type-I Error 0.0 T0, T20, T50, Set-Level Type-I Error T0, T20, T50, R0 R20 R50 R0 R20 R50 Configuration (% of Genes DE in Configuration (% of Genes DE in Test/Reference Sets) Test/Reference Sets) CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Genes Simulated with IGC Genes Simulated with IGC (C) No Gene-Level Random Effects (D) Gene-Level Random Effects Test Size = 30, Ref. Size = 100 Incorrectly Absent 0.3 Test Size = 30, Ref. Size = 100 0.3 0.2 0.2 0.1 0.1 Set-Level0.0 Type-I Error 0.0 T0, T20, T50, Set-Level Type-I Error T0, T20, T50, R0 R20 R50 R0 R20 R50 Configuration (% of Genes DE in Configuration (% of Genes DE in Test/Reference Sets) Test/Reference Sets) CAMERA MAST TWO-SIGMA-G CAMERA MAST TWO-SIGMA-G Supplementary Figure S4: Type-I error performance of TWO-SIGMA-G, CAM- ERA, and MAST for various set-level null hypotheses. Each panel varies the refer- ence set size and the presence of gene-level random effect terms in gene-level model. Scenarios along the x-axis of each panel vary the percentage of genes that are differentially expressed (with the same effect size) in the test and reference sets. Each boxplot aggregates six differ- ent settings which vary both the magnitude of the average inter-gene correlation in the test set and the nature of the correlation structure via the introduction of other individual-level covariates. Such settings are intended to represent the diversity seen in real data sets to paint an accurate picture of testing properties over a wide range of gene sets. Each of the six settings is further composed of 10 replicates which vary only random seed to mimic the impact of a different starting pool of cells from which genes were simulated. See the Methods section of the main text for more details regarding the simulation procedure. 4 Genes Simulated with IGC Genes Simulated with IGC (A) No Gene-Level Random Effects (B) No Gene-Level Random Effects Test Size = 30, Ref. Size = 100 Test Size = 30, Ref. Size = 30 0.100 0.100 0.075 0.075 0.050 0.050 0.025 0.025 Set-Level0.000 Type-I Error Set-Level0.000 Type-I Error T0, T20, T50, T0, T20, T50, R0 R20 R50 R0 R20 R50 Configuration (% of Genes DE in Configuration (% of Genes DE in Test/Reference Sets) Test/Reference Sets) iDEA PAGE TWO-SIGMA-G iDEA PAGE TWO-SIGMA-G Supplementary Figure S5: Type-I error performance of iDEA, PAGE, and TWO- SIGMA for various set-level null hypotheses for genes simulated with IGC. Refer- ence set sizes of 100 and 30 are shown. Each boxplot aggregates six different settings which vary both the magnitude of the average inter-gene correlation in the test set and the nature of the correlation structure via the introduction of other individual-level covariates. Such settings are intended to represent the diversity seen in real data sets to paint an accurate picture of testing properties over a wide range of gene sets. Each of the six settings is further composed of 10 replicates which vary only random seed to mimic the impact of a different starting pool of cells from which genes were simulated.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    21 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us