Supplementary Information for

Inflammatory and antiviral expression in Add Health: Molecular pathways to social disparities in disease emerge by young adulthood

Steven W. Cole, Michael J. Shanahan, Lauren Gaydosh, & Kathleen Mullan Harris

Corresponding authors: Kathleen Mullan Harris and Steven W. Cole

Email: [email protected], [email protected]

This PDF file includes:

Supplementary text

Figures S1 to S2

Tables S1 to S2

Legends for Datasets S1 to S2

SI References

Other supplementary materials for this manuscript include the following:

Datasets S1 to S2

www.pnas.org/cgi/doi/10.1073/pnas.1821367117 Methods Sample and survey procedures. Data come from the National Longitudinal Study of Adolescent to Adult Health (Add Health), a nationally representative study of U.S. adolescents in grades 7-12 at Wave I in 1994-1995 and followed into adulthood over five waves of data collection. We analyze data from the nationally representative Sample 1 subsample of Wave V, conducted in 2016-2017 when respondents were aged 32-42. Add Health administered Wave V using continuous interviewing over three years (2016-2018), and the sampling design selected 3 random subsamples of eligible Wave V respondents for interview in each year. The subsample of 1126 participants analyzed here are those who consented to provide a blood specimen for RNA analysis during the Sample 1 physical examination visit. Add Health developed this three subsample design in order to release preliminary Wave V data prior to the entire Wave V sample being interviewed (given it took 3 years) and released Sample 1 survey data in 2017. Sample 1 therefore represents a nationally representative sample because a randomly-selected subsample of a nationally-representative sample is also nationally representative (1). The general sampling design, interview procedures, and demographic and biobehavioral variable assessments have been previously described (2, 3). Blood transcriptome profiling. Venipuncture whole blood samples were collected into PAXgene RNA tubes and frozen at -80oC prior to single-pass processing of the first 1143 samples collected during Add Health Wave V (Sample 1). Total RNA was extracted using automated nucleic acid processing systems (Qiagen QIAcube) and tested for suitable mass (RiboGreen RNA > 300 ng; achieved mean = 2,716 ± SD 742) and integrity (Agilent TapeStation RIN > 3; achieved mean = 7.7 ± 0.7) prior to conversion of polyadenylated RNA to cDNA using the QuantSeq 3’ FWD system (Lexogen). Sample-barcoded cDNA libraries were sequenced in multiplex on an Illumina HiSeq 4000 system in the UCLA Neuroscience Genomics Core Laboratory. All assay procedures followed the manufacturers’ standard protocols for this workflow. Multiplex sequencing targeted > 107 single-stranded 65 bp reads per sample. Quality- annotated sequences (FASTQ) derived from Illumina HiSeq Control Software (3.4.0) were mapped to the ENSEMBL hg38 human transcriptome sequence and quantified at the gene level using STAR 2.5.3a. Post-sequencing quality control tested for expected read depth > 107 reads/sample (achieved mean = 11.6 ± 2.6 x 106 mapped reads per sample), >90% of reads aligning to the human transcriptome (achieved mean = 94.0% ± 3.9% mapped), and profile consistency with other samples (average Pearson r with 95 adjacent samples > .85; achieved mean r = .94 ± .02). Samples were assayed in 3 sets of 384 (4 x 96 well plates). Among 1131 unique samples assayed, none failed quality control criteria based on poor input RNA quality and 5 (.44%) failed based on poor endpoint quality metrics (profile consistency r < .85). Data analysis and bioinformatics. Transcript abundance values for each gene were pre-normalized to transcripts per million (TPM), standardized on average expression of 11 pre-specified reference (4), floored at 1 TPM to suppress spurious variability, and log2-transformed for analysis by standard linear statistical models relating transcript abundance to individual demographic characteristics (age, sex, race/ethnicity), contextual demographic conditions (U.S. region, family poverty status), biobehavioral factors (BMI, smoking, alcohol consumption), and technical covariates (sample RIN, assay plate, sequencing depth / total mapped reads, and profile consistency with other samples). Linear models were estimated using SAS PROC MIXED with the following specification: proc glm; class Plate; model Expression = RIN Plate TotalMappedReads AvgCorr Sex Age Black Hispanic Asian OthRace Poverty USRegion2 USRegion3 USRegion4 BMI Smoke Drink HeavyDrink / solution;

In this syntax, “Expression” represents RNA transcript abundance measures as outlined below and regressor variables were coded as follows: age (continuous self-reported years), sex (self-reported biologically assigned male sex at birth, coded by an indicator relative to reference point female), race/ethnicity (self-identified Asian, non-Hispanic Black, Hispanic, and Other race/ethnicity, each coded by an indicator relative to reference point non-Hispanic White), US region (census regions 2-4: Midwest, South, and West, each coded by an indicator relative to reference point region 1, Northeast), family poverty status (self-reported household income <= 2015 US Federal poverty level based on household size, coded by an indicator relative to non-poverty status), BMI (continuous kg/m2 derived from self- reported continuous height and weight), smoking history (self-reported ever smoked coded by an indicator relative to never smoked reference point), and alcohol consumption (represented as 2 variables; one “regular drinking” variable indicating whether participants self-reported drinking beer, wine, or liquor every day or almost every day, relative to less frequent drinking during the past 12 mo; and a second “binge drinking” ordinal variable reflecting days during the past 12 mo during which participants drank [female 4/male 5] or more drinks in a row, coded none=0, 1-2 d/yr = 1, 3-12 day/yrs=1 d/mo=2, 2-3 d/mo=3, 1-2 d/wk=4, 3-5 d/wk=5, every/almost every day=6), assay batch (nominal indicators for plates 1- 11 relative to reference point plate 12), sample RNA integrity (continuous 0-10 RIN), total mapped reads per sample (continuous/106), read alignment rate (continuous %), and profile consistency (average Pearson r with 95 adjacent samples). Among the 1143 total samples assayed, 17 came from subsequent re-assessments of a given individual and were deleted, yielding a sample of 1126 unique participants. Among these 1126 participants, 57 were missing data on one or more of the demographic and behavioral variables analyzed, leaving a total of 1,069 individuals in the final analytic sample. For Level 1 analyses of pre-specified gene sets, inflammatory and Type I interferon composite scores were derived from previous research (5) and computed by averaging z-score standardized RNA abundance values for 19 gene transcripts involved in inflammation (CXCL8, FOS, FOSB, FOSL1, FOSL2, IL1A, IL1B, IL6, JUN, JUNB, JUND, NFKB1, NFKB2, PTGS1, PTGS2, REL, RELA, RELB, TNF) and 30 gene transcripts involved in Type I interferon responses (GBP1, IFI16, IFI27, IFI27L1, IFI27L2, IFI30, IFI35, IFI44, IFI44L, IFI6, IFIH1, IFIT1, IFIT2, IFIT3, IFIT5, IFITM1, IFITM2, IFITM3, IFITM4P, IFITM5, IFNB1, IRF2, IRF7, IRF8, MX1, MX2, OAS1, OAS2, OAS3, OASL). For consistency with the use of the Type I interferon composite in previous research (5), we also included in that score two gene transcripts that were originally implicated in antibody production (JCHAIN, IGLL1) but have since been discovered to co-express with Type I interferon genes in dendritic cells (6-8). A composite score assessing the CTRA profile was computed as the difference between the average standardized value of the 19 pro- inflammatory indicator genes and the average standardized value of the 32 Type I interferon indicator genes. Standard OLS linear statistical models were employed to quantify the magnitude of variation in inflammatory, interferon, and CTRA composite score expression as a function of the demographic, biobehavioral, and technical factors (using the SAS PROC GLM syntax above. A global omnibus F ratio was used to simultaneously test the aggregate contributions of ten dimensions of demographic variation, including six dimensions of individual demographic variation (age, sex, and four indicator parameters representing variation as a function of race/ethnicity) and four dimensions of contextual demographic variation (indicator variables for poverty status and variation across 4 US Census regions), across the three gene sets analyzed (9-12). Contingent on a significant omnibus test of global sociodemographic variation in gene set expression, we conducted interpretive follow-up analyses testing for significant sociodemographic variation in expression of each gene composite in isolation (with False Discovery Rate (13) correction for multiple testing). For gene sets showing a significant omnibus test of global sociodemographic variation, we present the individual parameter estimates underlying those global results for descriptive/interpretive purposes and conduct follow-up nested hypothesis tests to assess the respective effects of individual vs. contextual demographic factors (again with False Discovery Rate correction for multiple testing), as well as ancillary hypotheses involving biobehavioral factors that might potentially confound sociodemographic effects. Individual parameter estimates are presented for descriptive purposes only and do not serve as the analytic basis for primary substantive conclusions. To identify more fine-grained empirical co-regulatory modules within the broad pre-specified 19- gene inflammatory composite and the 32-gene Type I interferon composite, we conducted exploratory principal factor analysis (14) of the inter-gene correlation matrix for each gene set with initial factor extraction by principal components analysis, the number of retained factors determined by Kaiser’s criterion of eigenvalue > 1, and gene-specific loadings derived from varimax-rotated factor patterns (14). Analyses were conducted using SAS PROC FACTOR and results are summarized in Dataset S1 along with association coefficients relating each identified sub-component to demographic factors using the same analytic model as outlined above. For Level 2 and Level 3 analyses of empirical variation in genome-wide transcriptional profiles, the SAS PROC GLM statistical model specified above was applied to each assayed gene transcript to identify all transcripts showing > 20% difference in expression as a function of a binary indicator variable (e.g., male vs. female) or across a 4-SD range of variation in a continuous variable (i.e., spanning the range from a low value 2 SD below the mean value to a high value 2 SD above the mean). Genes were pre-screened to remove un-named transcripts and those showing minimal expression level (median=0) or variability (SD=0), leaving a total of 13668 gene transcripts for analyses of transcriptome variation. Differential expression of individual gene transcripts was tested for significance using false discovery rate- corrected p-values from linear model parameter t statistics (i.e., a q-value), with significance defined by a false discovery-corrected q < .05 assuming dependence among genes (15) (SAS PROC MULTTEST). Transcript-specific parameter estimates and related statistics are reported in Dataset S2. In Level 2 analyses of activity, all gene transcripts identified as empirically up- or down-regulated by >20% were analyzed for differential activity of five pre-specified transcription factors using TELiS promoter-based bioinformatics analyses (16) by comparing the prevalence of TFBMs in the promoters of up- vs. down-regulated genes. These analyses used maximum likelihood point estimates of differential expression to identify TELiS input genes based on previous findings that quantitative measures of absolute effect size yield more replicable gene lists and more accurate downstream bioinformatics results than do gene lists screened by p- or q-values (which correspond to the lower end of a confidence interval rather than the most accurate estimate of the parameter value) (5, 17-20). Analyses were performed for two inflammation-related transcription factors (NF-B, assessed by the TRANSFAC (21) position-specific weight matrix V$CREL_01; and AP-1, assessed by V$AP1_Q6), the interferon- stimulated response element (ISRE, V$ISRE_01), and 2 neural/endocrine-related transcription factors (the CREB transcription factor involved in mediating sympathetic nervous system-induced -adrenergic signaling, V$CREB_01; and the glucocorticoid involved in mediating cortisol signaling from the hypothalamus-pituitary-adrenal axis, V$GR_Q6). Analyses were performed using three different stringencies for detecting transcription factor-binding motifs (TFBMs; TRANSFAC mat_sim values .80, .90, and .95), with each computed over three different definitions of core promoter scope (-300 bp, -600 bp, and -1000 to + 200 bp relative to the RefSeq transcription start site) (16). Log2-transformed TFBM prevalence ratios from the nine parametric combinations were averaged and statistical significance of that average was derived from a standard error estimated by bootstrap resampling of participant-specific transcript abundance vectors (to account for the correlation among genes) (22). In Level 3 analyses, the same sets of differentially expressed genes were analyzed by Transcript Origin Analysis to identify shared cellular origins based on over-representation analysis of cell type diagnosticity scores computed as previously described (23) based on reference transcriptome profiles derived from isolated samples of B lymphocytes, CD4+ T lymphocytes, CD8+ T lymphocytes, NK cells, neutrophils, CD16- classical monocytes, CD16+ non-classical monocytes, CD1C+ (BDCA1+) dendritic cells, CLEC4C+ (BDCA2+) dendritic cells, and THBD+ (BDCA3+) dendritic cells (GSE101489) (24). This analysis only infers the specific cellular origins of the empirically observed differences in ; it does not estimate the prevalence of specific leukocyte subsets or determine whether observed transcriptome differences stem from difference in cell type abundance or from per-cell differences in cellular activation (i.e., absent differences in cell prevalence) or both. Statistical testing of mean cell type diagnosticity scores was based on standard errors derived from bootstrap resampling as described above. All analyses were performed using SAS 9.4 on a Linux x64 operating system.

A. Eigenvalue 0.1 1 10 100 0 2 4 6 8 10 12 C. Interferon 14 16 IFNgene & Ab sub-component modules 18 F1 F2 F3 F4 F5 F6 F7

20 IFI6 .83 .27 .07 -.01 .10 .00 -.07 Eigenvector 22 IFIT3 .83 .30 .01 -.07 .03 -.01 -.11 24 IFI44L .82 .36 -.08 .03 .05 .01 -.03 26 MX1 .80 .30 .14 .05 -.03 -.03 .06 28 OAS3 .76 .31 -.11 -.08 .07 .01 .05 30 IFIT1 .75 .34 -.14 -.04 .02 .01 -.16 32 IFITM3 .72 -.09 .26 .02 -.14 .03 -.07 OAS1 .71 .27 -.16 .00 .04 .06 .07 IRF7 .65 .01 -.03 .11 -.14 -.05 .11 B. Inflammatory IFI27 .65 -.06 -.01 .08 -.08 .13 -.07 OAS2 .62 .31 -.04 .05 -.03 .01 .17 Inflammatorygene modules sub-component IFI35 .58 -.05 .22 .06 .13 -.17 .05 F1 F2 F3 F4 F5 F6 F7 OASL .53 .29 .31 .03 -.09 -.02 -.01 JUNB .76 .27 -.06 -.07 .15 .05 -.13 IFI16 .21 .81 .16 .03 -.01 -.06 .04 FOSL2 .76 .01 .26 .04 -.01 -.04 .04 IFIH1 .14 .80 -.08 .03 -.02 .03 .01 RELB .60 -.41 .34 .32 -.02 .08 .00 IFIT2 .58 .62 -.06 -.14 .01 -.04 -.10 RELA .51 -.33 .19 .51 .16 .00 -.04 GBP1 .46 .59 .17 .01 -.02 -.03 -.06 CXCL8 -.01 .71 .02 .01 -.12 .10 .11 MX2 .50 .53 .34 -.10 .00 -.04 .04 FOS .01 .68 -.31 -.15 .17 -.03 -.01 IFI44 .50 .50 -.28 .08 .11 -.02 -.05 PTGS2 .36 .52 .15 .16 -.25 -.14 .00 IFIT5 .35 .48 -.38 .16 .04 -.04 -.08 REL .12 -.11 .78 -.05 -.05 -.03 .03 IRF2 .13 .44 .37 -.02 -.31 .00 .07 IL1B .30 .26 .55 -.29 -.04 -.04 -.13 IFITM2 .21 .06 .81 .01 -.04 .00 -.01 NFKB1 .12 -.05 .52 .18 .36 -.03 -.02 IFI30 .04 .12 .51 .24 .32 -.02 .10 NFKB2 .05 -.34 .47 .21 -.17 .20 .07 JCHAIN .03 .13 -.49 .32 .15 .14 -.11 JUN .05 .03 .08 .65 -.14 -.22 .18 IFITM1 .36 -.17 -.54 .04 -.14 -.07 -.02 FOSB .01 -.13 -.14 .53 .12 .28 -.08 IRF8 -.04 .07 -.74 -.09 .03 -.04 .16 TNF .01 .30 .07 .48 .15 -.05 -.44 IFI27L1 .02 -.03 .03 .74 -.22 .06 .01 JUND .21 -.01 -.18 -.09 .78 .07 -.06 IFI27L2 .05 .04 .02 .72 .15 -.03 -.06 PTGS1 -.09 -.05 .14 .10 .72 -.10 .14 IGLL1 -.01 -.06 .02 -.04 .77 .02 -.02 IL1A .11 .01 -.07 -.08 -.09 .78 .18 IFITM5 .04 -.04 .03 -.09 -.18 .77 -.09 FOSL1 -.26 .04 .25 .11 .09 .49 -.31 IFITM4P -.01 -.02 -.03 .15 .23 .62 .12 IL6 -.09 .16 .03 .08 .13 .06 .79 IFNB1 .00 -.01 -.02 -.07 -.02 .02 .91

Loading: -1.0 -1.0 -.8 -.6 -.4 -.2 .0 .2 .4 .6 .8 1.0 +1.0

Fig. S1. Co-regulatory structure of inflammatory and Type I interferon gene sets. Principal factor analysis was applied to identify co-regulated gene modules within the priori-specified sets of 19 representative inflammatory genes and 32 representative Type I interferon response genes assessed on n=1069 study participants in Level 1 analyses. (A) Eigenvalue scree plots indicate 7 sub-components with eigenvalue > 1 (black circles) within the global inflammatory gene set (labeled F1-F7) and 7 sub- components with eigenvalue > 1 (gray diamonds) within the global Type I interferon gene set. (B) Rotated loading patterns describing the modular structure of co-expression for inflammatory indicator genes. (C) Rotated loading patterns describing the modular structure of co-expression for Type I interferon indicator genes. High (red) loadings indicate module membership, and low (blue) loadings identify genes showing negative correlation with a given gene module. Dataset S1 contains underlying numerical data and association measures quantifying demographic variation in expression of each gene module.

Fig S2. Demographic variation in empirical transcriptome profiles. Histogram of p-values from transcriptome-wide association analyses relating demographic and biobehavioral characteristics to relative abundance of 13,668 expressed human gene transcripts (results detailed in Dataset S2). Association estimates are derived from standard linear statistical models with adjustment for all other characteristics shown as well as assay technical factors. “5% FDR” indicates the number of genes reaching genome-wide statistical significance after control for multiple testing by dependent False Discovery Rate analysis. The magnitude of asymmetry in small p-values (histogram bars on the far left) relative to intermediate and high p-values quantifies the global magnitude of transcriptomic alteration associated with each demographic factor. Thus, graphs showing few or no 5% FDR results and relatively uniform (rectangular) p-value distributions indicate little transcriptome-wide impact of a given demographic factor, whereas highly skewed (left-shifted) distributions indicate comparatively large transcriptome-wide differences associated with a given demographic factor.

Table S1. Linear model parameter estimates supporting main text Fig 1 (Demographic variation in expression of inflammation- and Type I interferon-related genes).

Gene composite

Inflammatory Type I interferon CTRA

Age (yrs) 0.0050 (0.0047) p = .2855 0.0117 (0.0085) p = .1711 -0.0066 (0.0089) p = .4547 Male (vs F) -0.0490 (0.0169) p = .0038 -0.1978 (0.0305) p < .0001 0.1488 (0.0318) p < .0001 Asian (vs White) 0.0650 (0.0410) p = .1133 0.2992 (0.0742) p < .0001 -0.2342 (0.0772) p = .0025 Black (vs White) 0.0549 (0.0238) p = .0212 0.0899 (0.0431) p = .0372 -0.0349 (0.0449) p = .4363 Hispanic (vs White) 0.0132 (0.0276) p = .6329 0.0035 (0.0499) p = .9441 0.0097 (0.0520) p = .8523 Other Race (vs White) -0.0223 (0.0421) p = .5974 0.0035 (0.0762) p = .9634 -0.0258 (0.0794) p = .7457

Midwest (vs NE) -0.0210 (0.0244) p = .3907 -0.0218 (0.0442) p = .6224 0.0008 (0.0460) p = .9862 South (vs NE) -0.0045 (0.0241) p = .8527 0.0394 (0.0435) p = .3653 -0.0439 (0.0453) p = .3331 West (vs NE) -0.0114 (0.0290) p = .6949 -0.0157 (0.0525) p = .7646 0.0043 (0.0547) p = .9367 Poverty (vs non) 0.0044 (0.0249) p = .8593 0.0164 (0.0451) p = .7155 -0.0120 (0.0470) p = .7980

BMI (kg/m2) 0.0052 (0.0011) p < .0001 0.0010 (0.0020) p = .6092 0.0042 (0.0021) p = .0488 Smoke (vs non) 0.0468 (0.0171) p = .0063 0.0380 (0.0309) p = .2201 0.0088 (0.0322) p = .7843 Drink (vs non) -0.0502 (0.0423) p = .2360 0.0747 (0.0765) p = .3295 -0.1249 (0.0797) p = .1176 Binge Drink (0-6) -0.0009 (0.0063) p = .8888 0.0048 (0.0114) p = .6712 -0.0057 (0.0118) p = .6299

Values are linear statistical model parameter estimates (standard error) 2-tailed p-value.

Table S2. Linear model parameter estimates quantifying association between Level 1 gene expression composite scores and household income or educational attainment.

Gene composite

Inflammatory Type I interferon CTRA

Household income (binned $/year) 0.002 (0.003) p = .3710 -0.006 (0.005) p = .2828 0.008 (0.006) p = .1322

Household income (linear) 0.013 (0.013) p = .3147 -0.027 (0.024) p = .2650 0.040 (0.025) p = .1138 Household income (quadratic) -0.001 (0.001) p = .3283 0.001 (0.002) p = .4194 -0.002 (0.002) p = .2218

Household income <20K/yr - - - Household income [20-50K/yr) 0.012 (0.027) p = .6503 0.025 (0.049) p = .6006 -0.013 (0.051) p = .7940 Household income [50-100K/yr) 0.015 (0.023) p = .5210 -0.040 (0.041) p = .3256 0.055 (0.043) p = .1984 Household income >100K/yr 0.002 (0.023) p = .9259 -0.070 (0.042) p = .0944 0.072 (0.043) p = .0977

Personal income <20K/yr - - - Personal income [20-50K/yr) 0.001 (0.022) p = .9776 -0.011 (0.040) p = .7768 0.012 (0.412) p = .7739 Personal income [50-100K/yr) 0.017 (0.023) p = .4433 -0.010 (0.041) p = .8109 0.027 (0.043) p = .5239 Personal income >100K/yr -0.020 (0.030) p = .5142 -0.036 (0.054) p = .5094 0.016 (0.057) p = .7741

Education: High School or less - - - Education: College 0.005 (0.023) p = .8148 -0.003 (0.042) p = .9489 0.008 (0.044) p = .8523 Education: Advanced Degree -0.013 (0.028) p = .6465 -0.024 (0.050) p = .6236 0.012 (0.052) p = .8202

Values are linear statistical model parameter estimates (standard error) 2-tailed p-value, from the same analytic model used in the main text, with replacement of the 1/0 poverty indicator with other SES measures indicated above. - indicates contrast reference category

Dataset S1 (separate file). Inflammatory and interferon sub-components.

Dataset S2 (separate file). Empirical transcript associations.

References

1. R. M. Groves, F. Fowler Jr, M. Couper, J. Lepkowski, E. Singer and T. R., Survey methodology (2nd Edition) (John Wiley and Sons, 2009). 2. K. M. Harris, An integrative approach to health. Demography 47, 1-22 (2010). 3. K. M. Harris, C. T. Halpern, E. A. Whitsel, J. M. Hussey, L. A. Killeya-Jones, J. Tabor and S. C. Dean, Cohort Profile: The National Longitudinal Study of Adolescent to Adult Health (Add Health). Int J Epidemiol (2019). 4. E. Eisenberg and E. Y. Levanon, Human housekeeping genes, revisited. Trends Genet 29, 569- 74 (2013). 5. B. L. Fredrickson, K. M. Grewen, K. A. Coffey, S. B. Algoe, A. M. Firestine, J. M. Arevalo, J. Ma and S. W. Cole, A functional genomic perspective on human well-being. Proc Natl Acad Sci U S A 110, 13684-9 (2013). 6. E. Kallberg and T. Leanderson, A subset of dendritic cells express joining chain (J-chain) . Immunology 123, 590-9 (2008). 7. W. Cao, L. Zhang, D. B. Rosen, L. Bover, G. Watanabe, M. Bao, L. L. Lanier and Y. J. Liu, BDCA2/Fc epsilon RI gamma complex signals through a novel BCR-like pathway in human plasmacytoid dendritic cells. PLoS Biol 5, e248 (2007). 8. M. C. Rissoan, T. Duhen, J. M. Bridon, N. Bendriss-Vermare, C. Peronne, B. de Saint Vis, F. Briere and E. E. Bates, Subtractive hybridization reveals the expression of immunoglobulin-like transcript 7, Eph-B1, granzyme B, and 3 novel transcripts in human plasmacytoid dendritic cells. Blood 100, 3295-303 (2002). 9. J. Cao and S. Zhang, Multiple comparison procedures. JAMA 312, 543-544 (2014). 10. A. D. Althouse, Adjust for Multiple Comparisons? It's Not That Simple. Ann Thorac Surg 101, 1644-5 (2016). 11. R. Bender and S. Lange, Adjusting for multiple testing--when and how? J Clin Epidemiol 54, 343- 9 (2001). 12. R. J. Feise, Do multiple outcome measures require p-value adjustment? BMC Med Res Methodol 2, 8 (2002). 13. Y. Benjamini and Y. Hochberg, Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B 57, 289-300 (1995). 14. S. A. Mulaik, Foundations of Factor Analysis (CRC Press, 2010). 15. Y. Benjamini and D. Yekateuli, The Control of the False Discovery Rate in Multiple Testing under Dependency. Annals of Statistics 29, 1165-1188 (2001). 16. S. W. Cole, W. Yan, Z. Galic, J. Arevalo and J. A. Zack, Expression-based monitoring of transcription factor activity: The TELiS database. Bioinformatics 21, 803-810 (2005). 17. S. W. Cole, Z. Galic and J. A. Zack, Controlling false-negative errors in microarray differential expression analysis: a PRIM approach. Bioinformatics 19, 1808-16. (2003). 18. L. Shi, W. D. Jones, R. V. Jensen, S. C. Harris, R. G. Perkins, F. M. Goodsaid, L. Guo, L. J. Croner, C. Boysen, H. Fang, F. Qian, S. Amur, W. Bao, C. C. Barbacioru, V. Bertholet, X. M. Cao, T. M. Chu, P. J. Collins, X. H. Fan, F. W. Frueh, J. C. Fuscoe, X. Guo, J. Han, D. Herman, H. Hong, E. S. Kawasaki, Q. Z. Li, Y. Luo, Y. Ma, N. Mei, R. L. Peterson, R. K. Puri, R. Shippy, Z. Su, Y. A. Sun, H. Sun, B. Thorn, Y. Turpaz, C. Wang, S. J. Wang, J. A. Warrington, J. C. Willey, J. Wu, Q. Xie, L. Zhang, S. Zhong, R. D. Wolfinger and W. Tong, The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies. BMC Bioinformatics. 9, S10 (2008). 19. D. M. Witten and R. Tibshirani, A comparison of fold-change and the t-statistic for microarray data analysis (Stanford University Technical Report, 2007). 20. A. W. Norris and C. R. Kahn, Analysis of gene expression in pathophysiological states: balancing false discovery and false negative rates. Proc Natl Acad Sci U S A. 103, 649-53 Epub 2006 Jan 9 (2006). 21. E. Wingender, P. Dietze, H. Karas and R. Knuppel, TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Res 24, 238-41. (1996). 22. B. Efron and R. J. Tibshirani, An introduction to the bootstrap (Chapman & Hall, 1993). 23. S. W. Cole, L. C. Hawkley, J. M. Arevalo and J. T. Cacioppo, Transcript origin analysis identifies antigen-presenting cells as primary targets of socially regulated gene expression in leukocytes. Proc Natl Acad Sci U S A 108, 3080-5 (2011). 24. D. S. Black, S. W. Cole, G. Christodoulou and J. C. Figueiredo, Genomic mechanisms of fatigue in survivors of colorectal cancer. Cancer 124, 2637-2644 (2018).