<<

Supporting Information Supporting Information Corrected September 30, 2013 Fredrickson et al. 10.1073/pnas.1305419110 SI Methods Analysis of Differential Gene Expression. Quantile-normalized gene Participants and Study Procedure. A total of 84 healthy adults were expression values derived from Illumina GenomeStudio software recruited from the Durham and Orange County regions of North were transformed to log2 for general linear model analyses to Carolina by community-posted flyers and e-mail advertisements produce a point estimate of the magnitude of association be- followed by telephone screening to assess eligibility criteria, in- tween each of the 34,592 assayed gene transcripts and (contin- cluding age 35 to 64 y, written and spoken English, and absence of uous z-score) measures of hedonic and eudaimonic well-being chronic illness or disability. Following written informed consent, (each adjusted for the other) after control for potential con- participants completed online assessments of hedonic and eudai- founders known to affect PBMC gene expression profiles (i.e., monic well-being [short flourishing scale, e.g., in the past week, how age, sex, race/ethnicity, BMI, alcohol consumption, smoking, often did you feel... happy? (hedonic), satisfied? (hedonic), that minor illness symptoms, and leukocyte subset distributions). Sex, your life has a sense of direction or meaning to it? (eudaimonic), race/ethnicity (white vs. nonwhite), alcohol consumption, and that you have experiences that challenge you to grow and become smoking were represented by 0/1 dummy codes. Age, BMI, and a better person? (eudaimonic), that you had something to con- minor illness symptoms were treated as continuous regressors, tribute to society? (eudaimonic); answered on a six-point fre- as were observed (log2-transformed) expression levels for tran- quency metric whereby 0 indicates never, 1 indicates once or scripts marking T lymphocyte subsets (CD3D, CD3E, CD4, twice, 2 indicates approximately once per week, 3 indicates two or and CD8A), B lymphocytes (CD19), natural killer cells (CD16/ three times per week, 4 indicates almost every day, and 5 indicates FCGR3A and CD56/NCAM1), and monocytes (CD14)(4). every day] (1, 2) and depressive symptoms [per Center for Epi- The primary “low-level” analysis model for point estimation of demiological Studies–Depression (CES-D)] (3). Participants then transcript-phenotype associations is as follows: attended a late-afternoon laboratory session in which they were assessed for weight, height, and blood pressure, and provided a = + + Log2 transcript abundance Intercept Hedonic Eudaimonic 20-mL venipuncture blood sample under resting conditions. Age, + + + = + sex, race/ethnicity, smoking history, alcohol consumption, and Age Sex White Non Alcohol + + 2-wk history of 13 minor illness symptoms (e.g., headache, upset Smoking IllnessSymptoms + + CD D + CD E + CD stomach; each rated as being experienced on a frequency scale BMI 3 3 4 + CD A + CD + FCGR A with values ranting from 0 indicating not at all to 8 indicating very 8 19 3 frequently) were assessed as potential confounders. All study + NCAM1 + CD14 + residual: procedures were approved by the institutional review board of the [S1] University of North Carolina at Chapel Hill.

Analysis of Affective and Demographic Variables. Associations be- This primary low-level analysis model is used solely to provide tween affective and demographic/behavioral characteristics and point estimates of transcript–phenotype associations to serve as Short Flourishing Scale measures of hedonic and eudaimonic fi inputs into subsequent gene set expression analyses. No statistical well-being (treated as continuous variables) were quanti ed by testing is performed at the level of individual transcript–phenotype analysis for continuous variables [age, body mass associations because the goal of this study is not to discover reli- index (BMI), CES-D total scores, scores on affective and vege- able associations between the expression of individual transcripts tative symptom subscales of the CES-D, and minor illness and measured levels of hedonic or eudaimonic well-being. The symptoms, with association strength summarized by Pearson goal of this study is to test associations between eudaimonic and correlation coefficient] or by one-way ANOVA for categorical hedonic well-being and average levels of expression of specific variables (race/ethnicity, smoking history, alcohol history). Dif- sets of genes selected a priori for analysis here based on their ferences in average levels of hedonic and eudaimonic well-being previously observed involvement in the conserved transcrip- were tested by repeated-measures ANOVA, and the qualitative tional response to adversity (CTRA) (e.g., as representative predominance of eudaimonic vs. hedonic well-being was assessed proinflammatory genes, IFN-related genes, and antibody synthe- by sign test. In all analyses, residuals were checked for normal sis-related genes) as summarized in recent surveys of this re- distribution and absence of outliers, and two-tailed P values search area (5, 6). <0.05 served as the criterion for statistical significance. Analyses were conducted in SAS version 9.3. Primary Analysis: CTRA Gene Set Expression. This primary focus of this study involves testing potential differences in association of Transcriptome Profiling. Genome-wide transcriptional profiling hedonic vs. eudaimonic well-being with average expression of one was carried out on peripheral blood mononuclear cells (PBMCs) set of 53 genes selected a priori as indicators of the leukocyte isolated by Ficoll density gradient centrifugation of 20 mL CTRA gene expression profile (5, 6): (i) 19 proinflammatory anticoagulated blood samples drawn by antecubital venipuncture genes (IL1A, IL1B, IL6, IL8, TNF, PTGS1, PTGS2, FOS, FOSB, from all 80 participants who consented to provide specimens. FOSL1, FOSL2, JUN, JUNB, JUND, NFKB1, NFKB2, REL, Total RNA was extracted (RNeasy; Qiagen), tested for suitable RELA, and RELB), which are up-regulated on average in the mass (>100 ng as determined by Nanodrop ND1000) and in- CTRA; (ii) 31 genes involved in type I IFN responses (GBP1, tegrity (RNA integrity number ≥7.0; Bioanalyzer; Agilent), and IFI16, IFI27, IFI27L1-2, IFI30, IFI35, IFI44, IFI44L, IFI6, converted to fluorescent cRNA for hybridization to Illumina IFIH1, IFIT1-3, IFIT5, IFIT1L, IFITM1-3, IFITM4P, IFITM5, Human HT-12 v4 BeadArrays following the manufacturer’s IFNB1, IRF2, IRF7-8, MX1-2, OAS1-3, and OASL), which are standard protocol in the University of California, Los Angeles, down-regulated on average in the CTRA; and (iii) three genes Neuroscience Genomics Core Laboratory. Data are deposited in involved in antibody synthesis (IGJ, IGLL1, and IGLL3), which the Gene Expression Omnibus as series GSE45330. are down-regulated on average in the CTRA (5, 6). To provide

Fredrickson et al. www.pnas.org/cgi/content/short/1305419110 1of9 a single summary measure of overall CTRA profile expression, recurrent emergence as gene regulatory correlates of the CTRA a weighted average association estimate was formed by multiply- profile in multiple previous studies of a diverse range of stressors ing each transcript association point estimate by an appropriate in observational studies of human adverse life circumstances signed contrast coefficient (+1proinflammatory, −1 IFN, −1an- (4, 16–20), animal models of experimentally manipulated social tibody). The resulting average gene set association measure was stress (21–23), and randomized experiments examining the effect tested for statistically significant departure from the null hypoth- of stress-buffering interventions in humans (20, 24, 25). No other esis (0 difference in log2 metric) by using a single-sample t test transcription factor hypotheses were examined in this study. with SE term estimated by nonparametric bootstrap analysis [200 Analyses were formatted as a “transcriptional asymmetry test” cycles to estimate a stable SE, as recommended (7)]. A two-tailed comparing TFBM prevalence in promoters of up-regulated P value < 0.05 served as the criterion for statistical significance. genes with that observed in promoters of down-regulated genes Following a statistically significant omnibus test of CTRA (to ensure that all analyzed transcripts were potentially expres- composite gene set association with well-being phenotypes, an- sive in the PBMC pool, and thereby exclude transcripts that cillary analyses were conducted to identify the specific CTRA could not be expressed in this cell type but are nevertheless subsets driving that overall association. These separate analyses of present in the human genome as a whole). TFBM frequencies the 19-gene proinflammatory subset, the 31-gene IFN-related were assessed by using three parametric variations in the size of subset, and the three-gene antibody-related subset paralleled the proximal promoter sequence scanned ([−300 bp, +0 bp] relative structure of the test for the overall CTRA composite, but did not to the RefSeq transcription start site, [−600, +0], and [−1,000, + require weighting as a result of the homogenous composition of 200]) and three parametric variations in the TFBM detection each subset. stringency (TRANSFAC MatInspector algorithm, mat_sim = 0.80, 0.90, 0.95). Results from each parametric scan were sum- Secondary Analysis: CTRA-Related Transcription Factors. Previous marized as a (log2-transformed) ratio of TFBM prevalence in up- research has implicated several specific transcription control regulated vs. down-regulated promoters (8), and results were pathways in regulating CTRA gene expression, including up- averaged over the nine parametric combinations of promoter regulated activity of proinflammatory NF-κB and AP-1 tran- length and scan stringency for statistical testing with a dependent scription factors and down-regulated activity of IRF-1/2 and measures t test. Nominal P values for each of the five TFBM STAT family transcription factors (5, 6). To determine whether targets are reported in Fig. 3A, with asterisks indicating results these pathways might also contribute to the differential gene that reach statistical significance at P < 0.05 after Bonferroni expression dynamics associated with hedonic or eudaimonic correction for multiple testing. The main text interprets only well-being, we conducted Transcription Element Listening Sys- results that would be significant after correction for multiple tem (TELiS) promoter-based bioinformatics analyses (8) on testing. Statistical reliability of TELiS results was also assessed genes empirically observed in this study to show ≥1.5-fold by empirical split-half replication analyses and Monte Carlo magnitude of differential expression across the range from −2 analyses of statistical power and replication rates as described SD to +2 SD relative to the mean level of eudaimonic or hedonic later (Figs. S1–S5). High-level TELiS results from samples of well-being (with each well-being dimension adjusted for the similar and smaller sizes have been shown to be valid in previous other and for the covariates listed earlier; Eq. S1). An effect size studies replicating TELiS indications using direct measures of of 1.5-fold across the range [−2 SD, +2 SD] was selected as transcription factor activity (21, 26–28) and experimental ma- a threshold for generally reliable results based on cross-valida- nipulation of transcription factor activity (8, 21). As noted ear- tion studies of alternative effect size thresholds by using the lier, previous TELiS analyses have already implicated the specific machine learning algorithm PRIM to optimize a threshold transcription factors analyzed here (NF-κB, AP-1, IRF-1, IRF-2, forecasting >80% replication of observed differences in gene and STAT-1) in driving the CTRA gene expression profile expression (detailed in ref. 9). We did not use statistical signifi- (4, 16), and multiple studies have replicated those results in the cance criteria such as linear-model P values or false discovery confirmatory testing format used here to explore the potential rate q-values to identify differentially expressed genes because role of these factors in structuring the gene set transcriptional previous studies (and analytic results from this study; reported correlates of hedonic and eudaimonic well-being (17–21, 23–25). later and in Figs. S1–S5) show that fold-difference criteria yield more replicable gene lists than do statistical test-based criteria Secondary Analysis: CTRA-Related Cell Types. Previous research has (9–14). Dataset S1 lists genes meeting the 1.5-fold criterion for implicated several specific cell subsets within the PBMC pool as hedonic well-being, and Dataset S2 lists genes meeting that cri- potential mediators CTRA-related differences gene expression, terion for eudaimonic well-being. The PRIM-derived threshold including up-regulated activity of monocytes and plasmacytoid estimates the general replicability of a gene set–phenotype as- dendritic cells (pDCs) and down-regulated activity of B lym- sociation, but does not provide any information about the rep- phocytes (5, 6, 29). To determine whether these leukocyte licability of any association between a phenotype and an subsets might also contribute to the differential gene expression individual gene transcript. Transcripts listed in Datasets S1 and dynamics associated with hedonic and eudaimonic well-being, we S2 should not be treated as individually replicable results, and conducted transcript origin analysis (TOA) using the low-level are presented there only to record the empirical point estimates transcript association point estimates in Datasets S1 and S2 as of association that served as inputs into high-level bioinformatics input, following the general analytic format previously described analyses testing for gene set associations with observed pheno- (29). Briefly, TOA uses external reference data on isolated cell types. More information on the statistical distinctions between populations (30) to generate a cell type-specific diagnosticity low-level analyses of individual transcripts and high-level analy- score for each gene that reflects the extent to which that gene is ses of gene sets is provided in previous publications (9, 15). expressed solely or predominately by a given cell type (e.g., High-level TELiS bioinformatics analyses were conducted in an monocyte, pDC, or B cell) (29). TOA diagnosticity scores are a priori hypothesis-testing format using the following TRANS- z-scores indicating the extent to which a gene is expressed at FAC position-specific weight matrices to assess the prevalence of notably higher level in one PBMC subset than in all other PBMC transcription factor-binding motifs (TFBMs) within the proximal subsets (i.e., a z-score indicating the extent of distinctive ex- promoter sequences of up-regulated and down-regulated gene pression in one cell type) (29). For each cell type examined, av- sets: NF-κB, V$CREL_01; AP-1, V$AP1_Q4; IRF-1, V$IRF1_01; erage TOA diagnosticity scores were computed for the gene set IRF-2, V$IRF2_01; STAT1, V$STAT_01 (8). These specific under study and tested for significant elevation (i.e., difference transcription factors were selected for analysis based on their from the genome-wide average z-score value) using a z-test

Fredrickson et al. www.pnas.org/cgi/content/short/1305419110 2of9 (because the population SD of TOA z scores for a given cell type compared the reliability of individual transcript-phenotype as- is already known from the census of human genes surveyed in the sociation tests with the gene set association tests used in this external reference study) (29, 30). Significant elevation in the study to assess the CTRA profile and CTRA-related transcrip- average diagnosticity score across an empirically defined gene set tion factors (TELiS) and cell types (TOA). Monte Carlo analyses has been shown to provide a accurate indication of cell-specific used the general data characteristics of the present dataset (i.e., transcriptional dynamics in previous validation studies (29) [e.g., sample size, n = 80; differential gene expression effect sizes TOA indications of monocyte involvement in the CTRA (5, 6, of 1.1–1.5-fold over the range [−2 SD, +2 SD] of well-being 20, 29) have been confirmed in direct studies of immuno- scores; 10,000 significantly expressed genes as estimated by Il- magnetically isolated monocytes (16, 19)]. In the present analy- lumina GenomeStudio software Detection P-Values); stochastic ses, we first screened for potential involvement of monocytes, variation modeled as a normally distributed random variable pDCs, and B cells by using a first-stage omnibus test assessing with mean 0 and dispersion (SD) ranging from 0.1 to 2.0 log2-ex- average TOA diagnosticity scores for all differentially expressed pression units (i.e., spanning the general range of transcript- genes (i.e., the pool of up- and down-regulated genes). Following specific SDs observed for the genes listed in Datasets S1 and S2, a significant first-stage test statistic for a given cell type, second- as indicated by horizontal axis box plots in Figs. S1, S2, S4, and stage ancillary analyses tested specific cell type contributions to S5). Each analysis cycle involved: (i) generation of 80 individual the up-regulated gene set and the down-regulated gene set “observed data values” from the standard linear regression separately. Fig. 3B presents results of second-stage analyses, and model (i.e., ObservedExpressionValue = TrueEffect + Ran- all results presented there and interpreted in the text passed first- domVariation; where TrueEffect = TrueEffectSize * Re- stage significance testing at P < 0.05 with Bonferroni correction gressorValue, with the RegressorValue modeled as a uniformly for multiple hypothesis testing. Statistical reliability of TOA re- distributed variable over the range [0, 1] and TrueEffectSize sults was also assessed by empirical split-half replication analyses specified as a [log2-transformed] 1.5-fold difference over the (Figs. S6 and S7) and Monte Carlo analyses of statistical power range [−2 SD, +2 SD] on an a priori-selected subset of 50 “truly and replication rates as described later (Figs. S1–S5). affected genes” corresponding to the general size of the target gene sets analyzed here in analyses of CTRA, TELiS TFBM Ancillary Analysis of Split-Half Replication. To assess the replicability gene sets, and the set of highly diagnostic genes in TOA; of high-level results derived from empirically defined gene sets and TrueEffectSize = 0 for the remaining “truly unaffected (i.e., TELiS and TOA results based on the transcripts listed in genes”; and RandomVariation = StochasticVariationMagnitude Datasets S1 and S2), we conducted split-half replication studies in * RandomNormalDeviate, with StochasticVariationMagnitude fi which analytic results were derived in the rst half of the sample ranging from 0.2 to 2.0 on the log2 expression metric and and then tested for concordance with results derived indepen- RandomNormalDeviate representing a random observation dently from the second half of the sample. Replicability was from a with a mean value of 0 and a SD of gauged by correlation of quantitative point estimates of effect 1); (ii) analysis of the resulting 80 (subject) × 10,000 (expressed size. (Comparison of quantitative statistical significance levels in gene) data matrix using: (a) a statistical test of individual tran- split-half samples is technically invalid because the 50% reduction script–phenotype association (i.e., a standard linear regression in each sample’s size results in substantial underestimation of the model testing for association between ObservedExpressionValue true magnitude of statistical significance derived from the whole and RegressorValue, with P values corrected for multiple testing sample.) Results for TELiS transcription factor analyses showed over 10,000 genes); (b) a simple point estimate of transcript- a correlation of +0.76 in estimated effect size across the five phenotype association (i.e., comparing the observed point es- TFBMs analyzed. (We assessed TELiS effect size reliability only timate of association between ObservedExpressionValue and for results on the eudaimonic-associated gene set because that RegressorValue to an a priori-selected threshold such as the 1.5- set of TELiS results was the only one to emerge as statistically fold criterion used in the present study); (c) a statistical test of reliable in the total-sample analyses reported in Results.As gene set association based on genes declared to be differentially noted there, no reliable TELiS results emerged for the hedonic- expressed according to the transcript-phenotype statistical test (i. associated gene set and no consistency would be expected in e., P value) criterion of ii(a), with the gene set association split-half analyses.) Results for TOA analyses are presented in analysis testing for relative enrichment of the 50 truly affected Figs. S6 and S7 and show good concordance for the first-stage genes within the set of all genes declared differentially expressed omnibus test of all differentially expressed transcripts (r = 0.73 for based on the statistical criterion of ii(a) using a binomial test eudaimonic-associated genes and r = 0.63 for hedonic-associated against the expected probability under independence of 50 truly genes; Fig. S6) and for second-stage analyses involving separate affected genes/10,000 total expressed genes; (d) a statistical test tests of up- and down-regulated gene sets (r = 0.82 for eudai- of gene set association based on genes declared differentially monic and r = 0.79 for hedonic; Fig. S7). Thus, despite the expressed according to the point estimate criterion of ii(b) [with known negative bias of split-half replication analyses (as a result gene set association tested using the same binomial analysis of loss of statistical power in half-sized samples), these results described earlier for ii(c); this is the analytic format of the TELiS indicate that the main bioinformatics findings of this study are and TOA hypothesis tests performed on empirically observed stable. However, because split-half replication studies under- point estimates of differential gene expression in Datasets S1 estimate the true reliability of results derived from the whole and S2]; and ii(e) a statistical test of gene set association based sample, we also conducted Monte Carlo analyses of statistical on the average point estimate of association for an a priori-se- power and result replicability to provide more accurate estimates lected set of specific genes using the dependent-measures t test of the true reliability of the present full-sample results. approach used to assess association between the 53-gene CTRA composite and well-being phenotypes; followed by (iii) quan- Ancillary Monte Carlo Analysis of Statistical Power and Replicability. tification of analytic error rates for each of the association tests To provide accurate estimates of the present results’ replicability, in ii(a)toii(e), with errors defined as follows: (a) “false positives” we conducted standard Monte Carlo analyses of statistical power defined as the fraction of truly nonaffected genes (i.e., TrueEf- and replication rates, as previously described (9, 31, 32). (Monte fectSize 0) that are declared by the analysis to be differentially Carlo studies use massively repeated analysis of simulated da- expressed; (b) “false negatives” defined as the fraction of truly tasets to test the accuracy of statistical analysis results when the affected genes (i.e., TrueEffectSize >0) that are declared by the true effect size is known and analytic results can thus be de- analysis to be not differentially expressed; and (iv) quantifi- termined definitively to be true or false.) These analyses also cation of observed replication rates by concordance of results

Fredrickson et al. www.pnas.org/cgi/content/short/1305419110 3of9 from tests ii(a)toii(e) in one Monte Carlo cycle with those of analyses based on low-level point estimates did not stem from a second Monte Carlo cycle. We performed 1,000 cycles of Monte reduced false positive errors (i.e., fewer “truly unaffected” Carlo analysis and averaged results to estimate the general analytic transcripts being erroneously forwarded to the high-level analy- performance of each test. Analyses were conducted using a Java sis). As shown in Fig. S3A, point estimate-based screening ac- implementation of the general Monte Carlo, matrix algebra, and tually yielded higher rates of false-positive errors than did numerical analysis algorithms previously described (9, 31, 32). statistical testing criteria. Instead, the differing performance of Results are presented in Figs. S1–S5 and summarized in the fol- the two low-level screening strategies stemmed from sub- lowing sections. stantially greater rates of false-negative error incurred by the statistical testing criterion (i.e., fewer truly affected transcripts Replication of Gene Set Association. Results shown in Fig. S1 (solid advanced to the high-level analysis). As shown in Fig. S3B, false- black circles) show that high-level gene set associations of the size positive error rates for statistical test-based screen (P-value cri- observed here for the CTRA composite (∼1.1-fold) are highly terion; Fig. S3B, gray diamonds) increased rapidly with in- replicable when derived from a sample of the present size (N = creasing stochastic variability in gene expression and exceeded “ ” 80) using the gene set pooled point estimate analysis used here 90% by the median intratranscript SD (0.72 log2 units). In [i.e., approach ii(e)]. Results are robust across the range of sto- contrast, false-positive error rates in the low-level point estimate- chastic variation in gene expression levels observed in this study, based screen (Fig. S3B, black circles) were maintained at a lower with replication rates >80% over the interquartile range of gene- and stable level throughout the range of observed intratranscript specific residual SDs (represented in Fig. S1 by solid gray boxes SDs. (In Fig. S3B, the near-perfect stability of false-positive error ranging from the 25th-percentile value of 0.53 log2 units to the rates with increasing stochastic variability is a byproduct of the 75th-percentile value of 0.95). These high replication rates emerged fact that the dataset presented used 1.5 fold as the criterion for from a priori gene set differential expression analysis [test ii(e)] point estimate-based screening and as the data generation despite the fact that the replicability of individual transcript– TrueEffectSize. However, the superior false positive control of phenotype associations (Fig. S1, gray diamonds) was modest by point estimate-based screening is a general phenomenon and not comparison and trended to 0 over much of the observed range of limited to this particular situation; Fig. S4 provides an example stochastic variation in transcript-specific expression values (i.e., in which the TrueEffectSize is substantially smaller than point over the interquartile range marked by the gray box in Fig. S1). estimate-based screening threshold.) High replication rates are also observed for the a priori gene set As a consequence of false-negative error rates that approach enrichment analysis used for TELiS inferences of transcription 100% in the low-level screening analysis, the number of truly factor activity and TOA inferences of cellular origin (Fig. S2A). affected genes that are advanced into the high-level bioinfor- Replication rates for gene set associations (Fig. S2A, black cir- matics analyses approaches 0 throughout the range of observed cles) again exceeded 90% over the range of observed signal-to- signal-to-noise ratios. This is a consequence of (unnecessary) noise ratios (Fig. S2A, box-plot), and again proved reliable despite efforts to control false-positive error rates at the level of in- the fact that the individual transcript–phenotype associations dividual transcript–phenotype associations. However, as shown it took as inputs were themselves unstable (Fig. S2A, black dia- by the high replicability of high-level results derived form point monds). In this high-level gene set enrichment analysis, use of estimate-based screening (i.e., Figs S1, S2A, and S5), stringent the low-level point estimate test to identify differentially ex- control of false-positive errors in low-level association screening pressed genes [test ii(b)] contributed significantly to the stability is not necessary for the discovery of reliable gene set–phenotype of high-level results. “Screening in” genes based on the low-level associations. The result-stabilizing (i.e., noise-suppressing) ef- statistical test of individual transcript–phenotype association fects of the Central Limit Theorem generally increase the sta- [i.e., test ii(a)] resulted in substantially less replicable high-level bility of gene set-based averages to yield replicable high-level gene set enrichment results (Fig. S2B, gray circles). results across the range of observed low-level signal-to-noise How is it that associations between gene sets and phenotypes ratios (interquartile boxes in Figs. S1, S2A, and S5). That is true can reach high levels of reliability when the associations between in large part because the high-level statistical tests maintain their individual genes and phenotype that they are derived from do own control over analytic error rates and thus render stringent not? High-level bioinformatics tests can capitalize on the “law of low-level control redundant (and superfluous). If the low-level large numbers” [the Central Limit Theorem (33, 34)] to sub- transcript association inputs into the high-level gene set analyses stantially reduce the magnitude of sampling variability in estimates do not stem from any true association (i.e., they reflect only false of gene set averages (reflecting the shared biological features of positives), the high-level tests will appropriately fail to declare a gene set such as their common involvement in a biological evidence of association (33, 34). That will be true regardless of process such as the CTRA, their common regulation by a given whether (i) the low-level input gene set has been stringently transcription factor, or their common expression by a given cell stripped of false positives [e.g., to the level <0.001% per gene, or type). (Ref. 15 provides a fuller discussion of the differing statis- 5% errors over a set of 10,000 genes, as in Fig. S3A, with gray tical properties of high-level and low-level gene expression tests.) diamonds representing the individual transcript–phenotype as- Stabilization of sampling variability in high-level bioinformatic sociation test of ii(a)], (ii) the low-level input gene set is subject inferences exemplifies the broader statistical theme that accumu- to fractional contamination by false positives [e.g., 20–30%, as in lation of many individually noisy indicator variables can yield Fig. S3A, with black circles representing the point estimate-based highly stable estimates of the underlying factors they share in screen of ii(b)], or (iii) the low-level input gene set is 100% common (e.g., CTRA involvement, transcription factor regulation, contaminated with false-positive errors (33, 34). In each of these cell type of origin) (34, 35). To clarify the mechanisms by which cases, the high-level statistical test maintains accurate control alternative low-level screening criteria impacted the reliability of over false-positive error rates in high-level results regardless of subsequent high-level bioinformatics results, we compared analytic the fraction of false-positive errors present in the low-level input error rates for high-level tests based on gene sets derived from gene set (33, 34). This is empirically verified in Fig. S3C, which the transcript–phenotype statistical testing criterion [ii(a)] shows that the high-level gene set enrichment test maintains and point estimation criterion [ii(b)]. exactly 5% false-positive error rates regardless of whether the input gene set shows minimal false-positive contamination [e.g., Analytic Performance and Statistical Power. Results from Monte gray diamonds representing input from the low-level statistical Carlo analyses of false-positive and false-negative analytic errors association test ii(a)] or more substantial false-positive contam- (Fig. S3) showed that the superior performance of high-level ination [e.g., black circles representing high level-results based

Fredrickson et al. www.pnas.org/cgi/content/short/1305419110 4of9 on the low-level point effect criterion of ii(b), the results of which signal-to-noise ratios. As shown in Fig. S5, the sensitivity of high- fall so exactly on 5% across the range of observed signal-to-noise level association tests based on the low-level point estimate screen ratios that they occlude the underlying gray diamonds]. The will eventually degrade as noise levels increase and as the mag- analytic results in Fig. S3C are based on the binomial gene set nitude of true signal (TrueEffectSize) decreases (compare results enrichment test [tests ii(c) and ii(d)], but identical results occur in Fig. S5A for a comparatively large TrueEffectSize with those for the continuous gene set differential expression test [test ii(e)]. in Fig. S5B for a smaller TrueEffecSize, holding constant the These results verify that the basic statistical validity of significant range of stochastic variation). However, the point estimate-based results from a high-level gene set association test is not affected screen is substantially more robust than statistical test-based by the magnitude of truly unaffected (i.e., false positive) genes screening to the joint effects of high stochastic variability and that contaminate its input dataset. massively parallel testing. This is particularly true over the range When substantive interest focuses only on testing a priori bi- of stochastic variation values representative of gene expression ological hypotheses represented by gene lists (as opposed to data (box-plotted interquartile ranges). Fig. S5 shows this effect discovering new associations between phenotypes and specific in the context of the high-level test of continuous gene set as- individual transcripts), low-level statistical testing of individual sociation [ii(e)], and Fig. S1 illustrates the same general effects of gene–phenotype associations is not only unnecessary but can increasing and decreasing signal on the high-level test actually undermine the identification of replicable gene set– of discrete gene set enrichment analysis [ii(c) and ii(d)]. phenotype associations. Fig. S4D shows one way this happens as the high-level bioinformatic test incurs a substantially higher rate Implications of the Present Results. Both Monte Carlo analyses of of false-negative errors when the input gene set is subject to analytic performance and split-half replication studies support the aggressive false-positive error control in low-level screening. In reliability of the results observed in our primary analyses. The this case, the high-level test fails to identify the (true) gene set– present study is underpowered for analyses that seek to discover phenotype association because the stringent control of false- new reliable associations between individual transcript expression positive errors in the low-level screen substantially reduces the and the well-being phenotypes assessed here. However, this study number of genes entering into the high-level test, and thus de- maintains ample statistical power for the focused testing of prives the high-level test of the statistical power required to ac- specific a priori hypotheses relating well-being phenotypes to curately estimate the magnitude of observed gene set–phenotype variations in the aggregate expression of CTRA-related gene sets association and distinguish it from the null hypothesis (33, 34). of size 50 or greater. Moreover, the CTRA-related gene sets Despite the fact that the alternative point estimate-based screen showing significant association with the present well-being phe- admits a greater false-positive contamination into the input gene notypes have previously been shown to be replicably associated set (Fig. S4A), it also advances substantially greater amounts of with other major indicators of human ill-being, including the truly informative data and thus yields a lower false-negative rate CTRA-defining proinflammatory, type I IFN-related, and anti- (Fig. S4D). The second way that low-level overtesting under- body synthesis-related genes (4, 16–23, 29); TELiS promoter-based mines the discovery of replicable gene set associations involves a gene sets used in analysis of immunoregulatory transcription reduction in the pool of true results that are available to be con- factor activity (4, 16, 20, 23, 25); and cell type-diagnostic gene cordantly observed (i.e., replicated) in future studies. Whereas sets used in TOA analyses of mediating cell types (20, 23, 25, 29). nonreplicated associations are often assumed to reflect false-posi- Thus, the present data add to a growing body of results relating tive errors in an initial discovery study, statistical overtesting in the these specific gene sets to major dimensions of individual well- second study can also prevent the appropriate replication of a true being vs. ill-being, and the distinctive associations of hedonic and association that was accurately identified in a discovery study. eudaimonic well-being with the CTRA “gene expression refer- Although point estimate-based screening of low-level as- ence point” clarifies which specific component of well-being sociations significantly improves the reliability of higher-order contributes most directly to the observed epidemiological asso- bioinformatic inferences, it is not immune to the effects of low ciation between overall well-being and human health (36, 37).

1. Keyes C (2006) The Mental Health Continuum-Short Form (MHC-SF) for adults. 14. Shi L, et al.; MAQC Consortium (2010) The MicroArray Quality Control (MAQC)-II study Available at http://calmhsa.org/wp-content/uploads/2013/06/MHC-SFEnglish.pdf. Accessed of common practices for the development and validation of microarray-based July 12, 2013. predictive models. Nat Biotechnol 28(8):827–838. 2. Lamers SM, Westerhof GJ, Bohlmeijer ET, ten Klooster PM, Keyes CL (2011) Evaluating 15. Cole SW (2010) Elevating the perspective on human stress genomics. the psychometric properties of the Mental Health Continuum-Short Form (MHC-SF). J Psychoneuroendocrinology 35(7):955–962. Clin Psychol 67(1):99–110. 16. Miller GE, et al. (2008) A functional genomic fingerprint of chronic stress in humans: 3. Radloff LS (1977) The CES-D scale: A self-report depression scale for research in the Blunted glucocorticoid and increased NF-kappaB signaling. Biol Psychiatry 64:266–272. general population. Appl Psychol Meas 1:386–401. 17. Miller GE, et al. (2009) Low early-life social class leaves a biological residue manifested 4. Cole SW, et al. (2007) Social regulation of gene expression in human leukocytes. by decreased glucocorticoid and increased proinflammatory signaling. Proc Natl Acad Genome Biol 8(9):R189. Sci USA 106(34):14716–14721. 5. Irwin MR, Cole SW (2011) Reciprocal regulation of the neural and innate immune 18. Chen E, Miller GE, Kobor MS, Cole SW (2011) Maternal warmth buffers the effects of systems. Nat Rev Immunol 11(9):625–632. low early-life socioeconomic status on pro-inflammatory signaling in adulthood. Mol 6. Cole S (2012) Social regulation of gene expression in the immune system. The Oxford Psychiatry 16(7):729–737. Handbook of Psychoneuroimmunology, ed Segerstrom S (Oxford Univ Press, New 19. O’Donovan A, et al. (2011) Transcriptional control of monocyte gene expression in York), pp 254–273. post-traumatic stress disorder. Dis Markers 30(2-3):123–132. 7. Efron B, Tibshirani RJ (1993) An Introduction to the Bootstrap (Chapman and Hall, 20. Antoni MH, et al. (2012) Transcriptional modulation of human leukocytes by New York). cognitive-behavioral stress management in women undergoing treatment for breast 8. Cole SW, Yan W, Galic Z, Arevalo J, Zack JA (2005) Expression-based monitoring of cancer. Biol Psychiatry 71:366–372. transcription factor activity: The TELiS database. Bioinformatics 21(6):803–810. 21. Cole SW, et al. (2010) Computational identification of gene-social environment 9. Cole SW, Galic Z, Zack JA (2003) Controlling false-negative errors in microarray interaction at the human IL6 locus. Proc Natl Acad Sci USA 107(12):5681–5686. differential expression analysis: A PRIM approach. Bioinformatics 19(14):1808–1816. 22. Tung J, et al. (2012) Social environment is associated with gene regulatory variation in 10. Shi L, et al.; MAQC Consortium (2006) The MicroArray Quality Control (MAQC) project the rhesus macaque immune system. Proc Natl Acad Sci USA 109(17):6490–6495. shows inter- and intraplatform reproducibility of gene expression measurements. Nat 23. Cole SW, et al. (2012) Transcriptional modulation of the developing immune system Biotechnol 24(9):1151–1161. by early life social adversity. Proc Natl Acad Sci USA 109(50):20578–20583. 11. Guo L, et al. (2006) Rat toxicogenomic study reveals analytical consistency across 24. Creswell JD, et al. (2012) Mindfulness-Based Stress Reduction training reduces microarray platforms. Nat Biotechnol 24(9):1162–1169. loneliness and pro-inflammatory gene expression in older adults: A small randomized 12. Witten DM, Tibshirani R (2007) A Comparison of Fold-Change and the t-Statistic for controlled trial. Brain Behav Immun 26(7):1095–1101. Microarray Data Analysis (Stanford Univ Press, Stanford, CA). 25. Black DS, et al. (2013) Yogic meditation reverses NF-κB and IRF-related transcriptome 13. Shi L, et al. (2008) The balance of reproducibility, sensitivity, and specificity of lists of dynamics in leukocytes of family dementia caregivers in a randomized controlled differentially expressed genes in microarray studies. BMC Bioinformatics 9(suppl 9):S10. trial. Psychoneuroendocrinology 38(3):348–355.

Fredrickson et al. www.pnas.org/cgi/content/short/1305419110 5of9 26. Irwin MR, et al. (2008) Sleep loss activates cellular inflammatory signaling. Biol 31. Kroese DP, Taimre T, Botev ZI (2011) Handbook of Monte Carlo Methods (Wiley, Psychiatry 64(6):538–540. Hoboken, NJ). 27. Irwin MR, Wang M, Campomayor CO, Collado-Hidalgo A, Cole S (2006) Sleep 32. Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2002) Numerical Recipes in C: deprivation and activation of morning levels of cellular and genomic markers of The Art of Scientific Computing (Cambridge Univ Press, New York). inflammation. Arch Intern Med 166(16):1756–1762. 33. Miller RG (1986) Beyond ANOVA: Basics of Applied Statistics (Wiley, New York). 28. Brown HJ, et al. (2010) Gene expression and transcription factor profiling reveal 34. Casella G, Berger RL (1990) Statistical Inference (Brooks-Cole, Belmont, CA). inhibition of transcription factor cAMP-response element-binding protein by gamma- 35. Dillon WR, Goldstein M (1984) Multivariate Analysis: Methods and applications herpesvirus replication and transcription activator. J Biol Chem 285(33):25139–25153. (Wiley, New York). 29. Cole SW, Hawkley LC, Arevalo JM, Cacioppo JT (2011) Transcript origin analysis 36. Friedman EM (2012) Well-being, aging, and immunity. The Oxford Handbook identifies antigen-presenting cells as primary targets of socially regulated gene of Psychoneuroimmunology, ed Segerstrom S (Oxford Univ Press, New York), pp expression in leukocytes. Proc Natl Acad Sci USA 108(7):3080–3085. 37–62. 30. Su AI, et al. (2004) A gene atlas of the mouse and human protein-encoding 37. Ryff CD, et al. (2006) Psychological well-being and ill-being: Do they have distinct or transcriptomes. Proc Natl Acad Sci USA 101:6062–6057. mirrored biological correlates? Psychother Psychosom 75(2):85–95.

Fig. S1. Replicability of results from gene set differential expression analyses and gene-specific association analyses. Replicability of high-level gene set differential expression results was tested in 1,000 Monte Carlo analyses modeling the empirical dataset analyzed here, with true effect sizes ranging from 1.1- fold to 2.0-fold and magnitudes of gene-specific stochastic variation representative of those observed for genes in Datasets S1 and S2 (summarized by hor- izontal axis box-and-whisker plots, with boxes spanning the interquartile range of intragene SDs and whiskers ranging from first to 99th percentile). Gene set differential expression analyses tested the average value of 50 linear regression-based point estimates of transcript–phenotype association. Consistency of statistical significance results across replicate independent Monte Carlo experiments was expressed as percent replication of positive associations derived from low-level regression analyses of individual transcript–phenotype associations (gray diamonds) and high-level analyses of gene set–phenotype associations (black circles). Representative results are provided for true association effect sizes of 1.25 fold (A) and 1.10 fold (B). Additional details are specified in SI Methods, Ancillary Monte Carlo Analysis of Statistical Power and Replicability.

Fig. S2. Effect of low-level gene screening on replicability of results from high-level gene set analyses. Replication rates for high-level gene set enrichment tests based on alternative low-level association screens: (A) comparing a low-level linear regression point estimate of transcript–phenotype association to an a priori-defined threshold (1.5-fold shown) and (B) statistical testing of low-level transcript–phenotype association by linear regression with correction for multiple hypothesis testing across 10,000 analyzed transcripts. Box-and-whisker plots indicate the empirical range of stochastic variation in gene expression, and replicability of high-level gene set enrichment results (circles) is compared with replicability of individual transcript–phenotype associations (diamonds) for each type of low-level screen.

Fredrickson et al. www.pnas.org/cgi/content/short/1305419110 6of9 Fig. S3. Analytic error rates for high- and low-level association tests using alternative low-level association screening criteria. Rates of false-positive errors (truly unaffected transcripts declared to be associated with phenotype) and false-negative errors (truly affected transcripts declared not to be associated with phenotype) were quantified by Monte Carlo analyses of high-level gene set enrichment (as in Fig. S2). Effects of alternative low-level association screens based on point es- timate comparison with a preselected threshold (1.5-fold shown; circles) or statistical testing by linear regression (P < 0.05 with correction for multiple testing; diamonds) are shown for low-level tests of individual transcript–phenotype association (A and B) and high-level tests of gene set enrichment (C and D).

Fig. S4. Analytic error rates for alternative low-level transcript–phenotype association tests in the context of small true effect size. (A) False-positive and (B) false-negative error rates for a 1.25-fold effect screened by using the point estimate criterion of 1.5 fold (compare vs. Fig. S3 A and B). Box-and-whisker plots indicate the empirical range of stochastic variation in gene expression values, and data points show replicability of low-level transcript–phenotype associations based on the point estimate criterion (circles) or statistical testing by linear regression with P values corrected for multiple testing (diamonds).

Fredrickson et al. www.pnas.org/cgi/content/short/1305419110 7of9 Fig. S5. Effect of true effect size on replicability of gene set enrichment results derived from alternative low-level association screens. Replication rates were assessed for high-level tests of gene set enrichment in the presence of a 1.5-fold (A) or 1.25-fold (B) true effect. Box-and-whisker plots indicate the empirical range of stochastic variation in gene expression values. Data points indicate replicability of high-level gene set enrichment results based on low-level screening by point estimate criteria (circles) or statistical testing criterion (diamonds).

Fig. S6. Split-half replication analysis of omnibus transcript origin results. TOA assessed the cellular origins of genes showing differential expression (up- and down-regulated) in association with eudaimonic well-being (A) and hedonic well-being (B) in separate analyses of each split-half sample (indicated by black and gray bars, with 40 subjects each). Data represent the mean (±SEM) TOA cell-type diagnosticity score for each PBMC subset, with P values indicating significance of results derived from each half-sample, and consistency of effect size point estimates quantified by Pearson correlation coefficient.

Fredrickson et al. www.pnas.org/cgi/content/short/1305419110 8of9 Fig. S7. Split-half replication analysis of transcript origin results for up-regulated and down-regulated genes. Split-half replication consistency was assessed as in Fig. S6 for results from separate transcript origin analyses of up- and down-regulated transcripts for genes differentially expressed in association with eudaimonic well-being (A) and hedonic well-being (B).

Other Supporting Information Files

Dataset S1 (XLS) Dataset S2 (XLS)

Fredrickson et al. www.pnas.org/cgi/content/short/1305419110 9of9