ENVIRONMENTAL HEALTH ehp PERSPECTIVES ehponline.org
450K Epigenome-Wide Scan Identifies Differential DNA Methylation in Newborns Related to Maternal Smoking During Pregnancy
Bonnie R. Joubert, Siri E. Håberg, Roy M. Nilsen, Xuting Wang, Stein E. Vollset, Susan K. Murphy, Zhiqing Huang, Cathrine Hoyo, Øivind Midttun, Lea A. Cupul-Uicab, Per M Ueland, Michael C. Wu, Wenche Nystad, Douglas A. Bell, Shyamal D. Peddada, Stephanie J. London
http://dx.doi.org/10.1289/ehp.1205412
Online 31 July 2012
National Institutes of Health U.S. Department of Health and Human Services Page 1 of 41
450K Epigenome-Wide Scan Identifies Differential DNA Methylation in Newborns Related
to Maternal Smoking During Pregnancy
Bonnie R. Joubert 1, Siri E. Håberg 2, Roy M. Nilsen 3, Xuting Wang 1, Stein E. Vollset 2,4 , Susan
K. Murphy 5, Zhiqing Huang 5, Cathrine Hoyo 5,
Øivind Midttun 6, Lea A. Cupul Uicab 1, Per M Ueland 4, Michael C. Wu 7, Wenche Nystad 2,
Douglas A. Bell 1, Shyamal D. Peddada 1, Stephanie J. London 1*
1 Division of Intramural Research, National Institute of Environmental Health Sciences, National
Institutes of Health, Department of Health and Human Services, Research Triangle Park, North
Carolina, USA; 2 Norwegian Institute of Public Health, Oslo, Norway; 3 Haukeland University
Hospital, Bergen, Norway; 4 University of Bergen, Bergen, Norway; 5 Duke University School
of Medicine, Durham NC, USA; 6 Bevital A/S, Laboratoriebygget, Bergen, Norway; 7 University
of North Carolina, Chapel Hill, NC, USA
* Correspondence should be addressed to: SJ London, 111 T.W. Alexander Drive, Rall Bldg, A306, RTP, NC 27709 Tel: (919) 541 5772, Fax: (919) 541 2511 [email protected]
Short title: DNA Methylation in Newborns of Smoking Mothers
Key words: Epigenetics, epigenome wide, in utero , maternal smoking, methylation
1
Page 2 of 41
Acknowledgments
We are grateful to the participating families in Norway who take part in this ongoing cohort study. We thank Rolf Erik Kolstad, Arild Sunde, and Kari Harbak from the MoBa
Biobank, and Huiling Li, Gary Pittman, Laura Wharey, and Kevin Gerrish from the NIEHS for expert technical assistance. We acknowledge Grace Chiu of Westat, Inc., Shuangshuang Dai of
SRA International, Inc., and John Grovenstein of the NIEHS, for computing support and Dmitri
Zaykin, Paul Wade, and Min Shi of the NIEHS for helpful comments on the manuscript. O.M. is employed by Bevital A/S, Loboratoriebygget, Bergen, Norway.This research was supported in part by the Intramural Research Program of the NIH, NIEHS (Z01 ES 49019). The Norwegian
Mother and Child Cohort Study is supported by the Norwegian Ministry of Health and the
Ministry of Education and Research, NIH/NIEHS (contract no NO ES 75558), NIH/NINDS
(grant no.1 UO1 NS 047537 01), and the Norwegian Research Council/FUGE (grant no.
151918/S10). The Newborn Epigenetics STudy is supported by NIH grants R21ES01497,
R01ES016772, R01DK085173, P30ES011961 and by the Duke Comprehensive Cancer Center.
None of the authors has any actual or potential competing financial interests.
Abbreviations 450K: Infinium HumanMethylation450 BeadChip CpG: cytosine guanine dinucleotide DMR: differentially methylated region MN: mononuclear cells MoBa: Norwegian Mother and Child Cohort Study NEST: Newborn Epigenetics Study PAH: polycyclic aromatic hydrocarbons PM: polymorphonuclear cells XRE: xenobiotic response element
2
Page 3 of 41
Abstract Background : Epigenetic modifications, such as DNA methylation, due to in utero exposures
may play a critical role in early programming for childhood and adult illness. Maternal smoking
is a major risk factor for multiple adverse health outcomes in children but the underlying
mechanisms are unclear.
Objective : We investigated epigenome wide methylation in cord blood of newborns in relation
to maternal smoking during pregnancy.
Methods : We examined maternal plasma cotinine (an objective biomarker of smoking)
measured during pregnancy in relation to DNA methylation at 473,844 CpG sites (CpGs) in
1,062 newborn cord bloods from the Norwegian Mother and Child Cohort Study (MoBa) using
the Infinium HumanMethylation450 BeadChip (450K).
Results : We found differential DNA methylation at epigenome wide statistical significance (p
value<1.06x10 7) for 26 CpGs mapped to 10 genes. We replicated findings for CpGs in AHRR ,
CYP1A1 , and GFI1 at strict Bonferroni corrected statistical significance in a U.S. birth cohort.
AHRR and CYP1A1 play a key role in the aryl hydrocarbon signaling pathway, which mediates
the detoxification of the components of tobacco smoke. GFI1 is involved in diverse
developmental processes but has not previously been implicated in responses to tobacco smoke.
Conclusions : We identified a set of genes with methylation changes present at birth in children
whose mothers smoked during pregnancy. This is the first study of differential methylation
across the genome in relation to maternal smoking during pregnancy using the 450K platform.
Our findings implicate epigenetic mechanisms in the pathogenesis of the adverse health
outcomes associated with this important in utero exposure.
3
Page 4 of 41
Introduction
Maternal smoking during pregnancy is a major risk factor for adverse health outcomes in children including low birth weight, some childhood cancers, reduced lung function and early respiratory illnesses (Office of the Surgeon General 2006). Recent evidence suggests that maternal smoking during pregnancy leads to obesity and elevated blood pressure in children
(Brion et al. 2008; Cupul Uicab et al. 2012). The underlying mechanisms for the diverse effects of maternal smoking during pregnancy on offspring may involve epigenetic modifications such as DNA methylation.
DNA cytosine methylation plays a key role in modulating the transcriptional potential of the genome and may influence the development of complex human diseases (Feinberg 2010).
Changes to DNA methylation can occur throughout life but much of the epigenome is established during embryogenesis and early development of the fetus (Reik 2007).
Data from animal models demonstrate that maternal exposures, such as dietary methyl donors (Hollingsworth et al. 2008) and bisphenol A (Dolinoy et al. 2007) can impact offspring phenotypes via DNA methylation. A few human studies have examined epigenetic alterations in relation to maternal smoking during pregnancy and reported it to be associated with global methylation of leukocyte DNA using a [ 3H] methyl acceptance assay (Terry et al. 2008), global
(LINE 1 and AluYb8) methylation in human placenta (Wilhelm Benartzi et al. 2012), or differential methylation at cytosine guanine dinucleotide (CpG) sites [CpG specific methylation using the Illumina Infinium HumanMethylation27 Beadchip (27K) measuring approximately
27,000 CpGs] in human placenta (Suter et al. 2011). Maternal tobacco use during pregnancy has also been associated with global (LINE 1 and AluYb8) and CpG specific methylation (Illumina
GoldenGate Cancer methylation panel I measuring 1,505 CpGs) in buccal cells from children
4
Page 5 of 41
(Breton et al. 2009). To date, the effect of maternal smoking on differential DNA methylation
has not been evaluated with more comprehensive epigenomic coverage than offered by the 27K
platform. Using the Infinium HumanMethylation450 Beadchip (450K), which measures CpG
methylation at over 470,000 CpGs, we evaluated the relationship between maternal smoking and
DNA methylation in 1,062 infant cord blood samples from a birth cohort in Norway. We
assessed maternal smoking objectively by measuring cotinine, a sensitive biomarker, in maternal
plasma samples, and replicated our findings in an independent birth cohort study from the U.S.
To our knowledge, this is the largest human study of any in utero exposure in relation to DNA
methylation at birth using the 450K platform with improved epigenome wide coverage.
Methods
Participants in the current analysis were selected from a substudy of the Norwegian
Mother and Child Cohort Study (MoBa) (Magnus et al. 2006; Ronningen et al. 2006) that
evaluated the association between maternal plasma folate during pregnancy and childhood
asthma status at age 3 years (Haberg et al. 2011). We analyzed 1,062 participants of this study
who had available cord blood samples and non missing data for maternal cotinine and covariates.
Umbilical cord blood samples were collected at birth and frozen at 80 0C. All biological
material was obtained from the biobank of the MoBa study (Ronningen et al. 2006). The MoBa
study has been approved by the Regional Committee for Ethics in Medical Research, the
Norwegian Data Inspectorate and the Institutional Review Board of the National Institute of
Environmental Health Sciences, USA, and written informed consent was provided by all
participants. Participants in the replication analysis were part of the Newborn Epigenetics Study
(NEST) and were recruited from prenatal clinics in Durham, North Carolina (Hoyo et al. 2011;
5
Page 6 of 41
Murphy et al. 2012). We selected cord blood DNA samples from 18 newborns born to mothers who reported smoking during pregnancy and 18 whose mothers reported no smoking, all
Caucasians. The NEST study has been approved by the Duke University Institutional Review
Board and written informed consent was provided by all participants.
Bisulfite conversion was performed using the EZ 96 (MoBa samples) or EZ (NEST samples) DNA Methylation kit (Zymo Research Corporation, Irvine, CA) according to manufacturer instructions. We measured methylation at 485,577 CpGs in cord blood using
Illumina’s Infinium HumanMethylation450 BeadChip (Bibikova et al. 2011; Sandoval et al.
2011). Illumina’s GenomeStudio® Methylation module version 1.0 (Illumina Inc., San Diego,
CA, 2012) was used to calculate the methylation level at each CpG as the beta value (β=intensity of the methylated allele (M) /(intensity of the unmethylated allele (U) + intensity of the methylated allele (M) + 100)) (Bibikova et al. 2011). Beta values were then transformed to obtain the logratio, defined as log[β/(1 β)]. We report the detection p value for each beta, which represents a statistical test for the difference between the signal for a given probe and background (the average for all negative controls).
Maternal plasma cotinine concentrations in the MoBa samples were measured using liquid chromatography tandem mass spectrometry (LC MS/MS) (Midttun et al. 2009). With guidance from a previous study with data on circulating cotinine in pregnant women (Shaw et al.
2009), we created four categories of exposure (undetectable: ≤0 nmol/L; low: >0 56.8 nmol/L; moderate: >56.8 – 388 nmol/L; high: >388 nmol/L), where 56.8 nmol/L indicates active maternal smoking and participants with levels above 56.8 nmol/L were categorized at the median value for this group (388 nmol/L). Smoking in the NEST samples was assessed by maternal self report of smoking during pregnancy and verified by medical records.
6
Page 7 of 41
Quality control protocols are described in detail in Supplemental Material (see
Supplemental Material, pg 3 5). After quality control, 1,062 MoBa subjects and 473,844 CpGs
were analyzed. We examined the association between maternal plasma cotinine and methylation
in cord blood at each CpG using robust linear regression to account for any potential outliers or
heteroskedasticity in the data (Fox and Weisberg 2011). We evaluated maternal age (continuous
variable), maternal education ( more as the referent group), maternal pre pregnancy body mass index (<18.5, 18.5 to <25, 25 to <30, ≥30), maternal physical activity during pregnancy (none, moderate, vigorous), maternal plasma folate (log transformed continuous variable), parity (0 as the referent group, 1, 2, and 3 or more), and gender of the child for associations with maternal plasma cotinine. The final model included variables that were associated with cotinine (p<0.1) and plausibly related to methylation levels, specifically, maternal age, maternal education, and parity. Because our study was a subset of Haberg et al. (Haberg et al. 2011), where cotinine was measured among asthmatic children in a later analytic batch, we adjusted for childhood asthma status at age 3 years although this made little difference in the results. Overall, the crude and adjusted results were extremely similar. We applied Bonferroni correction, adjusting the level of significance from 0.05 to 1.06x10 7. Additional adjustment for principle components to address potential population structure (Rakyan et al. 2011) was explored but not retained because it did not influence the results. All NEST samples and CpGs passed quality control. We used unadjusted linear regression models to examine the association between maternal smoking during pregnancy and methylation in cord blood at each of the 26 CpGs that were significantly associated with plasma cotinine (p<1.06x10 7) in the MoBa population, and calculated a 1 sided p value for each CpG. 7 Page 8 of 41 We applied Bonferroni correction for 26 tests, which adjusted the level of significance to 0.0019. We used the Kolmogorov test to test the null hypothesis that the p values are uniformly distributed on (0,1) against the alternative that the p values were stochastically smaller than uniform distribution on (0,1). Because CpGs corresponding to the same gene may not be independent, and since some genes were represented by multiple CpGs, for such genes we chose the CpG that had the largest p value when using the Kolmogorov test, thus making the Kolmogorov test more conservative against the alternative hypothesis. To assess the potential impact of variation according to white blood cell subtype, we measured DNA methylation using the 450K platform in 21 cord blood samples collected at the same facilities as the NEST samples that had been separated while fresh into mononuclear cells (MN) and polymorphonuclear cells (PM) using Lympholyte ® poly (Cedarlane Laboratories Limited, Hornby, Ontario). We used a paired t test to evaluate differential methylation between PM and MN cell types for our top 26 CpGs. To compare our results with those from other studies, we employed a single CpG lookup approach where we looked at the result in our data for the CpG of interest and did not correct for multiple comparisons. A CpG with a p value<0.05 was considered to be statistically significant. All statistical analyses were performed using R (R Development Core Team 2010) and Bioconductor (Gentleman et al. 2004) packages. Results Epigenome-wide analysis We plotted the –log10(p values) from the robust linear regression for 473,844 CpGs across the genome in 1,062 cord blood samples of participants in the MoBa study (Figure 1). 8 Page 9 of 41 The mean age of study participants was 29.5 years (±4.3), 12.8% had plasma cotinine levels consistent with active smoking (Shaw et al. 2009), and 11.7% reported current smoking during pregnancy (Table 1). The methylation intensities showed bimodal distribution when displayed across all probes (see Supplemental Material, Figure S1A B for beta and logratio values, respectively) but approximately normal distribution for most CpGs when they were considered individually (see Supplemental Material, Figure S1C D). Using conservative Bonferroni correction for 473,844 tests, we observed epigenome wide statistically significant associations (p value <1.06x10 7) between maternal plasma cotinine and DNA methylation for 26 CpGs mapped to 10 genes in both unadjusted and covariate adjusted analyses (Figure 1, Table 2). Four genes included at least four Bonferroni significant CpGs. Among the 26 CpGs, eight were within the coding region of growth factor independent 1 transcription repressor ( GFI1 ) on chromosome 1, four were within the coding region of aryl hydrocarbon receptor repressor ( AHRR ) on chromosome 5, four were in a region upstream of cytochrome P450 isoform CYP1A1 on chromosome 15, and four were within the coding region of myosin 1G ( MYO1G ). Methylation levels of AHRR cg05575921, the CpG with the smallest p value in the analysis, decreased with cotinine in a dose dependent manner and the trend was statistically significant (see Supplemental Material, Figure S2, Jonkheere Terpstra trend test p < 2.2x10 16 ). All other statistically significant AHRR CpGs (see Supplemental Material, Figure S3A), had lower methylation with increasing cotinine levels with the exception of cg23067299, which is upstream of the other significant AHRR CpGs and had higher methylation with increasing cotinine (see Supplemental Material, Table S1). Conversely, CYP1A1 cg05549655 (p=2.38x10 10 ) and other statistically significant CYP1A1 CpGs (see Supplemental Material, Figure S3B), 9 Page 10 of 41 had higher methylation with increasing levels of cotinine. As with AHRR , cotinine was inversely related to methylation for GFI1 cg09935388 (p=2.68x10 31 ) and other statistically significant GFI1 CpGs. All four CpGs in MYO1G had higher methylation with increasing levels of cotinine (Table 2). Other genes in which methylation was associated with cotinine levels at epigenome wide statistical significance are HLA-DPB2 , ENSG00000225718, TTC7B, CNTNAP2, EXT1, and RUNX1 (Table 2). Coefficients, p values, locations and other information for the 100 most statistically significant CpGs are provided in Supplemental Material Table S1. We obtained similar DNA methylation differences when the metric of exposure was maternal self report of smoking during pregnancy rather than measured maternal cotinine levels. Specifically, self reported maternal smoking was related to lower methylation of AHRR CpGs cg03991871, cg23916896, cg05575921, and cg21161138, with similar regression coefficients as in the cotinine analysis, and was epigenome wide statistical significant for cg05575921 (regression coefficient (coef)= 0.131, standard error (se)=0.017, p=2.40 15 ). Self reported maternal smoking was associated with higher methylation of the CYP1A1 CpGs shown in Table 2 but did not achieve epigenome wide statistical significance (e.g. cg05549655 coef=0.039, se=0.010, p=1.72x10 4). Self reported maternal smoking was associated with epigenome wide statistically significant lower methylation of seven of the eight GFI1 CpGs that had significant (p<1.06x10 7) inverse associations with maternal cotinine. Higher methylation of the four MYO1G CpGs statistically significantly associated with maternal cotininewas observed with maternal smoking but did not reach epigenome wide statistical significance. Replication analysis All twenty six CpGs showing epigenome wide statistical significance for the association between plasma cotinine and methylation in MoBa were followed up for replication analysis in 10 Page 11 of 41 cord blood DNA samples from offspring of 18 mothers who reported smoking during pregnancy and 18 mothers who denied smoking during pregnancy from the NEST study (North Carolina, USA) (Table 3). The direction of effect (differential methylation by smoking status) in the NEST replication study was consistent with the discovery study (MoBa) for all 26 CpGs, and the magnitudes of the differences between smokers and non smokers in NEST and those between women with plasma cotinine >56.8 nmol/L versus ≤56.8 nmol/L in MoBa were very similar (Table 3, Figure 2, Spearman’s correlation coefficient = 0.965). The most statistically significant association in both MoBa and NEST data was observed for AHRR cg05575921 with lower methylation for smokers compared to non smokers (Table 3). In the NEST study, a total of five CpGs ( AHRR cg05575921, CYP1A1 cg05549655 and cg11924019, and GFI1 cg09935388 and cg12876356) reached statistical significance after strict Bonferroni correction for 26 tests (p<0.0019), despite the much smaller sample size of the replication study (18 newborns of smoking mothers compared to 18 newborns of non smoking mothers), and a total of 21 out of the 26 CpGs gave a p value < 0.05 (Table 3). The Kolmogorov test showed that the replication p values were systematically smaller than would be expected by chance (p<0.00011). This suggests that it is exceedingly unlikely that the replication findings are false positives and confirms the high degree of replication that we observed. The magnitudes of the differences in methylation between PM and MN cell types (Table S2) were much smaller than the differences in methylation between smokers and non smokers in both the MoBa and NEST study populations (Table 3) and were not statistically significantly different between cell types (p<0.0019 after Bonferroni correction for 26 tests) for the 5 replicated CpGs. 11 Page 12 of 41 Discussion Our study of maternal smoking in relation to epigenome wide DNA methylation in newborns in the MoBa cohort is the largest and most extensive that we know of. In addition to the large sample size, we used a highly reproducible platform that assesses methylation at over 470,000 individual CpGs, providing more comprehensive coverage of the epigenome than other studies of maternal smoking published to date. Further, we assessed maternal smoking with a sensitive assay for cotinine, a well validated biomarker for tobacco smoke. We observed epigenome wide statistically significant associations between maternal smoking in pregnancy, assessed by plasma cotinine levels and methylation in cord blood at 26 CpGs mapping to 10 genes in the Norwegian Birth Cohort (MoBa). In an independent birth cohort from the US, we found a striking degree of replication for our findings. In the NEST replication population, the direction of differential methylation in relation to maternal smoking was consistent with the direction in relation to maternal plasma cotinine for all of the 26 CpGs that were significant (p<1.06x10 7) in the discovery study. In addition, despite the more modest sample size of the replication set (18 newborns born to smoking mothers and 18 born to nonsmokers) estimates for 21 of 26 CpGs had p values less than 0.05. Five CpGs out of the 26 met strict Bonferroni corrected statistical significance in the replication study (p<0.0019): two in CYP1A1 and one in AHRR, genes known to be involved in the detoxification of compounds from tobacco smoke via the AhR signaling pathway, and two CpGs in GFI1 , a gene that has not previously been implicated in responses to tobacco smoke. Our most statistically significant finding in both the replication and discovery analyses was lower methylation with higher levels of self reported or cotinine based evidence of maternal smoking at cg05575921 in AHRR . Remarkably, a recent study in adults, using the same 450K 12 Page 13 of 41 platform, showed lower methylation at this same CpG (cg05575921) in smokers compared with non smokers at epigenome wide statistical significance (Monick et al. 2012). That study observed lower methylation in smokers at this CpG for both lymphoblasts and pulmonary alveolar macrophages (Monick et al. 2012). The authors also studied the functional implications of this methylation change and found that methylation at AHRR cg05575921 was associated with AHRR expression. Thus our data show that a methylation change found in adult smokers and implicated as functionally important in AHRR , a gene involved in a key pathway of response to tobacco smoke components, is already present at birth in newborns due to maternal smoking in pregnancy. Our findings for genes in the AhR pathway make sense biologically; the pathway is known to mediate the effects of toxicants such as polycyclic aromatic hydrocarbons (PAH) in tobacco smoke. PAH bind to AhR causing its translocation to the nucleus and the formation of a heterodimer with the AhR nuclear transporter. This complex binds DNA regulatory sequences, termed xenobiotic response elements (XRE)s, and initiates the expression of CYP1A1 and other genes involved in detoxification of these chemicals (Nguyen and Bradfield 2008). The AhR repressor (AHRR) acts as a negative regulator of AhR activity, suppressing CYP1A1 transcription (Harper et al. 2006). In our study, maternal smoking, assessed objectively by cotinine, displayed a dose dependent association with lower methylation of AHRR CpGs and higher methylation of CYP1A1 CpGs in cord blood. The contrasting effects of maternal smoking during pregnancy on methylation at CpGs in AHRR and CYP1A1 are notable because of the opposing function these genes have in the AhR pathway (Kawajiri and Fujii Kuriyama 2007). While the role of the AhR pathway in response to toxicants is well known, there is increasing identification of the importance of this pathway in the regulation of other processes, 13 Page 14 of 41 such as immune function (Lawrence and Sherr 2012). In addition, AhR has also recently been found to play a crucial role in regulating cigarette smoke extract induced apoptosis in fibroblasts (lung and embryonic) and lung epithelial cells in culture from the mouse (Rico de Souza et al. 2011). We also replicated our novel findings for GFI1, which has not previously been implicated in response to tobacco smoke. GFI1 plays an essential role in diverse developmental processes including hematopoiesis and the development of the inner ear and pulmonary neuroendocrine cells (Duan et al. 2005; Khandanpour et al. 2011). GFI1 influences numerous cellular events such as proliferation, apoptosis, differentiation, lineage decisions, and oncogenesis (Jafar Nejad and Bellen 2004). This gene is part of a complex that enables histone modifications and may also control alternative pre mRNA splicing (Moroy and Khandanpour 2011). Given the pivotal involvement of GFI1 in fundamental development processes, a role in diverse effects of maternal smoking on the offspring is biologically plausible. Although our findings for RUNX1 did not meet strict Bonferroni statistical significance in the replication population (NEST p>0.00019), there were four RUNX1 CpGs in the top 100 results in the MoBa discovery population (see Supplemental Material, Table S1). RUNX1 (also known as AML1 ) is involved in the development of normal hematopoiesis as well as leukemia (Kumano and Kurokawa 2010). Of note, RUNX1 , AhR, and GFI1 are all involved in the regulation of hematopoietic stem cells (Boitano et al. 2010; Khandanpour et al. 2011; Oshima et al. 2011), suggesting the possibility that cross talk between these genes may impact smoking related health outcomes in the offspring. Correlation of DNA methylation at CpGs within the same gene may contribute to the finding of genes with multiple significant CpGs in our analyses. However, if CpGs are not truly 14 Page 15 of 41 independent, then using strict Bonferroni correction for multiple testing, which assumes independent tests, is quite conservative. This adds support for the results that surpassed this strict threshold, particularly the 5 CpGs with corrected statistical significance in both the discovery and replication populations. Cotinine, the biomarker of smoking, was not measured among pregnant women in the replication (NEST) study so we used maternal self report of smoking on questionnaires that was consistent with medical records of smoking. Among our MoBa study participants, 8 of the 136 women (5.9%) with cotinine values consistent with active smoking (≥56.8 nmol/L) reported that they did not smoke during pregnancy. A study of U.S. reproductive age women using NHANES data indicates that U.S. pregnant women also underreport smoking (Dietz et al. 2011). Thus, we expect that some of the NEST participants classified as non smokers might actually have been smokers. However, this type of misclassification should lead to a bias of estimates towards the null, making our replication estimates more conservative than would be expected if smokers had reported their exposure with 100% accuracy. The 450K Beadchip offers greatly improved genomic coverage over the earlier 27K platform. The 450K content includes 99% of RefSeq genes with multiple probes per gene, 96% of CpG islands from the UCSC database, CpG island shores, and additional content selected from whole genome bisulfite sequencing data and input from DNA methylation experts (Bibikova et al. 2011). None of the 26 CpGs with epigenome wide significance in our study were present on the 27K platform. A study using the 27K platform observed differences in DNA methylation associated with smoking status in adults at CpG cg03636183 in F2RL3 , a gene associated with cardiovascular disease (Breitling et al. 2011). Employing a single CpG look up approach (one CpG evaluated, uncorrected for multiple testing), our data provide support for an 15 Page 16 of 41 association between maternal smoking during pregnancy and cord blood DNA methylation at this CpG (coef= 0.020, se=0.009, p=0.016). Another recent study of 27K methylation and adult smoking identified cg19859270 in GPR15 (Wan et al. 2012) . Our single look up approach for this CpG did not provide supporting results (coef = 0.010, se=0.007, p=0.181). Maternal smoking during pregnancy has been associated with CpG specific differential DNA methylation in placental tissue using the 27K Beadchip (Suter et al. 2011), but we found no overlap between the top hits from that study and the top hits in our study using the 450K Beadchip to measure methylation in newborn cord blood samples. In addition to the limitations of comparing the 27K and 450K platforms (the top 26 CpGs from our study were not covered on the 27K platform), it is likely that altered methylation in response to tobacco smoke exposure is different in placental tissue and newborn cord blood. We measured DNA methylation in whole cord blood samples. Because hematopoietic differentiation and methylation status of differentially methylated regions (DMRs) are highly coordinated (Schmidl et al. 2009), significant shifts in cell type pools in the blood should be accompanied by shifts in methylation at dozens, if not hundreds, of cell type specific DMRs. If our findings were simply a reflection of maternal smoking influencing shifts in cell types, we would expect to find many differentially methylated genes, which we did not. Instead, only 10 genes had differences in methylation levels related to cotinine in our data. Notably, the recent paper of Monick et al., 2012 (Monick et al. 2012) supports the notion that unmeasured confounding by cell type does not explain the altered methylation status that we observed in relation to smoking. In that paper, smoking induced alteration of AHRR CpGs (including our top CpG) was seen in both B lymphoblastoid cells and in an independently collected sample of alveolar macrophage cells collected from bronchial lavage. Thus, Monick et 16 Page 17 of 41 al. identified smoking induced signals across two distinct cell types, which strengthens our replicated whole blood findings. While the above evidence suggests that our results are not confounded by cell type, we directly addressed the potential impact of differential cell counts by measuring epigenome wide DNA methylation, using the 450K assay, in 21 cord blood samples that had been separated, while fresh, into the two major cell pools, polymorphonuclear cells (PM) and mononuclear cells (MN). In these 21 paired samples differences in methylation by cell type were very small. These small differences were statistically significant (p<0.0019) for 3 of the top 26 CpGs associated with maternal plasma cotinine in MoBa, but these CpGs were not significantly associated with maternal smoking in the NEST population. Furthermore, the magnitude of the difference in median methylation between PM and MN cell types was much smaller than the difference in median methylation between smokers and non smokers in both the MoBa and NEST study populations. For the percent difference in median methylation by cell type, the maximum was 3.1% and the mean was 1.0%. In contrast, for the percent difference in median methylation between smokers and non smokers measured in whole blood, which is a mixture of these two major cell types (PM and MN), the maximum was 13.7% (MoBa) and 15.1% (NEST) and the mean was 5.3% (MoBa) and 5.0% (NEST). For our top CpG AHRR cg05575921, the percent difference in median methylation was 0.31% by cell type compared to 7.52% for smokers compared to non smokers in MoBa (7.67% in NEST). Thus, the differences in methylation between these two major cell pools are much smaller than the differences in methylation by smoking that we observed in whole blood, suggesting that confounding by cell type is unlikely to explain our findings of differential methylation related to smoking. 17 Page 18 of 41 It is possible that differences in methylation may exist in more refined subclassifications of cell types that we did not specifically examine. But again, our top finding for AHRR cg05575921 due to maternal smoking was also reported in adult smokers in two different cell types, alveolar lung macrophage DNA and lymphoblast DNA (Monick et al. 2012). These various lines of evidence give strong support for the conclusion that our replicated findings are not explained by effects of maternal smoking on the relative prevalence of white blood cell subtypes that differ with regard to CpG methylation. Conclusions In utero exposure to maternal smoking is an important risk factor for numerous adverse outcomes in children and adults. With a hypothesis free epigenome wide screen with replication in a second population, we observed strong evidence that maternal smoking during pregnancy is associated with cord blood methylation of genes in the AhR signaling pathway, important for the detoxification of xenobiotics in tobacco smoke, and methylation of a novel gene not previously implicated in response to tobacco smoke that is involved in fundamental developmental processes. These results suggest that epigenetic mechanisms reflected by DNA methylation may underlie some of the well documented impacts of maternal smoking on offspring. Our identification of differential methylation in genes known to be involved in the response to tobacco related compounds, in addition to a novel gene, demonstrates the value of using this approach to elucidate the epigenetic effects of in utero exposures. 18 Page 19 of 41 References Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, et al. 2011. High density DNA methylation array with single CpG site resolution. Genomics 98(4):288 295. Boitano AE, Wang J, Romeo R, Bouchez LC, Parker AE, Sutton SE, et al. 2010. Aryl hydrocarbon receptor antagonists promote the expansion of human hematopoietic stem cells. Science 329(5997):1345 1348. Breitling LP, Yang R, Korn B, Burwinkel B, Brenner H. 2011. Tobacco smoking related differential DNA methylation: 27K discovery and replication. Am J Hum Genet 88(4):450 457. Breton CV, Byun HM, Wenten M, Pan F, Yang A, Gilliland FD. 2009. Prenatal tobacco smoke exposure affects global and gene specific DNA methylation. Am J Respir Crit Care Med 180(5):462 467. Brion MJ, Leary SD, Lawlor DA, Smith GD, Ness AR. 2008. Modifiable maternal exposures and offspring blood pressure: a review of epidemiological studies of maternal age, diet, and smoking. Pediatr Res 63(6):593 598. Cupul Uicab LA, Skjaerven R, Haug K, Melve KK, Engel SM, Longnecker MP. 2012. In Utero Exposure to Maternal Tobacco Smoke and Subsequent Obesity, Hypertension, and Gestational Diabetes Among Women in the MoBa Cohort. Environ Health Perspect 120(3):355 360. Dietz PM, Homa D, England LJ, Burley K, Tong VT, Dube SR, et al. 2011. Estimates of nondisclosure of cigarette smoking among pregnant and nonpregnant women of reproductive age in the United States. Am J Epidemiol 173(3):355 359. Dolinoy DC, Huang D, Jirtle RL. 2007. Maternal nutrient supplementation counteracts bisphenol A induced DNA hypomethylation in early development. Proc Natl Acad Sci U S A 104(32):13056 13061. Duan Z, Zarebski A, Montoya Durango D, Grimes HL, Horwitz M. 2005. Gfi1 coordinates epigenetic repression of p21Cip/WAF1 by recruitment of histone lysine methyltransferase G9a and histone deacetylase 1. Mol Cell Biol 25(23):10338 10351. Feinberg AP. 2010. Genome scale approaches to the epigenetics of common human disease. Virchows Arch 456(1):13 21. 19 Page 20 of 41 Fox J, Weisberg S. 2011. Robust Regression in R. In: An R Companion to Applied Regression, Part Second. Thousand Oaks, CA:Sage. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, et al. 2004. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5(10):R80. Haberg SE, London SJ, Nafstad P, Nilsen RM, Ueland PM, Vollset SE, et al. 2011. Maternal folate levels in pregnancy and asthma in children at age 3 years. J Allergy Clin Immunol 127(1):262 264, 264 e261. Harper PA, Riddick DS, Okey AB. 2006. Regulating the regulator: factors that control levels and activity of the aryl hydrocarbon receptor. Biochem Pharmacol 72(3):267 279. Hollingsworth JW, Maruoka S, Boon K, Garantziotis S, Li Z, Tomfohr J, et al. 2008. In utero supplementation with methyl donors enhances allergic airway disease in mice. J Clin Invest 118(10):3462 3469. Hoyo C, Murtha AP, Schildkraut JM, Forman MR, Calingaert B, Demark Wahnefried W, et al. 2011. Folic acid supplementation before and during pregnancy in the Newborn Epigenetics STudy (NEST). BMC Public Health 11(1):46. Jafar Nejad H, Bellen HJ. 2004. Gfi/Pag 3/senseless zinc finger proteins: a unifying theme? Mol Cell Biol 24(20):8803 8812. Kawajiri K, Fujii Kuriyama Y. 2007. Cytochrome P450 gene regulation and physiological functions mediated by the aryl hydrocarbon receptor. Arch Biochem Biophys 464(2):207 212. Khandanpour C, Kosan C, Gaudreau MC, Duhrsen U, Hebert J, Zeng H, et al. 2011. Growth factor independence 1 protects hematopoietic stem cells against apoptosis but also prevents the development of a myeloproliferative like disease. Stem Cells 29(2):376 385. Kumano K, Kurokawa M. 2010. The role of Runx1/AML1 and Evi 1 in the regulation of hematopoietic stem cells. J Cell Physiol 222(2):282 285. Lawrence BP, Sherr DH. 2012. You AhR what you eat? Nat Immunol 13(2):117 119. 20 Page 21 of 41 Magnus P, Irgens LM, Haug K, Nystad W, Skjaerven R, Stoltenberg C. 2006. Cohort profile: the Norwegian Mother and Child Cohort Study (MoBa). Int J Epidemiol 35(5):1146 1150. Midttun O, Hustad S, Ueland PM. 2009. Quantitative profiling of biomarkers related to B vitamin status, tryptophan metabolism and inflammation in human plasma by liquid chromatography/tandem mass spectrometry. Rapid Commun Mass Spectrom 23(9):1371 1379. Monick MM, Beach SR, Plume J, Sears R, Gerrard M, Brody GH, et al. 2012. Coordinated changes in AHRR methylation in lymphoblasts and pulmonary macrophages from smokers. Am J Med Genet B Neuropsychiatr Genet 159B(2):141 151. Moroy T, Khandanpour C. 2011. Growth factor independence 1 (Gfi1) as a regulator of lymphocyte development and activation. Semin Immunol 23(5):368 378. Murphy SK, Adigun A, Huang Z, Overcash F, Wang F, Jirtle RL, et al. 2012. Gender specific methylation differences in relation to prenatal exposure to cigarette smoke. Gene 494(1):36 43. Nguyen LP, Bradfield CA. 2008. The search for endogenous activators of the aryl hydrocarbon receptor. Chem Res Toxicol 21(1):102 116. Office of the Surgeon General. 2006. The health consequences of involuntary exposure to tobacco smoke : a report of the Surgeon General. Rockville, MD: U.S. Dept. of Health and Human Services, Public Health Service, Office of the Surgeon General. Oshima M, Endoh M, Endo TA, Toyoda T, Nakajima Takagi Y, Sugiyama F, et al. 2011. Genome wide analysis of target genes regulated by HoxB4 in hematopoietic stem and progenitor cells developing from embryonic stem cells. Blood 117(15):e142 150. R Development Core Team. 2010. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. Rakyan VK, Down TA, Balding DJ, Beck S. 2011. Epigenome wide association studies for common human diseases. Nat Rev Genet 12(8):529 541. Reik W. 2007. Stability and flexibility of epigenetic gene regulation in mammalian development. Nature 447(7143):425 432. 21 Page 22 of 41 Rico de Souza A, Zago M, Pollock SJ, Sime PJ, Phipps RP, Baglole CJ. 2011. Genetic ablation of the aryl hydrocarbon receptor causes cigarette smoke induced mitochondrial dysfunction and apoptosis. J Biol Chem 286(50):43214 43228. Ronningen KS, Paltiel L, Meltzer HM, Nordhagen R, Lie KK, Hovengen R, et al. 2006. The biobank of the Norwegian Mother and Child Cohort Study: a resource for the next 100 years. Eur J Epidemiol 21(8):619 625. Sandoval J, Heyn HA, Moran S, Serra Musach J, Pujana MA, Bibikova M, et al. 2011. Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. Epigenetics 6(6):692 702. Schmidl C, Klug M, Boeld TJ, Andreesen R, Hoffmann P, Edinger M, et al. 2009. Lineage specific DNA methylation in T cells correlates with histone methylation and enhancer activity. Genome Res 19(7):1165 1174. Shaw GM, Carmichael SL, Vollset SE, Yang W, Finnell RH, Blom H, et al. 2009. Mid pregnancy cotinine and risks of orofacial clefts and neural tube defects. J Pediatr 154(1):17 19. Suter M, Ma J, Harris AS, Patterson L, Brown KA, Shope C, et al. 2011. Maternal tobacco use modestly alters correlated epigenome wide placental DNA methylation and gene expression. Epigenetics 6(11). Terry MB, Ferris JS, Pilsner R, Flom JD, Tehranifar P, Santella RM, et al. 2008. Genomic DNA methylation among women in a multiethnic New York City birth cohort. Cancer Epidemiol Biomarkers Prev 17(9):2306 2310. Wan ES, Qiu W, Baccarelli A, Carey VJ, Bacherman H, Rennard SI, et al. 2012. Cigarette smoking behaviors and time since quitting are associated with differential DNA methylation across the human genome. Hum Mol Genet 21(13):3073 3082. Wilhelm Benartzi CS, Houseman EA, Maccani MA, Poage GM, Koestler DC, Langevin SM, et al. 2012. In utero exposures, infant growth, and DNA methylation of repetitive elements and developmentally related genes in human placenta. Environ Health Perspect 120(2):296 302. 22 Page 23 of 41 Table 1. Descriptive characteristics of the MoBa study population a Variable Category N (%) Maternal cotinine (nmol/L) b Undetectable (0) 736 (69.3) Low (>0 56.8) 190 (17.9) Moderate (>56.8 388) 68 (6.4) High (>388) 68 (6.4) Self reported maternal smoking Never 525 (49.4) Former stopped before pregnancy 233 (21.9) Stopped during pregnancy 180 (16.9) Sometimes or daily 124 (11.7) Child gender Male 566 (53.3) Female 496 (46.7) Maternal education Less than high school 78 (7.3) High school 343 (32.3) Some college 471 (44.4) 4 years of college or more 170 (16.0) Parity 0 447 (42.1) 1 435 (41.0) 2 137 (12.9) 3 or more 43 (4.0) a N=1,062 individuals. b Maternal cotinine measured in plasma at approximately gestational week 18. Cotinine values above 56.8 nmol/L are consistent with active smoking. 23 Page 24 of 41 Table 2. Differential methylation in cord blood DNA in relation to maternal cotinine in the MoBa study population: CpGs with Bonferroni corrected statistical significance (p < 1.06x10 7), sorted by chromosome and position Unadjusted Adjusted f Median methylation by cotinine category h Distance to Chr a Gene CpG Position c Coef d SE e p value Coef SE p value Rank g Undetectable Low Medium High Gene b 1 GFI1 3688 cg10399789 92945668 0.07 0.01 4.08E 13 0.065 0.010 1.10E 10 14 0.759 0.755 0.727 0.716 1 GFI1 3224 cg09662411 92946132 0.111 0.012 2.26E 20 0.106 0.013 2.96E 17 11 0.730 0.733 0.669 0.654 1 GFI1 3169 cg06338710 92946187 0.112 0.013 1.34E 18 0.106 0.014 5.02E 14 12 0.801 0.800 0.754 0.733 1 GFI1 2656 cg18146737 92946700 0.28 0.024 2.42E 30 0.271 0.026 3.30E 25 6 0.877 0.875 0.771 0.738 1 GFI1 2531 cg12876356 92946825 0.182 0.016 2.29E 30 0.176 0.017 1.70E 25 4 0.731 0.732 0.627 0.605 1 GFI1 2321 cg18316974 92947035 0.243 0.024 6.43E 24 0.238 0.026 3.16E 20 7 0.921 0.923 0.856 0.841 1 GFI1 1768 cg09935388 92947588 0.196 0.015 1.05E 38 0.188 0.016 2.68E 31 2 0.708 0.707 0.580 0.564 1 GFI1 1395 cg14179389 92947961 0.184 0.017 5.38E 28 0.181 0.017 2.63E 25 5 0.242 0.246 0.154 0.158 5 AHRR 19617 cg23067299 323907 0.075 0.012 4.21E 10 0.072 0.012 4.12E 09 20 0.789 0.789 0.813 0.837 5 AHRR 64157 cg03991871 368447 0.057 0.008 2.04E 11 0.054 0.009 1.99E 10 15 0.841 0.839 0.820 0.818 5 AHRR 69088 cg05575921 373378 0.202 0.015 2.85E 39 0.198 0.017 8.03E 33 1 0.883 0.874 0.829 0.784 5 AHRR 95070 cg21161138 399360 0.045 0.007 1.52E 11 0.043 0.007 8.91E 10 18 0.718 0.715 0.701 0.679 6 HLA-DPB2 11549 cg11715943 33091841 0.053 0.009 1.00E 08 0.054 0.010 3.63E 08 24 0.842 0.833 0.824 0.820 7 MYO1G 16417 cg19089201 45002287 0.083 0.013 3.22E 10 0.088 0.014 9.13E 11 13 0.925 0.926 0.932 0.944 7 MYO1G 16218 cg22132788 45002486 0.18 0.021 1.98E 18 0.184 0.021 4.82E 18 10 0.932 0.935 0.951 0.966 7 MYO1G 15968 cg04180046 45002736 0.073 0.008 8.76E 20 0.076 0.008 2.85E 19 9 0.441 0.446 0.484 0.508 7 MYO1G 15785 cg12803068 45002919 0.145 0.016 8.51E 19 0.149 0.016 1.25E 19 8 0.713 0.721 0.774 0.813 7 ENSG00000225718 198306 cg04598670 68697651 0.063 0.009 1.29E 11 0.061 0.010 1.27E 09 19 0.623 0.607 0.597 0.574 7 CNTNAP2 854 cg25949550 145814306 0.075 0.007 4.15E 30 0.073 0.007 1.02E 26 3 0.113 0.109 0.097 0.092 8 EXT1 33821 cg03346806 119157879 0.038 0.007 3.08E 08 0.039 0.007 9.34E 08 27 0.801 0.795 0.793 0.779 14 TTC7B 274756 cg18655025 91008005 0.041 0.007 2.07E 08 0.042 0.008 6.76E 08 26 0.854 0.847 0.841 0.836 15 CYP1A1 1266 cg05549655 75019143 0.064 0.01 2.96E 10 0.065 0.010 2.38E 10 16 0.189 0.188 0.221 0.226 15 CYP1A1 1374 cg22549041 75019251 0.096 0.016 4.52E 09 0.098 0.017 8.88E 09 21 0.385 0.379 0.414 0.475 15 CYP1A1 1406 cg11924019 75019283 0.044 0.008 2.62E 08 0.044 0.008 4.78E 08 25 0.434 0.430 0.457 0.475 15 CYP1A1 1425 cg18092474 75019302 0.066 0.012 1.10E 08 0.068 0.012 9.95E 09 22 0.510 0.504 0.549 0.573 21 RUNX1 1746 cg12477880 36259241 0.159 0.026 1.02E 09 0.163 0.026 7.55E 10 17 0.088 0.102 0.110 0.158 24 Page 25 of 41 a Chromosome; b Distance from CpG to transcription start site of the nearest gene; c Chromosomal position based on NCBI human reference genome assembly Build 37.3; d Regression coefficient; e Standard error for regression coefficient; f Adjusted for maternal age, maternal education, parity, and asthma; g Rank order based on the adjusted p value; h Maternal plasma cotinine (nmol/L) measured around gestational week 18 (undetectable: ≤0; low: >0 56.8; moderate: >56.8 – 388; high: >388). Values above 56.8 nmol/L indicate active smoking. 25 Page 26 of 41 Table 3. Replication results from the NEST cohort for the 26 CpGs reaching epigenome wide Bonferroni corrected statistical significance (p<1.06x10 7) in the MoBa cohort, and median methylation differences by maternal smoking for MoBa and NEST cohorts, sorted by NEST p value Percent Difference in Median Methylation (Smokers – Non Smokers) NEST p Chromosome Gene CpG value a MoBa b NEST c 5 AHRR cg05575921 0.0003* 7.5 7.7 15 CYP1A1 cg05549655 0.0006* 3.5 3.8 15 CYP1A1 cg11924019 0.0008* 3.2 5.3 1 GFI1 cg09935388 0.0012* 13.7 7.5 1 GFI1 cg12876356 0.0015* 11.9 10.5 1 GFI1 cg18316974 0.0023 7.1 9 1 GFI1 cg09662411 0.0023 6.6 6.8 7 CNTNAP2 cg25949550 0.0025 1.8 2.5 1 GFI1 cg06338710 0.0026 5.8 5.2 7 MYO1G cg04180046 0.0027 5.3 4.9 7 ENSG00000225718 cg04598670 0.0036 3 5.3 5 AHRR cg23067299 0.0036 3.2 3.7 1 GFI1 cg18146737 0.0037 12.3 15.1 7 MYO1G cg12803068 0.0041 8.3 3.8 1 GFI1 cg14179389 0.0043 8.6 8.2 15 CYP1A1 cg22549041 0.0044 7.2 8.9 15 CYP1A1 cg18092474 0.0044 5.9 5.3 7 MYO1G cg19089201 0.0092 1.4 2.2 7 MYO1G cg22132788 0.0096 2.8 2.1 1 GFI1 cg10399789 0.0154 3.7 3.4 5 AHRR cg21161138 0.0283 2.3 1.7 5 AHRR cg03991871 0.0655 2.2 2.3 21 RUNX1 cg12477880 0.0850 4.6 4 8 EXT1 cg03346806 0.1248 1.5 0.1 14 TTC7B cg18655025 0.2668 1.8 0.3 6 HLA-DPB2 cg11715943 0.7956 7.5 7.7 a P value from NEST replication linear regression model evaluating differential DNA methylation by maternal smoking during pregnancy. b MoBa maternal smoking status determined by gestational week 18 cotinine values >56.8 nmol/L (smoker) or ≤56.8 nmol/L (non smoker). c NEST maternal smoking status determined by maternal self report, verified by medical records. * Bonferroni corrected statistically significant (p < 0.0019). 26 Page 27 of 41 Figure Legends Figure 1. Epigenome wide association between maternal cotinine and methylation of 473,844 CpGs measured in cord blood from the MoBa cohort. Twenty six CpGs (10 genes) reached Bonferroni corrected statistical significance (p<1.06x10 7, represented by the horizontal line). 27 Page 28 of 41 Figure 2. Difference in median methylation intensity (beta) by smoking status among MoBa and NEST participants for top 26 CpGs, grouped by gene. 28 Page 29 of 41 Supplemental Material 450K Epigenome-Wide Scan Identifies Differential DNA Methylation in Newborns Related to Maternal Smoking During Pregnancy Bonnie R. Joubert 1, Siri E. Håberg 2, Roy M. Nilsen 3, Xuting Wang 1, Stein E. Vollset 2,4, Susan 5 5 5 K. Murphy , Zhiqing Huang , Cathrine Hoyo , Øivind Midttun 6, Lea A. Cupul-Uicab 1, Per M Ueland 4, Michael C. Wu 7, Wenche Nystad 2, Douglas A. Bell 1, Shyamal D. Peddada 1, Stephanie J. London 1* 1 Division of Intramural Research, National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services, Research Triangle Park, North Carolina; 2 Norwegian Institute of Public Health, Oslo, Norway; 3 Haukeland University Hospital, Bergen, Norway; 4 University of Bergen, Bergen, Norway; 5 Duke University School of Medicine, Durham NC; 6 Bevital A/S, Laboratoriebygget, Bergen, Norway; 7 University of North Carolina, Chapel Hill, NC 1 Page 30 of 41 Table of Contents Supplemental Text, pages 3-5 Table S1, pages 6-8 Table S2, page 9 Figure S1, page 10 Figure S2, page 11 Figure S3, pages 12-13 2 Page 31 of 41 Supplemental Text Quality Control Bisulfite conversion for the MoBa samples was evaluated according to methods previously described (Bibikova et al. 2011). Additionally, we included 28 blind replicate samples (14 study subjects run in duplicate, included on each plate), 26 plate control samples provided by Illumina (DNA from two cells lines run on each of the 13, 96-well plates), 25 plate control samples provided by us [DNA from 2 control individuals on each of 12 plates (1 on each half plate), 1 on the 13th plate] and 8 samples prepared from mixing methylated and non- methylated DNA, described as follows. Human HCT116 DKO Methylated DNA (Cat# D5014-2) and human HCT116 DKO non-methylated DNA (Cat# D5014-1) were purchased from Zymo Research (Irvine, CA). The fully methylated DNA was mixed with non-methylated DNA to provide a series of methylation controls (10%, 35%, 60%, and 85% methylated) to be included on the first and twelfth plates. We received from Illumina (San Diego, CA), data for a total of 1,204 samples. Samples with an average detection p-value across all probes of less than 0.05 and/or indicated by Illumina to have failed (N=49) were omitted from further analysis along with 1 sample erroneously included in the dataset. Multidimensional scaling (MDS) plots were used to evaluate gender outliers based on chromosome X data, where males and females separated into two distinct clusters. Samples separating into erroneous clusters (males in female cluster or females in male cluster) or not belonging to a distinct cluster were omitted (N=13). Blind duplicate samples were highly correlated (Spearman rho = 0.997) and the mean difference in beta was 0.0043 (standard error = 0.00012). For the 14 blind duplicate pairs, results from one of the two samples in each 3 Page 32 of 41 pair was selected at random to retain in the dataset and the other was omitted from further analysis. CpGs with missing chromosome data (N=65, mostly control probes), missing more than 10% of data across individuals (N=20), or on chromosome X (N=11,232) or Y (N=416) were omitted, resulting in 473,844 probes for analysis. The laboratory analysis plan was designed to exclude batch effects. All samples were run with a single set of reagents on a single machine at Illumina, Inc. (San Diego, CA). Bisulfite conversion and methylation measurements including reruns were performed in March 2011. Variables representing the chip (12 samples), chip set (four contiguous chips or half of a plate), and plate (96 samples) were included as covariates in statistical models to evaluate potential confounding. In addition, the distributions of beta and logratio values were compared across chips, chip sets and plates. We found that chip, chip set and plate were not appreciable sources of variability. The NEST data quality control followed a similar protocol as described above. In addition to the 18 smokers (9 males, 9 females) and 18 non-smokers (9 males, 9 females), eight plate control samples and three samples representing 10%, 50%, and 85% methylation were included on a single plate. It is possible that SNPs at or near CpGs could influence methylation intensities and thereby the associations we observed. We searched online databases to determine the presence of an underlying SNP for the top 105 most statistically significant CpGs. Information was obtained for SNPs with minor allele frequency ≥ 5% in the CEU (Utah residents with Northern and Western European ancestry) population, curated by 1000G projects (http://www.1000genomes.org/, 06/2011 release, 87 individuals), HapMap project (http://hapmap.ncbi.nlm.nih.gov/, release 28, 8/2010, 174 individuals), and dbSNP 4 Page 33 of 41 (http://www.ncbi.nlm.nih.gov/projects/SNP/, build 134, 8/2011, 116 individuals). We found 3 probes with SNPs within CpGs and removed them from the top 105 findings (rs77990586 in WBSCR17 cg20370581, rs115039881 in cg00531338 not specific to a gene, and rs6869832 in AHRR cg23576855). References Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, et al. 2011. High density DNA methylation array with single CpG site resolution. Genomics. 5 Page 34 of 41 Supplemental Material, Table S1. Differential methylation in cord blood DNA in relation to maternal cotinine, top 100 most significant CpGs, sorted by chromosome, position Chra Gene Distanceb Locationc CpG Positiond Coefe SEf p-value* 1 AJAP1 20974 BODY cg26435172 4736126 -0.033 0.007 7.66E-07 1 WDR78 734 BODY cg00044354 67389884 -0.045 0.009 1.65E-06 1 GNG12 -338 TSS cg25189904 68299493 -0.031 0.007 4.96E-06 1 GNG12 -356 TSS cg26764244 68299511 -0.034 0.007 4.47E-06 1 GFI1 3688 BODY cg10399789 92945668 -0.065 0.010 1.10E-10* 1 GFI1 3224 BODY cg09662411 92946132 -0.106 0.013 2.96E-17* 1 GFI1 3217 BODY cg06338710 92946187 -0.106 0.014 5.02E-14* 1 GFI1 2704 BODY cg18146737 92946700 -0.271 0.026 3.30E-25* 1 GFI1 2531 BODY cg12876356 92946825 -0.176 0.017 1.70E-25* 1 GFI1 2321 BODY cg18316974 92947035 -0.238 0.026 3.16E-20* 1 GFI1 1768 BODY cg09935388 92947588 -0.188 0.016 2.68E-31* 1 GFI1 1443 BODY cg14179389 92947961 -0.181 0.017 2.63E-25* 1 PBX1 27680 BODY cg07353489 164556276 -0.032 0.007 4.33E-06 1 ENSG00000198216 -2747 TSS cg08943293 181449699 -0.029 0.006 2.90E-06 1 MFSD4 -2879 TSS cg06827850 205535232 -0.030 0.007 4.16E-06 1 GALNT2 207374 BODY cg11527913 230410329 -0.052 0.011 5.76E-06 1 KIF26B 180355 BODY cg15137445 245498689 -0.027 0.006 2.80E-06 1 OR14I1 444 BODY cg01165362 248845209 -0.037 0.008 5.74E-06 2 ENSG00000232835 -5198 TSS cg14556323 5701409 -0.028 0.006 3.61E-06 2 XDH 78080 BODY cg14654795 31559579 -0.033 0.007 5.72E-06 2 EXOC6B 222520 BODY cg06108254 72830705 -0.030 0.007 4.96E-06 2 LOC284998 10689 BODY cg18703066 105363536 -0.076 0.015 3.42E-07 2 MYO1B 20360 BODY cg12738764 192130613 -0.033 0.007 2.76E-06 2 ENSG00000228226 -4057 TSS cg20962638 229553401 -0.031 0.007 4.93E-06 2 COPS8 6478 BODY cg05301057 238000561 -0.032 0.007 4.03E-06 2 PP14571 6909 BODY cg18455650 241389208 -0.030 0.007 5.73E-06 3 VGLL4 11 TSS cg18096987 11623873 -0.025 0.005 2.20E-06 3 ENSG00000235886 -99161 TSS cg07746241 43920635 -0.027 0.006 2.10E-06 3 ENSG00000243149 -31477 TSS cg22931622 65129155 -0.032 0.007 4.34E-06 3 SEMA5B -1510 TSS cg13428477 122748086 -0.041 0.008 4.16E-07 3 SSR3 -51017 TSS cg26405475 156324038 0.042 0.009 5.66E-06 4 ENSG00000249105 -183313 TSS cg07796335 59167549 -0.034 0.007 5.06E-06 5 AHRR 19569 BODY cg23067299 323907 0.072 0.012 4.12E-09* 5 AHRR 64109 BODY cg03991871 368447 -0.054 0.009 1.99E-10* 5 AHRR 64466 BODY cg23916896 368804 -0.046 0.010 1.46E-06 5 AHRR 69088 BODY cg05575921 373378 -0.198 0.017 8.03E-33* 5 AHRR 69088 BODY cg21161138 399360 -0.043 0.007 8.91E-10* 5 ENSG00000249201 2484 BODY cg01772854 1176225 -0.033 0.007 7.43E-07 5 FSTL4 86949 BODY cg02378360 132861322 -0.031 0.007 3.04E-06 6 RPP40 2364 BODY cg11942662 5001955 -0.030 0.007 4.26E-06 6 OR12D2 500 BODY cg08862210 29364963 -0.036 0.008 3.33E-06 6 TRIM15 1441 BODY cg23784132 30132471 -0.039 0.009 4.46E-06 6 TRIM26 21761 BODY cg21531017 30159510 -0.030 0.006 2.28E-06 6 Page 35 of 41 Supplemental Material, Table S1 (cont.). Differential methylation in cord blood DNA in relation to maternal cotinine, top 100 most significant CpGs, sorted by chromosome, position Chra Gene Distanceb Locationc CpG Positiond Coefe SEf p-value* 6 HLA-DPB2 11549 BODY cg11715943 33091841 -0.054 0.010 3.63E-08* 7 SDK1 7987 BODY cg21005410 4177351 -0.029 0.006 3.26E-06 7 MYO1G 16465 BODY cg19089201 45002287 0.088 0.014 9.13E-11* 7 MYO1G 16266 BODY cg22132788 45002486 0.184 0.021 4.82E-18* 7 MYO1G 15968 BODY cg04180046 45002736 0.076 0.008 2.85E-19* 7 MYO1G 15785 BODY cg12803068 45002919 0.149 0.016 1.25E-19* 7 ENSG00000225718 39390 TSS cg04598670 68697651 -0.061 0.010 1.27E-09* 7 PODXL 34603 BODY cg21771679 131206773 -0.036 0.008 1.89E-06 7 CNTNAP2 854 BODY cg25949550 145814306 -0.073 0.007 1.02E-26* 7 CNTNAP2 1090753 BODY cg11207515 146904205 -0.041 0.008 7.14E-07 7 PTPRN2 277695 BODY cg02356647 158102787 -0.037 0.008 3.24E-06 8 ARHGEF10 78878 BODY cg26101086 1851026 -0.035 0.007 1.14E-06 8 ERLIN2 18617 BODY cg26393977 37612713 -0.035 0.007 2.62E-06 8 EXT1 -33821 TSS cg03346806 119157879 -0.039 0.007 9.34E-08* 8 PLEC -1231 TSS cg03958308 145014989 -0.030 0.006 3.40E-06 9 GLIS3 71264 BODY cg14047387 4080919 -0.035 0.007 7.33E-07 9 MIR2964A -1100 TSS cg00321619 131153847 -0.033 0.007 7.52E-07 10 CAMK1D 39248 BODY cg16894855 12430878 -0.031 0.007 3.59E-06 10 FRMD4A -13 TSS cg11813497 14372879 0.043 0.009 3.48E-06 10 MGMT -1399 TSS cg09993459 131264102 -0.030 0.006 1.72E-06 10 FRG2B 6347 TSS cg20219790 135434000 -0.040 0.008 1.09E-06 11 OR5M10 364 BODY cg07850316 56344833 -0.031 0.007 5.43E-06 11 ARHGAP42 125042 BODY cg02008402 100683448 -0.034 0.007 1.62E-06 12 GRIN2B 23556 BODY cg17174980 14109514 -0.032 0.007 5.64E-06 12 ENSG00000212383 -44252 TSS cg11147442 57210954 -0.034 0.007 3.36E-06 12 CUX2 259376 BODY cg00029284 111731203 -0.031 0.006 1.53E-06 12 ENSG00000213144 30912 TSS cg13083057 119663566 -0.035 0.008 3.41E-06 12 LOC100507055 438 BODY cg09444108 133186599 -0.032 0.007 3.35E-06 13 ENSG00000215881 27438 BODY cg20338386 112275793 -0.026 0.005 1.27E-06 14 TTC7B 274756 BODY cg18655025 91008005 -0.042 0.008 6.76E-08* 14 CCDC88C 17815 BODY cg23304605 91866373 0.048 0.010 5.75E-06 14 MEG3 1703 BODY cg08698721 101294147 0.038 0.008 2.92E-06 15 CYP1A1 -1266 TSS cg05549655 75019143 0.065 0.010 2.38E-10* 15 CYP1A1 -1319 TSS cg13570656 75019196 0.067 0.014 1.64E-06 15 CYP1A1 -1326 TSS cg12101586 75019203 0.058 0.013 3.68E-06 15 CYP1A1 -1374 TSS cg22549041 75019251 0.098 0.017 8.88E-09* 15 CYP1A1 -1358 TSS cg11924019 75019283 0.044 0.008 4.78E-08* 15 CYP1A1 -1425 TSS cg18092474 75019302 0.068 0.012 9.95E-09* 15 LOC338963 1675 BODY cg02406469 83381070 -0.037 0.008 3.90E-06 16 CACNA1H -17013 TSS cg00720047 1186227 -0.031 0.007 3.35E-06 16 ENSG00000214696 63479 TSS cg00253658 54210496 0.077 0.016 9.64E-07 16 ZFHX3 90978 BODY cg05764102 72991344 -0.034 0.007 1.24E-06 16 VAT1L 3346 BODY cg04260557 77825828 -0.035 0.007 1.90E-06 7 Page 36 of 41 Supplemental Material, Table S1 (cont.). Differential methylation in cord blood DNA in relation to maternal cotinine, top 100 most significant CpGs, sorted by chromosome, position Chra Gene Distanceb Locationc CpG Positiond Coefe SEf p-value* 16 ANKRD11 68054 BODY cg00169122 89488963 0.028 0.006 5.20E-06 17 C17orf98 15 TSS cg19760250 36997627 -0.094 0.020 2.41E-06 17 C17orf98 -30 TSS cg23290482 36997720 -0.064 0.013 1.90E-06 17 CCR7 10231 BODY cg07479709 38711505 -0.035 0.007 1.78E-06 19 GMIP 11941 BODY cg02293766 19742514 -0.029 0.006 4.96E-06 19 MIR518B -1428 TSS cg11251554 54204562 -0.032 0.006 2.40E-07 20 TGM3 32119 BODY cg21293537 2308779 -0.034 0.007 1.90E-06 20 LOC339568 -1038 TSS cg07376374 37854477 -0.037 0.008 5.01E-06 20 ATP9A 72466 BODY cg07339236 50312490 -0.053 0.010 1.38E-07 21 RUNX1 1920 BODY cg02869559 36259067 0.079 0.017 2.75E-06 21 RUNX1 1746 BODY cg12477880 36259241 0.163 0.026 7.55E-10* 21 RUNX1 1652 BODY cg00994804 36259383 0.164 0.031 1.31E-07 21 RUNX1 1527 BODY cg06758350 36259460 0.073 0.016 3.52E-06 21 ETS2 2152 BODY cg15892280 40180000 -0.029 0.006 7.74E-07 aChromosome; b Distance from CpG to transcription start site of the nearest gene (negative=upstream; positive=downstream); c CpG is located in transcription start site (TSS) or in the gene body (BODY); d Chromosomal position based on NCBI human reference genome assembly Build 37.3; e Regression coefficient from robust linear regression adjusted for maternal age, maternal education, parity, and asthma; f Standard error for regression coefficient; * Bonferroni-corrected statistically significant (p < 1.06x10-7). 8 Page 37 of 41 Supplemental Material, Table S2. Differential methylation by cell type (polymorphonuclear cells (PM) compared to mononuclear cells (MN)) in cord blood DNA, and the magnitude of differential methylation by maternal smoking in MoBa (Smokers (S) – Non-smokers (NS)a). Median Methylation Percent difference Percent difference (beta) in median in median methylation methylation by cell type by smoking in MoBa CHR Gene CpG PM MN (PM – MN) (S – NS) 1 GFI1 cg10399789 0.844 0.840 0.41 -3.66 1 GFI1 cg09662411 0.822 0.845 -2.34 -6.65 1 GFI1 cg06338710 0.884 0.882 0.24 -5.81 1 GFI1 cg18146737 0.904 0.894 1.05 -12.33 1 GFI1 cg12876356 0.826 0.836 -1.02 -11.86 1 GFI1 cg18316974 0.928 0.942 -1.35 -7.14 1 GFI1 cg09935388 0.806 0.797 0.84 -13.68 1 GFI1 cg14179389 0.339 0.324 1.49 -8.61 5 AHRR cg23067299 0.765 0.764 0.08 3.15 5 AHRR cg03991871 0.881 0.862 1.87* -2.21 5 AHRR cg05575921 0.854 0.851 0.31 -7.52 5 AHRR cg21161138 0.818 0.813 0.46 -2.27 6 HLA-DPB2 cg11715943 0.883 0.863 1.97 -1.77 7 MYO1G cg19089201 0.922 0.911 1.04 1.44 7 MYO1G cg22132788 0.956 0.951 0.59 2.82 7 MYO1G cg04180046 0.617 0.586 3.10* 5.30 7 MYO1G cg12803068 0.862 0.844 1.81 8.31 7 ENSG00000225718 cg04598670 0.634 0.623 1.06 -3.01 7 CNTNAP2 cg25949550 0.213 0.204 0.97 -1.80 8 EXT1 cg03346806 0.829 0.818 1.02* -1.51 14 TTC7B cg18655025 0.888 0.886 0.23 -1.19 15 CYP1A1 cg05549655 0.278 0.288 -0.97 3.50 15 CYP1A1 cg22549041 0.408 0.424 -1.59 7.23 15 CYP1A1 cg11924019 0.566 0.563 0.21 3.23 15 CYP1A1 cg18092474 0.627 0.629 -0.20 5.90 21 RUNX1 cg12477880 0.098 0.100 -0.18 4.55 * Bonferroni-corrected statistically significant (Bonferroni p < 0.05, raw p < 0.0019). a Maternal smoking determined by plasma cotinine (nmol/L) measured around gestational week 18. Values above 56.8 nmol/L indicate active smoking during pregnancy. 9 Page 38 of 41 Supplemental Material, Figure S1. Histograms showing the distribution of methylation levels in our data. Bimodal distribution was observed when considering all 473,844 CpG sites whereas approximately normal distribution was observed for most individually plotted CpG sites. (a) Beta across all CpGs analyzed; (b) log(beta/1-beta) across all CpGs analyzed; (c) Beta for one representative CpG (cg11924019); (d) log(beta/1- beta) for the representative CpG. 10 Page 39 of 41 Supplemental Material, Figure S2. Plot of the cumulative distribution function for methylation intensities for the top-ranked CpG site (AHRR cg05575921) by the four cotinine categories (undetectable: ≤0 nmol/L; low: >0-56.8 nmol/L; moderate: >56.8 – 388 nmol/L; high: >388 nmol/L) demonstrates a dose-response effect of smoking in the MoBa cohort (Jonkheere-Terpstra trend test p < 2.2x10-16). The Jonkheere-Terpstra trend test was calculated using the SAGx package in R, version 2.14.0. Cotinine values above 56.8 nmol/L are consistent with active smoking. 11 Page 40 of 41 a cg probes 12 Page 41 of 41 b cg probes Supplemental Material, Figure S3. (a) CpG sites located in the intron region of the AHRR gene. (b) CpG sites located on the shore of a CpG island in a bidirection regulatory region of the CYP1A cluster. The statistical significance of the association between cotinine and methylation of each probe is color coded (blue: p > 1x10-5; grey: 1x10-5 ≥ p ≥ 1x10-7; red: p < 1x10-7). The magnitude of effect (coefficient from robust linear regression) is indicated as: <-0.01 (--), -0.01 to 0 (-), >0 to 0.01 (+), and >0.01 (++). 13