ENVIRONMENTAL HEALTH ehp PERSPECTIVES ehponline.org

450K Epigenome-Wide Scan Identifies Differential DNA Methylation in Newborns Related to Maternal Smoking During Pregnancy

Bonnie R. Joubert, Siri E. Håberg, Roy M. Nilsen, Xuting Wang, Stein E. Vollset, Susan K. Murphy, Zhiqing Huang, Cathrine Hoyo, Øivind Midttun, Lea A. Cupul-Uicab, Per M Ueland, Michael C. Wu, Wenche Nystad, Douglas A. Bell, Shyamal D. Peddada, Stephanie J. London

http://dx.doi.org/10.1289/ehp.1205412

Online 31 July 2012

National Institutes of Health U.S. Department of Health and Services Page 1 of 41

450K Epigenome-Wide Scan Identifies Differential DNA Methylation in Newborns Related

to Maternal Smoking During Pregnancy

Bonnie R. Joubert 1, Siri E. Håberg 2, Roy M. Nilsen 3, Xuting Wang 1, Stein E. Vollset 2,4 , Susan

K. Murphy 5, Zhiqing Huang 5, Cathrine Hoyo 5,

Øivind Midttun 6, Lea A. CupulUicab 1, Per M Ueland 4, Michael C. Wu 7, Wenche Nystad 2,

Douglas A. Bell 1, Shyamal D. Peddada 1, Stephanie J. London 1*

1 Division of Intramural Research, National Institute of Environmental Health Sciences, National

Institutes of Health, Department of Health and Human Services, Research Triangle Park, North

Carolina, USA; 2 Norwegian Institute of Public Health, Oslo, Norway; 3 Haukeland University

Hospital, Bergen, Norway; 4 University of Bergen, Bergen, Norway; 5 Duke University School

of Medicine, Durham NC, USA; 6 Bevital A/S, Laboratoriebygget, Bergen, Norway; 7 University

of North Carolina, Chapel Hill, NC, USA

* Correspondence should be addressed to: SJ London, 111 T.W. Alexander Drive, Rall Bldg, A306, RTP, NC 27709 Tel: (919) 5415772, Fax: (919) 5412511 [email protected]

Short title: DNA Methylation in Newborns of Smoking Mothers

Key words: Epigenetics, epigenomewide, in utero , maternal smoking, methylation

1

Page 2 of 41

Acknowledgments

We are grateful to the participating families in Norway who take part in this ongoing cohort study. We thank Rolf Erik Kolstad, Arild Sunde, and Kari Harbak from the MoBa

Biobank, and Huiling Li, Gary Pittman, Laura Wharey, and Kevin Gerrish from the NIEHS for expert technical assistance. We acknowledge Grace Chiu of Westat, Inc., Shuangshuang Dai of

SRA International, Inc., and John Grovenstein of the NIEHS, for computing support and Dmitri

Zaykin, Paul Wade, and Min Shi of the NIEHS for helpful comments on the manuscript. O.M. is employed by Bevital A/S, Loboratoriebygget, Bergen, Norway.This research was supported in part by the Intramural Research Program of the NIH, NIEHS (Z01ES49019). The Norwegian

Mother and Child Cohort Study is supported by the Norwegian Ministry of Health and the

Ministry of Education and Research, NIH/NIEHS (contract no NOES75558), NIH/NINDS

(grant no.1 UO1 NS 04753701), and the Norwegian Research Council/FUGE (grant no.

151918/S10). The Newborn Epigenetics STudy is supported by NIH grants R21ES01497,

R01ES016772, R01DK085173, P30ES011961 and by the Duke Comprehensive Cancer Center.

None of the authors has any actual or potential competing financial interests.

Abbreviations 450K: Infinium HumanMethylation450 BeadChip CpG: cytosineguanine dinucleotide DMR: differentially methylated region MN: mononuclear cells MoBa: Norwegian Mother and Child Cohort Study NEST: Newborn Epigenetics Study PAH: polycyclic aromatic hydrocarbons PM: polymorphonuclear cells XRE: xenobiotic response element

2

Page 3 of 41

Abstract Background : Epigenetic modifications, such as DNA methylation, due to in utero exposures

may play a critical role in early programming for childhood and adult illness. Maternal smoking

is a major risk factor for multiple adverse health outcomes in children but the underlying

mechanisms are unclear.

Objective : We investigated epigenomewide methylation in cord blood of newborns in relation

to maternal smoking during pregnancy.

Methods : We examined maternal plasma cotinine (an objective biomarker of smoking)

measured during pregnancy in relation to DNA methylation at 473,844 CpG sites (CpGs) in

1,062 newborn cord bloods from the Norwegian Mother and Child Cohort Study (MoBa) using

the Infinium HumanMethylation450 BeadChip (450K).

Results : We found differential DNA methylation at epigenomewide statistical significance (p

value<1.06x10 7) for 26 CpGs mapped to 10 genes. We replicated findings for CpGs in AHRR ,

CYP1A1 , and GFI1 at strict Bonferronicorrected statistical significance in a U.S. birth cohort.

AHRR and CYP1A1 play a key role in the aryl hydrocarbon signaling pathway, which mediates

the detoxification of the components of tobacco smoke. GFI1 is involved in diverse

developmental processes but has not previously been implicated in responses to tobacco smoke.

Conclusions : We identified a set of genes with methylation changes present at birth in children

whose mothers smoked during pregnancy. This is the first study of differential methylation

across the genome in relation to maternal smoking during pregnancy using the 450K platform.

Our findings implicate epigenetic mechanisms in the pathogenesis of the adverse health

outcomes associated with this important in utero exposure.

3

Page 4 of 41

Introduction

Maternal smoking during pregnancy is a major risk factor for adverse health outcomes in children including low birth weight, some childhood cancers, reduced lung function and early respiratory illnesses (Office of the Surgeon General 2006). Recent evidence suggests that maternal smoking during pregnancy leads to obesity and elevated blood pressure in children

(Brion et al. 2008; CupulUicab et al. 2012). The underlying mechanisms for the diverse effects of maternal smoking during pregnancy on offspring may involve epigenetic modifications such as DNA methylation.

DNA cytosine methylation plays a key role in modulating the transcriptional potential of the genome and may influence the development of complex human diseases (Feinberg 2010).

Changes to DNA methylation can occur throughout life but much of the epigenome is established during embryogenesis and early development of the fetus (Reik 2007).

Data from models demonstrate that maternal exposures, such as dietary methyl donors (Hollingsworth et al. 2008) and bisphenol A (Dolinoy et al. 2007) can impact offspring phenotypes via DNA methylation. A few human studies have examined epigenetic alterations in relation to maternal smoking during pregnancy and reported it to be associated with global methylation of leukocyte DNA using a [ 3H]methyl acceptance assay (Terry et al. 2008), global

(LINE1 and AluYb8) methylation in human placenta (WilhelmBenartzi et al. 2012), or differential methylation at cytosineguanine dinucleotide (CpG) sites [CpGspecific methylation using the Illumina Infinium HumanMethylation27 Beadchip (27K) measuring approximately

27,000 CpGs] in human placenta (Suter et al. 2011). Maternal tobacco use during pregnancy has also been associated with global (LINE1 and AluYb8) and CpGspecific methylation (Illumina

GoldenGate Cancer methylation panel I measuring 1,505 CpGs) in buccal cells from children

4

Page 5 of 41

(Breton et al. 2009). To date, the effect of maternal smoking on differential DNA methylation

has not been evaluated with more comprehensive epigenomic coverage than offered by the 27K

platform. Using the Infinium HumanMethylation450 Beadchip (450K), which measures CpG

methylation at over 470,000 CpGs, we evaluated the relationship between maternal smoking and

DNA methylation in 1,062 infant cord blood samples from a birth cohort in Norway. We

assessed maternal smoking objectively by measuring cotinine, a sensitive biomarker, in maternal

plasma samples, and replicated our findings in an independent birth cohort study from the U.S.

To our knowledge, this is the largest human study of any in utero exposure in relation to DNA

methylation at birth using the 450K platform with improved epigenomewide coverage.

Methods

Participants in the current analysis were selected from a substudy of the Norwegian

Mother and Child Cohort Study (MoBa) (Magnus et al. 2006; Ronningen et al. 2006) that

evaluated the association between maternal plasma folate during pregnancy and childhood

asthma status at age 3 years (Haberg et al. 2011). We analyzed 1,062 participants of this study

who had available cord blood samples and nonmissing data for maternal cotinine and covariates.

Umbilical cord blood samples were collected at birth and frozen at 80 0C. All biological

material was obtained from the biobank of the MoBa study (Ronningen et al. 2006). The MoBa

study has been approved by the Regional Committee for Ethics in Medical Research, the

Norwegian Data Inspectorate and the Institutional Review Board of the National Institute of

Environmental Health Sciences, USA, and written informed consent was provided by all

participants. Participants in the replication analysis were part of the Newborn Epigenetics Study

(NEST) and were recruited from prenatal clinics in Durham, North Carolina (Hoyo et al. 2011;

5

Page 6 of 41

Murphy et al. 2012). We selected cord blood DNA samples from 18 newborns born to mothers who reported smoking during pregnancy and 18 whose mothers reported no smoking, all

Caucasians. The NEST study has been approved by the Duke University Institutional Review

Board and written informed consent was provided by all participants.

Bisulfite conversion was performed using the EZ96 (MoBa samples) or EZ (NEST samples) DNA Methylation kit (Zymo Research Corporation, Irvine, CA) according to manufacturer instructions. We measured methylation at 485,577 CpGs in cord blood using

Illumina’s Infinium HumanMethylation450 BeadChip (Bibikova et al. 2011; Sandoval et al.

2011). Illumina’s GenomeStudio® Methylation module version 1.0 (Illumina Inc., San Diego,

CA, 2012) was used to calculate the methylation level at each CpG as the betavalue (β=intensity of the methylated allele (M) /(intensity of the unmethylated allele (U) + intensity of the methylated allele (M) + 100)) (Bibikova et al. 2011). Beta values were then transformed to obtain the logratio, defined as log[β/(1β)]. We report the detection pvalue for each beta, which represents a statistical test for the difference between the signal for a given probe and background (the average for all negative controls).

Maternal plasma cotinine concentrations in the MoBa samples were measured using liquid chromatographytandem mass spectrometry (LCMS/MS) (Midttun et al. 2009). With guidance from a previous study with data on circulating cotinine in pregnant women (Shaw et al.

2009), we created four categories of exposure (undetectable: ≤0 nmol/L; low: >056.8 nmol/L; moderate: >56.8 – 388 nmol/L; high: >388 nmol/L), where 56.8 nmol/L indicates active maternal smoking and participants with levels above 56.8 nmol/L were categorized at the median value for this group (388 nmol/L). Smoking in the NEST samples was assessed by maternal selfreport of smoking during pregnancy and verified by medical records.

6

Page 7 of 41

Quality control protocols are described in detail in Supplemental Material (see

Supplemental Material, pg 35). After quality control, 1,062 MoBa subjects and 473,844 CpGs

were analyzed. We examined the association between maternal plasma cotinine and methylation

in cord blood at each CpG using robust linear regression to account for any potential outliers or

heteroskedasticity in the data (Fox and Weisberg 2011). We evaluated maternal age (continuous

variable), maternal education (

more as the referent group), maternal prepregnancy body mass index (<18.5, 18.5 to <25, 25 to

<30, ≥30), maternal physical activity during pregnancy (none, moderate, vigorous), maternal

plasma folate (log transformed continuous variable), parity (0 as the referent group, 1, 2, and 3 or

more), and gender of the child for associations with maternal plasma cotinine. The final model

included variables that were associated with cotinine (p<0.1) and plausibly related to methylation

levels, specifically, maternal age, maternal education, and parity. Because our study was a

subset of Haberg et al. (Haberg et al. 2011), where cotinine was measured among asthmatic

children in a later analytic batch, we adjusted for childhood asthma status at age 3 years although

this made little difference in the results. Overall, the crude and adjusted results were extremely

similar. We applied Bonferroni correction, adjusting the level of significance from 0.05 to

1.06x10 7. Additional adjustment for principle components to address potential population

structure (Rakyan et al. 2011) was explored but not retained because it did not influence the

results.

All NEST samples and CpGs passed quality control. We used unadjusted linear

regression models to examine the association between maternal smoking during pregnancy and

methylation in cord blood at each of the 26 CpGs that were significantly associated with plasma

cotinine (p<1.06x10 7) in the MoBa population, and calculated a 1sided pvalue for each CpG.

7

Page 8 of 41

We applied Bonferroni correction for 26 tests, which adjusted the level of significance to 0.0019.

We used the Kolmogorov test to test the null hypothesis that the pvalues are uniformly distributed on (0,1) against the alternative that the pvalues were stochastically smaller than uniform distribution on (0,1). Because CpGs corresponding to the same gene may not be independent, and since some genes were represented by multiple CpGs, for such genes we chose the CpG that had the largest pvalue when using the Kolmogorov test, thus making the

Kolmogorov test more conservative against the alternative hypothesis.

To assess the potential impact of variation according to white blood cell subtype, we measured DNA methylation using the 450K platform in 21 cord blood samples collected at the same facilities as the NEST samples that had been separated while fresh into mononuclear cells

(MN) and polymorphonuclear cells (PM) using Lympholyte ® poly (Cedarlane Laboratories

Limited, Hornby, Ontario). We used a paired ttest to evaluate differential methylation between

PM and MN cell types for our top 26 CpGs.

To compare our results with those from other studies, we employed a single CpG lookup approach where we looked at the result in our data for the CpG of interest and did not correct for multiple comparisons. A CpG with a pvalue<0.05 was considered to be statistically significant.

All statistical analyses were performed using R (R Development Core Team 2010) and

Bioconductor (Gentleman et al. 2004) packages.

Results

Epigenome-wide analysis

We plotted the –log10(pvalues) from the robust linear regression for 473,844 CpGs across the genome in 1,062 cord blood samples of participants in the MoBa study (Figure 1).

8

Page 9 of 41

The mean age of study participants was 29.5 years (±4.3), 12.8% had plasma cotinine levels

consistent with active smoking (Shaw et al. 2009), and 11.7% reported current smoking during

pregnancy (Table 1). The methylation intensities showed bimodal distribution when displayed

across all probes (see Supplemental Material, Figure S1AB for beta and logratio values,

respectively) but approximately normal distribution for most CpGs when they were considered

individually (see Supplemental Material, Figure S1CD).

Using conservative Bonferroni correction for 473,844 tests, we observed epigenome

wide statistically significant associations (pvalue <1.06x10 7) between maternal plasma cotinine

and DNA methylation for 26 CpGs mapped to 10 genes in both unadjusted and covariate

adjusted analyses (Figure 1, Table 2). Four genes included at least four Bonferronisignificant

CpGs. Among the 26 CpGs, eight were within the coding region of growth factor independent 1

transcription repressor ( GFI1 ) on 1, four were within the coding region of aryl

hydrocarbon receptor repressor ( AHRR ) on chromosome 5, four were in a region upstream of

cytochrome P450 isoform CYP1A1 on chromosome 15, and four were within the coding region

of myosin 1G ( MYO1G ).

Methylation levels of AHRR cg05575921, the CpG with the smallest pvalue in the

analysis, decreased with cotinine in a dosedependent manner and the trend was statistically

significant (see Supplemental Material, Figure S2, JonkheereTerpstra trend test p < 2.2x10 16 ).

All other statistically significant AHRR CpGs (see Supplemental Material, Figure S3A), had

lower methylation with increasing cotinine levels with the exception of cg23067299, which is

upstream of the other significant AHRR CpGs and had higher methylation with increasing

cotinine (see Supplemental Material, Table S1). Conversely, CYP1A1 cg05549655 (p=2.38x10

10 ) and other statistically significant CYP1A1 CpGs (see Supplemental Material, Figure S3B),

9

Page 10 of 41

had higher methylation with increasing levels of cotinine. As with AHRR , cotinine was inversely related to methylation for GFI1 cg09935388 (p=2.68x10 31 ) and other statistically significant

GFI1 CpGs. All four CpGs in MYO1G had higher methylation with increasing levels of cotinine

(Table 2). Other genes in which methylation was associated with cotinine levels at epigenome wide statistical significance are HLA-DPB2 , ENSG00000225718, TTC7B, CNTNAP2, EXT1, and

RUNX1 (Table 2). Coefficients, pvalues, locations and other information for the 100 most statistically significant CpGs are provided in Supplemental Material Table S1.

We obtained similar DNA methylation differences when the metric of exposure was maternal selfreport of smoking during pregnancy rather than measured maternal cotinine levels.

Specifically, selfreported maternal smoking was related to lower methylation of AHRR CpGs cg03991871, cg23916896, cg05575921, and cg21161138, with similar regression coefficients as in the cotinine analysis, and was epigenomewide statistical significant for cg05575921

(regression coefficient (coef)=0.131, standard error (se)=0.017, p=2.40 15 ). Selfreported maternal smoking was associated with higher methylation of the CYP1A1 CpGs shown in Table

2 but did not achieve epigenomewide statistical significance (e.g. cg05549655 coef=0.039, se=0.010, p=1.72x10 4). Selfreported maternal smoking was associated with epigenomewide statistically significant lower methylation of seven of the eight GFI1 CpGs that had significant

(p<1.06x10 7) inverse associations with maternal cotinine. Higher methylation of the four

MYO1G CpGs statistically significantly associated with maternal cotininewas observed with maternal smoking but did not reach epigenomewide statistical significance.

Replication analysis

All twentysix CpGs showing epigenomewide statistical significance for the association between plasma cotinine and methylation in MoBa were followed up for replication analysis in

10

Page 11 of 41

cord blood DNA samples from offspring of 18 mothers who reported smoking during pregnancy

and 18 mothers who denied smoking during pregnancy from the NEST study (North Carolina,

USA) (Table 3). The direction of effect (differential methylation by smoking status) in the NEST

replication study was consistent with the discovery study (MoBa) for all 26 CpGs, and the

magnitudes of the differences between smokers and nonsmokers in NEST and those between

women with plasma cotinine >56.8 nmol/L versus ≤56.8 nmol/L in MoBa were very similar

(Table 3, Figure 2, Spearman’s correlation coefficient = 0.965). The most statistically significant

association in both MoBa and NEST data was observed for AHRR cg05575921 with lower

methylation for smokers compared to nonsmokers (Table 3). In the NEST study, a total of five

CpGs ( AHRR cg05575921, CYP1A1 cg05549655 and cg11924019, and GFI1 cg09935388 and

cg12876356) reached statistical significance after strict Bonferroni correction for 26 tests

(p<0.0019), despite the much smaller sample size of the replication study (18 newborns of

smoking mothers compared to 18 newborns of nonsmoking mothers), and a total of 21 out of

the 26 CpGs gave a pvalue < 0.05 (Table 3). The Kolmogorov test showed that the replication

pvalues were systematically smaller than would be expected by chance (p<0.00011). This

suggests that it is exceedingly unlikely that the replication findings are false positives and

confirms the high degree of replication that we observed.

The magnitudes of the differences in methylation between PM and MN cell types (Table

S2) were much smaller than the differences in methylation between smokers and nonsmokers in

both the MoBa and NEST study populations (Table 3) and were not statistically significantly

different between cell types (p<0.0019 after Bonferroni correction for 26 tests) for the 5

replicated CpGs.

11

Page 12 of 41

Discussion

Our study of maternal smoking in relation to epigenomewide DNA methylation in newborns in the MoBa cohort is the largest and most extensive that we know of. In addition to the large sample size, we used a highly reproducible platform that assesses methylation at over

470,000 individual CpGs, providing more comprehensive coverage of the epigenome than other studies of maternal smoking published to date. Further, we assessed maternal smoking with a sensitive assay for cotinine, a wellvalidated biomarker for tobacco smoke. We observed epigenomewide statistically significant associations between maternal smoking in pregnancy, assessed by plasma cotinine levels and methylation in cord blood at 26 CpGs mapping to 10 genes in the Norwegian Birth Cohort (MoBa). In an independent birth cohort from the US, we found a striking degree of replication for our findings. In the NEST replication population, the direction of differential methylation in relation to maternal smoking was consistent with the direction in relation to maternal plasma cotinine for all of the 26 CpGs that were significant

(p<1.06x10 7) in the discovery study. In addition, despite the more modest sample size of the replication set (18 newborns born to smoking mothers and 18 born to nonsmokers) estimates for

21 of 26 CpGs had pvalues less than 0.05. Five CpGs out of the 26 met strict Bonferroni corrected statistical significance in the replication study (p<0.0019): two in CYP1A1 and one in

AHRR, genes known to be involved in the detoxification of compounds from tobacco smoke via the AhR signaling pathway, and two CpGs in GFI1 , a gene that has not previously been implicated in responses to tobacco smoke.

Our most statistically significant finding in both the replication and discovery analyses was lower methylation with higher levels of selfreported or cotininebased evidence of maternal smoking at cg05575921 in AHRR . Remarkably, a recent study in adults, using the same 450K

12

Page 13 of 41

platform, showed lower methylation at this same CpG (cg05575921) in smokers compared with

nonsmokers at epigenomewide statistical significance (Monick et al. 2012). That study

observed lower methylation in smokers at this CpG for both lymphoblasts and pulmonary

alveolar macrophages (Monick et al. 2012). The authors also studied the functional implications

of this methylation change and found that methylation at AHRR cg05575921 was associated with

AHRR expression. Thus our data show that a methylation change found in adult smokers and

implicated as functionally important in AHRR , a gene involved in a key pathway of response to

tobacco smoke components, is already present at birth in newborns due to maternal smoking in

pregnancy.

Our findings for genes in the AhR pathway make sense biologically; the pathway is

known to mediate the effects of toxicants such as polycyclic aromatic hydrocarbons (PAH) in

tobacco smoke. PAH bind to AhR causing its translocation to the nucleus and the formation of a

heterodimer with the AhR nuclear transporter. This complex binds DNA regulatory sequences,

termed xenobiotic response elements (XRE)s, and initiates the expression of CYP1A1 and other

genes involved in detoxification of these chemicals (Nguyen and Bradfield 2008). The AhR

repressor (AHRR) acts as a negative regulator of AhR activity, suppressing CYP1A1

transcription (Harper et al. 2006). In our study, maternal smoking, assessed objectively by

cotinine, displayed a dosedependent association with lower methylation of AHRR CpGs and

higher methylation of CYP1A1 CpGs in cord blood. The contrasting effects of maternal smoking

during pregnancy on methylation at CpGs in AHRR and CYP1A1 are notable because of the

opposing function these genes have in the AhR pathway (Kawajiri and FujiiKuriyama 2007).

While the role of the AhR pathway in response to toxicants is well known, there is

increasing identification of the importance of this pathway in the regulation of other processes,

13

Page 14 of 41

such as immune function (Lawrence and Sherr 2012). In addition, AhR has also recently been found to play a crucial role in regulating cigarette smoke extractinduced apoptosis in fibroblasts

(lung and embryonic) and lung epithelial cells in culture from the mouse (Rico de Souza et al.

2011).

We also replicated our novel findings for GFI1, which has not previously been implicated in response to tobacco smoke. GFI1 plays an essential role in diverse developmental processes including hematopoiesis and the development of the inner ear and pulmonary neuroendocrine cells (Duan et al. 2005; Khandanpour et al. 2011). GFI1 influences numerous cellular events such as proliferation, apoptosis, differentiation, lineage decisions, and oncogenesis (JafarNejad and Bellen 2004). This gene is part of a complex that enables histone modifications and may also control alternative pre mRNA splicing (Moroy and Khandanpour 2011). Given the pivotal involvement of GFI1 in fundamental development processes, a role in diverse effects of maternal smoking on the offspring is biologically plausible.

Although our findings for RUNX1 did not meet strict Bonferroni statistical significance in the replication population (NEST p>0.00019), there were four RUNX1 CpGs in the top 100 results in the MoBa discovery population (see Supplemental Material, Table S1). RUNX1 (also known as AML1 ) is involved in the development of normal hematopoiesis as well as leukemia

(Kumano and Kurokawa 2010). Of note, RUNX1 , AhR, and GFI1 are all involved in the regulation of hematopoietic stem cells (Boitano et al. 2010; Khandanpour et al. 2011; Oshima et al. 2011), suggesting the possibility that crosstalk between these genes may impact smoking related health outcomes in the offspring.

Correlation of DNA methylation at CpGs within the same gene may contribute to the finding of genes with multiple significant CpGs in our analyses. However, if CpGs are not truly

14

Page 15 of 41

independent, then using strict Bonferroni correction for multiple testing, which assumes

independent tests, is quite conservative. This adds support for the results that surpassed this

strict threshold, particularly the 5 CpGs with corrected statistical significance in both the

discovery and replication populations.

Cotinine, the biomarker of smoking, was not measured among pregnant women in the

replication (NEST) study so we used maternal selfreport of smoking on questionnaires that was

consistent with medical records of smoking. Among our MoBa study participants, 8 of the 136

women (5.9%) with cotinine values consistent with active smoking (≥56.8 nmol/L) reported that

they did not smoke during pregnancy. A study of U.S. reproductive age women using NHANES

data indicates that U.S. pregnant women also underreport smoking (Dietz et al. 2011). Thus, we

expect that some of the NEST participants classified as nonsmokers might actually have been

smokers. However, this type of misclassification should lead to a bias of estimates towards the

null, making our replication estimates more conservative than would be expected if smokers had

reported their exposure with 100% accuracy.

The 450K Beadchip offers greatly improved genomic coverage over the earlier 27K

platform. The 450K content includes 99% of RefSeq genes with multiple probes per gene, 96%

of CpG islands from the UCSC database, CpG island shores, and additional content selected

from wholegenome bisulfite sequencing data and input from DNA methylation experts

(Bibikova et al. 2011). None of the 26 CpGs with epigenomewide significance in our study

were present on the 27K platform. A study using the 27K platform observed differences in DNA

methylation associated with smoking status in adults at CpG cg03636183 in F2RL3 , a gene

associated with cardiovascular disease (Breitling et al. 2011). Employing a single CpG lookup

approach (one CpG evaluated, uncorrected for multiple testing), our data provide support for an

15

Page 16 of 41

association between maternal smoking during pregnancy and cord blood DNA methylation at this CpG (coef=0.020, se=0.009, p=0.016). Another recent study of 27K methylation and adult smoking identified cg19859270 in GPR15 (Wan et al. 2012) . Our single lookup approach for this CpG did not provide supporting results (coef = 0.010, se=0.007, p=0.181).

Maternal smoking during pregnancy has been associated with CpGspecific differential

DNA methylation in placental tissue using the 27K Beadchip (Suter et al. 2011), but we found no overlap between the top hits from that study and the top hits in our study using the 450K

Beadchip to measure methylation in newborn cord blood samples. In addition to the limitations of comparing the 27K and 450K platforms (the top 26 CpGs from our study were not covered on the 27K platform), it is likely that altered methylation in response to tobacco smoke exposure is different in placental tissue and newborn cord blood.

We measured DNA methylation in whole cord blood samples. Because hematopoietic differentiation and methylation status of differentially methylated regions (DMRs) are highly coordinated (Schmidl et al. 2009), significant shifts in cell type pools in the blood should be accompanied by shifts in methylation at dozens, if not hundreds, of cell type specific DMRs. If our findings were simply a reflection of maternal smoking influencing shifts in cell types, we would expect to find many differentially methylated genes, which we did not. Instead, only 10 genes had differences in methylation levels related to cotinine in our data.

Notably, the recent paper of Monick et al., 2012 (Monick et al. 2012) supports the notion that unmeasured confounding by cell type does not explain the altered methylation status that we observed in relation to smoking. In that paper, smokinginduced alteration of AHRR CpGs

(including our top CpG) was seen in both B lymphoblastoid cells and in an independently collected sample of alveolar macrophage cells collected from bronchial lavage. Thus, Monick et

16

Page 17 of 41

al. identified smokinginduced signals across two distinct cell types, which strengthens our

replicated whole blood findings.

While the above evidence suggests that our results are not confounded by cell type, we

directly addressed the potential impact of differential cell counts by measuring epigenomewide

DNA methylation, using the 450K assay, in 21 cord blood samples that had been separated,

while fresh, into the two major cell pools, polymorphonuclear cells (PM) and mononuclear cells

(MN). In these 21 paired samples differences in methylation by cell type were very small. These

small differences were statistically significant (p<0.0019) for 3 of the top 26 CpGs associated

with maternal plasma cotinine in MoBa, but these CpGs were not significantly associated with

maternal smoking in the NEST population. Furthermore, the magnitude of the difference in

median methylation between PM and MN cell types was much smaller than the difference in

median methylation between smokers and nonsmokers in both the MoBa and NEST study

populations. For the percent difference in median methylation by cell type, the maximum was

3.1% and the mean was 1.0%. In contrast, for the percent difference in median methylation

between smokers and nonsmokers measured in whole blood, which is a mixture of these two

major cell types (PM and MN), the maximum was 13.7% (MoBa) and 15.1% (NEST) and the

mean was 5.3% (MoBa) and 5.0% (NEST). For our top CpG AHRR cg05575921, the percent

difference in median methylation was 0.31% by cell type compared to 7.52% for smokers

compared to nonsmokers in MoBa (7.67% in NEST). Thus, the differences in methylation

between these two major cell pools are much smaller than the differences in methylation by

smoking that we observed in whole blood, suggesting that confounding by cell type is unlikely to

explain our findings of differential methylation related to smoking.

17

Page 18 of 41

It is possible that differences in methylation may exist in more refined subclassifications of cell types that we did not specifically examine. But again, our top finding for AHRR cg05575921 due to maternal smoking was also reported in adult smokers in two different cell types, alveolar lung macrophage DNA and lymphoblast DNA (Monick et al. 2012). These various lines of evidence give strong support for the conclusion that our replicated findings are not explained by effects of maternal smoking on the relative prevalence of white blood cell subtypes that differ with regard to CpG methylation.

Conclusions

In utero exposure to maternal smoking is an important risk factor for numerous adverse outcomes in children and adults. With a hypothesisfree epigenomewide screen with replication in a second population, we observed strong evidence that maternal smoking during pregnancy is associated with cord blood methylation of genes in the AhR signaling pathway, important for the detoxification of xenobiotics in tobacco smoke, and methylation of a novel gene not previously implicated in response to tobacco smoke that is involved in fundamental developmental processes. These results suggest that epigenetic mechanisms reflected by DNA methylation may underlie some of the welldocumented impacts of maternal smoking on offspring. Our identification of differential methylation in genes known to be involved in the response to tobaccorelated compounds, in addition to a novel gene, demonstrates the value of using this approach to elucidate the epigenetic effects of in utero exposures.

18

Page 19 of 41

References

Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, et al. 2011. High density DNA methylation array with single CpG site resolution. Genomics 98(4):288295.

Boitano AE, Wang J, Romeo R, Bouchez LC, Parker AE, Sutton SE, et al. 2010. Aryl hydrocarbon receptor antagonists promote the expansion of human hematopoietic stem cells. Science 329(5997):13451348.

Breitling LP, Yang R, Korn B, Burwinkel B, Brenner H. 2011. Tobaccosmokingrelated differential DNA methylation: 27K discovery and replication. Am J Hum Genet 88(4):450457.

Breton CV, Byun HM, Wenten M, Pan F, Yang A, Gilliland FD. 2009. Prenatal tobacco smoke exposure affects global and genespecific DNA methylation. Am J Respir Crit Care Med 180(5):462467.

Brion MJ, Leary SD, Lawlor DA, Smith GD, Ness AR. 2008. Modifiable maternal exposures and offspring blood pressure: a review of epidemiological studies of maternal age, diet, and smoking. Pediatr Res 63(6):593598.

CupulUicab LA, Skjaerven R, Haug K, Melve KK, Engel SM, Longnecker MP. 2012. In Utero Exposure to Maternal Tobacco Smoke and Subsequent Obesity, Hypertension, and Gestational Diabetes Among Women in the MoBa Cohort. Environ Health Perspect 120(3):355360.

Dietz PM, Homa D, England LJ, Burley K, Tong VT, Dube SR, et al. 2011. Estimates of nondisclosure of cigarette smoking among pregnant and nonpregnant women of reproductive age in the United States. Am J Epidemiol 173(3):355359.

Dolinoy DC, Huang D, Jirtle RL. 2007. Maternal nutrient supplementation counteracts bisphenol Ainduced DNA hypomethylation in early development. Proc Natl Acad Sci U S A 104(32):1305613061.

Duan Z, Zarebski A, MontoyaDurango D, Grimes HL, Horwitz M. 2005. Gfi1 coordinates epigenetic repression of p21Cip/WAF1 by recruitment of histone lysine methyltransferase G9a and histone deacetylase 1. Mol Cell Biol 25(23):1033810351.

Feinberg AP. 2010. Genomescale approaches to the epigenetics of common human disease. Virchows Arch 456(1):1321.

19

Page 20 of 41

Fox J, Weisberg S. 2011. Robust Regression in R. In: An R Companion to Applied Regression, Part Second. Thousand Oaks, CA:Sage.

Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, et al. 2004. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5(10):R80.

Haberg SE, London SJ, Nafstad P, Nilsen RM, Ueland PM, Vollset SE, et al. 2011. Maternal folate levels in pregnancy and asthma in children at age 3 years. J Allergy Clin Immunol 127(1):262264, 264 e261.

Harper PA, Riddick DS, Okey AB. 2006. Regulating the regulator: factors that control levels and activity of the aryl hydrocarbon receptor. Biochem Pharmacol 72(3):267279.

Hollingsworth JW, Maruoka S, Boon K, Garantziotis S, Li Z, Tomfohr J, et al. 2008. In utero supplementation with methyl donors enhances allergic airway disease in mice. J Clin Invest 118(10):34623469.

Hoyo C, Murtha AP, Schildkraut JM, Forman MR, Calingaert B, DemarkWahnefried W, et al. 2011. Folic acid supplementation before and during pregnancy in the Newborn Epigenetics STudy (NEST). BMC Public Health 11(1):46.

JafarNejad H, Bellen HJ. 2004. Gfi/Pag3/senseless zinc finger proteins: a unifying theme? Mol Cell Biol 24(20):88038812.

Kawajiri K, FujiiKuriyama Y. 2007. Cytochrome P450 gene regulation and physiological functions mediated by the aryl hydrocarbon receptor. Arch Biochem Biophys 464(2):207212.

Khandanpour C, Kosan C, Gaudreau MC, Duhrsen U, Hebert J, Zeng H, et al. 2011. Growth factor independence 1 protects hematopoietic stem cells against apoptosis but also prevents the development of a myeloproliferativelike disease. Stem Cells 29(2):376385.

Kumano K, Kurokawa M. 2010. The role of Runx1/AML1 and Evi1 in the regulation of hematopoietic stem cells. J Cell Physiol 222(2):282285.

Lawrence BP, Sherr DH. 2012. You AhR what you eat? Nat Immunol 13(2):117119.

20

Page 21 of 41

Magnus P, Irgens LM, Haug K, Nystad W, Skjaerven R, Stoltenberg C. 2006. Cohort profile: the Norwegian Mother and Child Cohort Study (MoBa). Int J Epidemiol 35(5):11461150.

Midttun O, Hustad S, Ueland PM. 2009. Quantitative profiling of biomarkers related to B vitamin status, tryptophan metabolism and inflammation in human plasma by liquid chromatography/tandem mass spectrometry. Rapid Commun Mass Spectrom 23(9):13711379.

Monick MM, Beach SR, Plume J, Sears R, Gerrard M, Brody GH, et al. 2012. Coordinated changes in AHRR methylation in lymphoblasts and pulmonary macrophages from smokers. Am J Med Genet B Neuropsychiatr Genet 159B(2):141151.

Moroy T, Khandanpour C. 2011. Growth factor independence 1 (Gfi1) as a regulator of lymphocyte development and activation. Semin Immunol 23(5):368378.

Murphy SK, Adigun A, Huang Z, Overcash F, Wang F, Jirtle RL, et al. 2012. Genderspecific methylation differences in relation to prenatal exposure to cigarette smoke. Gene 494(1):3643.

Nguyen LP, Bradfield CA. 2008. The search for endogenous activators of the aryl hydrocarbon receptor. Chem Res Toxicol 21(1):102116.

Office of the Surgeon General. 2006. The health consequences of involuntary exposure to tobacco smoke : a report of the Surgeon General. Rockville, MD: U.S. Dept. of Health and Human Services, Public Health Service, Office of the Surgeon General.

Oshima M, Endoh M, Endo TA, Toyoda T, NakajimaTakagi Y, Sugiyama F, et al. 2011. Genomewide analysis of target genes regulated by HoxB4 in hematopoietic stem and progenitor cells developing from embryonic stem cells. Blood 117(15):e142150.

R Development Core Team. 2010. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria.

Rakyan VK, Down TA, Balding DJ, Beck S. 2011. Epigenomewide association studies for common human diseases. Nat Rev Genet 12(8):529541.

Reik W. 2007. Stability and flexibility of epigenetic gene regulation in mammalian development. Nature 447(7143):425432.

21

Page 22 of 41

Rico de Souza A, Zago M, Pollock SJ, Sime PJ, Phipps RP, Baglole CJ. 2011. Genetic ablation of the aryl hydrocarbon receptor causes cigarette smokeinduced mitochondrial dysfunction and apoptosis. J Biol Chem 286(50):4321443228.

Ronningen KS, Paltiel L, Meltzer HM, Nordhagen R, Lie KK, Hovengen R, et al. 2006. The biobank of the Norwegian Mother and Child Cohort Study: a resource for the next 100 years. Eur J Epidemiol 21(8):619625.

Sandoval J, Heyn HA, Moran S, SerraMusach J, Pujana MA, Bibikova M, et al. 2011. Validation of a DNA methylation microarray for 450,000 CpG sites in the . Epigenetics 6(6):692702.

Schmidl C, Klug M, Boeld TJ, Andreesen R, Hoffmann P, Edinger M, et al. 2009. Lineage specific DNA methylation in T cells correlates with histone methylation and activity. Genome Res 19(7):11651174.

Shaw GM, Carmichael SL, Vollset SE, Yang W, Finnell RH, Blom H, et al. 2009. Mid pregnancy cotinine and risks of orofacial clefts and neural tube defects. J Pediatr 154(1):1719.

Suter M, Ma J, Harris AS, Patterson L, Brown KA, Shope C, et al. 2011. Maternal tobacco use modestly alters correlated epigenomewide placental DNA methylation and gene expression. Epigenetics 6(11).

Terry MB, Ferris JS, Pilsner R, Flom JD, Tehranifar P, Santella RM, et al. 2008. Genomic DNA methylation among women in a multiethnic New York City birth cohort. Cancer Epidemiol Biomarkers Prev 17(9):23062310.

Wan ES, Qiu W, Baccarelli A, Carey VJ, Bacherman H, Rennard SI, et al. 2012. Cigarette smoking behaviors and time since quitting are associated with differential DNA methylation across the human genome. Hum Mol Genet 21(13):30733082.

WilhelmBenartzi CS, Houseman EA, Maccani MA, Poage GM, Koestler DC, Langevin SM, et al. 2012. In utero exposures, infant growth, and DNA methylation of repetitive elements and developmentally related genes in human placenta. Environ Health Perspect 120(2):296302.

22

Page 23 of 41

Table 1. Descriptive characteristics of the MoBa study population a Variable Category N (%)

Maternal cotinine (nmol/L) b Undetectable (0) 736 (69.3) Low (>056.8) 190 (17.9) Moderate (>56.8388) 68 (6.4) High (>388) 68 (6.4)

Selfreported maternal smoking Never 525 (49.4) Formerstopped before pregnancy 233 (21.9) Stopped during pregnancy 180 (16.9) Sometimes or daily 124 (11.7)

Child gender Male 566 (53.3) Female 496 (46.7)

Maternal education Less than high school 78 (7.3) High school 343 (32.3) Some college 471 (44.4) 4 years of college or more 170 (16.0)

Parity 0 447 (42.1) 1 435 (41.0) 2 137 (12.9) 3 or more 43 (4.0) a N=1,062 individuals. b Maternal cotinine measured in plasma at approximately gestational week 18. Cotinine values above 56.8 nmol/L are consistent with active smoking.

23

Page 24 of 41

Table 2. Differential methylation in cord blood DNA in relation to maternal cotinine in the MoBa study population: CpGs with

Bonferronicorrected statistical significance (p < 1.06x10 7), sorted by chromosome and position

Unadjusted Adjusted f Median methylation by cotinine category h Distance to Chr a Gene CpG Position c Coef d SE e pvalue Coef SE pvalue Rank g Undetectable Low Medium High Gene b 1 GFI1 3688 cg10399789 92945668 0.07 0.01 4.08E13 0.065 0.010 1.10E10 14 0.759 0.755 0.727 0.716 1 GFI1 3224 cg09662411 92946132 0.111 0.012 2.26E20 0.106 0.013 2.96E17 11 0.730 0.733 0.669 0.654 1 GFI1 3169 cg06338710 92946187 0.112 0.013 1.34E18 0.106 0.014 5.02E14 12 0.801 0.800 0.754 0.733 1 GFI1 2656 cg18146737 92946700 0.28 0.024 2.42E30 0.271 0.026 3.30E25 6 0.877 0.875 0.771 0.738 1 GFI1 2531 cg12876356 92946825 0.182 0.016 2.29E30 0.176 0.017 1.70E25 4 0.731 0.732 0.627 0.605 1 GFI1 2321 cg18316974 92947035 0.243 0.024 6.43E24 0.238 0.026 3.16E20 7 0.921 0.923 0.856 0.841 1 GFI1 1768 cg09935388 92947588 0.196 0.015 1.05E38 0.188 0.016 2.68E31 2 0.708 0.707 0.580 0.564 1 GFI1 1395 cg14179389 92947961 0.184 0.017 5.38E28 0.181 0.017 2.63E25 5 0.242 0.246 0.154 0.158 5 AHRR 19617 cg23067299 323907 0.075 0.012 4.21E10 0.072 0.012 4.12E09 20 0.789 0.789 0.813 0.837 5 AHRR 64157 cg03991871 368447 0.057 0.008 2.04E11 0.054 0.009 1.99E10 15 0.841 0.839 0.820 0.818 5 AHRR 69088 cg05575921 373378 0.202 0.015 2.85E39 0.198 0.017 8.03E33 1 0.883 0.874 0.829 0.784 5 AHRR 95070 cg21161138 399360 0.045 0.007 1.52E11 0.043 0.007 8.91E10 18 0.718 0.715 0.701 0.679 6 HLA-DPB2 11549 cg11715943 33091841 0.053 0.009 1.00E08 0.054 0.010 3.63E08 24 0.842 0.833 0.824 0.820 7 MYO1G 16417 cg19089201 45002287 0.083 0.013 3.22E10 0.088 0.014 9.13E11 13 0.925 0.926 0.932 0.944 7 MYO1G 16218 cg22132788 45002486 0.18 0.021 1.98E18 0.184 0.021 4.82E18 10 0.932 0.935 0.951 0.966 7 MYO1G 15968 cg04180046 45002736 0.073 0.008 8.76E20 0.076 0.008 2.85E19 9 0.441 0.446 0.484 0.508 7 MYO1G 15785 cg12803068 45002919 0.145 0.016 8.51E19 0.149 0.016 1.25E19 8 0.713 0.721 0.774 0.813 7 ENSG00000225718 198306 cg04598670 68697651 0.063 0.009 1.29E11 0.061 0.010 1.27E09 19 0.623 0.607 0.597 0.574 7 CNTNAP2 854 cg25949550 145814306 0.075 0.007 4.15E30 0.073 0.007 1.02E26 3 0.113 0.109 0.097 0.092 8 EXT1 33821 cg03346806 119157879 0.038 0.007 3.08E08 0.039 0.007 9.34E08 27 0.801 0.795 0.793 0.779 14 TTC7B 274756 cg18655025 91008005 0.041 0.007 2.07E08 0.042 0.008 6.76E08 26 0.854 0.847 0.841 0.836 15 CYP1A1 1266 cg05549655 75019143 0.064 0.01 2.96E10 0.065 0.010 2.38E10 16 0.189 0.188 0.221 0.226 15 CYP1A1 1374 cg22549041 75019251 0.096 0.016 4.52E09 0.098 0.017 8.88E09 21 0.385 0.379 0.414 0.475 15 CYP1A1 1406 cg11924019 75019283 0.044 0.008 2.62E08 0.044 0.008 4.78E08 25 0.434 0.430 0.457 0.475 15 CYP1A1 1425 cg18092474 75019302 0.066 0.012 1.10E08 0.068 0.012 9.95E09 22 0.510 0.504 0.549 0.573 21 RUNX1 1746 cg12477880 36259241 0.159 0.026 1.02E09 0.163 0.026 7.55E10 17 0.088 0.102 0.110 0.158

24

Page 25 of 41

a Chromosome; b Distance from CpG to transcription start site of the nearest gene; c Chromosomal position based on NCBI human

reference genome assembly Build 37.3; d Regression coefficient; e Standard error for regression coefficient; f Adjusted for

maternal age, maternal education, parity, and asthma; g Rank order based on the adjusted pvalue; h Maternal plasma cotinine

(nmol/L) measured around gestational week 18 (undetectable: ≤0; low: >056.8; moderate: >56.8 – 388; high: >388). Values

above 56.8 nmol/L indicate active smoking.

25

Page 26 of 41

Table 3. Replication results from the NEST cohort for the 26 CpGs reaching epigenomewide Bonferronicorrected statistical significance (p<1.06x107) in the MoBa cohort, and median methylation differences by maternal smoking for MoBa and NEST cohorts, sorted by NEST p value Percent Difference in Median Methylation (Smokers – Non Smokers) NEST p Chromosome Gene CpG value a MoBa b NEST c 5 AHRR cg05575921 0.0003* 7.5 7.7 15 CYP1A1 cg05549655 0.0006* 3.5 3.8 15 CYP1A1 cg11924019 0.0008* 3.2 5.3 1 GFI1 cg09935388 0.0012* 13.7 7.5 1 GFI1 cg12876356 0.0015* 11.9 10.5 1 GFI1 cg18316974 0.0023 7.1 9 1 GFI1 cg09662411 0.0023 6.6 6.8 7 CNTNAP2 cg25949550 0.0025 1.8 2.5 1 GFI1 cg06338710 0.0026 5.8 5.2 7 MYO1G cg04180046 0.0027 5.3 4.9 7 ENSG00000225718 cg04598670 0.0036 3 5.3 5 AHRR cg23067299 0.0036 3.2 3.7 1 GFI1 cg18146737 0.0037 12.3 15.1 7 MYO1G cg12803068 0.0041 8.3 3.8 1 GFI1 cg14179389 0.0043 8.6 8.2 15 CYP1A1 cg22549041 0.0044 7.2 8.9 15 CYP1A1 cg18092474 0.0044 5.9 5.3 7 MYO1G cg19089201 0.0092 1.4 2.2 7 MYO1G cg22132788 0.0096 2.8 2.1 1 GFI1 cg10399789 0.0154 3.7 3.4 5 AHRR cg21161138 0.0283 2.3 1.7 5 AHRR cg03991871 0.0655 2.2 2.3 21 RUNX1 cg12477880 0.0850 4.6 4 8 EXT1 cg03346806 0.1248 1.5 0.1 14 TTC7B cg18655025 0.2668 1.8 0.3 6 HLA-DPB2 cg11715943 0.7956 7.5 7.7 a Pvalue from NEST replication linear regression model evaluating differential DNA methylation by maternal smoking during pregnancy. b MoBa maternal smoking status determined by gestational week 18 cotinine values >56.8 nmol/L (smoker) or ≤56.8 nmol/L (nonsmoker). c NEST maternal smoking status determined by maternal selfreport, verified by medical records. * Bonferronicorrected statistically significant (p < 0.0019).

26

Page 27 of 41

Figure Legends

Figure 1. Epigenomewide association between maternal cotinine and methylation of 473,844

CpGs measured in cord blood from the MoBa cohort. Twenty six CpGs (10 genes) reached

Bonferronicorrected statistical significance (p<1.06x10 7, represented by the horizontal line).

27

Page 28 of 41

Figure 2. Difference in median methylation intensity (beta) by smoking status among MoBa and NEST participants for top 26 CpGs, grouped by gene.

28

Page 29 of 41

Supplemental Material

450K Epigenome-Wide Scan Identifies Differential DNA Methylation in Newborns Related

to Maternal Smoking During Pregnancy

Bonnie R. Joubert 1, Siri E. Håberg 2, Roy M. Nilsen 3, Xuting Wang 1, Stein E. Vollset 2,4, Susan

5 5 5 K. Murphy , Zhiqing Huang , Cathrine Hoyo ,

Øivind Midttun 6, Lea A. Cupul-Uicab 1, Per M Ueland 4, Michael C. Wu 7, Wenche Nystad 2,

Douglas A. Bell 1, Shyamal D. Peddada 1, Stephanie J. London 1*

1 Division of Intramural Research, National Institute of Environmental Health Sciences, National

Institutes of Health, Department of Health and Human Services, Research Triangle Park, North

Carolina; 2 Norwegian Institute of Public Health, Oslo, Norway; 3 Haukeland University

Hospital, Bergen, Norway; 4 University of Bergen, Bergen, Norway; 5 Duke University School

of Medicine, Durham NC; 6 Bevital A/S, Laboratoriebygget, Bergen, Norway; 7 University of

North Carolina, Chapel Hill, NC

1

Page 30 of 41

Table of Contents

Supplemental Text, pages 3-5

Table S1, pages 6-8

Table S2, page 9

Figure S1, page 10

Figure S2, page 11

Figure S3, pages 12-13

2

Page 31 of 41

Supplemental Text

Quality Control

Bisulfite conversion for the MoBa samples was evaluated according to methods

previously described (Bibikova et al. 2011). Additionally, we included 28 blind replicate

samples (14 study subjects run in duplicate, included on each plate), 26 plate control samples

provided by Illumina (DNA from two cells lines run on each of the 13, 96-well plates), 25 plate

control samples provided by us [DNA from 2 control individuals on each of 12 plates (1 on each

half plate), 1 on the 13th plate] and 8 samples prepared from mixing methylated and non-

methylated DNA, described as follows. Human HCT116 DKO Methylated DNA (Cat# D5014-2)

and human HCT116 DKO non-methylated DNA (Cat# D5014-1) were purchased from Zymo

Research (Irvine, CA). The fully methylated DNA was mixed with non-methylated DNA to

provide a series of methylation controls (10%, 35%, 60%, and 85% methylated) to be included

on the first and twelfth plates.

We received from Illumina (San Diego, CA), data for a total of 1,204 samples. Samples

with an average detection p-value across all probes of less than 0.05 and/or indicated by Illumina

to have failed (N=49) were omitted from further analysis along with 1 sample erroneously

included in the dataset. Multidimensional scaling (MDS) plots were used to evaluate gender

outliers based on chromosome X data, where males and females separated into two distinct

clusters. Samples separating into erroneous clusters (males in female cluster or females in male

cluster) or not belonging to a distinct cluster were omitted (N=13). Blind duplicate samples were

highly correlated (Spearman rho = 0.997) and the mean difference in beta was 0.0043 (standard

error = 0.00012). For the 14 blind duplicate pairs, results from one of the two samples in each

3

Page 32 of 41

pair was selected at random to retain in the dataset and the other was omitted from further analysis. CpGs with missing chromosome data (N=65, mostly control probes), missing more than 10% of data across individuals (N=20), or on chromosome X (N=11,232) or Y (N=416) were omitted, resulting in 473,844 probes for analysis.

The laboratory analysis plan was designed to exclude batch effects. All samples were run with a single set of reagents on a single machine at Illumina, Inc. (San Diego, CA). Bisulfite conversion and methylation measurements including reruns were performed in March 2011.

Variables representing the chip (12 samples), chip set (four contiguous chips or half of a plate), and plate (96 samples) were included as covariates in statistical models to evaluate potential confounding. In addition, the distributions of beta and logratio values were compared across chips, chip sets and plates. We found that chip, chip set and plate were not appreciable sources of variability.

The NEST data quality control followed a similar protocol as described above. In addition to the 18 smokers (9 males, 9 females) and 18 non-smokers (9 males, 9 females), eight plate control samples and three samples representing 10%, 50%, and 85% methylation were included on a single plate.

It is possible that SNPs at or near CpGs could influence methylation intensities and thereby the associations we observed. We searched online databases to determine the presence of an underlying SNP for the top 105 most statistically significant CpGs. Information was obtained for SNPs with minor allele frequency ≥ 5% in the CEU (Utah residents with Northern and Western European ancestry) population, curated by 1000G projects

(http://www.1000genomes.org/, 06/2011 release, 87 individuals), HapMap project

(http://hapmap.ncbi.nlm.nih.gov/, release 28, 8/2010, 174 individuals), and dbSNP

4

Page 33 of 41

(http://www.ncbi.nlm.nih.gov/projects/SNP/, build 134, 8/2011, 116 individuals). We found 3

probes with SNPs within CpGs and removed them from the top 105 findings (rs77990586 in

WBSCR17 cg20370581, rs115039881 in cg00531338 not specific to a gene, and rs6869832 in

AHRR cg23576855).

References

Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, et al. 2011. High density DNA methylation array with single CpG site resolution. Genomics.

5

Page 34 of 41

Supplemental Material, Table S1. Differential methylation in cord blood DNA in relation to maternal cotinine, top 100 most significant CpGs, sorted by chromosome, position

Chra Gene Distanceb Locationc CpG Positiond Coefe SEf p-value* 1 AJAP1 20974 BODY cg26435172 4736126 -0.033 0.007 7.66E-07 1 WDR78 734 BODY cg00044354 67389884 -0.045 0.009 1.65E-06 1 GNG12 -338 TSS cg25189904 68299493 -0.031 0.007 4.96E-06 1 GNG12 -356 TSS cg26764244 68299511 -0.034 0.007 4.47E-06 1 GFI1 3688 BODY cg10399789 92945668 -0.065 0.010 1.10E-10* 1 GFI1 3224 BODY cg09662411 92946132 -0.106 0.013 2.96E-17* 1 GFI1 3217 BODY cg06338710 92946187 -0.106 0.014 5.02E-14* 1 GFI1 2704 BODY cg18146737 92946700 -0.271 0.026 3.30E-25* 1 GFI1 2531 BODY cg12876356 92946825 -0.176 0.017 1.70E-25* 1 GFI1 2321 BODY cg18316974 92947035 -0.238 0.026 3.16E-20* 1 GFI1 1768 BODY cg09935388 92947588 -0.188 0.016 2.68E-31* 1 GFI1 1443 BODY cg14179389 92947961 -0.181 0.017 2.63E-25* 1 PBX1 27680 BODY cg07353489 164556276 -0.032 0.007 4.33E-06 1 ENSG00000198216 -2747 TSS cg08943293 181449699 -0.029 0.006 2.90E-06 1 MFSD4 -2879 TSS cg06827850 205535232 -0.030 0.007 4.16E-06 1 GALNT2 207374 BODY cg11527913 230410329 -0.052 0.011 5.76E-06 1 KIF26B 180355 BODY cg15137445 245498689 -0.027 0.006 2.80E-06 1 OR14I1 444 BODY cg01165362 248845209 -0.037 0.008 5.74E-06 2 ENSG00000232835 -5198 TSS cg14556323 5701409 -0.028 0.006 3.61E-06 2 XDH 78080 BODY cg14654795 31559579 -0.033 0.007 5.72E-06 2 EXOC6B 222520 BODY cg06108254 72830705 -0.030 0.007 4.96E-06 2 LOC284998 10689 BODY cg18703066 105363536 -0.076 0.015 3.42E-07 2 MYO1B 20360 BODY cg12738764 192130613 -0.033 0.007 2.76E-06 2 ENSG00000228226 -4057 TSS cg20962638 229553401 -0.031 0.007 4.93E-06 2 COPS8 6478 BODY cg05301057 238000561 -0.032 0.007 4.03E-06 2 PP14571 6909 BODY cg18455650 241389208 -0.030 0.007 5.73E-06 3 VGLL4 11 TSS cg18096987 11623873 -0.025 0.005 2.20E-06 3 ENSG00000235886 -99161 TSS cg07746241 43920635 -0.027 0.006 2.10E-06 3 ENSG00000243149 -31477 TSS cg22931622 65129155 -0.032 0.007 4.34E-06 3 SEMA5B -1510 TSS cg13428477 122748086 -0.041 0.008 4.16E-07 3 SSR3 -51017 TSS cg26405475 156324038 0.042 0.009 5.66E-06 4 ENSG00000249105 -183313 TSS cg07796335 59167549 -0.034 0.007 5.06E-06 5 AHRR 19569 BODY cg23067299 323907 0.072 0.012 4.12E-09* 5 AHRR 64109 BODY cg03991871 368447 -0.054 0.009 1.99E-10* 5 AHRR 64466 BODY cg23916896 368804 -0.046 0.010 1.46E-06 5 AHRR 69088 BODY cg05575921 373378 -0.198 0.017 8.03E-33* 5 AHRR 69088 BODY cg21161138 399360 -0.043 0.007 8.91E-10* 5 ENSG00000249201 2484 BODY cg01772854 1176225 -0.033 0.007 7.43E-07 5 FSTL4 86949 BODY cg02378360 132861322 -0.031 0.007 3.04E-06 6 RPP40 2364 BODY cg11942662 5001955 -0.030 0.007 4.26E-06 6 OR12D2 500 BODY cg08862210 29364963 -0.036 0.008 3.33E-06 6 TRIM15 1441 BODY cg23784132 30132471 -0.039 0.009 4.46E-06 6 TRIM26 21761 BODY cg21531017 30159510 -0.030 0.006 2.28E-06

6

Page 35 of 41

Supplemental Material, Table S1 (cont.). Differential methylation in cord blood DNA in relation

to maternal cotinine, top 100 most significant CpGs, sorted by chromosome, position

Chra Gene Distanceb Locationc CpG Positiond Coefe SEf p-value* 6 HLA-DPB2 11549 BODY cg11715943 33091841 -0.054 0.010 3.63E-08* 7 SDK1 7987 BODY cg21005410 4177351 -0.029 0.006 3.26E-06 7 MYO1G 16465 BODY cg19089201 45002287 0.088 0.014 9.13E-11* 7 MYO1G 16266 BODY cg22132788 45002486 0.184 0.021 4.82E-18* 7 MYO1G 15968 BODY cg04180046 45002736 0.076 0.008 2.85E-19* 7 MYO1G 15785 BODY cg12803068 45002919 0.149 0.016 1.25E-19* 7 ENSG00000225718 39390 TSS cg04598670 68697651 -0.061 0.010 1.27E-09* 7 PODXL 34603 BODY cg21771679 131206773 -0.036 0.008 1.89E-06 7 CNTNAP2 854 BODY cg25949550 145814306 -0.073 0.007 1.02E-26* 7 CNTNAP2 1090753 BODY cg11207515 146904205 -0.041 0.008 7.14E-07 7 PTPRN2 277695 BODY cg02356647 158102787 -0.037 0.008 3.24E-06 8 ARHGEF10 78878 BODY cg26101086 1851026 -0.035 0.007 1.14E-06 8 ERLIN2 18617 BODY cg26393977 37612713 -0.035 0.007 2.62E-06 8 EXT1 -33821 TSS cg03346806 119157879 -0.039 0.007 9.34E-08* 8 PLEC -1231 TSS cg03958308 145014989 -0.030 0.006 3.40E-06 9 GLIS3 71264 BODY cg14047387 4080919 -0.035 0.007 7.33E-07 9 MIR2964A -1100 TSS cg00321619 131153847 -0.033 0.007 7.52E-07 10 CAMK1D 39248 BODY cg16894855 12430878 -0.031 0.007 3.59E-06 10 FRMD4A -13 TSS cg11813497 14372879 0.043 0.009 3.48E-06 10 MGMT -1399 TSS cg09993459 131264102 -0.030 0.006 1.72E-06 10 FRG2B 6347 TSS cg20219790 135434000 -0.040 0.008 1.09E-06 11 OR5M10 364 BODY cg07850316 56344833 -0.031 0.007 5.43E-06 11 ARHGAP42 125042 BODY cg02008402 100683448 -0.034 0.007 1.62E-06 12 GRIN2B 23556 BODY cg17174980 14109514 -0.032 0.007 5.64E-06 12 ENSG00000212383 -44252 TSS cg11147442 57210954 -0.034 0.007 3.36E-06 12 CUX2 259376 BODY cg00029284 111731203 -0.031 0.006 1.53E-06 12 ENSG00000213144 30912 TSS cg13083057 119663566 -0.035 0.008 3.41E-06 12 LOC100507055 438 BODY cg09444108 133186599 -0.032 0.007 3.35E-06 13 ENSG00000215881 27438 BODY cg20338386 112275793 -0.026 0.005 1.27E-06 14 TTC7B 274756 BODY cg18655025 91008005 -0.042 0.008 6.76E-08* 14 CCDC88C 17815 BODY cg23304605 91866373 0.048 0.010 5.75E-06 14 MEG3 1703 BODY cg08698721 101294147 0.038 0.008 2.92E-06 15 CYP1A1 -1266 TSS cg05549655 75019143 0.065 0.010 2.38E-10* 15 CYP1A1 -1319 TSS cg13570656 75019196 0.067 0.014 1.64E-06 15 CYP1A1 -1326 TSS cg12101586 75019203 0.058 0.013 3.68E-06 15 CYP1A1 -1374 TSS cg22549041 75019251 0.098 0.017 8.88E-09* 15 CYP1A1 -1358 TSS cg11924019 75019283 0.044 0.008 4.78E-08* 15 CYP1A1 -1425 TSS cg18092474 75019302 0.068 0.012 9.95E-09* 15 LOC338963 1675 BODY cg02406469 83381070 -0.037 0.008 3.90E-06 16 CACNA1H -17013 TSS cg00720047 1186227 -0.031 0.007 3.35E-06 16 ENSG00000214696 63479 TSS cg00253658 54210496 0.077 0.016 9.64E-07 16 ZFHX3 90978 BODY cg05764102 72991344 -0.034 0.007 1.24E-06 16 VAT1L 3346 BODY cg04260557 77825828 -0.035 0.007 1.90E-06

7

Page 36 of 41

Supplemental Material, Table S1 (cont.). Differential methylation in cord blood DNA in relation to maternal cotinine, top 100 most significant CpGs, sorted by chromosome, position

Chra Gene Distanceb Locationc CpG Positiond Coefe SEf p-value* 16 ANKRD11 68054 BODY cg00169122 89488963 0.028 0.006 5.20E-06 17 C17orf98 15 TSS cg19760250 36997627 -0.094 0.020 2.41E-06 17 C17orf98 -30 TSS cg23290482 36997720 -0.064 0.013 1.90E-06 17 CCR7 10231 BODY cg07479709 38711505 -0.035 0.007 1.78E-06 19 GMIP 11941 BODY cg02293766 19742514 -0.029 0.006 4.96E-06 19 MIR518B -1428 TSS cg11251554 54204562 -0.032 0.006 2.40E-07 20 TGM3 32119 BODY cg21293537 2308779 -0.034 0.007 1.90E-06 20 LOC339568 -1038 TSS cg07376374 37854477 -0.037 0.008 5.01E-06 20 ATP9A 72466 BODY cg07339236 50312490 -0.053 0.010 1.38E-07 21 RUNX1 1920 BODY cg02869559 36259067 0.079 0.017 2.75E-06 21 RUNX1 1746 BODY cg12477880 36259241 0.163 0.026 7.55E-10* 21 RUNX1 1652 BODY cg00994804 36259383 0.164 0.031 1.31E-07 21 RUNX1 1527 BODY cg06758350 36259460 0.073 0.016 3.52E-06 21 ETS2 2152 BODY cg15892280 40180000 -0.029 0.006 7.74E-07 aChromosome; b Distance from CpG to transcription start site of the nearest gene

(negative=upstream; positive=downstream); c CpG is located in transcription start site (TSS)

or in the gene body (BODY); d Chromosomal position based on NCBI human reference

genome assembly Build 37.3; e Regression coefficient from robust linear regression adjusted

for maternal age, maternal education, parity, and asthma; f Standard error for regression

coefficient; * Bonferroni-corrected statistically significant (p < 1.06x10-7).

8

Page 37 of 41

Supplemental Material, Table S2. Differential methylation by cell type (polymorphonuclear

cells (PM) compared to mononuclear cells (MN)) in cord blood DNA, and the magnitude of

differential methylation by maternal smoking in MoBa (Smokers (S) – Non-smokers (NS)a).

Median Methylation Percent difference Percent difference (beta) in median in median methylation methylation by cell type by smoking in MoBa CHR Gene CpG PM MN (PM – MN) (S – NS) 1 GFI1 cg10399789 0.844 0.840 0.41 -3.66 1 GFI1 cg09662411 0.822 0.845 -2.34 -6.65 1 GFI1 cg06338710 0.884 0.882 0.24 -5.81 1 GFI1 cg18146737 0.904 0.894 1.05 -12.33 1 GFI1 cg12876356 0.826 0.836 -1.02 -11.86 1 GFI1 cg18316974 0.928 0.942 -1.35 -7.14 1 GFI1 cg09935388 0.806 0.797 0.84 -13.68 1 GFI1 cg14179389 0.339 0.324 1.49 -8.61 5 AHRR cg23067299 0.765 0.764 0.08 3.15 5 AHRR cg03991871 0.881 0.862 1.87* -2.21 5 AHRR cg05575921 0.854 0.851 0.31 -7.52 5 AHRR cg21161138 0.818 0.813 0.46 -2.27 6 HLA-DPB2 cg11715943 0.883 0.863 1.97 -1.77 7 MYO1G cg19089201 0.922 0.911 1.04 1.44 7 MYO1G cg22132788 0.956 0.951 0.59 2.82 7 MYO1G cg04180046 0.617 0.586 3.10* 5.30 7 MYO1G cg12803068 0.862 0.844 1.81 8.31 7 ENSG00000225718 cg04598670 0.634 0.623 1.06 -3.01 7 CNTNAP2 cg25949550 0.213 0.204 0.97 -1.80 8 EXT1 cg03346806 0.829 0.818 1.02* -1.51 14 TTC7B cg18655025 0.888 0.886 0.23 -1.19 15 CYP1A1 cg05549655 0.278 0.288 -0.97 3.50 15 CYP1A1 cg22549041 0.408 0.424 -1.59 7.23 15 CYP1A1 cg11924019 0.566 0.563 0.21 3.23 15 CYP1A1 cg18092474 0.627 0.629 -0.20 5.90 21 RUNX1 cg12477880 0.098 0.100 -0.18 4.55 * Bonferroni-corrected statistically significant (Bonferroni p < 0.05, raw p < 0.0019).

a Maternal smoking determined by plasma cotinine (nmol/L) measured around gestational

week 18. Values above 56.8 nmol/L indicate active smoking during pregnancy.

9

Page 38 of 41

Supplemental Material, Figure S1. Histograms showing the distribution of methylation levels in our data. Bimodal distribution was observed when considering all 473,844 CpG sites whereas approximately normal distribution was observed for most individually plotted CpG sites. (a) Beta across all CpGs analyzed; (b) log(beta/1-beta) across all CpGs analyzed; (c) Beta for one representative CpG (cg11924019); (d) log(beta/1- beta) for the representative CpG. 10 Page 39 of 41

Supplemental Material, Figure S2. Plot of the cumulative distribution function for methylation intensities for the top-ranked CpG site (AHRR cg05575921) by the four cotinine categories (undetectable: ≤0 nmol/L; low: >0-56.8 nmol/L; moderate: >56.8 – 388 nmol/L; high: >388 nmol/L) demonstrates a dose-response effect of smoking in the MoBa cohort (Jonkheere-Terpstra trend test p < 2.2x10-16). The Jonkheere-Terpstra trend test was calculated using the SAGx package in R, version 2.14.0. Cotinine values above 56.8 nmol/L are consistent with active smoking. 11 Page 40 of 41 a

cg probes

12 Page 41 of 41 b

cg probes

Supplemental Material, Figure S3. (a) CpG sites located in the intron region of the AHRR gene. (b) CpG sites located on the shore of a CpG island in a bidirection regulatory region of the CYP1A cluster. The statistical significance of the association between cotinine and methylation of each probe is color coded (blue: p > 1x10-5; grey: 1x10-5 ≥ p ≥ 1x10-7; red: p < 1x10-7). The magnitude of effect (coefficient from robust linear regression) is indicated as: <-0.01 (--), -0.01 to 0 (-), >0 to 0.01 (+), and >0.01 (++).

13