<<

JASA jasa v.2004/12/09 Prn:26/10/2006; 10:42 F:jasaap05194r.tex; (Diana) p. 1 Structured Measurement Error in Nutritional

1 : Applications in the Pregnancy, 60 2 61 3 , and (PIN) Study 62 4 63 5 Brent A. JOHNSON,AmyH.HERRING, Joseph G. IBRAHIM, and Anna Maria SIEGA-RIZ 64 6 65 7 66 8 , defined as delivery before 37 completed weeks’ gestation, is a leading cause of infant morbidity and mortality. Identifying 67 factors related to preterm delivery is an important goal of public health professionals who wish to identify etiologic pathways to target 9 68 for prevention. Validation studies are often conducted in nutritional epidemiology in order to study measurement error in instruments that 10 are generally less invasive or less expensive than “gold standard” instruments. Data from such studies are then used in adjusting estimates 69 11 based on the full study sample. However, measurement error in nutritional epidemiology has recently been shown to be complicated 70 12 by correlated error structures in the study-wide and validation instruments. Investigators of a study of preterm birth and dietary intake 71 13 designed a validation study to assess measurement error in a food frequency questionnaire (FFQ) administered during pregnancy and with 72 the secondary goal of assessing whether a single administration of the FFQ could be used to describe intake over the relatively short 14 73 pregnancy period, in which energy intake typically increases. Here, we describe a likelihood-based method via Markov chain Monte Carlo 15 to estimate the regression coefficients in a generalized linear model relating preterm birth to covariates, where one of the covariates is 74 16 measured with error and the multivariate measurement error model has correlated errors among contemporaneous instruments (i.e., FFQs, 75 17 24-hour recalls, and biomarkers). Because of constraints on the covariance parameters in our likelihood, identifiability for all the variance 76 18 and covariance parameters is not guaranteed, and, therefore, we derive the necessary and sufficient conditions to identify the variance and 77 covariance parameters under our measurement error model and assumptions. We investigate the sensitivity of our likelihood-based model to 19 78 distributional assumptions placed on the true intake by employing semiparametric Bayesian methods through the mixture of Dirichlet 20 process priors framework. We exemplify our methods in a recent prospective of risk factors for preterm birth. We use long-term 79 21 folate as our error-prone predictor of interest, the FFQ and 24-hour recall as two biased instruments, and the serum folate biomarker as the 80 22 unbiased instrument. We found that folate intake, as measured by the FFQ, led to a conservative estimate of the estimated odds ratio of 81 23 preterm birth (.76) when compared to the odds ratio estimate from our likelihood-based approach, which adjusts for the measurement error 82 (.63). We found that our parametric model led to similar conclusions to the semiparametric Bayesian model. 24 83 25 KEY WORDS: Adaptive rejection sampling; Dirichlet process prior; MCMC; Semiparametric Bayes. 84 26 85 27 1. INTRODUCTION often attenuate the regression coefficients toward 0 [although 86 28 the result is not true in general nonlinear models (Fuller 1987; 87 Measurement error is a common and well-known challenge 29 Carroll, Ruppert, and Stefanski 1995)]. Although several sta- 88 in nutritional epidemiology. One only has to glance at a re- 30 tistical methods have been proposed for the analysis of data 89 cent issue of any one of the leading epidemiological journals 31 where covariates are measured with error, regression calibration 90 to see this and to verify that there still are many unresolved 32 (Stefanski and Carroll 1985) seems to be the default method in 91 questions. One of the more intriguing recent developments in 33 nutrition (Willett 1998). The method is popular because it may 92 nutritional epidemiology concerns the fitness and applicability 34 be implemented using standard software assuming one has a re- 93 of traditional error models used to assess the and gen- 35 liable calibration model (Spiegelman, Carroll, and Kipnis 2001; 94 eralizability of estimated risks obtained from studies using the 36 Spiegelman, Zhao, and Kim 2004). In addition, much money 95 food frequency questionnaire (FFQ). 37 and energy have been spent on validation studies over the past 96 Despite many documented pitfalls (Block 2001; Byers 2001; 38 several decades; therefore, and variance parameters relat- 97 Willett 2001), including systematic and within- and 39 ing the FFQ to the true, long-term dietary intake can be esti- 98 between-subject variability, the FFQ is a common dietary in- 40 mated with some degree of precision. A related problem to the 99 strument because of its ease of administration and economy in 41 one considered here is the error in covariate misclassification 100 large nutritional studies. Naive regression methods that use the 42 (cf. Holcroft and Spiegelman 1999; Morrissey and Spiegelman 101 43 error-prone FFQ in place of the true long-term dietary intake 102 1999; Spiegelman et al. 2001; Zucker and Spiegelman 2004). 44 The traditional statistical analysis and inference proceeds by 103 45 104 Brent A. Johnson is Postdoctoral Fellow (E-mail: [email protected]), first regressing the FFQ on the outcome to obtain a naive esti- 46 105 Amy H. Herring is Associate Professor (E-mail: [email protected]), mate of the regression coefficient. Then we regress a reference 47 and Joseph G. Ibrahim is Alumni Distinguished Professor (E-mail: instrument—that is, an unbiased measure for the true dietary 106 48 [email protected]), Department of Biostatistics, University of North Car- 107 olina, Chapel Hill, NC 27599. Anna Maria Siega-Riz is Associate Professor, intake—on the FFQ to estimate the attenuation factor. It can be 49 108 Departments of Nutrition and Epidemiology, University of North Carolina, shown that dividing the naive estimated regression coefficient 50 109 Chapel Hill, NC 27599 (E-mail: [email protected]). We thank the asso- by the estimated attenuation factor leads to a corrected estimate 51 ciate editor and two anonymous referees for helpful comments and sugges- 110 tions that led to a much improved manuscript. The research of the first author of the desired regression coefficient, that is, one obtained if we 52 111 was supported in part by grants from the National Institute for the Environ- could have regressed the outcome on the true long-term dietary 53 112 mental Health Sciences (P30ES10126, T32ES007018). Dr. Herring’s research intake (Carroll et al. 1995; Kipnis et al. 2001). If the system- 54 was supported in part by grants from the National Institutes of Child Health 113 and Human Development (1R03HD045780, HD37584, HD39373) and NIEHS atic bias or the correlated errors in the FFQ or 24-hour recall is 55 114 (P30ES10126). The PIN study was funded by grants from NICHD (HD28684, 56 HD05798, DK55865, AG09525), UNC Clinical Nutrition Research Center 115 57 (DK56350), UNC General Clinical Research Resources (RR00046), and funds © 0 American Statistical Association 116 from the Wake Area Health Education Center in Raleigh, NC. Dr. Ibrahim’s Journal of the American Statistical Association 58 117 research was supported in part by NIH grants #CA 70101, #CA 69222, #GM ???? 0, Vol. 0, No. 00, Applications and Case Studies 59 070335, and #AI 060373. DOI 10.1198/00 118

1 JASA jasa v.2004/12/09 Prn:26/10/2006; 10:42 F:jasaap05194r.tex; (Diana) p. 2

2 Journal of the American Statistical Association, ???? 0

1 ignored, then the attenuation factor will be biased, and subse- measurement error in the FFQ. Women in the validation study 60 2 quently, the “corrected” regression coefficient estimate will no were enrolled in the first trimester and were asked to complete 61 3 longer be reliable. Although primary interest often lies in esti- three FFQs over the course of pregnancy, with each FFQ re- 62 4 mating this true regression coefficient, epidemiologists are also flecting intake over the past trimester. The purpose of the longi- 63 5 quite interested in the estimated attenuation factor. Because the tudinal component of the validation substudy was to determine 64 6 power of the study to detect a significant effect is a function of whether one FFQ measurement during the second trimester of 65 7 the attenuation factor, epidemiologists use this fact to make post pregnancy would be sufficient to characterize intake through- 66 8 hoc calculations to determine whether a null finding appears, in out pregnancy. In addition, three daily in-depth interviews 67 9 fact, be to the case or whether it seems to be a result of low (also called “24-hour recalls”) were collected proximal to each 68 10 power. FFQ, providing a maximum of 12 measurements over three 69 11 Our method uses models that allow for correlation in the er- time points. The replicate dietary records were collected in or- 70 12 rors for contemporaneous instruments as suggested in the lit- der to help quantify measurement errors in each FFQ. 71 13 erature (Kaaks, Riboli, Esteve, Van Kappel, and Vab Staveren Finally, we make two additional points regarding the PIN 72 14 1994; Kipnis et al. 2001, 2003). Our point and interval estima- study data. First, one serum folate biomarker was collected on 73 15 tion method is different from that considered in Kipnis et al. every woman in the study, that is, both in the main study and 74 16 (2001, 2003) in that we use a likelihood-based approach (also in the substudy. This feature of the PIN study is not common 75 17 called a structural measurement error model), whereas Kipnis among dietary studies, where a “typical” study collects bio- 76 18 et al. (2001) estimated the attenuation coefficient first and then markers only on women in the validation substudy. However, 77 19 appropriately scaled the naive regression coefficient estimate we found that the additional biomarker information compen- 78 20 to obtain the corrected coefficient estimate. Recently, Spiegel- sated for a lack of information in the validation substudy (i.e., 79 21 man et al. (2004) considered a joint model for all the parame- missing FFQs, 24-hour recalls, or biomarkers). Second, the 80 22 ters in the disease (or outcome) model and the measurement PIN study collected serum and red-blood cell folate biomark- 81 23 error model (as we do in Sec. 3) by “stacking” the estimating ers, which we use as our reference instruments in our analyses. 82 24 equations for all the unknown parameters from both the disease As pointed out by a referee, these biomarkers are measures of 83 25 model and the calibration model and forming an M estimator folate concentration and not folate intake. Better measures of 84 26 (cf. Stefanski and Boos 2002). Again, this regression calibra- the latter are replicate urinary nitrogen or doubly labeled wa- 85 27 tion approach is different from our likelihood-based approach. ter measurements, neither of which were collected in the PIN 86 28 We subsequently extend our likelihood-based model through study. This important point does not change the validity of the 87 29 the mixture of Dirichlet processes (MDP) to avoid methods or analyses but does have a significant impact on the 88 30 placing strict parametric assumptions on the latent true dietary interpretation of the analysis results and their generalizability 89 31 intake variables. The remainder of this article is organized as to other studies. 90 32 follows: Section 2 describes the Pregnancy, Infection, and Nu- 91 3. MODEL AND NOTATION 33 trition (PIN) study, from which the data are acquired, and sci- 92 34 entific questions of interest; Section 3 describes our statistical In this section we describe the proposed model and inference 93 35 model and notation; Section 4 gives an overview of the joint full used in many nutritional studies. The outcome is often modeled 94 36 conditional distribution; Section 5 summarizes a small simula- in two stages, where the first stage models the response as a 95 37 tion study; and Section 6 summarizes the results of our analy- function of predictors, both latent and observed, and the second 96 38 sis; we end with a short discussion on the implications of our stage specifies a measurement error model for the error-prone 97 39 findings in Section 7. covariables. Let Yi, i = 1,...,m, be an outcome of interest be- 98 40 longing to the exponential family of distributions (McCullagh 99 2. THE PIN STUDY DATA 41 and Nelder 1983, p. 28). In the PIN study, Yi will be the bi- 100 = 42 The PIN study was a prospective cohort study of risk fac- nary outcome preterm birth, where Yi 1 if a woman deliv- 101 × 43 tors for preterm birth (Savitz et al. 1999). Recruitment occurred ered preterm and 0 otherwise. Define Ti as a pT 1 vector of 102 44 between 24 and 29 weeks’ gestation, and several question- error-prone covariates assumed to be related to the outcome of 103 45 naires, including an FFQ to assess dietary intake in the sec- interest (e.g., Ti may refer to the true long-term dietary intake 104 46 ond trimester, were administered at this time as described in of several nutrients of interest, or it may refer to a vector of true 105 47 Savitz et al. (1999; Siega-Riz et al. 2004). The outcome of in- dietary intakes for a single nutrient over different trimesters), 106 × 48 terest, preterm birth, was defined as delivery before 37 com- and Zi is a pZ 1 vector of other covariates assumed to be “er- 107 49 pleted weeks of gestation. Siega-Riz et al. (2004) examined the ror free.” The outcome is related to the covariates through the 108 50 relationship between maternal folate status and preterm birth, following model: 109 51 reporting increased risks of preterm birth among women with { }= + + 110 g θi(η) η0 ηT Ti ηZZi, (1) 52 mean daily folate intake less than 500 µg and among women 111 53 where g(·) is a known link function, EYi = θi(η), and η = 112 with serum folate levels less than 16.3 ng/mL. A variety of 54 folate exposure variables, including mean daily dietary intake (η0, ηT , ηZ) . The two primary instruments used in nutrition 113 55 from the FFQ and two biomarkers, serum and red blood cell studies are the FFQ and 24-hour recall, which we denote by 114 = Q = F 56 folate, were used in separate analyses, with all results reported. Qijl1 , l1 1,...,kij , and Fijl2 , l2 1,...,kij , respectively. In 115 × 57 To address FFQ measurement issues, the investigators con- general, it will be convenient to let Qi denote the ki1 1 vec- 116 = Q 58 ducted a validation substudy to determine whether dietary in- tor of all the FFQs for the ith subject, where ki1 j kij , and, 117 59 take changed over the course of pregnancy and to quantify similarly, let Fi be the ki2 × 1 vector of the 24-hour recalls, 118 JASA jasa v.2004/12/09 Prn:26/10/2006; 10:42 F:jasaap05194r.tex; (Diana) p. 3

Johnson et al.: Structured Measurement Error in Nutritional Epidemiology 3 1 = F 60 ki2 j kij . As discussed previously, evidence suggests the parameters in model (2) becomes problematic. For example, it 2 2 61 following measurement error model (Kipnis et al. 2001, 2003; is not possible to identify var(bi1) and σQ separately from one 3 Spiegelman et al. 2004) relating the observed instruments to the FFQ per subject without additional assumptions. Despite our 62 4 true dietary intake, Ti: model simplifications, we use general notation following mod- 63 5 64 Q Q Q Q els (2)Ð(4) as our methods and subsequent analyses are germane Qijl = µ + α + β Tij + bi1 + U , (2) 6 1 j ijl1 to other measurement error problems with similar data. 65 7 F = µF + αF + βFT + b + UF , (3) To write the likelihood for the observed data, it is convenient 66 ijl2 j ij i2 ijl2 8 to introduce some new notation and assumptions. Let Wi = 67 9 Q F × 68 where µ ,µ are means for the FFQ and 24-hour recalls, re- (Qi, Fi, Mi) be the ki 1 vector of all the instruments, where 10 Q Q F F × = M 69 spectively, (α1 ,...,α3 ,α1 ,...,α3 ) are trimester-level fixed Mi is a ki3 1 vector (ki3 j kij ) of unbiased reference in- 11 70 effects, (bi1, bi2) are mean-zero random effects describing struments for the ith subject and ki = ki1 + ki2 + ki3. Here, we Q F 12 subject-specific biases, (β ,β ) describe the systematic bias also assume that the random-effect vector bi = (bi1, bi2, bi3) 71 13 of the instruments, and (UQ , UF ) is a bivariate perturbation is normally distributed with mean 0 and covariance matrix D 72 ijl1 ij†l 14 2 = Q F M 73 vector assumed to have mean 0, variance (σ 2,σ2), respectively, and the measurement error vector Ui (Ui , Ui , Ui ) is nor- 15 Q F 74 † mally distributed with mean 0 and covariance matrix .The 16 and covariance ρjσQσF when j = j and 0 otherwise. To iden- 75 likelihood function of the observed data conditional on Zi is 17 tify trimester-level effects and systematic bias in the FFQ and | 76 i Li(Yi, Wi Zi), where 18 24-hour recall, it is necessary to have one instrument that is 77 19 unbiased for true dietary intake Ti. Let the serum folate bio- Li(Yi, Wi|Zi) 78 = M 20 marker Mijl3 , l3 1,...,kij , which is obtained from a blood 79 = | | | 21 draw taken close in time to the FFQ administration, be such an Li(Yi Ti, Zi)Li(Wi Ti, Zi)Li(Ti Zi) dTi, (6) 80 22 instrument which is assumed to follow the model (Kipnis et al. 81 | 23 2001, 2003; Spiegelman et al. 2004) Li(Ti Zi) is the likelihood of the true dietary intake vector Ti 82 (e.g., Gaussian), L (W |T , Z ) is the error distribution condi- 24 M = T + b + UM , (4) i i i i 83 ijl3 ij i3 ijl3 | 25 tional on Zi, and Li(Yi Ti, Zi) is the probability density function 84 M 26 where, again, b is a mean-zero random effect and U is an from the exponential family with the systematic and random 85 i3 ijl3 27 independent, instrument-specific measurement error with vari- components and link function given in (1). We will assume that 86 | 28 ance σ 2 . Again, recent research in nutritional epidemiology Ui is independent of Zi and, therefore, replace Li(Wi Ti, Zi) 87 M | 29 (Kipnis et al. 2001) suggests that it may be prudent to consider with Li(Wi Ti), which is a multivariate normal distribution de- 88 F M = = † fined by the models in (2), (3), and (4). This assumption seems 30 models where corr(U , U † ) 0, j j , and similarly for 89 ijl2 ij l3 31 the FFQ. The resulting model is heavily parameterized, and the tenable in many applications but would not be reasonable if, 90 32 identifiability of all parameters will only be satisfied with suffi- for example, the mother’s height or weight were somehow re- 91 33 ciently rich data, for example, replicate FFQs, 24-hour recalls, lated to the error in the instrument. If such an assumption 92 34 and biomarkers in a validation substudy. Because such data may were unjustified, a more complicated error model could be in- 93 35 not be observed in any one dataset, one must reduce the com- cluded without any additional difficulty. A detailed description 94 | 36 plexity of the measurement error model (2)Ð(4) through sim- of Li(Wi Ti) is given in the next section. 95 37 96 plification or a priori knowledge of some parameters to identify 3.1 Measurement Error Model 38 the remaining unknown parameters. In the following paragraph, 97 39 we discuss details of the PIN study data and its consequences We first consider a simplified version of the model in (6), 98 40 on our measurement error model; we compare our model to motivated by data from the PIN study described in Section 2. 99 41 one used in a recent analysis of the Council For simplicity, let j = 1,...,3 and Tij = Ti for all j. Conditional 100 42 (MRC) study data (Kipnis et al. 2001). on the random effects bi and true dietary folate intake Ti,we 101 43 In the PIN study, women had at most one FFQ per trimester have 102 44 Q ≤ F ≤ 103 j (kij 1) and at most three 24-hour recalls (kij 3) per Qi 45 ∼ µ + β + 104 trimester. Only one biomarker was collected throughout the Fi Nki Xi AiTi Ribi, i , 46 α 1 105 study period. In contrast, the MRC study collected one FFQ Mi 47 throughout their study period, but collected eight biomarkers 106 = Q F = Q Q F F = 48 (two per season) and four 24-hour recalls (one per season). where µ (µ ,µ ) , α (α1 ,...,α3 ,α1 ,...,α3 ) , β 107 Q F 49 For our analysis of the PIN data, we set and model a single (β ,β ) , and Xi, Ai, and Ri are fixed design matrices link- 108 50 error-prone random variable Ti—that is, Tij = Ti, j = 1, 2, 3, in ing the instruments/biomarkers to the calibration parameters 109 51 (1)–(3)—and use the classical measurement error model for the and random effects, respectively. To continue this illustration, 110 52 biomarker in (4): we make another common assumption and subsequent simpli- 111 53 fication in the measurement error model. In particular, one typ- 112 M = T + U , (5) 54 i i i3 ically assumes that the measurement errors in the biomarkers 113 55 2 114 with σM assumed to be known. Because replicate biomarkers for the ith subject are independent of the measurement errors 56 are collected for every season of the MRC study, Kipnis et al. in the FFQs and 24-hour recalls. This assumption seems ten- 115 57 (2001) did not need to simplify the error model in the biomarker able in the PIN data as the FFQs and 24-hour recalls are both 116 58 (4) as we have done for the PIN study. However, because only self-reported, whereas the biomarkers are laboratory measured 117 59 one FFQ is observed in the MRC study, identifiability for all the with no a priori knowledge of FFQ or 24-hour recall. If we 118 JASA jasa v.2004/12/09 Prn:26/10/2006; 10:42 F:jasaap05194r.tex; (Diana) p. 4

4 Journal of the American Statistical Association, ???? 0

1 partition i into 11i, 12i, 21i, and 22i where 11i cor- The three correlation models (CMs) of interest can be sum- 60 2 responds to the covariance matrix for the FFQs and 24-hour marized as follows: For every l1, l2, 61 3  =  62 recalls, 12i 21i corresponds to the covariance between in- Q F = † CM1: corr U , U † 0 for every j, j , (10) 4 struments and biomarkers, and  is the covariance matrix of ijl1 ij l2 63 22i 5 the biomarkers, then the conditional independence assumption = † 64 T Q F = ρ if j j 6 implies  =  = 0. From here, it is useful to treat the bio- CM2: corr U , U † (11) 65 12i 21i ijl1 ij l2 0 otherwise, 7 markers separately in the model as well as in the likelihood (6). 66 8 = † 67 Now, we focus on the error calibration model for the FFQ and Q F = ρj if j j CM3: corr U , U † (12) 9 24-hour recall only. Hence, we rewrite this portion of the model ijl1 ij l2 0 otherwise, 68 10 as for subject i at time j. In words, correlation model 1 (CM1) in 69 11 70 Qi (10) assumes that measurement errors between FFQs and 24- 12 ∼ N(Hiγ + Ribi, 11i), (7) 71 Fi hour recalls are mutually independent, whereas CM2 and CM3 13 72 assume correlated errors. CM3 assumes measurement errors for 14 = = r r r r r = 73 where γ (γ Q, γ F) , γ r (µ ,α1,α2,α3,β ) for r Q, F, different instruments are correlated differently for each mea- 15 where Q is short-hand for FFQ and F denotes 24-hour recall. surement time, whereas CM2 assumes the correlation remains 74 16 Because Hi is not full rank, it is necessary to constrain some the same over time. Both CM2 and CM3 are expected to reflect 75 17 of the parameters to achieve estimability of γ . We constrain better the errors in contemporaneous instruments observed in 76 18 r = = 77 the first trimester-level effect α1 0, r Q, F, which implies nutritional epidemiological studies (Kipnis et al. 2001; Subar 19 = r r r r r = 78 γ r (γ1 ,γ2 ,γ3 ,γT ) has the following interpretations: γ1 et al. 2003; Carroll 2003; Carroll, Ruppert, Crainiceanu, Toste- 20 r + r r = r − r r = r − r = 79 µ α1, γ2 α2 α1, and γ3 α3 α1 for r Q, F.For son, and Karagas 2004). 21 r = r 80 consistency, we label γT β . In (7), we also have that Hi is Now, we turn our attention to the positive definiteness of 22 = { Q F} 81 block diagonal, that is, Hi diag Hi , Hi , where 11i. By definition, 11i will be positive definite when the 23 = = 82 Q Q quadratic form λ 11iλ 0 if and only if λ 0. We use a corol- 24 = 83 Hi Bi 1ki1 Ti1ki1 lary that allows us to check the positivity of the determinants of 25 84 Q = { } F all the leading minors or, analogously, to check that the eigen- 26 and Bi diag 1 Q , 1 Q , 1 Q . Hi is defined similarly. With the 85 ki1 ki2 ki3 values are all positive (Searle 1971). 27 Qi and Fi organized as in (7), 11 (as a function of its parame- Assuming that there are J measurement times and a constant 86 28 ters) may be written as number of replicate FFQs and 24-hour recalls across trimesters, 87 29 88 = 1/2 1/2 nQ and nF, respectively, the general form of the determinant of 30 11i(σ , ρ) Gi (σ )i(ρ)Gi (σ ), (8) 89 11i is 31 = = 90 where σ (σQ,σF) , ρ (ρ1,ρ2,ρ3) for the three trimester J 32 2 2 91 G = { I I } 2JnF 2JnQ 2 correlation parameters, i(σ ) diag σQ ki1 ,σF ki2 and i(ρ) |11i|=σ σ (1 − nQnFρ ), (13) 33 × F Q j 92 is a symmetric ki ki correlation matrix. Assuming a single j=1 34 93 FFQ in each of three trimesters (i.e., kQ = kQ = kQ = 1) and 35 i1 i2 i3 2 2 94 Q = Q = and the unique eigenvalues of 11i are σQ, σF , and 36 three 24-hour recalls at each of three trimesters (i.e., ki1 ki2 95 Q 37 k = 3), i(ρ) is a correlation matrix with the following struc- 1 2 2 4 4 2 2 2 2 2 1/2 96 i3 σ + σ ± (σ + σ − 2σ σ + 4nFnQρ σ σ ) ture: 2 Q F Q F Q F j Q F 38   97 39 for j = 1,...,J. It is straightforward to verify that the product 98 ρ113 00 40  I 0 ρ 1 0  of the eigenvalues is indeed the determinant by including the 99  3 2 3  2 41   missing replicate eigenvalues, that is, J − 1 repeats of σ and 100 =  00α313  Q 42 i(ρ)   , (9) 2 101  ρ113 00  σF . Now, through some straightforward algebra, it is easy to see 43   102 0 ρ213 0 I9 that the condition that will ensure the positivity of the eigenval- 44 103 00ρ313 ues is 45 − 104 |ρ | <(n n ) 1/2, j = 1,...,J. (14) 46 where the 1r is a column vector of 1’s of length r and the 0’s j F Q 105 47 are vectors with the appropriate implied dimensions. Note that The condition in (14) is necessary and sufficient for the positive 106 48 if we have replicate FFQs and 24-hour recalls greater than or definiteness of 11i. Furthermore, any prior distribution placed 107 49 equal to 2 at each time, then 1 (and analogously the 0’s) will no on ρ must have support (14). Note that neither models (11) nor 108 50 longer refer to vectors, but matrices of 1’s (or 0’s). So far, we (12) will be able to detect/estimate correlation parameters that 109 51 have placed no restrictions on ρ. We discuss three correlation are extreme in either direction. 110 52 models of interest and subsequent restrictions on 11i(σ , ρ) in 111 53 Section 3.2. 4. PRIOR AND POSTERIOR DISTRIBUTIONS 112 54 In this section we discuss the prior specification for all the 113 3.2 Correlation Models and Their Implied Constraints 55 parameters in the preceding models and the resulting posterior 114 56 In this section we focus on three correlation models of inter- distributions to be used in a Gibbs (Geman and Geman 1984) 115 57 est and derive the conditions on ρ that lead to the positive def- or MetropolisÐHastings (Metropolis and Ulam 1949; Metropo- 116 58 initeness of 11i(σ , ρ) and, therefore, ultimately lead to model lis, Rosenbluth, Rosenbluth, Teller, and Teller 1953; Hastings 117 59 identifiability. 1970) sampling algorithm. For now, assume that T1,...,Tm are 118 JASA jasa v.2004/12/09 Prn:26/10/2006; 10:42 F:jasaap05194r.tex; (Diana) p. 5

Johnson et al.: Structured Measurement Error in Nutritional Epidemiology 5

1 independent and identically distributed random vectors from the full posterior is given by (15). However, this distributional 60 2 the distribution FT with mean µT and variance T . Define assumption is difficult to check and a more flexible model is of- 61 3 Li(Yi|Ti, Zi; η) in (6) as the ith contribution to the condi- ten desirable. One method is to use a scale mixture of normals 62 4 tional likelihood given Ti arising from (1); for example, for a for Ti. Toward this goal, suppose that we start with a univariate 63 5 2 64 Bernoulli response and a logit link function Gaussian distribution with mean µT and variance σT . Then, we 6 65 | ; = + + + { − } may write 7 log Li(Yi Ti, Zi η) Yi(η0 ηT Ti ηZZi) log 1 θi(η) , 66 = + 8 Ti µT i, 67 where θi(η) was defined in (1). For simplicity, we assume nor- 9 mal prior distributions on the mean parameters η, the sys- ∼ 2 68 where i N(0,σT ). A straightforward extension of this model 10 tematic bias parameters γ in (7), and mean of the latent di- 2 69 is to assume i ∼ N(0,λiσ ) where the λi are subject-specific 11 | T 70 etary intake random variables µT from Li(Ti Zi) in (6), that latent variables and assumed to have Gamma distributions. 12 ∼ ∼ ∼ 71 is, η N(η0, V0,η) in (1), γ N(γ 0, V0,γ ) in (7), and µT A second method makes even fewer assumptions about the dis- 13 N(µ , V ) in L (T |Z ), and conjugate Wishart priors on 72 T,0 0,µT i i i tribution function FT , requiring only that FT be a proper dis- 14 −1 −1 | −1 ∼ 73 D in (7) and T in Li(Ti Zi) in (6), D Wq(νD, CD) tribution function. We employ the mixture of Dirichlet process 15 −1 74 and  ∼ W(ν , C ), respectively. Note that although it is (MDP) methodology based on a Polyá urn scheme (Antoniak 16 T T T 75 common to assume inverse Gamma priors for σ , this will not 1974; Escobar 1994; MacEachern 1994). In addition to using 17 76 necessarily imply a conjugate prior distribution because of the the Dirichlet process prior for parameters, the MDP methodol- 18 77 correlation parameters ρ in 11i. Because our constraints on the ogy has been successfully applied to other missing-data prob- 19 78 correlation parameters do not depend on σ , we may factor our lems, such as random effects in mixed models (Kleinman 20 79 joint prior π(σ, ρ) into the product π(σ )π(ρ). Because there and Ibrahim 1998; Brown and Ibrahim 2003). Less work has 21 80 are typically more replicate FFQs and 24-hour recalls than bio- been done using the MDP prior in measurement error models. 22 81 markers, we assume flat priors for σ 2 and σ 2 but an inverse Two exceptions are Mallick, Hoffman, and Carroll (2002) and 23 Q F 82 2 Müller and Roeder (1997), the latter of which describes an ap- 24 Gamma (IG) prior for σM. We define our prior on σ as 83 − − − + plication of the MDP prior methodology to case-control stud- 25 = 2 2 1/(bMσM) aM 1 84 π(σ ) σQ σF e σM , 26 ies. There are at least two differences worth noting between our 85 27 where aM, bM are specified hyperparameters. For ρ, we spec- application here and the one presented in Müller and Roeder 86 28 ify a uniform prior with support given by the parameter con- (1997). First, there is the fundamental difference in design be- 87 29 straints given in Section 3.2, that is, π(ρ) ∝ 1 with |ρj| < tween the retrospective and prospective study design, where the 88 −1/2 30 (nFnQ) , j = 1,...,J. Finally, we also assume Ti is nor- case-control design has the additional complexity derived from 89 31 mally distributed with mean µ and covariance matrix T . conditioning on the of cases in the sample, that is, 90 T = 32 Given π(σ) and π(ρ), and prior variances V0,η, V0,γ , and conditioning on i Yi 1 (cf. Breslow and Day 1980). Second, 91 our two applications are different in that our model incorporates 33 V0,µT , the joint posterior of the parameters is given by 92 multiple validation instruments with correlated errors. We ex- 34 | 93 p(η, γ , b, T, σ , ρ, D, µT , T Y, W) 35 pect that in a case-control study with multivariate instruments, 94 − − + − − − + − − ∝| | 1/2| 1|(νD m q 1)/2| 1|(νT m pT 1)/2 as in our application presented here, a combined model using 36 W D T 95 ideas presented here and in Müller and Roeder (1997) could be 37 m 96 1 − applied. In the following discussion, we describe how to apply 38 × exp log L (η) − (W − Hγ )  1(W − Hγ ) 97 i 2 i i i the mixture of Dirichlet process methodology to our measure- 39 i=1 98 40 ment error problem. 99 1 − 1 − 41 − 1 − − 1 − Assume the random vectors Ti are drawn from an arbitrary 100 biD bi (Ti µT ) T (Ti µT ) 42 2 2 distribution FT , where FT has a Dirichlet process prior, de- 101 ∼ ∼ 43 1 − 1 − noted by FT DP(ξF0), F0 N(µT , T ), and ξ is an unknown 102 − (η − η ) V 1 (η − η ) − (γ − γ ) V 1 (γ − γ ) 44 2 0 0,η 0 2 0 0,γ 0 scalar confidence parameter. Suppressing parameters other than 103 45 the error-prone covariate Ti, the full conditional distributions 104 1 −1 46 − (µ − µ ) V (µ − µ ) for {Ti, i = 1,...,m} are given by (see Kleinman and Ibrahim 105 2 T T,0 0,µT T T,0 47 1998) 106 48 107 1 −1 −1 1 −1 −1 − tr(C D ) − tr C  π(σ )π(ρ), (15) [Ti|{Ti, k = i}, Yi, Wi] 49 2 D 2 T T 108 50 ∼ q L (Y , W |T , Z )f (T |Z ) + δ(dT |T ), (16) 109 = { } 0 i i i i i 0 i i i k 51 where W diag 1,...,m . Additional details for the full k=i 110 52 conditional distributions are given in the Appendix. 111 | ≡ | | 53 where f0(Ti Zi) Li(Ti Zi), and Li(Yi, Wi Ti, Zi) was defined 112 4.1 Relaxing Distributional Assumptions on T | 54 i in (6). Recall that Li(Yi, Wi Ti, Zi) factors into the product 113 Li(Yi|Ti, Zi)Li(Wi|Ti, Zi) by the nondifferential measurement 55 In measurement error problems, Ti is a latent random vector 114 error assumption. Also, {q0, qk, k = 1,...,m} are unnormal- 56 with distribution FT . The Bayesian paradigm offers a conve- 115 57 ized selection probabilities where 116 nient method for handling latent variables and other incomplete 58 data problems by sampling the latent variable from its full con- 117 q ∝ ξ ··· L (Y , W |T , Z )f (T |Z ) (17) 59 ditional distribution. When FT is parametric (e.g., Gaussian), 0 i i i i i 0 i i 118 JASA jasa v.2004/12/09 Prn:26/10/2006; 10:42 F:jasaap05194r.tex; (Diana) p. 6

6 Journal of the American Statistical Association, ???? 0

∗ ∗ Table 1. Summary of Posterior Means and Credible Sets Over 1 and qk ∝ Li(Yi, Wi|T , Zi) and T are the unique atoms of 60 k k 500 Monte Carlo Datasets 2 f0(T|Z). Because (17) does not, in general, have a closed-form 61 3 solution, numerical integration is typically needed. However, it Parameter Truth Mean SD Coverage 62 4 63 would be possible to find a closed-form solution if, for example, γ Q 3.00 2.99 .18 .93 | 1 5 Li(Yi, Wi Ti, Zi) and F0 were both multivariate normal. At the Q −.75 −.75 .17 .95 64 ∗ γ2 6 next stage, we sample the unique vector T from its full con- Q .75 .75 .17 .93 65 ∗ j γ3 7 ditional distribution p(T |Dobs, rest), where Dobs denotes the Q 66 j γT .50 .52 .19 .95 8 observed data, and rest is short hand for all remaining parame- F 67 γ1 1.00 1.01 .21 .93 9 ters. For a fixed confidence parameter ξ, the full conditional γ F −.25 −.25 .09 .95 68 ∗ 2 10 T F 69 distribution of j is defined as γ3 .25 .24 .10 .94 11 γ F .90 .94 .24 .91 70 ∗| T 12 p(Tj Dobs, rest) D 1.25 1.26 .29 .95 71 11 13 D12 .25 .24 .27 .94 72 14 ∝ exp Yig(θi) + log(1 − θi) D22 2.25 2.31 .48 .95 73 2 1.00 1.04 .12 .95 15 i∈Sj σQ 74 σ 2 1.00 1.01 .06 .94 16 1 F 75 − − − −1 − − ρ1 .00 .00 .07 .97 17 (Wi Hiγ Ribi) i (Wi Hiγ Ribi) 76 2 ρ2 .25 .24 .08 .94 18 77 ρ3 .00 .00 .06 .95 19 − 1 ∗ − −1 ∗ − 78 (Tj µT ) T (Tj µT ) (18) NOTE: Mean represents the Monte Carlo average posterior mean, SD represents the Monte 20 2 Carlo standard deviation of posterior means, and Coverage indicates the proportion of datasets 79 21 ∗ in which a 95% credible set includes the true value. γ are systematic bias parameters in the 80 for Sj ={i|Ti = T }. measurement error model (MEM) and the remaining parameters are covariance parameters in 22 j 81 Define I∗ as the number of unique clusters of T∗, I∗ ≤ m. the MEM. For each dataset, we drew 2,000 samples from our joint posterior and treated the first 23 1,000 as burn-in. 82 Then, the confidence parameter ξ influences the tendency of the 24 83 Markov chain Monte Carlo (MCMC) algorithm to favor large 25 = 84 or small I∗, with ξ →∞implying large I∗. In this article, we visit j 1, 2, 3, we generate a vector of instruments that satis- 26 85 use initially a two-stage data augmentation algorithm to sample fies the models: 27 86 ξ (Tanner and Wong 1987) and then conduct sensitivity studies = Q + Q + Q + Q + Q + Q 28 Qij γ1 γ2 γ3 γT Ti bi Uij , 87 where ξ is fixed. Assume ξ has a Gamma prior with shape r 29 = F + F + F + F + F + F = 88 and rate λ, that is, ξ ∼ Gamma(r,λ)with Eξ = r/λ. At the first Fijl γ1 γ2 γ3 γT Ti bi Uijl, l 1, 2, 3, 30 89 stage, the augmentation algorithm samples a latent variable c Q 31 ∗ ∗ F = = = = 90 conditional on the current value of ξ and I , that is, [c|ξ,I ]∼ where corr(Uij , Uijl) ρj, l 1, 2, 3, and ρ2 .25 but ρ1 32 = 91 Beta(ξ + 1, I∗). Next, we sample the confidence parameter ξ ρ3 0. Finally, we independently simulate one unbiased bio- 33 2 = 92 from the mixture of two Gamma distributions given the latent marker Mi as Gaussian with mean Ti and variance σM .3. 34 93 variable c and I∗, that is, The specific values for the remaining parameters are given in 35 Table 1. 94 ∗ ∗ 36 [ξ|c, I ]∼πcGamma r + I ,λ− log(c) In conclusion, we have not proven formally that the parame- 95 37 ∗ ters in our measurement error model (2)Ð(3) are identified. At 96 + (1 − πc)Gamma r + I − 1,λ− log(c) , 38 the same time, our simulation studies suggest that one can esti- 97 ∗ ∗ 39 where πc = z/(z + 1) and z = (r + I − 1)/[I {λ − log(c)}]. mate all parameters in our measurement error model and draw 98 40 Some care is needed in choosing the prior parameters (r,λ) as correct inference from the posterior distribution using the cor- 99 41 this strongly influences the tendency of the algorithm to favor rect likelihood specification and noninformative priors distrib- 100 42 the base measure F0 or collapse on relatively few clusters. We utions. 101 43 use two priors, Gamma(1, 1) and Gamma(.01,.01), to check 102 6. ANALYSIS OF THE PIN STUDY DATA 44 the sensitivity of parameter estimates due to the choice of prior 103 45 on ξ. Both priors have mean 1, but the latter prior has variance For purposes of discussion, we split the data into two groups: 104 46 100 and, therefore, puts mass on both large and small values women who were included in a substudy and women not in 105 47 of ξ. the substudy. In addition to the single FFQ, main study par- 106 48 ticipants also provided serum folate measures, which were in- 107 5. SIMULATION STUDIES 49 corporated into the measurement error model in the analysis. 108 50 Here, we present a small simulation study to provide some Women in the substudy provided additional dietary informa- 109 51 empirical validity that the parameters in the complex measure- tion that other women were not requested to give, ideally pro- 110 52 ment error model (2)Ð(3) are estimable. The structure of our viding three FFQs and nine 24-hour recalls (1 FFQ and three 111 53 simulation study mimics the PIN study data, and, hence, we use 24-hour recalls per trimester for all three trimesters) during the 112 54 2 113 the simplified biomarker model (5) with σM known. The details pregnancy. For convenience, we split the ith contribution to the 55 of our simulation study follow. likelihood (6) into two pieces through the use of indicator func- 114 · 56 We begin by simulating Ti as iid standard Gaussian random tions, I( ). Suppressing the parameters arising from Ti,wehave 115 57 variates, i = 1,...,75, and independently generating subject- I(Si=1) 116 Li(Yi, Wi) ={Li,sub(Wi; γ , D, σ, ρ)} 58 specific biases bi from a bivariate Gaussian distribution with 117 = 59 { ; Q }I(Si 0) 118 mean 0 and covariance matrix D. Then, for each subject i and Li,nsub(Yi, Wi η,γ1 ,σF) , JASA jasa v.2004/12/09 Prn:26/10/2006; 10:42 F:jasaap05194r.tex; (Diana) p. 7

Johnson et al.: Structured Measurement Error in Nutritional Epidemiology 7

Table 3. Analysis Results From the PIN Study Making Parametric 1 where Si equals 1 if the ith women belongs to the substudy and 60 Assumptions About True Folate and a Common Correlation 2 61 0 otherwise. Therefore, the posteriors for γ and σ will have dif- Parameter Among Contemporaneous Instruments 3 ferent contributions from women in the substudy versus those 62 = 4 not in the substudy. Of course, the posterior for Ti depends on Parameter Naive Normal MDP (ξ .83) 63 5 64 substudy status as well as each step in the MDP implementa- Intercept (η0) −1.95(.08) −1.99(.08) −1.99(.08) 6 tion. Folate (ηT ) −.27(.11) −.46(.15) −.48(.14) 65 Height ( ) −.07 −.08 −.08 7 Our analysis uses 172 women from the substudy who had ηZ1 (.03) (.03) (.03) 66 BMI (η ).68 .65 .63 8 at least one of the nine 24-hour recalls and 1,679 women in Z2 (.30) (.31) (.31) 67 Energy (ηZ ) −.01(.15) −.01(.16) −.01(.15) 9 the main study. Due to the rigorous protocol of the substudy, 3 68 γ Q 5.90 5.97 5.97 10 women did not provide all 12 dietary measures. The 1,679 1 (.16) (.02) (.02) 69 γ Q .18 .11 .10 11 women in the main study were chosen to have complete data for 2 (.19) (.07) (.07) 70 γ Q .20 −.34 −.35 12 preterm birth, the three “error-free” covariables in the outcome 3 (.23) (.08) (.08) 71 γ Q −.13 .13 .13 13 model—height, body mass index (BMI), and dietary caloric in- T (.13) (.03) (.03) 72 γ F 5.27 5.31 5.34 14 1 (.09) (.07) (.06) 73 take (also called “energy” in our analyses below) as measured F γ2 .25(.12) .31(.09) .31(.09) 15 in the FFQ—and serum folate. The overall preterm birth rate F 74 γ3 .59(.13) .56(.09) .55(.09) 16 in the combined data was 12.7% (236/1,851). Two covariables, F − − − 75 γT .01(.08) .02(.05) .02(.04) 17 BMI and dietary caloric intake, were transformed using the nat- 76 NOTE: The “naive” analysis refers to two independent, complete-case analyses that replace the 18 77 ural logarithm. All three covariables were standardized by their true folate random variable with the serum folate biomarker. γ refers to systematic parameters 19 sample means and standard deviations (2.6, .24, .47, respec- in the measurement error model. Posterior means from 6,000 Gibbs samples with the first 4,500 78 treated as burn-in are reported with standard deviations reported in parentheses. 20 tively) and all are assumed to be error free. With additional in- 79 21 formation on the variability in the measurements in these vari- 80 22 ables, it would be possible to relax this assumption as well. This calculated by fitting two independent regression models with 81 23 investigation is, however, beyond the scope of this article and complete data: first, the logistic regression model in (1) with 82 24 beyond the data available to the authors. The sample variance the true folate intake replaced by the serum folate biomarker 83 ˆ 25 of the unbiased serum folate biomarker is .40. to obtain ηnaive, and second, the linear regression of substudy 84 26 Although nonsubstudy women were chosen to have complete FFQs and 24-hour recalls on serum folate biomarkers assuming 85 = 27 data, the same criterion was not used to select women in the model (2)Ð(3) under CM1 (ρj 0) and no subject-specific bi- 86 = 28 substudy because of frequent nonresponse. As shown in Ta- ases (D 0). In Table 3 we summarize the parameter estimates 87 29 ble 2, although many women provided one 24-hour recall at under CM1 for folate intake following a normal distribution and 88 = 30 each trimester (82%, 73%, and 67% at visits 1, 2, and 3, re- also our MDP model with ξ .83, reflecting little confidence in 89 31 spectively), fewer provided all three 24-hour recalls for any the normality assumption. Interestingly, the protective folate ef- 90 32 given trimester because of the rigorous protocol. Rather than re- fect from the naive analysis appears even stronger after adjust- 91 33 move these missing observations, we assumed the missing val- ing for measurement error. Also, there appears to be an inverse 92 34 ues were missing at random, then used our model and MCMC intra-individual relationship between the FFQ and 24-hour re- 93 call (D =−.22), which suggests that women who respond 35 methods to sample the missing values (cf. Little and Rubin 2002 12 94 conservatively on the FFQ tend to respond liberally on the 24- 36 for a review of Gibbs sampling for missing-data problems). 95 hour recall and vice versa. In the validation study, there is some 37 A similar strategy was employed for missing biomarkers (only 96 evidence that folate consumption, as measured by the 24-hour 38 25 biomarkers were observed from the 172 substudy women). 97 recalls, increases throughout pregnancy, though there appears to 39 We summarize the mean parameters from the outcome model 98 be no monotone increase when evaluating folate consumption 40 (η) and the systematic bias parameters (γ ) in Table 3 and vari- 99 as measured by the FFQ. Though the cost may be prohibitive, 41 ance parameters (σ 2, σ 2, D, ρ ) in Table 4. In Table 3 we in- 100 Q F j future validation studies in pregnancy might consider including 42 clude one column of “naive” parameter estimates, which are 101 serial biomarkers to help determine whether there are substan- 43 102 tial pregnancy-related dietary changes throughout the 9-month 44 Table 2. Sample Mean and Standard Deviation for Two Biased 103 45 Measures of Folate Intake—Dietary Folate (FFQ) and 24-Hour 104 Table 4. Summary of Variance Component Estimates (with posterior 46 Recall—From 172 Women in the PIN Substudy 105 standard deviations in parentheses) From MCMC Analyses Results 47 106 Instrument Trimester Rep N Mean SD From the PIN Study Using Normal Prior Distribution on 48 True Folate Concentration 107 FFQ 1 1 97 5.92 .46 49 108 2 1 134 6.00 .35 Parameter CM1∗ CM2 CM3 50 3 1 72 6.00 .42 109 D .54 .51 .46 51 24-hour recall 1 1 141 5.62 .62 11 (.09) (.09) (.08) 110 D12 −.21(.03) −.23(.04) −.24(.05) 52 1 2 104 5.61 .68 D .08 .10 .13 111 1 3 16 5.18 .43 22 (.01) (.02) (.03) 53 2 112 2 1 125 5.72 .55 σQ .43(.02) .43(.02) .43(.02) 54 2 113 2 2 95 5.84 .48 σF 1.19(.05) 1.19(.05) 1.18(.05) ∗ 55 2355.55.58ρ1 0 −.02(.02) −.11(.04) 114 ∗ 3 1 116 5.87 .56 ρ2 0 −.02(.02) .03(.05) 56 ∗ 115 3 2 87 5.91 .51 ρ3 0 −.02(.02) .02(.04) 57 3326.14.22 116 NOTE: Correlation models (CM1ÐCM3) refer to different assumptions about the correlation 58 117 NOTE: Both FFQ and 24-hour recall measurements are reported on the log-scale with the 24- among errors of contemporaneous instruments and are described in Section 3. ∗ 59 hour recall attempted three times per trimester and FFQ attempted once per trimester. Model 1 sets ρj = 0, j = 1, 2, 3. 118 JASA jasa v.2004/12/09 Prn:26/10/2006; 10:42 F:jasaap05194r.tex; (Diana) p. 8

8 Journal of the American Statistical Association, ???? 0

1 Table 5. Model Comparison Using Deviance Information Criterion as a function of the confidence parameter ξ and then fit a cu- 60 2 ¯ ∗ bic spline to the points to illustrate the average trend. So, our 61 Model ∆ pδ ∆ DIC 3 empirical findings suggest that the parameter estimates do not 62 CM1 + {D = 0} 8,841.5 2,893.3 5,948.2 11,734.8 4 63 CM1 8,836.9 3,053.2 5,783.7 11,890.1 change significantly once the number of unique clusters of T 5 CM2 8,834.0 3,052.2 5,781.9 11,886.3 gets beyond 120 or so, on average, as we see in Figure 1(a). In 64 6 CM3 8,836.6 3,055.1 5,781.5 11,891.7 Figures 1(b)Ð1(f), we graph the posterior means of five para- 65

7 {D = 0} implies D11 = D12 = D22 = 0, which implies no subject-specific biases (no heterogeneity) meters most significantly impacted by choosing ξ sufficiently 66 ¯ in the FFQ or 24-hour recall.  is the posterior mean of minus twice the log-likelihood, pδ is the 8 ∗ small. We note that all five parameters tend to decrease as ξ 67 effective number of parameters,  is minus twice the log-likelihood evaluated at the posterior = ¯ + 9 means of the parameters, and DIC  pδ . approaches 0. For example, the posterior mean of ηT is approx- 68 10 imately −.48 when ξ small but −.46 for large ξ, the latter of 69 11 which corresponds to the normality assumption in Table 3. At 70 period that would necessitate serial dietary assessments in stud- 12 the same time, we emphasize a word of caution when draw- 71 ies of nutrition during pregnancy. In addition to the parameters 13 ing conclusions from these figures as the variability in posterior 72 in Table 3, we also estimated the odds ratio for an “IQR in- 14 means cannot be ignored, particularly for small ξ. Moreover, 73 crease” in folate, that is, an increase from the 25th percentile to 15 the average change in posterior means from ξ ≈ .05 to ξ ≈ 50 74 16 the 75th percentile of the folate sample distribution. Hence, we Q 75 = may be extremely small, for example, less than .01 for γT and 17 estimated a 27% reduction in the odds [odds ratio (OR) .73, F 76 less than .005 for γT . 18 (.59Ð.91)] of preterm birth for an IQR increase in latent folate 77 19 given BMI, mother’s height, and energy level. Mother’s height 7. DISCUSSION 78 and BMI are important predictors of preterm birth both before 20 We have presented a Bayesian semiparametric method to es- 79 and after adjusting for measurement error in the folate variable. 21 timate parameters from a generalized linear measurement er- 80 The proposed measurement error model (2)Ð(4) is parame- 22 ror model with a structured measurement error model and ap- 81 terized richly, and our analyses did not find substantial dif- 23 plied the method to an analysis of the PIN data. Our first 82 ferences among parameter estimates in models of increasing 24 method assumes that true long-term folate is normally distrib- 83 complexity. Hence, it may be preferable to select the most 25 uted, whereas the second method using the mixture of Dirichlet 84 parsimonious model and eliminate unnecessary complexity in 26 process priors framework does not. We found that results based 85 the measurement error model. To facilitate model comparisons, 27 on a naive model that replaces true long-term folate by the ob- 86 we use the deviance information criterion (DIC; Spiegelhalter, 28 served serum folate to be somewhat conservative when com- 87 Best, Carlin and van der Linde 2002) and compare the corre- 29 pared to results based on our calibrated analysis. Furthermore, 88 30 lation models (CM1ÐCM3) in addition to one simpler model 89 +{ = } the results presented in Tables 3 and 4 appeared to be insensitive 31 “CM1 D 0 ,” which allows for no subject-specific biases to the normality assumption on folate intake when compared to 90 32 in the FFQ or 24-hour recall. Our results are displayed in Ta- 91 ¯ those from the MDP analysis for modest values of ξ. 33 ble 5 using the following additional notation:  is the posterior In general, there has been mixed evidence in the literature 92 34 mean of minus twice the log-likelihood, pδ is the effective num- 93 ∗ about whether the instruments under- or overestimate intake. In 35 ber of parameters,  is minus twice the log likelihood evalu- the past, FFQs have been shown both to underestimate intake 94 = ¯ + 36 ated at the posterior means of all parameters, and DIC  pδ. with respect to food records (Brown et al. 1996) and to over- 95 37 We immediately notice that pδ is strikingly large, again empha- estimate intake relative to food records (Suitor, Gardner, and 96 38 sizing the large number of unknown variables in our model. Willet 1989; Greeley, Storbakken, and Magel 1992; Forsythe 97 39 Recall, that each latent folate variable Ti is regarded as an un- and Gage 1987; Robinson, Godfrey, Osmond, Cox, and Barker 98 40 known variable in addition to all missing FFQs, 24-hour recalls, 1996; Erkkola et al. 2001). The PIN raw data show some ev- 99 41 and biomarkers in the validation substudy. Our model compari- idence of underestimation of dietary intake in FFQ versus 24- 100 42 son suggests that a model with no correlation among contempo- hour recall in the second and third trimesters, but this was not 101 43 raneous measurements and no subject-specific biases is the best significant using tests of means. Food records themselves tend 102 44 model. The effective number of parameters in this simple model to underestimate dietary intake compared to the true gold stan- 103 45 is approximately 160 parameters fewer than model CM1 due to dard, doubly labeled water (Goldberg et al. 1993), under certain 104 46 the latent subject-specific biases b1i, b2i, which are absent when weight-stable conditions. As one anonymous referee pointed 105 47 D = 0. However, when we believe that D = 0 and only focus out, even doubly labeled water may have additional measure- 106 48 on CM1ÐCM3, we find that CM2 is the best model among the ment error with it, although we expect the error associated with 107 49 three, which suggests that a model that considers nonzero cor- it to be much smaller than the error associated with either FFQ 108 50 relations among contemporaneous instruments is useful. or 24-hour recall. 109 51 In Tables 3 and 4 we presented parameter estimates that The PIN study data are unique among nutritional epidemi- 110 52 we claim are relatively insensitive to the confidence parame- ology studies of dietary intake for many reasons, one of which 111 53 ter ξ. To investigate further this claim, we ran more than 60 is the collection of an FFQ and a biomarker in the main study. 112 54 MDP analyses of the PIN study data with confidence parame- Typically, a study will collect the FFQ in the main study and 113 55 ters ranging from .01 to 10,000. We found that posterior means then conduct a validation substudy to determine the relation- 114 56 and standard deviations from an MDP analysis using confi- ship between the FFQ and the biomarker. As suggested by an 115 57 dence parameters greater than 50 did not change significantly. anonymous referee, it would be interesting to see how our an- 116 58 In Figure 1 we plot the number of unique clusters of T, that alytic results changed once we removed the biomarker in the 117 ∗ 59 is, I , and the posterior means of five folate-related parameters main study. We conducted these analyses, including the model 118 JASA jasa v.2004/12/09 Prn:26/10/2006; 10:42 F:jasaap05194r.tex; (Diana) p. 9

Johnson et al.: Structured Measurement Error in Nutritional Epidemiology 9

1 60 2 61 3 62 4 63 5 64 6 65 7 66 8 67 9 68 10 69 11 70 12 71 13 72 14 73 15 74 16 75 17 76 18 77 19 78 20 79 21 80 22 81 23 82 24 83 25 84 26 85 27 86 28 87 29 88 30 89 31 90 32 91 33 92 34 93 35 94 36 95 37 96 38 97 ∗ 39 Figure 1. Effect of Confidence Parameter ξ on Latent Folate Concentration Parameters in MDP Analysis of PIN Study Data. I is the number of 98 Q F unique values of T; µT and σ T are the mean and standard deviation of T, respectively; ηT is the folate effect on preterm birth; γ and γ are the 40 T T 99 systematic biases in the FFQ and 24-hour recall, respectively. 41 100 42 101 43 comparison in Table 5, and found that our results are sensitive Our analysis used serum folate biomarkers as unbiased mea- 102 44 to the removal of these data. First, the measurement error model sures of folate concentration. For the PIN study data, serum bio- 103 45 is too complex for the observed validation data in the PIN sub- markers were analyzed in one of four batches with over 80% 104 46 105 study. In addition to removing the correlation parameters (ρj) of the sample analyzed in the first batch (specifically, the sam- 47 and subject-specific biases (i.e., D = 0), a substantial simplifi- ple proportions were approximately .87, .05, .06, and .02, for 106 48 Q F 107 cation of the trimester-level means (αj , αj ) in (2)Ð(3) would batches 1Ð4, respectively). Siega-Riz et al. (2004) found that 49 be necessary. Second, the estimated FFQÐbiomarker associa- batch differences were nonnegligible and should be included 108 50 tion using only the PIN substudy is too weak, and, hence, after in analyses using the serum biomarkers. Our analyses used the 109 51 removing the biomarker in the main study, the posterior means first batch as the reference group and placed vague, normal prior 110 52 of η in the outcome model (1) look more like the “naive” es- distributions on the remaining three batch effects. This addi- 111 53 timates than calibrated or corrected estimates. Thus, the para- tional caveat adds nothing new to the overall measurement error 112 54 meter estimates presented earlier do require the biomarker in model (2)Ð(4) and was easily incorporated into our Bayesian 113 55 the main study in an analysis of the PIN study data. In general, framework in Section 4. Finally, while the serum folate bio- 114 56 however, we conjecture that all parameters in the measurement marker is believed to be free from systematic biases, it is not 115 57 error model (2)Ð(4) are estimable given sufficient data in the without drawbacks, which involve individual-specific factors 116 58 validation substudy. Therefore, our models and methods are not such as personal rates of metabolism. In an ideal , 117 59 limited to studies that collect biomarkers in the main study. one would use an objective biomarker, such as doubly labeled 118 JASA jasa v.2004/12/09 Prn:26/10/2006; 10:42 F:jasaap05194r.tex; (Diana) p. 10

10 Journal of the American Statistical Association, ???? 0

1 water, rather than serum folate. Doubly labeled water is a mea- REFERENCES 60 2 61 sure of energy expenditure and intake (under certain weight- Antoniak, C. E. (1974), “Mixtures of Dirichlet Processes With Applications to 3 stable conditions) and is often regarded among the “best” bio- Non-Parametric Problems,” The Annals of Statistics, 2, 1152Ð1174. 62 4 markers; however, it is not a true biomarker for any particular Block, G. (2001), “Invited Commentary: Another Perspective on Food 63 Frequency Questionnaires,” American Journal of Epidemiology, 154, 5 nutrient. 1103Ð1104. 64 6 Breslow, N. E., and Day, N. E. (1980), Statistical Methods in Cancer Research, 65 APPENDIX: FULL CONDITIONAL DISTRIBUTIONS 7 Lyon: International Agency for Research on Cancer. 66 Let D denote the observed data and let rest be short hand for Brown, E. R., and Ibrahim, J. G. (2003), “A Bayesian Semiparametric Joint 8 obs Hierarchical Model for Longitudinal and Survival Data,” Biometrics, 59, 67 9 all remaining parameters. Recall that Yi is a Bernoulli outcome with 221Ð228. 68 10 canonical link function so that Brown, J. E., Buzzard, I. M., Jacobs, D. R., Hannan, P. J., Kushi, L. H., Barosso, 69 | ; = + + + { − } G. M., and Schmid, L. A. (1996), “A Food Frequency Questionnaire Can 11 log Li(Yi Ti, Zi η) Yi(η0 ηT Ti ηZZi) log 1 θi(η) , Detect Pregnancy-Related Changes in Diet,” Journal of the American Dietetic 70 − 12 where θ(u) = 1/(1 + e u). Association, 96, 262Ð266. 71 13 Byers, T. (2001), “Food Frequency Dietary Assessment: How Bad Is Good 72 1. Sample [η|rest, D ] from p(η|rest, D ) using adaptive rejec- Enough?” American Journal of Epidemiology, 154, 1087Ð1088. 14 obs obs 73 tion sampling (Gilks and Wild 1992), where Carlin, B. P., and Louis, T. A. (1996), Bayes and Empirical Bayes Methods for 15 Data Analysis, London: Chapman & Hall/CRC Press. 74 16 p(η|rest, Dobs) Carroll, R. J. (2003), “Variances Are Not Always Nuisance Parameters,” Bio- 75 metrics, 59, 211Ð220. 17 m 76 1 −1 Carroll, R. J., Ruppert, D., Crainiceanu, C., Tosteson, T., and Karagas, M. 18 ∝ exp log L (η) − (η − η ) V (η − η ) . 77 i 2 0 0,η 0 (2004), “Nonlinear and Nonparametric Regression and Instrumental Vari- 19 i=1 ables,” Journal of the American Statistical Association, 99, 736Ð750. 78 [ | ] { ˆ + − × Carroll, R. J., Ruppert, D., and Stefanski, L. A. (1995), Measurement Error in 20 2. Sample γ rest, Dobs from N γ γ (I γ )γ 0, γ (H Nonlinear Models, Boca Raton, FL: Chapman & Hall/CRC Press. 79 −1 −1 −1 21 H) }, where γ = (H H + V0,γ ) H H and γˆ = (H H) × Erkkola, M., Karppinen, M., Javanainen, J., Rasanen, L., Knip, M., and 80 22 H (W − Rb). Virtanen, S. M. (2001), “Validity and Reproducibility of a Food Frequency 81 3. Let b = (b ,...,b ) .Sample[b|rest, D ] from N( bˆ, Questionnaire Finnish Women,” American Journal of Epidemiology, 154, 23 1 m obs b 466Ð476. 82 −1 −1 = + ⊗ −1 −1 ⊗ 24 W b(R R) ),where b (R R Im D ) R R, Escobar, M. D. (1994), “Estimating Normal Means With a Dirichlet Process 83 ˆ = −1 − Prior,” Journal of the American Statistical Association, 89, 268Ð277. 25 is the Kronecker product, and b (R R) R (W Hγ ). 84 [ −1| ]∼ + −1 + m × Forsythe and Gage (1994), ???. 26 4. Sample D rest, Dobs Wq(m νD, CD ( i=1 bi Fuller (1987), ???. 85 −1 27 bi) ). Geman, S., and Geman, D. (1984), “Stochastic Relaxation, Gibbs Distributions 86 and the Bayesian Restoration of Images,” IEEE Transaction on Pattern Analy- 28 5. Sample (σ , ρ) from p(σ , ρ|rest, Dobs),where 87 sis and Machine Intelligence, 6, 721Ð741. 29 88 p(σ , ρ|rest, Dobs) Gilks and Wild (1992), ???. 30 − Goldberg, G. R., Prentice, A. M., Coward, W. A., Davies, H. L., Murgatroyd, 89 ∝| | 1/2 31 W P. R., Wensing, C., Black, A. E., Harding, M., and Sawyer, M. (1993), “Lon- 90 gitudinal Assessment of Energy Expenditure in Pregnancy by the Doubly La- 32 1 − 91 × exp − (W − Hγ − Rb)  1(W − Hγ − Rb) beled Water Method,” American Journal of Clinical Nutrition, 57, 494Ð505. 33 2 W Greeley, S., Storbakken, L., and Magel, R. (1992), “Use of a Modified Food 92 34 Frequency Questionnaire During Pregnancy,” Journal of the American Col- 93 × π(σ)π(ρ). lege of Nutrition, 11, 728Ð734. 35 94 6. Sample the error-prone covariate [T |rest, D ] from p(T |rest, Hastings, W. K. (1970), “Monte Carlo Sampling Methods Using Markov 36 i obs i Chains and Their Applications,” Biometrika, 57, 97Ð109. 95 Dobs) for i = 1,...,m,where 37 Holcroft, C., and Spiegelman, D. (1999), “Design of Validation Studies for Es- 96 | timating the Odds Ratio of ExposureÐDisease Relationships When Exposure 38 p(Ti rest, Dobs) 97 Is Misclassified,” Biometrics, 55, 1193Ð1201. 39 Kaaks, R., Riboli, E., Esteve, J., Van Kappel, A., and Vab Staveren, W. (1994), 98 = exp log L (η) 40 i “Estimating the Accuracy of Dietary Questionnaire Assessments: Validation 99 in Terms of Structural Equation Models,” Statistics in Medicine, 13, 127Ð142. 41 Kipnis, V., Midthune, D., Freedman, L. S., Bingham, S., Schatzkin, A., Subar, 100 − 1 − − −1 − − 42 (Wi Hiγ Ribi) W (Wi Hiγ Ribi) A., and Carroll, R. J. (2001), “Empirical Evidence of Correlated Biases in 101 2 i American Journal of 43 Dietary Assessment Instruments and Its Implications,” 102 1 − Epidemiology, 153, 394Ð403. 44 − − 1 − 103 (Ti µT ) T (Ti µT ) . Kipnis, V., Subar, A., Midthune, D., Freedman, L. S., Ballard-Barbash, R., 45 2 Troiano, R., Bingham, S., Schoeller, D. A., Schatzkin, A., and Carroll, R. J. 104 (2003), “The Structure of Dietary Measurement Error: Results of the OPEN 46 7. Sample the missing FFQs and 24-hour recalls in the substudy 105 Biomarker Study,” American Journal of Epidemiology, 158, 14Ð21. 47 assuming the observations are missing at random, leading to 106 [ miss| ]∼ + Kleinman, K. P., and Ibrahim, J. G. (1998), “A Semi-Parametric Bayesian Ap- 48 Wi rest, Dobs N(Hiγ Ribi, Wi ). proach to Generalized Linear Mixed Models,” Statistics in Medicine, 17, 107 [ miss| ]∼ 2579Ð2596. 49 8. Sample the missing biomarkers from Mi rest, Dobs 108 + 2 Little, R. J. A., and Rubin, D. B. (1987), Statistical Analysis With Missing Data, 50 N(Ti bi3,σM), assuming missing observations are missing at New York: Wiley. 109 random. ??? 51 − (2002), . 110 9. Sample [µ |rest, D ]∼N( T¯ + (I − )µ , m 1 × MacEachern, S. N. (1994), “Estimating Normal Means With a Conjugate Style 52 T obs T T 0,T 111  = V m−1 + V −1 T¯ = Dirichlet Process Prior,” Communications in Statistics: Simulation and Com- 53 T T ),where T 0,T ( T 0,T ) where 112 −1 m putation, 23, 727Ð741. 54 m i=1 Ti. Mallick, B., Hoffman, F. O., and Carroll, R. J. (2002), “Semiparametric Regres- 113 [ | ]∼ + + m × 10. Sample T rest, Dobs WpT (m νT , CT ( i=1 Ti sion Modeling With Mixtures of Berkson and Classical Error, With Applica- 55 − 114 T ) 1). tion to Fallout From the Nevada Test Site,” Biometrics, 58, 13Ð20. 56 i McCullagh, P., and Nelder, J. A. (1983), Generalized Linear Models, London: 115 57 For the MDP implementation, substitute all of Section 4.1 for Chapman & Hall/CRC Press. 116 Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, 58 117 step 6. E. (1953), “Equation of State Calculations by Fast Computing Machines,” 59 [Received April 2005. Revised January 2006.] Journal of Chemical Physics, 21, 1087Ð1092. 118 JASA jasa v.2004/12/09 Prn:26/10/2006; 10:42 F:jasaap05194r.tex; (Diana) p. 11

Johnson et al.: Structured Measurement Error in Nutritional Epidemiology 11

1 Metropolis, N., and Ulam, S. (1949), “The Monte Carlo Method,” Journal of Main Study/Validation Study Designs,” Journal of the American Statistical 60 2 the American Statistical Association, 44, 335Ð341. Association, 95, 51Ð61. 61 Morrissey, M. J., and Spiegelman, D. (1999), “Matrix Methods for Estimat- 3 Spiegelman, D., Zhao, B., and Kim, J. (2004), “Correlated Errors in Biased 62 ing Odds Ratios With Misclassified Exposure Data: Extensions and Compar- Surrogates: Study Designs and Methods for Measurement Error Correction,” 4 isons,” Biometrics, 55, 338Ð344. Statistics in Medicine, ??, ??Ð??. 63 Müler and Roeder (1997), ???. 5 Stefanski, L. A., and Boos, D. D. (2002), “The Calculus of M-Estimation,” The 64 Robinson, S., Godfrey, K., Osmond, C., Cox, V., and Barker, D. (1996), “Eval- American Statistician, 56, 29Ð38. 6 uation of a Food Frequency Questionnaire Used to Assess Nutrient Intakes in 65 7 Pregnant Women,” European Journal of Clinical Nutrition, 50, 302Ð308. Stefanski, L. A., and Carroll, R. J. (1985), “Covariate Measurement Error in 66 Logistic Regression,” The Annals of Statistics, 13, 1335Ð1351. 8 Savitz, D. A., Dole, N., Terry, J. W., Zhou, H., and Thorp, J. M. (2001), “Smok- 67 ing and Pregnancy Outcome Among African-American and White Women in Subar, A. F., Kipnis, V., Troiano, R. P., Midthune, D., Schoeller, D. A., 9 Central North Carolina,” Epidemiology, 12, 636Ð642. Bingham, S., Sharbaugh, C. O., Trabulsi, J., Runswick, S., Ballard-Barbash, 68 10 Savitz, D. A., Dole, N., Williams, J., Thorp, J. M., McDonald, T., Carter, C. A., R., Sunshine, J., and Schatzkin, A. (2003), “Using Intake Biomarkers to Eval- 69 and Eucker, B. (1999), “Study Design and Determinants of Participation in uate the Extent of Dietary Misreporting in a Large Sample of Adults: The 11 70 an Epidemiologic Study of Preterm Delivery,” Paediatric and Perinatal Epi- OPEN Study,” American Journal of Epidemiology, 158, 1Ð13. 12 demiology, 13, 114Ð125. Suitor, C. J., Gardner, J., and Willett, W. C. (1989), “A Comparison of Food 71 13 Savitz, D. A., Henderson, L., Dole, N., Herring, A., Wilkins, D. G., Rollins, D., Frequency and Diet Recall Methods in Studies of Nutrient Intake of Low- 72 and Thorp, J. M. (2002), “Indicators of Cocaine Exposure and Preterm Birth,” Income Pregnant Women,” Journal of the American Dietary Association, 89, 14 73 Obstetrics and Gynecology, 99, 458Ð465. 1786Ð1794. 15 Searle, S. R. (1971), Linear Models, New York: Wiley. Tanner, M. A., and Wong, W. H. (1987), “The Calculation of Posterior Distri- 74 16 Siega-Riz, A. M., Savitz, D. A., Zeisel, S. H., Thorp, J. M., and Herring, A. H. butions by Data Augmentation” (with discussion), Journal of the American 75 (2004), “Second Trimester Folate Status and Preterm Birth,” American Jour- 17 Statistical Association, 82, 528Ð550. 76 nal of Obstetrics and Gynecology, 191, 1851Ð1857. U.S. Department of Agriculture Food and Nutrition Service (1994), WIC Di- 18 Spiegelhalter, D. J., Best, N. G., Carlin, B. P., van der Linde, A. (2002), etary Assessment Validation Study Executive Summary: Freeman, San Fran- 77 “Bayesian Measures of Model Complexity and Fit” (with discussion), Jour- 19 cisco: Sullivan and Company. 78 nal of the Royal Statistical Society, Ser. B, 64, 583Ð639. Willett, W. (1998), Nutritional Epidemiology, New York: Oxford University 20 Spiegelman, D., Carroll, R. J., and Kipnis, V. (2001), “Efficient Regression Cal- 79 Press. 21 ibration for Logistic Regression in Main Study/Internal Validation Study De- 80 (2001), “Invited Commentary: A Further Look at Dietary Question- 22 signs,” Statistics in Medicine, 29, 139Ð161. 81 Spiegelman, D., Rosner, B., and Logan, R. (2000), “Estimation and Inference naire Validation,” American Journal of Epidemiology, 154, 1100Ð1102. 23 for Binary Data With Covariate Measurement Error and Misclassification for Zucker and Spiegelman (2004), ???. 82 24 83 25 84 26 85 27 86 28 87 29 88 30 89 31 90 32 91 33 92 34 93 35 94 36 95 37 96 38 97 39 98 40 99 41 100 42 101 43 102 44 103 45 104 46 105 47 106 48 107 49 108 50 109 51 110 52 111 53 112 54 113 55 114 56 115 57 116 58 117 59 118