Statistical Analysis of Molecular Epidemiology Studies Employing Case-Series”

Statistical Analysis of Molecular Epidemiology Studies Employing Case-Series”

Vol. 3, 173-175, March 1994 Cancer Epidemiology, Biomarkers & Prevention 173 Statistical Analysis of Molecular Epidemiology Studies Employing Case-Series” Cohn B. Begg2 and Zuo-feng Zhang tinct etiology for p53+ cancers in which smoking plays a Memorial Sloan-Kettering Cancer Center, Department of Epidemiology stronger role than in p53- cancers. and Biostatistics, New York, New York 10021 Each of these studies has involved a case-series design in which tumor samples from a series of cancer patients are evaluated. That is, the studies have not involved control Abstract groups of patients without cancer, as in conventional studies The case-series design is being used increasingly to of etiological factors. Most commonly, the relationship be- explore associations between environmental risk factors tween the risk factor (smoking) and the marker (p53) has and genetic markers. It is demonstrated that the odds been characterized by the odds ratio. In contrast, Taylor et ratio derived from a case-series study is the ratio of the a!. (6) conducted a study of the relationship between occu- relative risk for developing marker-positive disease to pational exposures and the activation ofthe rasoncogene in the relative risk for developing marker-negative disease. the etiology of acute myeloid leukemia, employing a con- This parameter is an empirical manifestation of ventional control group identified by random digit dialing. etiological heterogeneity with resped to the risk factor In this study, in addition to calculating unadjusted and ad- under study, and it can be used to construd a statistical justed odds ratios characterizing the correlation between the significance test. Presence of etiological heterogeneity, occupational factors and ras mutations, the investigators also as refleded in departures of this parameter from unity, assessed the relative risks of the occupational risk factors for could be a result of either the presence of distind causal incidence of ras+ and ras- tumors separately. mechanisms for the two categories of cases, or a In this article we will clarify the yield from the case- different strength of effect via the same mechanism. The series design in relation to the information that can be ob- case-series approach represents an efficient and valid tamed from the conventional case-control approach. Our approach for evaluating gene-environment associations, particular focus will be on the interpretation of the param- especially in referral centers where it is difficult to eters from the statistical models commonly employed, that identify a valid control group. is, Mantel-Haenszel techniques and logistic regression. Introdudion Methods Recent advances in technology for identifying genetic mu- We are interested primarily in the hypothesis that the two tations and their products have led to an interest in come- categories ofcases, distinguished by the presence on absence lating these biological markers with clinical and epidemio- of the tumor marker, are characterized by etiological het- logical factors with a view to better understanding the erogeneity. That is, we are testing the hypothesis that the natural history and the etiology of diseases. There is a con- strength ofeffectofone or more risk factors differs forthe two comitant need for research into the most appropriate study case groups. Such an effect could be because the causal designs for investigating these issues, and to determine the pathway differs, or it could merely reflect a different mag- relevant statistical methods for data analysis and intenpre- nitude ofeffect via the same mechanism. Empirical evidence tation. Our purpose in this article is to examine the relevant of such etiological heterogeneity with respect to one or more methodology for studying the relationship between cancer risk factors would provide strong justification for more de- biomankers and environmental risk factors for cancer. tailed investigations of the specific mechanisms of action. Our motivating example concerns the possible role of Case-Series Design. This study design consists of a series of smoking in causing bladder cancers that are characterized incident cases. Ideally, this would be a consecutive series of by p53 mutations. Several recent studies have identified con- population-based incident cases. If the ascertainment is not relations between smoking and p53 mutations in tumor complete, on if the study is, say, hospital based, we must samples from a variety oftumor types, including lung cancer assume that case selection for the two disease categories is (1 , 2), head and neck cancer (3), esophageal cancer (4), and not influenced differentially by the risk factors. bladder cancer (5). The results of these studies imply a dis- Suppose that Y is the risk factor of primary interest, as- sumed for simplicity to be binary, and that Wdenotes the set of remaining risk factors, where Y+ indicates presence of the risk factor and Y- indicates its absence. Let X+(X-) denote Received 8/i 0/93; revised 1 1/22/93; accepted 1 1/23/93. the presence (absence) ofthe tumor marker. Furthermore, let 1 This research was supported by National Institute of Environmental Health q(w) be the odds ratio relating Yand X, conditional on W. Services Grant ES-067i 8, and by National Cancer Institute Grant CA-47538 In the context of our bladder cancer example, V represents from the NIH, Department of Health and Human Services. smoking status, X represents the presence or absence of p53 2 To whom requests for reprints should be addressed, at Memorial Sloan- Kettering Cancer Center, Department of Epidemiology and Biostatistics, 1275 mutations in the tumor samples, and W represents the me- York Ave., New York, NY 10021. maining risk factors. Downloaded from cebp.aacrjournals.org on September 30, 2021. © 1994 American Association for Cancer Research. 174 Case-Series Analysis We can evaluate q,(W) using standard statistical meth- Table 1 Freque ncies by p53 mutatio ns and smoki ng status ods such as the Mantel-Haenszel procedure or logistic me- Cases gression. A test of the hypothesis that i/s(W) = 1 is a test of Smoking Status ---- ---- ---- Controls the hypothesis that the strength of Yas a risk factor is different p53+ p53- for the two case groups (e.g., p53+ and p53-). Smoker 34 43 81 Case-Control Design. A more conventional approach is to Nonsmoker 1 0 21 64 use a case-control design, in which a control series is as- sembled in addition to the preceding case-series. We will assume that the control group is sampled randomly from the source population of the cases, as opposed to using a (relative risk) for the marker-positive cases is matched design. In this setting the conventional analytic strategy is to use polychotomous logistic regression (7). In P(Y+ [X+,Z+, t4’)/P(Y- IX+,Z+, %4’ O() = this model the relationships between marker-positive cases FY+ lZ-, W)/P(Y- [Z-, W) and controls, and between marker-negative cases and con- Correspondingly, trols, are both modeled concurrently using two separate (lo- gistic) regression functions. Let f3 be the coefficient of the FY+ [X-,Z+, W)/FY- [X-,Z+, t4’) primary risk factor in the logistic regression relating marker- 2( Y+z-,/Y-Z-, positive cases and controls, and let j32 be the corresponding parameter relating marker-negative cases and controls. If It is clean by inspection that 4i(W) = 01(W)/02(W). The de- there are no interactions between Y and W, then f3 is the nominators of 01(W) and 02(W) are the same because the conditional log odds ratio of the risk factor on marker- same control population is relevant for each case series. This positive disease, and p2 the conditional log odds ratio of is true in principle, although in practice, if one were to use the risk factor on marker-negative disease. To test the hy- separate control groups, as would be the case for a pair- pothesis that the two diseases possess etiological heteroge- matched design, the estimates of tji( W) from the two methods neity with respect to the risk factor, one can test the hy- would not be equivalent even when we do not condition on pothesis that f3 = 132’ that is, that the two odds ratios are the remaining risk factor, W. equal. Such a comparison can be accomplished by using, for example, a likelihood ratio test. Quantitative evidence of the Example degree of departure from the hypothesis can be character- ized by the difference in these coefficients, f3 - 132. This is We illustratethe method usingdatafrom ourown case-series the logarithm ofthe ratio ofthe two adjusted relative risks of study of the relationship between smoking and p53 muta- the risk factor, that is, the relative risk with respect to marker- tions in patients with bladder cancer, treated at Memorial positive and marker-negative cases, respectively. Specifi- Sloan-Kettering Cancer Center (8). The raw frequencies are cally, if O,(W) and 2(”) are the respective relative risks, contained in Table 1 . For illustrative purposes we have em- then ployed a control group consisting of patients with other can- cens believed to be unrelated to smoking, although this log(O,(14/)/02(WJ) = j3, - would not be an ideal control group for a case-control study in general. Relationship of Case-Series and Case-Control Approaches. The odds ratios and confidence intervals are presented The odds ratio derived from the case-series study, iJi( W), is in Table 2. The unadjusted odds ratios are calculated directly the same parameter as the ratio of relative risks obtained from the cross-products, as usual. That is, tji = (34 X 21 )/(1 0 from the polychotomous model, that is, X43),O1 =(34X64)/(10X81),O2=(43X64)/(21 X81). tji(W) = The equivalence of tj and O1/O2is evident by inspection. Calculation ofadjusted odds ratios involves the use of simple However, the fact that different statistical models are used in logistic regression for the case-series study, and polychoto- the two approaches means that different (consistent) esti- mous logistic regression for the case-control study.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    4 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us