<<

M1 - PopMed OBJECTIVES

Bias and Confounding • Describe the key types of

• Identify sources of bias in study design, data collection and analysis Saba Masho, MD, MPH, DrPH Department of and Community Health • Identify confounders and methods for [email protected] controlling confounders

1 2

Bias Types: • Bias is any systematic error in an epidemiologic study that results in an 1. Selection incorrect estimate of association between risk factor and outcome. 2. Information/measurement • It is introduced in the design of the study including, data collection, analysis, 3. Confounding interpretation, publication or review of data.

• It is impossible to correct for bias during the analysis and may also be hard to evaluate. Thus, studies need to be carefully designed. 3 4

SELECTION BIAS

• A systematic error that arises in the process of selecting the study populations • May also occur when membership in one group differs systematically from the general population or • May occur when different criteria is used to select control group, called membership bias cases/controls or exposed/non exposed – E.g. selecting cases from inner hospitals and controls from – E.g. A study in uterine cancer excluding patients with sub-urban hospitals hysterectomy from cases but not from control • The differences in hospital admission between cases • Detection bias: occurs when the risk factor under and controls may conceal an association in a study, investigation requires thorough diagnostic this is called “Berkson’s bias” or “admission rate assessment. bias” – E.g. Benign breast tumor usually undergo detailed diagnostic assessment. Women with benign lesion are more likely to be diagnosed with the illness … due to the assessment 5 6 SELECTION BIAS SELECTION BIAS

• Migration bias: study subjects in one subgroup of a • bias: when the selected for the cohort leave their original group, dropping out of the study is systematically different from the population study or moving to one of the other groups they are designed to represent – Loss to follow-up bias – most common in cohort and experimental studies – patients are lost from follow up • Lead –time bias: lead-time is a period between the detection of medical condition by and – Crossover – move of patients from one group to when the condition is diagnosed when it is another in the cohort during their follow-up symptomatic. Screening detects the problem before it • Susceptibility bias or assembly bias: groups being is symptomatic therefore, - better survival than those compared are not equally susceptible to the outcome entering when diagnosed of interest – E.g. Cases are significantly older than controls in cancer research – older people are more likely to have cancer 7 8

INFORMATION BIAS SELECTION BIAS

• Also called measurement or observation bias • : occurs when the disease of interest has slow progression. If such types of conditions are found through screening versus when • Systematic differences in the way exposure or symptomatic their prognosis is different affecting outcome data are obtained. mortality. • Measurement bias: methods of measurement are dissimilar in different groups of patients. • Compliance bias: common in experimental studies – one group is less compliant than the other • : difference in remembering systematically affecting the result of the study. and reporting

• Interviewer bias: systematic difference in soliciting,

9 recording, or interpreting of information 10

INFORMATION BIAS CONTROL OF BIAS

• Misclassification: occurs when subjects are 1) Study design mistakenly categorized by exposure or disease • Choice of the study population: e.g. hospital controls status. for hospital based studies • E.g. smoking in pregnancy – because patients deny smoking – they may be misclassified as non smokers. • Choosing well defined and stable population. – Random (non-differential): when the factor(s) • Choosing population that might be interested in the are equally distributed in the study groups – no study. effect • Clear and uniform criteria for selection – Non-random (differential): unequally distributed between the study groups – under or over- estimate

11 12 CONTROL OF BIAS CONTROL OF BIAS

2) Methods of data collection • Include several items directed to the same question • Data collection instruments: leave less on the questionnaire. room for interpretation by the subjects-providing clear • Record time to find out if interviewers are spending options. Select best method for data collection – e.g. more time on some subjects over the others. high cholesterol – use lab data instead of interview. • Reliability scores – subjects with unreliable • Blinding - chart abstractors, interviewers or subjects information may be excluded. • Provide standardized training and 3) Sources of exposure and disease information written for data collection. • Use of pre-existing record is the most unbiased because it was recorded before the onset of the study. However, records may not have complete information on all factors. • Having multiple sources of data to verify accuracy is 13 14 beneficial.

Remarks in Evaluation of the Role of Bias Confounding

• To maximum possible, potential sources of bias • Confounders are factors that are associated both to should be eliminated through rigorous design and the risk factor and outcome of interest. These factors conduct of the study can be independently risk factors not an intermediate • Existing and potential effects of the biases link. has to be discussed in the report • Investigators may want to build strategies in the methodologies to assess the extent of the effect of the potential bias • In evaluating the role of bias, the type of study, its design and nature of the results has to be examined

15 16

Example: sex is a confounder in the association between Confounding smoking and breast cancer.

• Can lead to an over or underestimate OR = 54*148/62*156 = 0.826

• Existence of confounders can be determined by comparing the crude estimate of association with controlled estimate –

• The effect of confounding has to be considered at the design of the study by collecting the necessary information based on existing knowledge and utilizing previous studies OR = 1.55 for males OR = 1.33 for females

17 18 Example: alcohol consumption and myocardial infarction (MI).

• Mantel-Haenszel overall (to adjust for •OR = 2.26 confounding): MH-OR (overall)

• = [(4*140)/210] +[(50*8/210)] / [(60*6/210) + (2*150/210) ]= 1.454

OR = 1.00 for nonsmokers OR = 1.00 for smokers 19 20

Method of Controlling Confounders Randomization

Can be done in the: • In large enough sample size all potential confounding known and unknown are evenly 1. Study design distributed between the study groups. 1. Randomization, 2. Restriction and • Is unique because of its ability to control for 3. Matching the effect of unknown confounders 2. Analysis • Imbalance because of small sample size or 1. Stratification chance can be controlled during analysis 2. Multivariate analysis

21 22

Matching Restriction

• Restrict the study to a specified category or • Done in the design and analysis phase categories. – E.g. To control for the effect of age and sex in lung – E.g. If gender is a confounding variable, the study cancer and smoking, for each case of lung cancer can be restricted to one gender. with a particular age and sex, one or more controls • Creates a homogenous study population will be selected of the same age and gender • Is easy, convenient and inexpensive method • Matching is mostly used in case control studies Limitations of restriction include: – Reduction in the number of subjects • Maximize the sample size in respect to the potential confounder – Residual confounding – if the criteria is not strict – Limits the evaluation to the group - affecting generalizability 23 24 Matching as a of Controlling for Confounding Odds Ratio = b/c

• Matching variables should meet the following criteria:

– Related both to exposure and disease, so that they are potential confounders – Fairly strongly related to disease so that matching will have the potential for increasing the precision of estimation. – Readily available at little or no cost to the investigator

• Common matching variables • gender • age group •race

25 26

Matching Matching

Limitations: • Considering the benefit and limitation, matching – Difficult, expensive, and time-consuming to find should only be considered after careful analysis of its comparison with the set of characteristics limitations and benefits

– Inability of evaluating factors that are used for • A matched data will lead to similarity between cases matching and controls with respect to the exposure. If this is not appropriately analyzed, it will provide an underestimate in the true association. Thus, – Does not control for other factors other than the matching in the design by itself will not control for the matching factors effect.

– Reduces sample size for analysis because of • Therefore, matching should be accompanied by discordant pairs stratification or multivariate analysis.

27 28

Example: Example:

• In a case-control study of menopausal estrogen use and • There were 39 pairs in which both the case and her , 317 cases of endometrial cancer matching control used estrogens, and 150 pairs in which and controls (with other forms of cancers) were matched neither the case nor her matching control used on the basis of age at diagnosis (within 2 years) and estrogens. There were 113 pairs in which the case was year of diagnosis (within 2 years). exposed but not the control, and 15 in which the control reported estrogen use while the case did not. What is the estimate of the of endometrial cancer for those who used postmenopausal estrogens?

29 30 Controls

Cases Estrogen No Estrogen

OR = 113/15 = 7.53 Estrogen 39 113 152

No 15 150 165 Estrogen

54 263 317 31 32

Stratified Analysis • Permits the evaluation and allows clear • Data will be stratified by the confounding variable and estimates will be calculated for each stratum. understanding of confounders

– For example, if age (young and old) is a • Used for explanatory and predicting purposes confounding in the association of hypertension and exercise – separate estimates will be calculated for young, old and pooled (everybody) and the • Allows to control for many factors estimates will be compared. simultaneously.

• There is a statistical method to combine the stratum • Allows efficient estimation of measures of estimates known as Mantel Haenszel (MH) estimate - produce a single un-confounded estimate). association while controlling for a number of confounding factors. 33 34

Additional Resources Exercise Please identify and discuss the problems of the following studies: • http://stroke.ahajournals.org/cgi/reprint/22/ 1. A case control study was conducted to examine the association between reserpine (antihypertensive drug) 7/938.pdf and breast cancer. Women with newly diagnosed breast cancer who underwent mastectomy were • http://www.epidemiology.ch/history/papers identified from hospital record as cases. The control group included women who undergone elective /SPM%2047(3)%20156-161%20Vineis- surgery. In selecting the controls, the investigators 1.pdf excluded cases of cholecystectomy, thyrroidectomy, surgery for renal disease, and cardiac operations. At http://www2.sph.unc.edu/courses/eric/note the time the study was done (1970’s), reserpine was used to treat these conditions. They excluded these books/dec99-Selection_Bias.pdf patients thinking that the of reserpine use • http://www.fpnotebook.com/PRE12.htm will be artificially higher in the control group.

35 36 Exercise Exercise

2. In the 1960’s there was some concern that oral 3.The association between a rare form of contraceptives (OC) may cause thromboembolism. congenital abnormality and exposure to Because it is a deadly illness, physicians often admitted women if they are OC users and suspect chemicals was investigated using a case control that they have the disease. At around that time study. Cases were mothers of newly born babies investigators conducted a hospital based case with the congenital abnormality and controls control study investigating the association between included mothers with normal babies. The OC use and thromboembolism. In this study, cases were women with thromboembolism and controls interviewer asked exposure status to selected were women admitted for other reason. The household cleaning chemicals. The finding researcher found that thromboembolism patients shows that use of household chemicals were were more likely to be OC users. associated with the abnormality.

37 38

Exercise Exercise

4.A five year comparing drug A and drug B was conducted among HIV patients. At 5.The majority of the study population was the beginning of the study patients were homeless and iv drug user. Due to social issues randomized to take either drug A or B. 30% of the study participant from drug A and However, due to side-effects with drug B about about 5% of patients did not complete the 20% of the patients decided to stop drug B and necessary appointments. they were given drug A.

39 40