<<

A systematic error (caused by the investigator or the subjects) that causes an incorrect (over- or under-) estimate of an association.

Here, random error is small, but systematic errors have led to an inaccurate estimate

Also biased Precise, but biased

True Effect 0 1.0 10 Risk Ratio, Rate Ratio, or Odds Ratio Suppose we conducted a study multiple times in an identical way.

True Null value Random Error

Little random Random Error or systematic error and Bias

Little random error, but biased True Relationship Unbiased Sample

Diseased Not Diseased Not

20,000 200,000 200 2000

10,000 500,000 100 5000

Biased Estimates Diseased Not Diseased Not These samples are 175 2000 misleading about 100 2100 the true relationship 125 5000 in the population. 100 5000 There are 4 mechanisms that can produce this distortion when samples are taken. D+ D-

E+

E-

Selection Bias Information Bias When there is over- or under- When there are errors in sampling that distorts the picture. classification of exposure or outcome. Confounding When other factors affecting Random Error outcome are unevenly distributed between the comparison groups.

Occurs when selection, enrollment, or continued participation in a study is somehow dependent on the likelihood of having the exposure of interest or the outcome of interest. D+ D-

E+

E-

Selection Bias Selection bias can cause an overestimate or underestimate of 0.3 1.0 2 3 the association. Selection bias can occur in several ways:

1. Selection of a comparison group ("controls") that is not representative of the population that produced the cases in a case-control study. (Control selection bias) 2. Differential loss to follow up in a cohort study, such that the likelihood of being lost to follow up is related to outcome status and exposure status. (Loss to follow-up bias) 3. Refusal, non-response, or agreement to participate that is related to the exposure and disease (Self-selection bias) Selection Bias in a Case-Control Study

Selection bias can occur in a case-control study if controls are more (or less) likely to be selected if they have the exposure.

Do women of low SES have higher risk of cervical cancer?

MGH 100 200 Controls: Hospital Door-to-door survey Cases of neighborhood around the hospital during work day. 200 Controls: Door-to-door survey of neighborhood around the hospital during work day. Problems: 1. SE status of people living around the hospital may generally be different from that of the population that produced the cases. 2. The door-to-door method of selecting controls may tend to select people of lower SE status.

Case Control Low SES

High SES Selection Bias in a (More likely to select a Case-Control Study control with low SES.) Dis. Dis. Y N Y N Y 75 100 Y 75 120 Exp. Exp. 25 100 N N 25 80

True Control Selection Bias

OR=3.0 OR=2.0 Selection bias can occur in a case-control study when controls are more (or less) likely to be selected if they have the exposure.

Selection bias is not caused by differences in other potential risk factors (confounding). It is caused by selecting controls who are more (or less) likely to have the exposure of interest. Control Selection Bias - The “Would” Criterion

Are the controls a representative sample of the population that produced the cases?

If a control had developed cervical cancer, would she have been included in the case group? (“Would” criterion)

If the answer is “not necessarily,” then there is likely to be a problem with selection bias. 2,000,000 women > age 20 in MA, & about 200 cases of cervical cancer per year. If low SES were associated with cervical cancer with OR=3.0, MA would look like this. Entire Cancer Normal Population Cases Low SES 150 1,000,000 (median) Cases are referred to MGH from all over, so their SES distribution is same But, controls as the state’s, i.e. 3 to 1. selected from area Sample Cancer Normal around MGH may have Cases lower SES than MA. Low SES 75 120 OR = (75/25) = 2.0 High SES 25 80 (120/80) (Biased) Are mothers of children with hemi-facial microsomia more often diabetic? Cases are referred, but what if controls are selected from the Referred general pediatrics ward at MGH? Cases

The referral mechanism of controls might be very different from that of the cases with microsomia. Could mothers of controls be more or less likely to be diabetic than the cases (regardless of any association between diabetes and microsomia)?

How would you select controls for this study? Self- Selection Bias in a Case-Control Study

Selection bias can be introduced into case-control studies with low response or participation rates if the likelihood of responding or participating is related to both the exposure and outcome.

Example: A case-control study explored an association between family history of heart disease (exposure) and the presence of heart disease in subjects. Volunteers are recruited from an HMO. Subjects with heart disease may be more likely to participate if they have a family history of disease. Case Control Fam. Hx+

Fam. Hx- Self-Selection Bias in a Case-Control Study Diseased Diseased Y N Y N Y 300 200 Y 240 120 Exp. Exp. (80%) (60%) 200 300 N N 120 180 (60%) (60%)

True Self-Selection Bias

OR=2.25 OR=3.0

The best solution is to work toward high participation (>80%) in all groups. Selection Bias in a Retrospective Cohort Study

In a retrospective cohort study selection bias occurs if selection of exposed & non-exposed subjects is somehow related to the outcome.

What will be the result if the investigators are more likely to select an exposed person if they have the outcome of interest? Selection Bias in a Retrospective Cohort Study

Example: Investigating occupational exposure (an organic solvent) occurring 15-20 yrs. ago in a factory.

Exposed & unexposed subjects are enrolled based on employment records, but some records were lost.

Suppose there was a greater likelihood of retaining records of those who were exposed & got disease. Selection Bias in a Differential “referral” or Retrospective Cohort Study diagnosis of subjects Diseased Diseased Y N Y N Y 100 900 Y 99 720 Exp. Exp. 50 950 N N 40 760

20% of employee health records were lost or discarded, except in True “solvent” workers who reported illness (1% loss). RR=2.0 RR=2.42

Workers in the exposed group were more likely to be included if they had the outcome of interest. The “Healthy Worker” Effect Can be considered a form of selection bias because the general population controls have a higher probability of getting the outcome (death).

The general population is often used in occupational studies of mortality, General Population since data is readily available, and they vs. are mostly unexposed.

Rubber Mortality Workers Rates?

The main disadvantage is bias by the “healthy worker effect.” The employed work force (mostly healthy) generally has lower rates of mortality and disease than the general population (with healthy & ill people). Comparing Mortality In: Additional risk in unemployed Apparent True RR=1.1 RR=1.3

More likely to have the outcome than the employed work force that produced the cases.

Exposed Unexposed Overall Work Force Work Force Population Differential Retention (Loss to Follow Up) in Prospective Cohort Studies

Enrollment into a prospective cohort study will not be biased by the outcome, because the outcome has not occurred at enrollment.

However, prospective cohort studies can have selection bias if the exposure groups have differential retention of subjects with the outcomes of interest. This can cause either an over- or under- estimate of association

0.3 1.0 2 3 Selection Bias in a More ‘events’ Prospective Cohort Study lost in one exposure group TE TE Y N Y N Y 20 9980 Y 8 5980 OC OC 10 9990 N N 8 5990

True Loss to Follow Up Bias

RR=2.0 RR=1.0 Differential loss to follow up in a prospective cohort study on oral contraceptives (OC) & thromboembolism (TE). If OC were associated with TE with RR=2.0 (TRUTH), the 2x2 for all subjects would look like this: Without TE Normal Losses OC+ 20 9,980

OC- 10 9,990 If OC users There is 40% loss to follow up overall, with TE are more but a greater tendency to loose OC users likely to be lost than with TE results in a de facto selection. non-OC-users Final TE Normal with TE… Sample (Biased) OC+ 8 5,980 RR = (8/5988) = 1.0 OC- 8 5,990 (8/5998) Observation Bias (Information Bias) Biased measure of association due to incorrect categorization. Diseased Not Diseased

Exposed

The Correct Classification

Not Exposed Misclassification Bias

Subjects are misclassified with respect to their risk factor status or their outcome, i.e., errors in classification.

Non-differential Misclassification (random): If errors are about the same in both groups, it Errors Errors tends to minimize any true difference between the groups (bias toward the null). =

Differential Misclassification (non-random): If information is better in one group than Errors another, the association maybe over- or Errors underestimated. Non-Differential Misclassification Errors = Errors When errors in exposure or outcome status occur with approximately equal frequency in groups being compared. 1. Equally inaccurate memory of exposures in both groups. Example: Case-control study of heart disease and past activity: difficulty remembering your specific exercise frequency, duration, intensity over many years 2. Recording and coding errors in records and databases. Example: ICD-9 codes in hospital discharge summaries. 3. Using surrogate measures of exposure: Example: Using prescriptions for anti-hypertensive medications as an indication of treatment 4. Non-specific or broad definitions of exposure or outcome. Example: “Do you smoke?” to define exposure to tobacco smoke (vs. How much, how often, how long). Non-Differential Misclassification Errors = Errors

Recording or Coding Errors:

Example: When patients are discharged, the MD dictates a summary which is transcribed. Diagnoses and procedures noted on the summary are encoded (ICD-9 codes) and sent to the MA Health Data Consortium. 1. MDs don’t list all relevant diagnoses. 2. Coders assign incorrect codes (they aren’t MDs).

Errors occur in 25-30% of records. Non-Differential Misclassification

Errors = Errors

Example: A case-control study comparing CAD cases & controls for history of diabetes. Only half of the diabetics are correctly recorded as such in cases and controls.

True Relationship With Nondifferential Misclassification CAD Controls CAD Controls Diabetes 40 10 Diabetes 20 5 No diabetes 60 90 No diabetes 80 95

OR= 40x90 = 6.0 OR= 20x95 = 4.75 10x60 5x80 Non-Differential Misclassification Errors = Errors

Effect: With a dichotomous exposure (e.g., smoking vs. non-smoking), non-differential misclassification minimizes differences & causes an underestimate of effect, i.e. “bias toward the null.”

“Null” means no difference

0.3 0.5 1.0 2 3

Relative Risk Validation

When data from the Nurses’ Health Study was used to examine the association between obesity and heart disease, information on exposure (BMI) was obtained from self-reported weights on a questionnaire.

Could they have under-reported?

“Self-reported weights were validated in a subsample of 184 NHS participants living in the Boston, MA area and were highly correlated with actual measured weights (r = 0.96).”

Cho E, Manson JE, et al.: A Prospective Study of Obesity and Risk of Coronary Heart Disease Among Diabetic Women. Diabetes Care 25:1142–1148, 2002. Differential Misclassification Errors When errors in classification of exposure or Errors outcome are more frequent in one group.

• Differences in accurately remembering exposures (unequal) Example: Mothers of children with birth defects will remember the drugs they took during pregnancy better than mothers of normal children (maternal recall bias). • Interviewer or recorder bias. Example: Interview has subconscious belief about the hypothesis. • More accurate information in one of the groups. Example: Case-control study with cases from one facility and controls from another with differences in record keeping. Recall Bias (Differential) Errors Errors Note: If the groups have the same % of errors based on faulty memory, that’s non-differential misclassification. People with disease may remember exposures differently (more or less accurately) than those without disease. To Minimize: • Use a control group that has a different disease (unrelated to the disease under study). • Use questionnaires that are constructed to maximize accuracy and completeness. Ask specific questions. More accuracy means fewer differences. • For socially sensitive questions, such as alcohol and drug use or sexual behaviors, use a self-administered questionnaire instead of an interviewer. • If possible, assess past exposures from biomarkers or from pre-existing records. (Differential) Interviewer Bias Are you sure? (& recorder bias in record reviews) Think harder! Errors Errors Systematic differences in soliciting, recording, or interpreting information.

Minimized by: Interviewer • Blinding the interviewers if possible. • Using standardized questionnaires consisting of closed-end, easy to understand questions with appropriate response options. • Training all interviewers to adhere to the question and answer format strictly, with the same degree of questioning for both cases and controls. • Obtaining data or verifying data by examining pre- existing records (e.g., medical records or employment records) or assessing biomarkers. Effects of Bias

Non-DifferentialErrors Errors Misclassification

Errors Errors 0.3 1.0 2 3

Bias to Null 0.3 1.0 2 3 Differential Misclassification

Interviewer bias Recall Bias

These are differential and can bias toward or away from null. Misclassification of Outcome Can Also Introduce Bias

… but it usually has much less of an impact than misclassification of exposure, because:

1. Most of the problems with misclassification occur with respect to exposure status, not outcome. 2. There are a number of mechanisms by which misclassification of exposure can be introduced, but most outcomes are more definitive and there are few mechanisms that introduce errors in outcome. 3. Most outcomes are relatively uncommon. 4. Misclassification of outcome will generally bias toward the null, so if an association is demonstrated, if anything the true effect might be slightly greater. A study is conducted to see if serum cholesterol screening reduces the rate of heart attacks. 1,500 members of an HMO are offered the opportunity to participate in the screening program, & 600 volunteer to be screened. Their rates of MI are compared to those of randomly selected members who were not invited to be screened. After 3 years of follow-up rates of MI are found to be significantly less in the screened group.

Any concerns? 1. Nope 2. Differential misclassification 3. Interviewer bias 4. Recall bias 5. Selection bias Background Information on Abdominal Aortic Aneurysms

Abdominal Aortic Aneurysm (AAA) Diagnosis of Abdominal Aortic Aneurysms

Usually asymptomatic (surgery if > 5 cm.)  Discovered during routine abdominal exam by palpation, or  Seen on x-ray or ultrasound of abdomen (done for other reasons).

Known risk factors:  Age  Male gender  Smoking  Hypertension Costa & Robbs: Br. J. Surg. 1986

A vascular surgery (referral) service in So. Africa reviewed records of elective peripheral vascular surgery.

‘Other’: a variety of AAA Other readily apparent conditions. Black 60 1,242 1,302

White 260 620 880

320 1,862

Conclusion: OR = 0.12 (0.09 – 0.15) AAA uncommon in Blacks and more often due to infections. Costa & Robbs: Br. J. Surg. 1986

A vascular surgery (referral) service in So. Africa reviewed records of elective peripheral vascular surgery.

‘Other’: a variety of AAA Other readily apparent conditions. Black 60 1,242 1,302

White 260 620 880 Could there have been selection bias? 320 1,862 1. Yes 2. No “All black patients were screened for TB … and for syphilis.” Blacks Whites Atherosclerotic 34% 99% Inflammatory or Infectious 47% 0.5% Uncertain etiology 19% 0. 0% “AAA in blacks are more often due to infectious causes.”

A possibility of misclassification?

1. No 2. Yes, random. 3. Yes, differential. More Details About the Study

White Black Male:Female 2:1 1:1 Mean age 49.4 67.1 Admitted for Uncontrolled HBP 0% 17% Smoking 76% 48%

(Known risk factors…)  Age  Male gender  Smoking  Hypertension Environmental tobacco smoke and tobacco related mortality in a prospective study of Californians, 1960-98. James E. Enstrom, Geoffrey C. Kabat. BMJ 2003;326:1057

118,094 adults enrolled in an ACS cancer study in 1959 were followed until 1998. For “never smokers married to ever smokers” compared with “never smokers married to never smokers”:

RR in Males RR in Females Heart disease 0.94 (0.85 - 1.05) 1.01 (0.94 - 1.08) Lung cancer 0.75 (0.42 - 1.35) 0.99 (0.72 - 1.37) Chr. Pulm. Dis. 1.27 (0.78 - 2.08) 1.13 (0.80 - 1.58)

Conclusions: The results do not support a causal relation between environmental tobacco smoke and tobacco related mortality, although they do not rule out a small effect. Any potential selection bias in the ETS study?

1. I don’t think so. 2. Yes, there was a potential for it. Any potential information bias in the ETS study?

1. I don’t think so. 2. Non-differential misclassification. 3. Differential misclassification. 4. Interviewer bias. 5. Recall bias. Are Analgesic Drugs Associated with Increased Risk of Renal Failure?

Case-Control study by Perneger et al.

 Cases found in the renal dialysis registry for all of Maryland, Virginia, West Virginia, & D.C.

 Controls: random digit dial from the same geographic area.

Data: Non-blinded phone interview asking about lifetime analgesic use.

Case-Control Study: Analgesic Use & Renal Failure

OR 95% CI Acetaminophen 0-999 1.0 - 1000-4999 2.0 1.3-3.2 Conclusion: >5000 2.4 1.2-4.8 Acetaminophen & NSAIDS Aspirin increase risk of renal 0-999 1.0 - failure, but not aspirin. 1000-4999 0.5 0.4-0.7 >5000 1.0 0.6-1.8 NSAIDs 0-999 1.0 - 1000-4999 0.6 0.3-1.1 Could any have influenced the conclusion? >5000 8.8 1.1-71.8 This was not a terrible paper. However, is it possible that there were any potentially important sources of bias?

Think in a structured way. Think about how they collected the data. Consider0% each of the following: 0% • Could there have been selection bias? • Was loss to follow-up a problem? • Could interviewer bias have affected results? • Could recall bias have affected results? • Is it possible that the conclusions were influenced by non-differential misclassification? Avoiding Bias

Once it’s in the study, you can’t fix it.

 Select subjects by similar mechanism.  Maintain follow-up in prospective studies.  Blind interviewers.  For case-control studies, get subjects with equal tendency to remember.  Use clear, homogeneous definitions of disease & exposure.  Get accurate data collected in a similar way.  Confirm data; use error trapping during data entry.