Mathematics of Diagnostic Testing Steven Goodman, MD, PhD
What kind of tests are these? Evaluating Most generally, a procedure that provides Diagnostic tests information that places a probability on a subject having:
a disease (diagnostic test)
Steven Goodman, MD, PhD pre-clinical condition (screening) [email protected] high risk state (screening) clinical event (prediction)
What kind of tests are these? What kinds of tests are these?
Devices For us to care, this information must be
CT scans, ultrasound, thermography, FOBT, actionable. colonoscopy, oral rinse
Biomarkers
PSA, BMI, GTT, d-Dimer, pregnancy test
Genes
Huntington’s, BrCA, Gene expression
Clinical hx, signs and symptoms!
1 Mathematics of Diagnostic Testing Steven Goodman, MD, PhD
Differences from explanatory or etiologic epi We will care about….
Who is being tested Subject Test setting Where they are being tested Test Result Why they are being tested How they are being tested Action Consequences The consequences of testing….
to the one tested
to society
We will care about… We will care about…
Individual classification, not group differences! Individual classification, not group differences!
N=1000 N=70 t= 15 t= 4 P<0.0001 P=0.001 OR = 2.7 OR = 2.5 AUC = 0.76 N=40 P=0.15 AUC = 0.76
2 Mathematics of Diagnostic Testing Steven Goodman, MD, PhD
We will care about… Diagnostic study Absolute, not relative risks.! design
Courtesy of! Milo A. Puhan, MD, PhD JHU Dept. of Epidemiology
Key messages Learning objectives
Diagnostic research goes far beyond sensitivity and To describe the use of diagnostic specificity tests in practice To frame a diagnostic research Diagnostic test evaluation ranges from determination of test accuracy to impact on health outcomes and question costs To know about the phases of diagnostic The diagnostic research question must take the test evaluation context of the diagnostic work-up into consideration To know the different diagnostic study designs Diagnostic study design varies greatly across the phases of diagnostic test evaluation
3 Mathematics of Diagnostic Testing Steven Goodman, MD, PhD
Brain natriuretic peptide Use of BNP in practice
(BNP) for dx’ing CHF Stage of clinical Setting Available Purpose of management BNP information test Patient history Early diagnostic Primary Triage work-up care Physical exam ER
Diagnosis not yet + ER Add-on established Chest X-ray ECG Replacement
ER + Prognostic Cut-offs Diagnosis established Specialized echocardiography information between 15 care and 100 pg/ml Diagnostic Monitor disease Under treatment Primary or used specialized work-up + process care treatments
Toma et al Cardiovascular Medicine 2007;10:27–33
BNP for triage to avoid unneeded work-up BNP as add-on test Heart failure among differential diagnoses 100% Heart failure suggested after patient history, 100% after taking patient history and physical physical exam, ECG and Chest x-ray exam 50% 50%
0% 0%
100% Positive test result Negative test result 100% 100% Positive test result Negative test result 100%
50% 50% 50% 50% Heart failure work-up Consider other Heart failure work-up Consider other ± treatment diagnoses ± treatment diagnoses 0% 0% 0% 0%
Health outcomes Health outcomes
4 Mathematics of Diagnostic Testing Steven Goodman, MD, PhD
BNP used as replacement for echocardiography BNP for monitoring Heart failure diagnosis and treatment established Heart failure suggested after patient history, 100% physical exam, ECG and Chest x-ray 50%
0%
Not in therapeutic range In therapeutic range
100% Positive test result Negative test 100% result Adapt treatment Treatment unchanged
50% 50% Treatment Consider other diagnoses 0% 0%
Health outcomes
Diagnostic research questions Framing a diagnostic 100% Information before testing What is the prior research question 50% probability?
0% PICO
How accurate is the test at Population (setting, patient characteristics, stage of diagnostic work- what stage of diagnostic up) work-up? Positive test result Negative test result Index test (test of interest)
Control (Reference) test (determines disease status) Heart failure work-up ± treatment Outcome (test accuracy measures, management decisions, health Is decision Is the health outcome outcomes) making influenced? influenced? Health outcomes At acceptable costs?
5 Mathematics of Diagnostic Testing Steven Goodman, MD, PhD
Test phases for diagnostic tests Phases in the development Phase I Investigates whether test results are different for patients ± disease of a diagnostic test
Phase II Investigates whether patients with disease are more likely to have positive test results compared to patients without disease Phase III Investigates how well the a distinguishes between patients ± disease in patients suspected of having the disease Phase IV Investigates how informative a test is considering additional information available at the moment of testing.
Phase V Investigates whether using the test leads to better health outcomes
Phase VI Investigates whether using the test leads to better health outcomes at acceptable costs
Phases of diagnostic Phases of diagnostic test evaluation test evaluation
Phase I Healthy Phase III Patient with Patient without subjects heart failure heart failure
Patient with Test positive 85 250 PPV: 25% heart failure Test negative 15 650 NPV: 96% +LR: 3.0 Sens: 85% Spec: 72% -LR: 0.21
Patient with Healthy Phase II heart failure subjects Phase IV Test positive 90 20 PPV: 82% DOR: 36 Test negative 10 80 NPV: 89% +LR: 4.5 Patients suspected of ± Sens: 90% Spec: 80% -LR: 0.13 having disease Probability of heart failure? Age, gender, smoking and coronary heart disease status known
6 Mathematics of Diagnostic Testing Steven Goodman, MD, PhD
Phases of diagnostic Study designs for the evaluation of Phase V test evaluation + Treat and follow-up diagnostic tests Randomized trial Phase I Cross-sectional case-control study (pro- or retrospective) Patients suspected of - R Health outcome having heart failure Phase II Cross-sectional case-control study ± Treat and follow- (pro- or retrospective) or up Phase III Cross-sectional study of patients suspected of having disease Before-after study Systematic (pro- or retrospective) until to 2000 from 2001 review Patients suspected of Patients suspected of Phase Cross-sectional study of patients having heart failure having heart failure IV suspected of having disease
± Treat and follow- Phase Randomized trial or before-after study up IV Health Phase V Cost effectiveness outcome - + Treat and follow-up study
Diagnostic test framework Biases in dx testing
Measurement Ref. Test GOOD Spectrum bias Information bias Dx test reader Reader bias Imperfect gold- Verification Test-review standard Imperfect test Workup Diagnostic review Population Test Reference reading standard Context Selection Incorporation BAD (prevalence) Reading order Outlier/no result
7 Mathematics of Diagnostic Testing Steven Goodman, MD, PhD
Is Sensitivity a Property Diagnostic test framework of the Test Alone?
Ref. Test GOOD Dx test reader Reader
Population Test Reference standard BAD
Is Specificity a Property Spectrum Bias - Sensitivity of the Test Alone?
Sensitivity of a test changes as the composition of the case population changes, with different proportions of mild, moderate and severe cases.
Normal
Mild Mod. Severe
Cutoff Test Value
8 Mathematics of Diagnostic Testing Steven Goodman, MD, PhD
Spectrum Bias - Specificity Diagnostic test biases
Specificity of a test usually goes down when the non- Spectrum bias
diseased population includes more people with Spectrum of disease severity/subtype in cases varies from target symptoms that mimic the disease. population Spectrum of comorbidities in controls varies from target population Normal Information bias Interpretation of dx test is affected by gold standard (or patient correlates), or vice versa. With Sx that Mild Dz mimic dz [Imperfect] Measurement bias Bias introduced by imperfect dx or referent test reading.
Cutoff
Mathematics of Topics
Diagnostic testing Basic goals of diagnostic testing Threshold model
Link to decision analysis
Mathematics of diagnostic testing (review)
Bayes theorem
Sensitivity, Specificity, Odds, Likelihood ratio, PV
Tests with multiple cutpoints
ROC curves
Choosing optimal cutpoints
Distinction between binary vs. continuous LRs.
9 Mathematics of Diagnostic Testing Steven Goodman, MD, PhD
Goal of a diagnostic test? Threshold model of testing
Not!!! ...... just to distinguish between diseased and non-diseased patients. Go Home More Reference Treat To improve the overall health outcomes (or 0 100% reduce cost or suffering) in a group in whom it testing standard is applied. Probability
To do more good than harm. of disease
Basic quantitative concepts Sensitivity and Specificity
Sensitivity: Proportion of persons with disorder who will test positive
Specificity: Proportion of persons without disorder who will test negative
Likelihood Ratio: Change in disease odds from before to after test.
Predictive Value: Probability of disease (positive test) Specificity Sensitivity or non-disease (negative test) after test
Posterior probability: Probability of disease after test Test Value
10 Mathematics of Diagnostic Testing Steven Goodman, MD, PhD
The 2x2 Nightmare Diagnostic calculations
Test Test Test Positive! Negative! Test Positive Negative
Sensitivity= Disease True False Disease Sensitivity= present Positive Negative True Pos / Tot. Disease present 95 5 100 95/100=95%
Disease False True Specificity= Disease Specificity= Negative 80 absent Positive True Neg / Tot. Well absent 320 400 320/400=80%
PV(+) = 95 / 175 PV(-)= 320/325 Pos. Pred. Value Neg. Pred. Value = 54% = 98.5% PV(+) = PV(-) = Pr(Dis | - ) = 1.5% True Pos / Tot. Pos True Neg / Tot Neg
Bayes Theorem Semantic confusion
Positive predictive value = Pr(Dis | T+) = The posterior (post-test) probability of disease after a positive test, i.e. the probability of disease after a positive test. It is the probability that you are right if you classify someone as diseased after a positive test.
Negative predictive value a.k.a. Bayes factor = Pr(No Dis | T-) = 1- Pr(Dis | T-) is the complement of the posterior probability of disease after a negative test. It is the probability that you are right if you classify someone as non-diseased after a negative test. = 1- posterior probability after a negative test
11 Mathematics of Diagnostic Testing Steven Goodman, MD, PhD
Conceptual/Computational Model for From probability to odds, Analyzing Diagnostic Test Information and back
Odds of disease after Odds of disease after negative test positive test
0 Infinite Odds
Well LR(- ) LR(+ ) Has Disease
Odds of disease before testing, (prevalence, clinical suspicion
Bayes theorem Odds Practice
Probability world p=.1 Pre-test probability Post-test probability Odds = .1/(1-.1) = .1/.9 = 0.11 p=.5 Odds = .5/(1-.5) = 1 p=.9 Odds world Odds = .9/.1 = 9 p=.95 Odds = .95/.05 = 19 Pre-test odds Prior Odds x LR Post-test odds Range of odds is 0--> infinity Bayes theorem!
12 Mathematics of Diagnostic Testing Steven Goodman, MD, PhD
Higher Odds Likelihood Ratio Defn.
Likelihood Ratio (LR) is the chance of Odds =0.1 p= 0.1/(1+0.1) = 0.09 observing a given result when the patient has Odds = 0.5 disease, divided by the probability of observing p = 0.5/(1.5) = 0.33 that result if they are well. Odds = 5 p = 5/6 = 0.83 Odds = 24 p = 24/25 = 0.96
Range of p is 0--> 1
Likelihood Ratio Eqns. Diagnostic Tests acronyms
LR(+)=Sens/(1-Spec) LR(-) = (1-Sens)/Spec
SnNout
A highly Sensitive test, that when negative rules out the disorder. ( i.e. LR(-) small))!
SpPin
A highly specific test, that when positive rules in the disorder. (i.e. LR (+) large)!
13 Mathematics of Diagnostic Testing Steven Goodman, MD, PhD
Calculation of LRs Value of LRs
The degree to which dz probability is affected LR(+)=Sens/(1-Spec) by a test result. LR(-) = (1-Sens)/Spec Tell us exactly how to combine sensitivity and spec to inform us about dx. Sens 95% ---> LR (+) = 9.5 Sens 80% ---> LR (+) = 16 Spec 90% ---> LR (-) = 1/18 Are a measure of the “strength of the Spec 95% ---> LR (-) = 1/4.8 evidence” of a test result for the hypothesis that
Sens 65% ---> LR (+) = 1 the patient has dz vs. patient does not. Sens 90% ---> LR (+) = 180 Spec 35% ---> LR (-) = 1 Spec 99.5% ---> LR (-) = 1/10 Does not require a dichotomous test outcome; can be calculated for any test result
What is a “good” LR? Diagnostic calculations
3-5 “Fair” Prevalence = 33%, Sensitivity = 80%, Specificity = 90%
8-12 “Moderate” What is the probability of disease after a positive test?
12-20 “Strong” Prevalence odds = 33/66 = 0.5 LR(+) = 80/(100-90) = 8 > 20 V. Strong Odds of disease (+) = LR(+) x Prev. Odds = 8 x 0.5 = 4 Probability of disease (+) = 4/(1+4) = 80%
14 Mathematics of Diagnostic Testing Steven Goodman, MD, PhD
Computational Model for Analyzing Diagnostic Test Information Diagnostic calculations
Prevalence = 33%, Sensitivity = 80%, Specificity = Odds of disease after 90% positive test = 8 x 0.5 = 4 What is the probability of disease after a negative test? 0 Infinite Odds LR(-) = 20/90 = 0.22 Well LR(+ )=8 Has Prevalence odds = 33/67 = 0.5 Disease Odds of disease (-) = LR(-) x Prev. Odds = 0.22 x 0.5 = 0.11 Odds of disease before Probability of disease (-) = 0.11/1.11 = 10% testing = 0.5
Conceptual/Computational Model for Analyzing Diagnostic Test Information
Disease odds = 0.11 Disease prob = 0.10
0 Infinite Odds
Well LR= .22 Has Disease
Odds of disease = 0.5
15 Mathematics of Diagnostic Testing Steven Goodman, MD, PhD
LR(+) = Sens/(1-Spec) = 35.8/18.1 = 1.98
LR(-) = (1-Sens)/Spec = 64.2/81.9 = 0.78
16 Mathematics of Diagnostic Testing Steven Goodman, MD, PhD
Predictive value calculation
Pre-test probability = 405/(405+914) = 30.7%
Prior odds = 30.7/69.3 = 0.44
Post-test odds = 0.44 x LR(+) = 0.44 * 1.98 = 0.87
Post-test probability = 0.87/1.87 = 47%!
145/(145+165) = 47%
Bayes theorem
Prior prob. Probability world 0.47 0.307
Negative predictive value calculation Odds world Pre-test probability = 405/(405+914) = 30.7% Prior odds = 30.7/69.3 = 0.44
Post-test odds = 0.44 x LR(-) = 0.44 * 0.78 = 0.34 0.44 Prior odds x LR = 0.44 x 1.98 0.87 Post-test probability = 0.34/1.34 = 26% Bayes theorem! Predictive value (-) = 100-26 = 74%!
17 Mathematics of Diagnostic Testing Steven Goodman, MD, PhD
Should we be testing? Decisions about testing
Diagnostic setting Sensitivity = 95%, Specificity = 95%, Prevalence = 50%
Pre-test odds = 50/50 = 1 LR(+) = 95/5 = 19 Disease Odds(+) = 1x19 = 19 PV(+) = 19/20 = 95%
For every 19 true cases worked-up, one patient will have an unnecessary work-up. ! For this test to be worth using in this setting: ! 19 x benefit (true positive) > harm (false positive)!
Should we test? (Cont.) Effect of lowering prevalence
Screening setting: Positive test Sensitivity = 95%, Specificity = 95%, Prevalence = 1/500
Pre-test odds ≈ 1/500 LR(+) = 95/5 = 19 Disease Odds(+) = (1/500)x19 = 0.04 PV(+) = 0.04/1.04 = 4%
For every single true case worked-up, 24 patients will have an unnecessary work-up. ! For this test to be worth using in this setting: ! Benefit (true positive) > 24 x Harm (false positive)! Test Value
18 Mathematics of Diagnostic Testing Steven Goodman, MD, PhD
Decisions about screening Key Dx Test Questions
“Phase” of study? Retrospective vs. prospective? There must be a clear benefit that results Valid reference standard, done in all patients, or in a from screening, and this must outweigh the cost defined random sample of cases and controls. of testing (infrastructure, and harm to false Description of disease spectrum, comorbidities and positives and false negatives). other clinical characteristics in all patients.
Alternatives to screening/testing include: Clear test procedures, inc. training, interpretations Doing nothing - waiting for disease to become manifest.(e.g. and reproducibility. Done blind to reference standard? PSA controversy) Comparison with alternatives. Incremental value. Doing “next step” on everyone. (e.g. Strep throat) Screen only subsets at higher risk. Appropriate indices, w/variability.
How is it proposed to be used, and has this use been shown to impact clinical outomes?
Puzzler #1 Puzzler #2
PET scanning is being assessed as a Dx test for New radioisotope is being evaluated for use in schizophrenia. diagnosing MI’s. 20 schizophrenics have already had the test as part Test is applied to large ER population presenting of a pilot feasibility study. with angina. Gold standard is combination of ECG changes and 20 normal controls are chosen among general cardiac enzyme elevation. medicine clinic patients for comparison. Sensitivity = 96%, Specificity = 93% Results: PET scan positive in 17/20 schizophrenics, 9/20 controls. Conclusion: Useful screening test. Recommendation: Implement in all ERs. Conclusion: Useful test.
19 Mathematics of Diagnostic Testing Steven Goodman, MD, PhD
Liquid Crystal Thermography Liquid Crystal Thermography as a Screening as a Screening Test
Test For Deep-Vein Thrombosis, Sandler DA and Martin JF, Lancet, 1985; 1:665 - 7 Sensitivity (34/35) =97.1% Background: Poor correlation between clinical signs and Specificity (28/45) =62.2% objectively proven DVT. Venograms have significant expense and morbidity, making a universal screening test for DVT’s desirable. LR(+) = 2.5 Patients: “Patients with sx or signs suggesting unilateral, lower LR(-) = 1/21 limb, DVT were studied with a standardized clinical exam and LCT before X-ray venography.” PV(+) = 34/51 = 66.7% Diagnostic test: “Criteria have been described...” “If there is a PV(-) = 28/29 = 96.5% homogenous area of increased temperature in the symptomatic [n.b., 95% CI: 82% to 99.9%] limb...This was done w/o knowledge of either the side of the suspected thrombis or the result of the venogram.”
Liquid Crystal Thermography PET in Breast Cancer Staging as a Screening Test Smith, et al. Annals of Surgery, 1998
Conclusions: “We would propose that any patients in whom Goal: Find non-invasive means to dx spread of DVT is clinically suspected should undergo LCT. breast ca to local nodes Anticoagulants may be withheld from those with a negative Patients: 50 women w/breast cancer in Aberdeen thermogram. Since a positive thermogram still has a 1/3 Royal Infirmary Breast Clinic. chance of being a false-positive, patients with a positive LCT could undergo [technetium] venoscanning, which would give a Referred? Consecutive? correct diagnosis in 2/3rds of them. The remainder would then Test: PET scan, read by two observers, blinded to undergo the definitive test, X-ray venography. This diagnostic clinical information and surgical staging. approach...would reduce morbidity, time and cost; the chance of missing a DVT because of a false-negative thermogram on Reference test: Surgical exploration of axilla with initial screening would be 1.25%.” nodal pathology or FNA.
20 Mathematics of Diagnostic Testing Steven Goodman, MD, PhD
PET in Breast Cancer Staging PET in Breast Cancer Staging Smith, et al. Annals of Surgery, 1998 Smith, et al. Annals of Surgery, 1998
Results of pathology; TNM Stage Sensitivity Specificity Total nodes examined: 425 T1 100 (1/1) 100 (7/7) [CI 2.5% -100%] [82% -99%] Total # of patients with involved nodes: 21 T2 78 (7/9) 92 (11/12) Results of PET [40% -97%] [66% -99+%] Sensitivity: 90% (19/21) [95% CI 70% to 99%] T3 100 (5/5) 100 (4/4) Specificity: 97% (28/29) [95% CI 82% to 99.9%] [48% -100%] [40% - 100%]
PPV: 95% T4 100 (6/6) 100 (4/4)
NPV: 96% N1 and N2 91 (10/11) 100 (3/3)
Claims… Key Dx Test Questions
“The results from this study show that PET can Retrospective vs. prospective study. accurately (94%) and reliably (PV>94%) stage the Valid reference standard, done in all patients, i.e. not affected axilla…” by dx. test result.
“…a preponderance of patients with large or locally Description of disease spectrum, comorbidities and other advanced cancer.” clinical characteristics in all patients. Clear test procedures, inc. interpretations and reproducibility. “PET…had a sensitivity and specificity of 100% in Done blind to reference standard? seven patients with T1N0 disease. Comparison with alternatives.
“On the evidence obtained…PET could help Appropriate indices, w/variability. determine treatment options in postmenopausal How is it proposed to be used, and has this use been shown to women….PET may have the advantage of obviating impact clinical outomes? the need for additional staging investigations.”
21