DIAGNOSTICS TESTS: STATISTICAL EVALUATION to DETERMINE the SUITABILITY for DIAGNOSIS Michael J

DIAGNOSTICS TESTS: STATISTICAL EVALUATION TO DETERMINE THE SUITABILITY FOR DIAGNOSIS Michael J. Campbell and Jenny V. Freeman demonstrate how statistical methods should be used to evaluate the suitability of a diagnostic or screening test IN THIS TUTORIAL WE will examine negative predictive values. The proportion of those whose test result is how to evaluate a diagnostic test. sensitivity of the test is the proportion What negative who do not have the disease Initially we will consider the case when of people with the disease who are and is given by d/(c + d). there is a binary measure (two correctly identified as having the patients It should be noted that whilst categories: disease present / disease disease. This is given by a/(a + c) and is really want sensitivity and specificity are absent). We will then look at how to usually presented as a percentage. “to know is independent of prevalence, positive and define a suitable cut-off for an ordinal Suppose a test is 100 per cent negative predictive values are not. or continuous measurement scale and sensitive. Then the number of false ‘if I have a Sensitivity and specificity are we will finish with a short discussion negatives is zero and we would expect positive characteristics of the test and will be contrasting diagnostic tests with table 2. test, what valid for different populations with screening tests. From table 2 we can see that if a different prevalences. Thus we could When evaluating any diagnostic test patient has a negative test result we are the use them in populations with high one should have a definitive method for can be certain that the patient does not chances I prevalence such as elderly people as deciding whether the disease is have the disease. Sackett et al.1 refer to have the well as for low prevalence such as for present in order to see how well the this as SnNout, i.e. for a test with a young people. However, the PPV is a test performs. For example, to high sensitivity (Sn), a Negative result disease?’ characteristic of the population and so diagnose a cancer one could take a rules out the disease. will vary depending on the prevalence. biopsy, to diagnose depression one The specificity of a test is the To show this, suppose that in a could ask a psychiatrist to interview a proportion of people without the different population the prevalence of patient, and to diagnose a walking disease who are correctly identified as the disease is double that of the current problem one could video a patient and not having the disease. This is given by population (assume the prevalence is have it viewed by an expert. This is d/(b + d) and as with sensitivity is ” low, so that a and c are much smaller sometimes called the ‘gold standard’. usually presented as a percentage. than b and d and thus the results for Often the gold standard test is Now suppose a test is 100 per cent those without the disease are much the expensive and difficult to administer specific. Then the number of false same as the earlier table). The situation and thus a test is required that is positives is zero and we would expect is given in table 4. cheaper and easier to use. table 3. The sensitivity is now 2a/(2a + 2c) = From table 3 we can see that if a a/(a + c) as before. The specificity is BINARY SITUATION patient has a positive test we can be unchanged. However the positive Let us consider first the simple binary certain the patient has the disease. predictive value is given by 2a/(2a + b) situation in which both the gold Sackett et al. refer to this as SpPin, i.e. which is greater than the earlier value standard and the diagnostic test have for a test with a high specificity (Sp), a of a/(a + b). either a positive or negative outcome Positive test rules in the disease. (disease is present or absent). The LIKELIHOOD RATIO situation is best summarised by a 2 × 2 USEFUL MNEMONIC It is common to prefer a single table (table 1). In writing this table SeNsitivity = 1 − proportion false summary measure, and for a always put the gold standard on the top Negatives (n in each side) diagnostic test this is given by the and the results of the test on the side. likelihood ratio for a positive test (LR(+)) The numbers ‘a’ and ‘d’ are the SPecificity = 1 − proportion false as defined below: numbers of true positives and true Positives (p in each side) negatives, respectively. The number ‘b’ LR+ = is the number of false positives, What patients really want to know, because although the test is positive however, is ‘if I have a positive test, Probability of positive test given the disease = the patients don’t have the disease, what are the chances I have the Probability of positive test without disease and similarly ‘c’ is the number of false disease?’ This is given by the positive negatives. The prevalence of the predictive value (PPV) which is a/(a + b). Sensitivity = a (b + d) disease is the proportion of people One way of looking at the test is that 1-Specificity b(a + c) diagnosed by the gold standard and is before the test the chances of having given by (a + c)/n, although this is often the disease was (a + c)/n. After the test One reason why this is useful is that expressed as a percentage. they are either a/(a + b) or c/(c + d) it can be used to calculate the odds of In order to assess how good the test depending on whether the result was having the disease given a positive is we can calculate the sensitivity and positive or negative. result. The odds of an event are defined specificity, and the positive and The negative predictive value is the as the ratio of the probability of the 22 | DECEMBER 08 | SCOPE TUTORIAL | SCOPE event occurring to the probability of the on the result. Note that even with a event not occurring, i.e. p/(1 − p) where positive test the chances of having p is the probability of the event. Before GAD are still less than 1/3. TABLE 1 the test is conducted the probability of For the GAD example we find that having the disease is just the LR(+) = 0.86/(1 − 0.83) = 5.06 and the Diagnostic Gold standard prevalence, and the odds are simply {(a odds = 0.29/(1 − 0.29) = 0.41. test Positive Negative + c)/n}/{b + d)/n} = (a + c)/(b + d). The odds of having the disease after a ROC CURVES Positive a b a + b positive test are given by For a diagnostic test that produces results on a continuous or ordinal Negative c d c + d Odds of disease after positive test = measurement scale, a convenient cut- odds of disease before test × LR(+) = off level needs to be selected to Total a + c b + d n a/b calculate the sensitivity and specificity. For example the GAD2 questionnaire We can also get the odds of disease has possible values from 0 to 6. Why TABLE 1. Standard table for diagnostic tests. after a positive test directly from the should one choose the value of 3 as PPV since the odds of disease after a the cut-off? For a cut-off of 2 the positive test is PPV/(1 − PPV). sensitivity is 0.95, the specificity is 0.64 and the LR(+) is 2.6.2 One might argue TABLE 2 EXAMPLE that since a cut-off of 3 has a better A recent study by Kroenke et al. 2 LR(+) then one should use it. However, surveyed 965 people attending primary a cut-off of 2 gives a higher sensitivity, Diagnostic Gold standard care centres in the US. They were which might be important. It should be test Positive Negative interested in whether a family noted that a sensitivity of 100 per cent practitioner could diagnose is always achievable by stating that Positive a b a + b Generalised Anxiety Disorder (GAD) by everyone has the disease, but this is at asking two simple questions (the GAD2 the expense of a poor specificity Negative 0 d d questionnaire): ‘Over the last two (similarly a 100 per cent specificity can weeks, how often have you been be achieved by stating no-one has the Total a b + d n bothered by the following problems? disease. If the prevalence is low, this (1) Feeling nervous, anxious or on edge; tactic will have a high accuracy, i.e. it (2) not able to stop or control worrying’. will be right most of the time, but sadly TABLE 2. Results of a diagnostic test with 100 per cent sensitivity. The patients answered each question wrong for the important cases). A from ‘not at all’, ‘several days’, ‘more discussion of the different scenarios than half’ and ‘nearly every day’, for preferring a high specificity or scoring 0, 1, 2 or 3, respectively. The sensitivity is given in the next section. TABLE 3 scores for the two questions were A simple graphical device for summed and a score of over 3 was displaying the trade-offs between Gold standard considered positive. Two mental health sensitivity and specificity for tests on a Diagnostic test professionals then held structured continuous or ordinal scale is a Positive Negative psychiatric interviews with the subject receiver operating characteristics over the telephone to diagnose GAD. (ROC) curve (the unusual name Positive a 0 a The professionals were ignorant of the originates from electrical result of the GAD2 questionnaire.

Load more