Evaluation of Binary Diagnostic Tests Accuracy for Medical Researches
Total Page:16
File Type:pdf, Size:1020Kb
Turk J Biochem 2021; 46(2): 103–113 Review Article Jale Karakaya* Evaluation of binary diagnostic tests accuracy for medical researches [Tıbbi araştırmalar için iki sonuçlu tanı testlerinin doğruluğunun değerlendirilmesi] https://doi.org/10.1515/tjb-2020-0337 In this review, some basic definitions, performance mea- Received July 8, 2020; accepted November 6, 2020; sures that can be used only in evaluating the diagnostic published online November 30, 2020 tests with binary results and their confidence intervals are mentioned. Having many different measures provides Abstract different interpretations in the evaluation of test perfor- mance. Accurately predicting the performance of a diag- Objectives: The aim of this study is to introduce the fea- nostic test depends on many factors. These factors can be tures of diagnostic tests. In addition, it will be demon- study design, criteria of participant selection, sample size strated which performance measures can be used for calculation, test methods etc. There are guidelines that diagnostic tests with binary results, the properties of these ensure that all information regarding the conditions under measures and how to interpret them. which the study was conducted is in report, in terms of Materials and Methods: The evaluation of the diagnostic such factors. Therefore, these guidelines are recommended test performance measures may differ depending on for use of the checklist by many publishers. whether the test result is numerical or binary. When the diagnostic test result is continuous numerical data, ROC Keywords: binary diagnostic test; diagnostic odds ratio; analysis is often utilized. The performance of a diagnostic likelihood ratios; overall accuracy; predictive values; test with binary results are usually evaluated using the sensitivity; specificity. measures of sensitivity and specificity. However, there are some important measures other than these two measures for binary test results. These measures are predictive Öz values, overall accuracy, diagnostic odds ratio, Youden Amaç: Bu çalışmada amaç, tanı testlerinin özelliklerini index, and likelihood ratios. tanıtmak. Buna ek olarak, İki sonuçlu tanı testleri için Results: A hypothetical data has been produced based on hangi performans ölçülerinin kullanabileceği, bu ölçülerin the studies conducted on the performance of rapid tests özelikleri ve nasıl yorumlanacağı gösterilecektir. (Specific IgM/IgG) according to the RT-PCR test for Covid 19 Gereç ve Yöntem: Tanı testi sonucunun sayısal veya iki in the literature. An example of a diagnostic test (Specific sonuçlu olmasına göre test performans ölçülerinin değer- IgM/IgG) with a binary result is given and all measure- lendirilmesi farklılık gösterebilir. Tanı testi sonucu sürekli ments and their confidence interval are obtained for this sayısal olduğunda sıklıkla ROC analizinden yararlanılır. data. The performance of rapid test was examined and İki sonuçlu bir tanı testinin performansı genellikle duyar- interpreted. lılık ve seçicilik ölçüleri ile değerlendirilir. Ancak, iki Conclusion: It is important to design and evaluate the durumlu testler için bu iki ölçü dışında başka bazı önemli performance of diagnostic/screening tests for health care. ölçüler de vardır. Bu ölçüler; kestirim değerleri, genel doğruluk oranı, tanı odds oranı, youden indeksi, olabilirlik oranlarıdır. *Corresponding author: Jale Karakaya, Ph.D, Department of Bulgular: Literatürde bulunan Covid 19 hastalığı için Biostatistics, School of Medicine, Hacettepe University, Sihhiye, ı ı fi Ankara, Turkey, E-mail: [email protected]. https://orcid.org/ RT-PCR testine göre h zl testin (Speci c IgM/IgG) perfor- 0000-0002-7222-7875 mansı üzerine yapılan çalışmalar temel alınarak hipotetik Open Access. © 2021 Jale Karakaya, published by De Gruyter. This work is licensed under the Creative Commons Attribution 4.0 International License. 104 Karakaya: Evaluation of binary diagnostic tests bir veri üretilmiştir. İki durumlu bir tanı testine (Specific accurately it can distinguish patients and healthy people. It IgM/IgG) örnek verilmiş ve bu veri için tüm ölçüler ve is quite important in medical practice to know in advance onların güven aralıkları elde edilmiştir. Hızlı testin per- the possible accuracy or inaccuracy of the results of these formansı incelenmiş ve yorumlanmıştır. indefinite tests [1, 2]. Some measures have been developed Sonuç: Sağlık hizmetleri için tanı ve tarama testlerinin to assess the performance of these tests in terms of their performanslarının tasarlanması ve değerlendirilmesi discrimination accuracy. The measures that can be used in önemlidir. Bu derlemede, bazı basit tanımlar, sadece iki statistical evaluation processes vary with respect to the sonuçlu tanı testlerinin değerlendirilmesinde kullanılan purpose and whether the outcome of diagnostic test is performans ölçüleri ve bunlara ait güven aralıklarından numerical or categorical (i.e. positive-negative). The ac- söz edilmiştir. Birçok farklı ölçü olması performansın curacy of a diagnostic test when the outcome is quantita- değerlendirilmesinde farklı yorumlamalar sağlar. Bir tanı tive (numerical) or ordinal is usually evaluated by a testinin performansınındoğru olarak kestirilmesi birçok receiver operating characteristic (ROC) analysis [3, 4]. faktöre bağlıdır. Bu faktörler, çalışma tasarımı, katılımcı- Although the results of some tests are numerical, they can ların seçim kriterleri, örneklem büyüklüğü hesaplaması, still be assessed as negative or positive by applying a cut- test yöntemleri vb. olabilir. Böyle faktörler açısından, off value. çalışmanın yapıldığı koşullara ilişkin tüm bilginin çalışma This article deals only with performance measures that raporunda olmasını sağlayan kılavuzlar bulunmaktadır. can be used to assess the classification performance of Bu nedenle, birçok yayınevi tarafından bu kontrol listele- diagnostic tests whose outcome is binary or numerical re- rinin kullanılması için bu kılavuzlar önerilmektedir. sults have been transformed into binary. For assessing the success of a diagnostic test in classification, it is necessary Anahtar Sözcükler: duyarlılık; genel doğruluk oranı; İki to have both reference and index tests applied indepen- sonuçlu tanı testi; kestirim değerleri; olabilirlik oranı; dently to each individual and to evaluate outcomes [5, 6]. seçicilik; tanı odds oranı. Statistical measures for diagnostic accuracy Introduction assessment Various diagnostic and laboratory tests are used in a Many measures have been developed to assess the accu- medical process of deciding whether a person has a spe- racy of an index test. Each of these measures has different cific disease. The results of diagnostic tests about a person interpretation and domain of use. It is appropriate to use may not always be accurate. In other words, they cannot the measures listed below in cases where the test outcome distinguish patients and healthy peoples 100% accurately. is binary (positive–negative). While some tests are perfect, such as reference tests, – Sensitivity completely distinguishing the diseased from the non- – Specificity diseased subjects, others may lead to mis-classifications – Positive predictive value (PPV) (wrong diagnosis) due to indefinite outcomes [1, 2]. Diag- – Negative predictive value (NPV) nostic tests which has 100% accurate results are known as – Overall test accuracy “gold standard” tests. Although outcomes of some tests – Diagnostic odds ratio (DOR) cannot accurately discriminate the non-diseased and – Youden Index (YI) diseased, they are still used as a reference test for being the – Positive Likelihood ratio (LR+) best available preference. It may not always be possible to – Negative Likelihood ratio (LR−) use reference tests since they are costly, risky, difficult, and so on. Consequently, imperfect tests with low cost, fast, If true disease status and the outcome of imperfect diag- and low risk are frequently used to make a diagnosis. These nostic test results are binary, basic diagnostic measures tests are also known as index test. Index test is a diagnostic used in assessing the performance of diagnostic tests are test that is being evaluated against a reference standard sensitivity, specificity, false positive rate and false negative test in a study of test accuracy. Then, how reliable are the rate [7]. To obtain these measures there is need for a 2 × 2 outcomes of such tests? The answer of this question is contingency table relating to individuals to whom both related to what extent the test applied yields accurate re- tests are applied as given in Table 1 below. sults. The fact that index tests other than reference tests can In this table, the true disease status (D) obtained be used in the diagnostic process depends on knowing how through gold standard test is denoted as D+ in the presence Karakaya: Evaluation of binary diagnostic tests 105 Table : × contingency table for the diagnostic test with binary result. Diagnostic (index) test result True disease status gold standard test (or reference test) Total Disease present (D+) Disease absent (D−) Test positive (T+) True positive (TP) False positive (FP) TP + FP Test negative (T−) False negative (FN) True negative (TN) FN + TN Total TP + FN FP + TN N (TP + FN + FP + TN) of disease and D− in the absence of it. Similarly, while the Sensitivity(TPR) = TP/(TP + FN) result obtained by the diagnostic test is (T), the presence of Specificity(TNR) = TN/(FP + TN) disease (positive outcome) is denoted by T+ and its absence (negative) by T−. False