Inferences for Likelihood Ratios Absence of a “Gold Standard” In

Inferences for Likelihood Ratios in the Absence of a “Gold Standard” LAWRENCE JOSEPH, PhD, THERESA W. GYORKOS, PhD Likelihood ratios are extensively used to evaluate the performances of diagnostic tests and to update prior odds of disease to posttest odds. Since few tests are truly 100% accurate, including many used as “gold standards,” it is important to be able to estimate likelihood ratios in cases where no such standard is available. In this paper, methods to calculate point and interval estimates for likelihood ratios are described. The results numerically coincide with those reviewed by Centor when a “gold standard” is assumed available, but typically provide wider interval estimates when such a standard is not available, reflecting the increased uncertainty inherent in such situations. Unlike previous techniques, the methods do not require normal approximations or log- arithmic transformations, and hence provide accurate estimates even when parameter distributions are highly skewed. The methods are illustrated using the results of two different diagnostic tests for the presence of an intestinal parasitic infection. Key words: Bayesian analysis; diagnostic tests; epidemiologic methods: likelihood ratios; Monte Carlo methods; statistical models. (Med Decis Making 1990;16:412-417) Consider two tests, where Test 1 is a “gold stan- Table 1 l Observed Data from Two Diagnostic Tests dard” for detection of a disease or condition, and Test 1 Test 2 is an imperfect, but more convenient, cheaper, or easier test for that disease or condition. Estimating likelihood ratios for Test 2 can help as- Test 2 sess its diagnostic@. Interval as well as point estimates for likelihood ratios can be useful. Interval estimates such as 95% confidence intervals (CIs) de- scribe the accuracy of point estimates, and in pro- viding a range of plausible values for likelihood ratios aid the clinician in interpreting diagnostic tests bl, cannot be determined, as the number of dis- (e.g., tests with narrower CIs provide more precise eased subjects is unknown in the absence of a “gold posttest odds of disease than do those with wider standard.” Similarly, the usual point estimates of the CIs). As a result, use of interval rather than point specificity and hence of the positive and negative estimates has been increasing rapidly in recent likelihood ratios (denoted here by LR+ and LR-, years.l Gart and Nam2 provide a review of statistical respectively) do not hold. Walter and Irwig3 sum- methods for point and interval estimation of the marize the standard frequentist or classic literature likelihood ratio. for statistical inferences for diagnostic test parame- The standard estimation techniques for diagnostic ters in the absence of a “gold standard.” Under this test parameters from 2 X 2 table data such as those framework, unless there are data from three or given in table 1 cannot be used when Test is not 1 more tests (for example, three or more different a “gold standard.” The sensitivity of Test 2, a/(a + tests are applied to each subject, or the same test is . applied on three or more independent instances), Received April 18, 1995, from the Department of Epidemiology simultaneous estimation of all unknown parameters and Biostatistics, McGill University, and the Division of Clinical of the tests is not possible. Typically, some of the Epidemiology, Department of Medicine, Montreal General Hos- unknown parameters must be assumed exactly pital, Montreal, P.Q., Canada. Revision accepted for publication known in order to draw inferences about the re- December 13, 1995. Supported in part by the Natural Sciences and Engineering Research Council. Dr. Joseph is a research maining parameters. For example, the sensitivity scholar of the Fonds de la Recherche en Sante du Quebec, and and specificity of Test 1 must be assumed exactly Dr. Gyorkos is a research scholar supported by the National known in order to estimate the disease prevalence Health Research and Development Program, Health Canada. and the sensitivity and specificity of Test 2. When Address correspondence and reprint requests to Dr. Joseph: Test 1 is a “gold standard,” its sensitivity and spec- Division of Clinical Epidemiology, Department of Medicine, Mon- treal General Hospital, 1650 Cedar Avenue, Montreal, Quebec, ificity are 100%. Otherwise, it is rare that the sensi- H3G lA4, Canada. e-mail: ([email protected]). tivity and specificity of Test 1 are exactly known. Fail- 412 VOL 16/NO 4. OCT-DEC 1996 Likelihood Ratios without a Standard l 413 ure to account for this uncertainty can lead to mis- is considered to be available for P, Sn2, and SpZ, then leading point and interval estimates for likelihood uniform distributions covering the interval EO,ll (the ratios. This is illustrated in the examples below. range of possible values) could be used. Standard However, by taking a Bayesian approach, Joseph methods as reviewed by Centor’ are approximately et al4 demonstrated that it is possible to estimate all numerically equivalent to a Bayesian approach that unknown parameters without imposing often-unjus- assumes these five prior distributions. However, tified constraints on a subset of the unknown pa- since “gold standards” are rarely available, and di- rameters. The results from the Bayesian approach agnostic tests would not be used unless at least reduce to those of the classical approach when sim- some information was known about their properties ilar constraints are imposed, and thus the Bayesian a priori, most prior distributions should fall in be- approach can be considered a generalization of the tween these extremes. classical methods. Like previous solutions that re- When no “gold standard” is available, the calcu- quired a “gold standard,“‘,’ this numerical approach lations required by Bayes’ theorem are analytically . proceeds iteratively. The calculations are performed intractable, and therefore Monte Carlo algorithms using a Markov chain Monte Carlo simulation tech- such as the Gibbs sampler are employed. The output nique called the Gibbs sampler,” which has recently of the Gibbs sampler consists of random samples been applied to a wide variety of estimation prob- from the joint posterior density of all parameters of lems in medicine.7-10 interest. These random samples can then be used In this paper, the approach of Joseph et a1.4 is to reconstruct the marginal densities of each param- used to obtain interval estimates for LR+ and LR- eter, or more simply, to provide summaries of these from serologic testing and stool examination data, densities, such as the means, standard deviations, used as diagnostic tests for Strongyloides infection. and interval estimates.’ Since any density shape can Neither of these tests can be considered a “gold be reconstructed in this way, assumptions concern- standard” for the detection of Strongyloides. Exam- ing normality or log-normality of the distributions ple 1 of Centor,5 which examines the use of creatine are not necessary. Joseph et a1.4 provided details of kinase (CK) as a diagnostic test for myocardial in- the Gibbs-sampler algorithm that is required to pro- farction (MI), is used to illustrate that the interval vide random samples from the sensitivity, specificity, estimates given by the Bayesian approach match and positive and negative predictive values of diag- those given by the standard approach when a “gold nostic tests using a Bayesian approach when no standard” test is available. “gold standard” is assumed. This approach is summarized in the appendix. Since likelihood ratios are functions of sensitivity (Sn) and specificity (Sp), a ran- Methods dom sample from the marginal posterior density of a likelihood ratio can be constructed directly from In the Bayesian approach to statistical inference, a random sample of (Sn, Sp) pairs. Therefore, the the information available before the experiment results in the next two sections were obtained by about the parameters of interest is summarized in using the Gibbs-sampler algorithm to obtain ran- a prior distribution. The data, through the likelihood dom (Sn, SpJ pairs, i = 1, . , M, where M is the function, are then combined with the prior distri- size of the Monte Carlo sample. A sample from the bution to derive posterior distributions using Bayes’ posterior density of each likelihood ratio can then theorem. The posterior distributions contain up- be constructed by calculating the quantities dated beliefs about the test parameters after consid- ering the infermation provided by the data. Gelman et al.10 provide an introduction to, and many exam- Sni ples of, Bayesian data analysis. LR+i = i = 1, . ..A4 When two diagnostic tests are applied to each 1 - Sp( subject in a given population, there are typically five unknown parameters-the prevalence 03 and the and sensitivities @I,, Sn,) and specificities (Spl, Sp& of each diagnostic test. Prior distributions that sum- marize the available information about these param- 1 - Sni eters must be formulated as inputs to the analysis. LR-i= - i = 1, . ..M @i ’ For example, if Test 1 is considered to be a “gold standard,” Sn, = Spl =1, so that the prior proba- bility distributions for these parameters consist of point masses on the single number 1, and zero else- These samples are then used to obtain mean, me- where. At the other extreme, if no prior information dian, and interval estimates for the likelihood ratios. 414 l Joseph, Gyorkos MEDICAL DECISION MAKING Table 2 l Results of Serologic and Stool Testing for not known with high precision. Nevertheless, the Strongyloides infection of 162 Cambodian likelihood ratios for both tests can be estimated Refugees Arriving in Montreal, Canada, between July 1962 and February 1983* from the data presented in table 2 and the available prior information about the sensitivities and speci- Stool Examination ficities of the two tests. + - In consultation with a panel of experienced par- + 36 87 125 asitologists from the McGill University Centre for Serology Tropical Diseases, 95% prior credible intervals 2 35 37 (Bayesian analogs of CIs) were elicited by consensus.

Inferences for Likelihood Ratios Absence of a “Gold Standard” In

Should the Randomistas (Continue To) Rule?

Getting Off the Gold Standard for Causal Inference

Assessing the Accuracy of a New Diagnostic Test When a Gold Standard Does Not Exist Todd A

Memorandum Explaining Basis for Declining Request for Emergency Use Authorization for Emergency Use of Hydroxychloroquine Sulfate

Randomized Controlled Trials, Development Economics and Policy Making in Developing Countries

Evaluating Diagnostic Tests in the Absence of a Gold Standard

Screening for Alcohol Problems, What Makes a Test Effective?

Adaptive Platform Trials

Common Types of Questions: Types of Study Designs

Efficient Adaptive Designs for Clinical Trials of Interventions for COVID-19

Assessing the Accuracy of Diagnostic Tests

Evaluation of Diagnostic Tests When There Is No Gold Standard Vol