Binary Data Consistency Test 1 Debit

BINARY DATA CONSISTENCY TEST 1 DEBIT: A Simple Consistency Test For Binary Data 1 2 James A. J. Heathers , Nicholas J. L. Brown 1. Northeastern University, Boston MA 2. University of Groningen, The Netherlands Correspondence to: [email protected] Bouvé College of Health Sciences Northeastern University, 360 Huntington Avenue, Boston MA 02115. BINARY DATA CONSISTENCY TEST 2 Abstract Scientific papers occasionally report group membership coded as a binary variable [i.e., 0 or 1], and present the mean and standard deviation calculated from it in a table of descriptive statistics. This is redundant as the mean and standard deviation are not independent. This manuscript demonstrates this redundancy, and uses the observation as a simple error detection test to investigate the accuracy of published descriptive statistics, termed here DEBIT (DEscriptive BInary Test). Salient features of deploying the test are discussed, as are some anonymized examples where presented tables of descriptives appear to fail DEBIT. Keywords: error detection, metascience BINARY DATA CONSISTENCY TEST 3 DEBIT: A Simple Consistency Test For Binary Data Within science, there is a pressing concern that methods and results may be underreported or under-specified. The inaccessibility of raw data during review (Mayernik, Callaghan, Leigh, Tedds, & Worley, 2015) or in perpetuity (Vines et al., 2014), and the underreporting of methodological details and/or irreproducibility of methods (Prinz, Schlange, & Asadullah, 2011), are contributors to the present crisis in reproducibility and replicability (Camerer et al., 2016, 2018). In this environment, it is unusual to consider information that is overreported—that is, details included in scientific work although they contribute no additional information. However, overreported metrics have the potential to form error detection tests (i.e., they can be used to investigate inconsistencies or errors in scientific publications), as they contain redundant elements that ought necessarily to be in agreement. Scientific papers commonly report measures of central tendency and dispersion to define sample characteristics. As this is ubiquitous, papers will occasionally—and unnecessarily— report both the mean and standard deviation of a binary variable, most typically from a group which has yes/no membership, coded numerically as 0 and 1. Examples include dichotomous age delineations (e.g., participants aged 25 or below, versus participants aged 26 or above), participants meeting a group membership criterion (e.g., depressed patients with BDI scores of 14 or above vs. non-depressed or recovering patients with BDI scores of 13 or below; Beck, Steer, & Brown, 1996), sex (male vs. female), geographical location (urban vs. rural), and so on. In text, these variables are often reported as a single percentage (e.g., “A sample of 70 participants (47.1% female) was collected for analysis”) or as one or more raw numbers (e.g., BINARY DATA CONSISTENCY TEST 4 “n = 85 urban and n = 35 rural participants returned the survey”). For inclusion in statistical analyses for proportionality between groups as categories (e.g. using the chi-square test of independence), or for use as variables in a regression, these group memberships are often assigned a binary value (0 vs. 1) from which the mean and standard deviation are calculated. It is unclear why or how this reporting standard has arisen, because the standard deviation is entirely determined by the mean and cell sizes. It is trivial to demonstrate the redundancy of the standard deviation, and thus determine it contains no additional information about the sample of interest. Consider a sample with two groups (cell sizes a and b), which are assigned the binary values of 0 and 1 respectively with an overall sample size of N, hence: If this is the case, the standard deviation is a simple function of a and b: As , BINARY DATA CONSISTENCY TEST 5 As the sample standard deviation is invariably used, this can be modified to: This relationship can also be expressed in terms of the overall mean using the previous identity: As a consequence, any sample of binary variables has a mean and standard deviation that are precisely described by the two cell sizes, and as these mean/standard deviation pairs are commonly reported, we can treat them as a window to check the consistency of the presented figures. Similar to other methods of checking internal consistency, this identity is absolute, as it is when test statistics are back-calculated (e.g., with statcheck: Nuijten, Hartgerink, van Assen, Epskamp, & Wicherts, 2016) or in the GRIM test, a method for using granularity to determine whether reported means are possible (Brown & Heathers, 2016). Unlike SPRITE (Heathers, Anaya, van der Zee, & Brown, 2018), a method for reconstructing potential datasets via iterating individual changes in the constituent values, the SD of a binary variable requires no estimation and is absolute rather than probabilistic. For instance, if we report a sample (N = 280) with sex coded as a binary variable, with 127 male participants and 153 female participants, the standard deviation is exactly equal to 0.4987..., as: BINARY DATA CONSISTENCY TEST 6 For simplicity, we have termed this observation the DEscriptive BInary Test (DEBIT1). While the test is absolute if the numbers are perfectly specified, there are exceptions to this besides basic typographical or statistical errors: (1) Rounding In a sufficiently large sample, reporting any number to a low amount of decimal places (usually 2) or as a percentage may return a range of possible sample sizes. Assume a sample of N = 2,500, which is described as 17% patient population, 83% healthy participants (with the numbers of participants in each condition being designated as P and H, respectively), with a reported SD of 0.37. Allowing for numbers very close to the rounding values, a check for consistency is now whether an SD of ≥0.365 to <0.375 is produced by any possible solution where N = P + H = 2500 (i.e., from ~16.5% to ~17.5% P and from ~82.5% to ~83.5% H). Hence, a range of possible solutions may fit the reported data; in this case, 396 ≤ P ≤ 422. (2) Unreported exclusions Real-world datasets frequently have incomplete items, which are presumably more common when data is collected unsupervised, under duress, under time pressure, when resistance is provoked by the content of the questions (eg. if on deeply personal matters), or with methods which do not compel a response (i.e. an mailed paper survey, or an automated survey which does not mandate that every question be answered to achieve completion). These deficiencies are, in general, more likely when research is completed on a large scale or at multiple testing sites. In this case, cell sizes or subsamples of individual items may not be implied by the overall sample size reported. 1 The name BInary DEscriptive Test (BIDET) was rejected after consulting a small focus group of colleagues. BINARY DATA CONSISTENCY TEST 7 (3) Altered data If a data set has been altered in certain ways—such as the accidental omission of part of the data due to an incorrect operation in a text editor or incorrect filtering in a statistical package, or if the descriptive statistics have simply been fabricated—then sample sizes, means, and standard deviations may not align as they should. (Note, however, that if the raw data for one or more variables have been fabricated, the descriptive statistics for these variables will be internally consistent if correctly reported.) Visualising DEBIT As these errors differ in magnitude (e.g., while a rounding error might produce an n1/n2 pair that is slightly incorrect, other errors may be of larger magnitude), and the relevant numbers are presented en masse in tables of descriptives, a simple visualization can help to simplify the task of inspecting data. In any sample size of interest, the overall N can be split into every possible n1/n2 pair (x-axis), and expressed either in absolute terms or as a proportion. These points can be plotted against the corresponding standard deviations (y-axis). If an individual point does not lie on this line, it is necessarily incorrect. An example: In a hypothetical medium-sized sample of participants (women over the age of 75; N = 100), a small number answer a binary question positively (“are you still in the workforce?”; yes, n1 = 15) and a larger sub-sample answer the same question negatively (“are you still in the workforce?”; no, n1 = 85). Figure 1 displays this visualization with SDs at 0.28, 0.36, 0.44, and 0.52. BINARY DATA CONSISTENCY TEST 8 Figure 1. A simple DEBIT graph, with potential values seen at x=0.15 The four points in Figure 1 represent different classes of solutions. SD = 0.36 is correct for the sample/cell sizes, and as such it lies on the line at (15, 0.36). SD = 0.52 is trivially incorrect, and a test is not required to make this determination as it is impossible at any point; if n = 100, the maximum possible value for any SD of a standard experimental sample size is 0.50 + ε, where ε approaches 0 as n increases. SD = 0.28 and SD = 0.44 are both incorrect, and potentially provide useful additional information. These solutions may themselves fall into one of two categories: they may either correspond to a different mean (i.e. if SD = 0.44, n1 = 25 and n1 = 26 are both solutions), or they may not exist as per the GRIM test (Brown & Heathers, BINARY DATA CONSISTENCY TEST 9 2016), as these SDs are also affected by the relationship between granularity and the level of rounding possible in the solutions. In this case, if we report all figures to 2 d.p., SD = 0.28 is impossible as it lies between n1 = 8 (SD = 0.27) and n2 = 9 (SD = 0.29).

Binary Data Consistency Test 1 Debit

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support