A Method and R Package for Calculating the Conditional Distribution of Identiﬁed Students Under One- and Two-Stage Identiﬁcation Systems Matthew T

A Method and R Package for Calculating the Conditional Distribution of Identified Students Under One- and Two-Stage Identification Systems Matthew T. McBee1, Scott J. Peters2, & Natasha L. Godkin1 1 East Tennessee State University 2 University of Wisconsin-Whitewater This paper develops methods for calculating the conditional probabilities of identification in one- and two-stage gifted identification systems and derives the conditional distribution functions for the true and observed ability scores of identified students, including the probability density function, the cumulative density function, and quantile function. The first and second moments of these distributions are presented, and functions for simulating random values from the distribution of identified student scores are implemented via rejection sampling. An accompanying R package and Shiny web application allows these computations to be easily carried out. These methods allow researchers to quantify the efficacy of identification systems for specific students and to better understand the impact of design characteristics on performance. Keywords: gifted, identification, psychometrics Word count: X Two previous papers have provided methods for calculat- generated by considering this information could prove to be ing the sensitivity, incorrect identification rate, and expected exceptionally useful. program size for multiple-criteria gifted identification sys- This paper presents a method for calculating conditional prob- tems with no nomination stage (McBee, Peters, & Waterman, abilities of identification and derives the conditional prob- 2014) and for single-criteria systems with a nomination stage ability distribution of true and observed ability scores for (McBee, Peters, & Miller, 2016). These statistics are proper- identified students. The method applies to single-stage iden- ties at the system level and do not represent the probability of tification systems as well as two-stage systems in which an identification for any specific individual; sensitivity and the initial nomination phase is followed by a confirmatory testing other metrics are marginal statistics that are averaged over stage. The identification probability for an individual can all of the students. Individual probabilities of identification be calculated given knowledge of the system characteristics vary as a function of the system characteristics as well as the and the individual’s true score(s). When the true score is individual’s true scores on the qualities or characteristics that above the giftedness threshold, the two possible outcomes are are being assessed. (If the identification system is biased then a true positive or a false negative; when below it, the possible identification probabilities vary across other characteristics, outcomes are a false positive or a true negative. The identifica- such as race or income, as well). tion probability function makes possible the derivation of the conditional probability density function for both the true and A specific student with exceptionally high ability may have a observed scores of identified students, as well as the condi- high probability of identification even in a system with low tional cumulative density and quantile functions. Simulation sensitivity. A borderline student may have a low chance of of random draws from these distributions is implemented via entry even under a reasonably sensitive identification, and rejection sampling. a practically zero chance of entry under an insensitive one. These conditional probabilities of identification allow for spe- An R package and Shiny app developed by the lead author cific outcomes to be examined, lowering the level of analysis accompanies this paper. The package provides a convenient from systems to individual students. They reveal specifically means for implementing the calculations described in this 1 who is and is not likely to make the cut. Given the field’s document. When the package is available on CRAN , it can continuous efforts to address severe race and income dis- be installed using the following code: proportionalities (e.g., Yoon and Gentry (2009), the insights install.packages("giftedCalcs") library(giftedCalcs) Before the package is on CRAN, or to access the develop- Department of Psychology ment version, the package can be installed from github via the Department of Educational Foundations Correspondence concerning this article should be addressed to 1It is not on CRAN at the time of this writing; we would like Matthew T. McBee, . E-mail: [email protected] feedback from reviewers before submitting it there. 2 MATTHEW T. MCBEE1, SCOTT J. PETERS2, & NATASHA L. GODKIN1 install_github() function from the devtools package. a simple example before delving in to the more complex two- install.packages("devtools") stage model scenario. The conditional expectation and condi- devtools::install_github("mcbeem/giftedCalcs") tional covariance matrix of the distribution can be computed library(giftedCalcs) using the following equations. Note that bold terms indicate vectors or matrices rather than scalar values. Package help can be viewed by running ?giftedCalcs at the console. −1 E(XjY = y) = E(X) + ΣXYΣYY[y − E(y)] (2) Background −1 Σ(XjY = y) = ΣXX − ΣXYΣYYΣYX (3) In diagnostic testing, one or more cutoffs are placed on a continuous observed score to classify subjects. Cutoffs for According to equations 2 and 3, the covariance matrix is bro- gifted identification vary widely (McClain & Pfeiffer, 2012) ken into components. The set of variables to be conditioned but are always found on the upper tail of the distribution. on is labeled Y, and the portion of the covariance matrix that We will initially consider the simplest possible diagnostic involves them is labeled ΣYY. The remaining variables, for system in which all members of a population of interest (i.e., which we wish to derive the conditional multivariate distribu- students) are tested via a single confirmatory assessment and tion, is denoted X; the portion of Σ containing the covariance classified into two categories (e.g., gifted and non-gifted). We matrix of these variables is denoted ΣXX. The elements of Σ will arbitrarily select the 90th percentile as our location for relating X and Y are denoted ΣXY. the cutoff. Applying these equations to the single-stage assessment case, Classical test theory (CTT) can be used to analyze the per- Σ is the covariance matrix of the multivariate normal distri- formance of such a system from first principles under its bution (see equation 1), X is the observed score (because we assumption that observed scores are composed of true score wish to obtain its conditional distribution), and Y the true plus error, that the expectation of the errors is zero, and that score. Therefore, the covariance matrix is structured as: the errors are independent of the true scores. Following typ- ical CTT notation, we will denote the true score T and the " # Σ Σ observed score X. We will further assume that the true scores Σ = XX YX (4) Σ Σ and observed scores follow a multivariate normal distribution. XY YY If the scores are on a standardized (z-score) metric, the mean and the components of equation 4 from equation 1 are as vector (µ) will consist of a vector of zeros. The covariance follows: matrix (Σ) is characterized by the reliability coefficient ρtt. The square root of this term is the covariance (and correlation, since both variances are one) between the true and observed ΣXX = 1 scores. " p # p 1 ρ ΣXY = ΣYX = ρtt Σ(X; T) = p tt (1) ρtt 1 As indicated, the order of variables in the covariance matrix ΣYY = 1 (equation 1) is the observed score (first row/column) followed by the true score (second row/column). Substituting these components into equations 2 and 3 yields the following: Conditional Probabilities for Single-Stage Identification p Systems E(XjT = t) = E(X) + ρtt[t − E(T)] (5) We use Equation A.4 and Theorem A.3 from Lindgren, Rootzén, and Sandsten (2013) to compute the conditional Σ(XjT = t) = 1 − ρtt (6) distribution of observed scores given a specific true score. The bivariate case is trivial and could have been derived using Since the expectations of the observed score (X) and the true the more familiar equations for simple linear regression or score (T) are both zero (due to standardization), the equation from classical test theory, but we present the general equations for the expectation of the observed score in this case reduces p for the conditional multivariate normal distribution to provide to E(XjT = t) = ρtt(t). CONDITIONAL DISTRIBUTION 3 The conditional_moments() function implements the cal- both sides of the distribution of true scores. In other words, culations from equations 5 and 6, which are the mean (µ) the shrinkage on both sides cancels out. and variance (σ2) of the distribution of observed scores given The conditional_moments() function can also be used to a specific true score. The observed score distribution for a obtain the distribution of the true score conditional on the student with a true score of t = 1 (noting that this is a z score) observed score. The user simply provides the value to condi- in a single-stage identification system with a test reliability of tion on. The function accepts the true test score (t.true=), :9 has the following moments. the observed test score (t.obs=) or, in the case of two-stage conditional_moments(t.true=1, relyt=.9) systems, the observed nomination score (n.obs=). ## $conditional.mean conditional_moments(t.obs=1, relyt=.9) ## [,1] ## t.obs 0.9486833 ## $conditional.mean ## ## [,1] ## $conditional.cov ## t.true 0.9486833 ## [,1] ## ## t.obs 0.1 ## $conditional.cov ## [,1] In the single-stage case, this distribution is univariate normal. ## t.true 0.1 The distribution is given by The probability of identification for a student with true score t can be calculated by integrating the distribution from the p π(XjT = t) ∼ N ρtt(t); (1 − ρtt) (7) identification cutoff (τ) to positive infinity. | {z } | {z } µ σ2 Z 1 where N represents the normal distribution. p (pidentificationjT = t) = N ρtt(t); (1 − ρtt) dX (8) The interpretation of this distribution is simple.

A Method and R Package for Calculating the Conditional Distribution of Identiﬁed Students Under One- and Two-Stage Identiﬁcation Systems Matthew T

The Statistical Analysis of Distributions, Percentile Rank Classes and Top-Cited

A Logistic Regression Equation for Estimating the Probability of a Stream in Vermont Having Intermittent Flow

Integration of Grid Maps in Merged Environments

How to Use MGMA Compensation Data: an MGMA Research & Analysis Report | JUNE 2016

Ggplot2 Compatible Quantile-Quantile Plots in R by Alexandre Almeida, Adam Loy, Heike Hofmann

Data Analysis Toolkit #4: Confidence Intervals Page 1 Copyright © 1995

Control Chart Based on the Mann-Whitney Statistic

JMASM25: Computing Percentiles of Skew-Normal Distributions," Journal of Modern Applied Statistical Methods: Vol

Guidance for Comparing Background and Chemical Concentrations in Soil for CERCLA Sites

4. Descriptive Statistics

Chapter 5: Spatial Autocorrelation Statistics

Gaussian Copula Vs. Loans Loss Assessment: a Simplified and Easy-To-Use Model Viviane Y