<<

A Method and R Package for Calculating the Conditional Distribution of Identified Students Under One- and Two-Stage Identification Systems Matthew T. McBee1, Scott J. Peters2, & Natasha L. Godkin1

1 East Tennessee State University 2 University of Wisconsin-Whitewater

This paper develops methods for calculating the conditional probabilities of identification in one- and two-stage gifted identification systems and derives the conditional distribution functions for the true and observed ability scores of identified students, including the probability density function, the cumulative density function, and function. The first and second moments of these distributions are presented, and functions for simulating random values from the distribution of identified student scores are implemented via rejection . An accompanying R package and Shiny web application allows these computations to be easily carried out. These methods allow researchers to quantify the efficacy of identification systems for specific students and to better understand the impact of design characteristics on performance.

Keywords: gifted, identification, Word count: X

Two previous papers have provided methods for calculat- generated by considering this information could prove to be ing the sensitivity, incorrect identification rate, and expected exceptionally useful. program size for multiple-criteria gifted identification sys- This paper presents a method for calculating conditional prob- tems with no nomination stage (McBee, Peters, & Waterman, abilities of identification and derives the conditional prob- 2014) and for single-criteria systems with a nomination stage ability distribution of true and observed ability scores for (McBee, Peters, & Miller, 2016). These are proper- identified students. The method applies to single-stage iden- ties at the system level and do not represent the probability of tification systems as well as two-stage systems in which an identification for any specific individual; sensitivity and the initial nomination phase is followed by a confirmatory testing other metrics are marginal statistics that are averaged over stage. The identification probability for an individual can all of the students. Individual probabilities of identification be calculated given knowledge of the system characteristics vary as a function of the system characteristics as well as the and the individual’s true score(s). When the true score is individual’s true scores on the qualities or characteristics that above the giftedness threshold, the two possible outcomes are are being assessed. (If the identification system is biased then a true positive or a false negative; when below it, the possible identification probabilities vary across other characteristics, outcomes are a false positive or a true negative. The identifica- such as race or income, as well). tion probability function makes possible the derivation of the conditional probability density function for both the true and A specific student with exceptionally high ability may have a observed scores of identified students, as well as the condi- high probability of identification even in a system with low tional cumulative density and quantile functions. Simulation sensitivity. A borderline student may have a low chance of of random draws from these distributions is implemented via entry even under a reasonably sensitive identification, and rejection sampling. a practically zero chance of entry under an insensitive one. These conditional probabilities of identification allow for spe- An R package and Shiny app developed by the lead author cific outcomes to be examined, lowering the level of analysis accompanies this paper. The package provides a convenient from systems to individual students. They reveal specifically for implementing the calculations described in this 1 who is and is not likely to make the cut. Given the field’s document. When the package is available on CRAN , it can continuous efforts to address severe race and income dis- be installed using the following code: proportionalities (e.g., Yoon and Gentry (2009), the insights install.packages("giftedCalcs") library(giftedCalcs)

Before the package is on CRAN, or to access the develop- Department of Psychology ment version, the package can be installed from github via the Department of Educational Foundations Correspondence concerning this article should be addressed to 1It is not on CRAN at the time of this writing; we would like Matthew T. McBee, . E-mail: [email protected] feedback from reviewers before submitting it there. 2 MATTHEW T. MCBEE1, SCOTT J. PETERS2, & NATASHA L. GODKIN1 install_github() function from the devtools package. a simple example before delving in to the more complex two- install.packages("devtools") stage model scenario. The conditional expectation and condi- devtools::install_github("mcbeem/giftedCalcs") tional matrix of the distribution can be computed library(giftedCalcs) using the following equations. Note that bold terms indicate vectors or matrices rather than scalar values. Package help can be viewed by running ?giftedCalcs at the console. −1 E(X|Y = y) = E(X) + ΣXYΣYY[y − E(y)] (2) Background −1 Σ(X|Y = y) = ΣXX − ΣXYΣYYΣYX (3) In diagnostic testing, one or more cutoffs are placed on a continuous observed score to classify subjects. Cutoffs for According to equations 2 and 3, the is bro- gifted identification vary widely (McClain & Pfeiffer, 2012) ken into components. The set of variables to be conditioned but are always found on the upper tail of the distribution. on is labeled Y, and the portion of the covariance matrix that We will initially consider the simplest possible diagnostic involves them is labeled ΣYY. The remaining variables, for system in which all members of a population of interest (i.e., which we wish to derive the conditional multivariate distribu- students) are tested via a single confirmatory assessment and tion, is denoted X; the portion of Σ containing the covariance classified into two categories (e.g., gifted and non-gifted). We matrix of these variables is denoted ΣXX. The elements of Σ will arbitrarily select the 90th percentile as our location for relating X and Y are denoted ΣXY. the cutoff. Applying these equations to the single-stage assessment case, Classical test theory (CTT) can be used to analyze the per- Σ is the covariance matrix of the multivariate normal distri- formance of such a system from first principles under its bution (see equation 1), X is the observed score (because we assumption that observed scores are composed of true score wish to obtain its conditional distribution), and Y the true plus error, that the expectation of the errors is zero, and that score. Therefore, the covariance matrix is structured as: the errors are independent of the true scores. Following typ- ical CTT notation, we will denote the true score T and the " # Σ Σ observed score X. We will further assume that the true scores Σ = XX YX (4) Σ Σ and observed scores follow a multivariate . XY YY If the scores are on a standardized (z-score) metric, the and the components of equation 4 from equation 1 are as vector (µ) will consist of a vector of zeros. The covariance follows: matrix (Σ) is characterized by the reliability coefficient ρtt. The square root of this term is the covariance (and correlation, since both are one) between the true and observed ΣXX = 1 scores.

" √ # √ 1 ρ ΣXY = ΣYX = ρtt Σ(X, T) = √ tt (1) ρtt 1

As indicated, the order of variables in the covariance matrix ΣYY = 1 (equation 1) is the observed score (first row/column) followed by the true score (second row/column). Substituting these components into equations 2 and 3 yields the following:

Conditional Probabilities for Single-Stage Identification √ Systems E(X|T = t) = E(X) + ρtt[t − E(T)] (5)

We use Equation A.4 and Theorem A.3 from Lindgren, Rootzén, and Sandsten (2013) to compute the conditional Σ(X|T = t) = 1 − ρtt (6) distribution of observed scores given a specific true score. The bivariate case is trivial and could have been derived using Since the expectations of the observed score (X) and the true the more familiar equations for simple or score (T) are both zero (due to standardization), the equation from classical test theory, but we present the general equations for the expectation of the observed score in this case reduces √ for the conditional multivariate normal distribution to provide to E(X|T = t) = ρtt(t). CONDITIONAL DISTRIBUTION 3

The conditional_moments() function implements the cal- both sides of the distribution of true scores. In other words, culations from equations 5 and 6, which are the mean (µ) the shrinkage on both sides cancels out. and (σ2) of the distribution of observed scores given The conditional_moments() function can also be used to a specific true score. The observed score distribution for a obtain the distribution of the true score conditional on the student with a true score of t = 1 (noting that this is a z score) observed score. The user simply provides the value to condi- in a single-stage identification system with a test reliability of tion on. The function accepts the true test score (t.true=), .9 has the following moments. the observed test score (t.obs=) or, in the case of two-stage conditional_moments(t.true=1, relyt=.9) systems, the observed nomination score (n.obs=). ## $conditional.mean conditional_moments(t.obs=1, relyt=.9) ## [,1] ## t.obs 0.9486833 ## $conditional.mean ## ## [,1] ## $conditional.cov ## t.true 0.9486833 ## [,1] ## ## t.obs 0.1 ## $conditional.cov ## [,1] In the single-stage case, this distribution is univariate normal. ## t.true 0.1 The distribution is given by The probability of identification for a student with true score t can be calculated by integrating the distribution from the √  π(X|T = t) ∼ N ρtt(t), (1 − ρtt) (7) identification cutoff (τ) to positive infinity. | {z } | {z } µ σ2

Z ∞ where N represents the normal distribution.  √  (pidentification|T = t) = N ρtt(t), (1 − ρtt) dX (8) The interpretation of this distribution is simple. Given a stu- τ dent with true score t, what distribution of observed scores do we expect? Intuitively, under CTT, we expect the variance This quantity can be calculated in R via the pnorm() func- of this distribution to be the error variance – and it is (via tion using the moments obtained from equations 5 and 6. equation 6). The student’s expected (most likely) observed Alternatively, the giftedCalcs package provides function score is the true score shrunken toward the population average conditional_p_id() for convenience. For example, the due to regression to the mean as consequence of measurement probability that a student with true ability t = 1.6 will be error, which implies an imperfect correlation between the true identified in a single-stage system with a test reliability of .90 scores and observed scores. and a cutoff at the 90th percentile is as follows: conditional_p_id(x=1.6, relyt=.9, test.cutoff=.9)

## [1] 0.7725823

100 110 120 130 140 150 Figure 1. Conditional distribution of the observed score X conditional probability given that a student has a specific true score (t=124), which of identification is indicated by the vertical reference line. Reliability is set to 90th %ile cutoff 0.77 0.90 in this example. Units have been converted from z-scores to IQ scores.

100 110 120 130 140 150 The shrinkage of the expectation of the observed scores ap- Figure 2. The probability of identification for a student with pears to violate the CTT dictum that the measurement errors a particular true score is found by calculating the area under have an expectation of zero. The contradiction is resolved that student’s conditional observed score distribution that is when one realizes that zero is only the marginal expectation above the program entry cutoff. The red diamond denotes this of measurement errors. The conditional expectation of mea- student’s true score of 124. The test reliability is 0.90 and the surement errors is symmetrically shrunken toward zero on identification cutoff is set to 119 (90th percentile). 4 MATTHEW T. MCBEE1, SCOTT J. PETERS2, & NATASHA L. GODKIN1

True positives False negatives When t is greater than the cutoff τ, this equation yields the 1.0 1.0 probability of a true positive; when t is less than the τ, a false 0.9 0.9 0.8 0.8 positive. The complement of equation 8 is the probability 0.7 0.7 0.6 0.6 of not being identified. Depending on whether t exceeds τ, 0.5 0.5 0.4 0.4 the complement probability gives either the probability of a 0.3 0.3 false negative or a true negative. The shaded area of figure 0.2 0.2 0.1 0.1

Probability of identification 0.0 0.0 2 is the quantity calculated in equation 8. In this case, since 100 105 110 115 120 125 130 135 140 145 100 105 110 115 120 125 130 135 140 145 the student’s true score exceeds the cutoff, the probability of identification (0.77) is the true positive probability. The False positives True negatives 1.0 1.0 complement (1 − 0.77 = 0.23) is the false negative probability. 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 Identification curves for single-stage systems 0.2 0.2 0.1 0.1

Probability of identification 0.0 0.0 100 105 110 115 120 125 130 135 140 145 100 105 110 115 120 125 130 135 140 145 Confirmatory test true score Confirmatory test true score Figure 4. Representation of the possible diagnostic test classi- Figure 3 displays these conditional probabilities of identifica- fications on identification curves. As test reliability increases, tion for students with a of true scores under different the slope of the curve becomes steeper, reducing the rate of test reliability conditions. An identification curve depicts classification errors. p(identification|T) as opposed to the probability for a specific choice of T, which has been denoted T = t throughout this pa- per. These curves (hereafter known as “identification curves”) bear more than a striking resemblance to the item characteris- Calculating identification curve for single- tic curves that are a hallmark of item response theory (IRT) stage systems models. They are, in fact, mathematically equivalent to the normal ogive IRT model, as both are based on the cumulative An alternative method for calculating the conditional proba- normal distribution. The “discrimination” or maximal slope of bility of identification exploits the equivalence between the the curve is a function of the test reliability. The “difficulty”, normal ogive IRT model and the previous method. The iden- or location of the 50% identification threshold, is a function tification cutoff τ and the test reliability ρXX are reparam- of both the identification cutoff and test reliability. eterized as difficulty and discrimination coefficients. This method will prove to be more convenient for performing these calculations in the two-stage system case. 1.0 0.9 The difficulty (b) and discrimination (a) coefficients can be 0.8 0.7 calculated as follows: 0.6 0.5 τ 0.4 b = √ (9) 0.3 ρXX 0.2

Probability of identification 0.1 r ρ 0.0 a = XX (10) 110 115 120 125 130 135 140 1 − ρ Confirmatory test true score XX

Test reliability 0.5 0.6 0.7 0.8 0.9 The identification probability for an individual with Confir- Figure 3. Conditional probability of identification by true matory test true score t and identification system parameters score and reliability for a single-stage identification system. a and b is given by:

Z ∞ Figure 4 illustrates how classification decisions are repre- (pidentification|T = t) = N(0, 1) dX (11) sented on identification curves. The area below the curve a(t−b) represents those are identified; the area above it represents those that are not. The area to the right of the vertical ref- Equation 11 is simply a reparameterization of equation 8 erence line represents those whose true scores exceed the as a standard cumulative normal distribution. This is the identification cutoff; the area to the left those that are below approach implemented in the conditional_p_id() func- it. tion. An identification curve can be created for a specific CONDITIONAL DISTRIBUTION 5 identification system by calculating values b and a via equa- tions 9 and 10 and then plotting a range of values of the true score t against probabilities calculated using equation 11. The Σ(NX, NT , CX, CT ) = following code chunk illustrates how an identification curve

R  √ rCX NX  can be plotted in . In this case, the curve is displayed for  1 ρ r √   NN CX NX  a single-stage identification system with a test reliability of  ρCC    0.92 and a cutoff at the 97.5th percentile.  √ r r   CX NX CX NX  t <- seq(from=0, to=4, by=.05)  ρNN 1 √ √   ρNN ρNN ρCC  p <- conditional_p_id(x=t, relyt=.92,    rC N √  test.cutoff=.975)  r √ X X 1 ρ   CX NX CC  plot(t, p, type="l")  ρNN     r r √   CX NX CX NX   √ √ ρCC 1  ρCC ρNN ρCC

Conditional Probabilities for Two-Stage Identification The nomination true score plays no role in determining either Systems giftedness or identification. It can therefore be eliminated from further consideration. The second row/column of the matrix is dropped, leaving the following: In a two-stage identification system, students must first qualify on a screening test before being able to take the confirmatory  rC N  test. The screening test can be a formal test or an informal  1 r √ X X   CX NX ρ  recommendation. Therefore, we refer to this initial step as a  CC   √  nomination stage. Students must achieve qualifying scores  r 1 ρ  Σ(NX, CX, CT ) =  CX NX CC  on both the nomination and confirmatory tests in order to be      r  identified. A negative classification at either step will prevent  CX NX √   √ ρCC 1  a student from qualifying. Students who are actually qualified ρCC can experience a false negative at either stage, and the risk of this is particularly high when the nomination process has low The component that we wish to condition on is CT , the confir- validity. Conversely, unqualified students must experience matory test true score. It is located in the last row/column. Its two consecutive false positive errors in order to be identified. variance becomes term ΣYY in the decomposition. Thus, nomination stages reduce both sensitivity and the false positive rate. Whether this is desirable or not depends on the context, but the impact on sensitivity can be severe (McBee ΣYY = 1 et al., 2016). We wish to obtain the multivariate distribution of NX and Under classical test theory, both the nomination and confirma- CX given CT = t, because this determines the probability of tory assessments have true- and observed-score instantiations. identification in a two-stage system. Therefore, the covariance Let the subscripts X and T refer to the observed score and true matrix is decomposed into the following components. score, respectively, while variable N denotes the score on the nomination instrument and C the score on the confirmatory " # 1 rCX NX test. Therefore, and for example, NT would be the nomination ΣXX = rCX NX 1 true score; CX would the confirmatory test observed score.

The term ΣXY consists of the between the ele- Via CTT, we assume that the joint distribution of these four ments of Σ and Σ . components is multivariate normal. Students are actually2 XX YY gifted if and only if their confirmatory test true score exceeds  r  the identification cutoff, but they are positively identified if CX NX  √   ρCC  both their nomination assessment and confirmatory test true ΣXY =    √  scores are above their respective cutoffs. ρCC

The covariance matrix for the nomination and confirmatory And ΣYX is the transpose of ΣXY. test true and observed scores was given by McBee et al. (2016), p. 264. It is reproduced here with the terms presented 2We hesitate to use the more natural phrase "truly gifted" due to in an alternate order. the unfortunate baggage this phrase carries within the field. 6 MATTHEW T. MCBEE1, SCOTT J. PETERS2, & NATASHA L. GODKIN1

display a test cutoff at the 90th percentile and a nomination " r √ # −1 CX NX cutoff at the 85th percentile. The proportion of the area under ΣYX = ΣXY = √ ρCC ρCC this multivariate distribution that lies above both cutoffs is this student’s conditional probability of identification, which Solving equations 2 and 3 with these components yields the is quite low in this case even though the student’s true ability conditional mean vector (µ ) and the conditional covariance T=t score (t = 1.4) is substantially above the cutoff (τ = 1.28). matrix (Σ ) of the joint multivariate normal distribution of T=t This student is likely to experience a false negative identifica- the nomination observed score and confirmatory test observed tion outcome. score. These calculations can be performed with the 145 conditional_moments() function. A two-stage system is 140 implemented when a nomination validity value is supplied 135 to the function via the valid= argument. The conditional 130 moments for a student with a true ability score of t = 1.4 in 125 an identification with a test reliability of .90 and a nomination 120 validity of .50 are as follows. 115 110 conditional_moments(t.true=1.4, relyt=.9, valid=.5) 105 100 Confirmatory test observed observed Confirmatory ## $conditional.mean 95 ## [,1] 90 ## n.obs 0.7378648 80 85 90 95 100 105 110 115 120 125 130 135 140 ## t.obs 1.3281566 Nomination observed score ## density ## $conditional.cov 0.1 0.2 0.3 0.4 0.5 ## n.obs t.obs Figure 5. Heat map of the conditional multivariate normal ## n.obs 0.7222222 0.0 distribution of the confirmatory test observed score (y-axis) ## t.obs 0.0000000 0.1 and the nomination observed score (x-axis) conditional on a true test score of z=1.4 under a two-stage identification The returned $conditional.mean value is the (2 × 1) mean system with a nomination validity of 0.5 and a test reliability vector, where the first element is the student’s mean ob- of 0.9. The reference lines display test and nomination cutoffs served score on the nomination and the second is the stu- at the 90th and 85th percentiles, respectively. dent’s mean observed score on the confirmatory test. The $conditional.cov value is the (2 × 2) variance-covariance matrix, with the variances on the diagonal and covariance The identification probability for a student with confirmatory on the off-diagonal. The (1, 1) element is the variance of the test true score t is computed by calculating the area under observed nomination score, the (2, 2) element is the variance this distribution that is simultaneously above the nomination of the observed confirmatory test score, and the covariance is cutoff (ν) and the confirmatory test cutoff (τ), both of which zero in this case3. are expressed as z-scores. A conditional multivariate normal distribution with these mo- ments is visualized in Figure 5. Though the conditional (pidentified|CT = t) = Z ∞ Z ∞ mean vector contains the most likely values of the nomi- (12) N2(µT=t, ΣT=t) dNX dCX nation and test observed scores, measurement error and im- τ ν perfect nomination validity imply that many plausible ob- served scores could be obtained for this student. The col- Figure 6 attempts to illustrate the meaning of equation 12. The oration in the figure describes the of any joint distribution of the nomination observed score, confirma- nomination-test observed score pairing. The output from tory test true score, and confirmatory test observed scores is conditional_moments() showed that the nomination vari- a three-dimensional multivariate normal distribution, repre- ance was much higher than the test variance. This is displayed sented by the elliptical shape in the figure. Choosing a partic- by the elliptical shape of the distribution, which has much ular value of the confirmatory test true score is akin to taking more “spread” with respect to the nomination than with re- a two-dimensional slice through that plane, which is itself a spect to the test. two-dimensional multivariate normal distribution. The area

The student is identified if both of these observed scores ex- 3The covariance is zero because the confirmatory test and nomi- ceed their respective cutoffs. The reference lines on the figure nation are independent after conditioning on the true score. CONDITIONAL DISTRIBUTION 7

The combination of low validity with a high nomination cutoff results in a broad, relatively shallow identification curve, in which even students of very high ability have little chance of identification. The IRT comparison again becomes salient: many of these curves have the character of very poor items that would likely be removed or revised during test develop- ment – and yet they typify many existing gifted identification systems, which feature low nomination validity paired with a high nomination cutoff. As the nomination validity increases, higher nomination cut- offs can be tolerated while preserving relatively steep curves. The identification curves converge toward the universal screen- ing curve as this happens. An animation of these curves chang- Figure 6. The elliptical shape represents the multivariate ing with nomination validity can be found at the following normal distribution of the confirmatory test true score, confir- URL: http://goo.gl/3DWDWj (see file curves.gif ). matory test observed score, and screening test observed score. A bivariate normal distribution of the nomination observed Nomination validity = 0.4 Nomination validity = 0.5 1.00 score and the confirmatory test observed score is implied by every choice of the confirmatory test true score. One can 0.75 imagine the plane at t sliding along the distribution. For each 0.50 choice of t, the parameters of the resulting bivariate normal distribution, indicated by the intersection of the plane and the 0.25 , can be computed using equations (2) 0.00 and (3). The probability of identification, shaded here in red, Nomination validity = 0.6 Nomination validity = 0.7 1.00 can then be computed using equation (12) or (17). 0.75

0.50 of that distribution which is simultaneously above both the 0.25 nomination and test cutoff is the probability of identification. 0.00 Probability of identification The conditional_p_id() function calculates these condi- Nomination validity = 0.8 Nomination validity = 0.9 tional probabilities for two-stage systems when values are 1.00 passed to the nomination validity (valid=) and nomination 0.75 cutoff (nom.cutoff=) arguments. For example, the identi- 0.50 fication probability of student with a true score of t = 1.5 in a two-stage system with a nomination cutoff at the 80th 0.25 percentile, a nomination validity of 0.5, a test reliability of 0.00 0.9, and a test cutoff at the 90th percentile is calculated as 115 120 125 130 135 115 120 125 130 135 Confirmatory test true score follows: conditional_p_id(x=1.5, relyt=.9, Nomination cutoff %ile Universal screening 70th 80th 90th test.cutoff=.9, valid=.5, nom.cutoff=.8) Figure 7. Probability of identification in a two-stage identifica- tion system by confirmatory test true score, nomination cutoff, ## [1] 0.3202376 and nomination validity. All plots assume a test reliability of 0.95 and a test cutoff at the 90th percentile, as indicated by Identification curves for two-stage systems the vertical reference line. Panels: nomination validity. Color and shape: nomination cutoff. Figure 7 shows a set of identification curves for a two-stage identification process under various levels of nomination va- lidity and nomination cutoff. Each panel features a curve for Calculating identification curve parameters for two-stage universal screening, in which the nomination phase is elimi- systems nated and all students take the confirmatory test. These curves are identical across all the panels because the nomination va- There is a simpler method for calculating identification prob- lidity is irrelevant if every student is (effectively) nominated. abilities that is, once again, based on a correspondence with 8 MATTHEW T. MCBEE1, SCOTT J. PETERS2, & NATASHA L. GODKIN1 the normal ogive IRT model. The two-stage system identifi- 1.00 cation curves are a composite of two curves: one related to the confirmatory test, and the other related to the nomination 0.75 assessment. Each of the these curves is defined by “difficulty” (b) and “discrimination” (a) coefficients that can be calculated 0.50 from four pieces of information: the nomination cutoff (ν), the 0.25 test cutoff (τ), the nomination validity coefficient (rCX NX ), and Probability of identification the test reliability (ρCC). (The nomination reliability plays no 0.00 role other than imposing a ceiling on the validity coefficient). 110 115 120 125 130 135 Confirmatory test true score The difficulty and discrimination coefficients for the confirma- tory test are given by the following equations. Note that these Component Confirmatory test Nomination Composite are identical to equations 9 and 10 except for the notation, Figure 8. The two-stage system identification curve is the which has been altered to be consistent with the two-stage pointwise product of the confirmatory test curve and the nom- context. ination curve. In this example, the nomination validity is 0.6, the test reliability is 0.9, the nomination cutoff is at the τ 90th percentile, and the test cutoff is at the 90th percentile, as bC = √ (13) indicated by the vertical reference line. ρCC

r ρ a = CC (14) students. Bayes’ theorem for probability distributions pro- C 1 − ρ CC vides a means of transforming the probability of identification given the true score, p(identified|T) into the distribution of The equations for the nomination process are given below. true scores given identification, p(T|identified). √ ν ρCC Bayes’ theorem for probability distributions is written in its bN = (15) generic form as rCX NX

v π(θ) f (x|θ) t 2 r /ρCC π(θ|x) = CX NX f (x) aN = (16) 1 − (r2 /ρ ) CX NX CC where π(θ|x) is the conditional of θ The probability of identification is the product of the proba- given x, π(θ) is the marginal distribution of θ, f (x|θ) is the bility of being nominated and the probability of passing the conditional distribution of x given θ, and f (x) is the marginal confirmatory test, as shown in equation 17. distribution of x, which is itself given by

(p |T = t) = Z ∞ identification | Z ∞ Z ∞ f (x) = π(θ) f (x θ) dθ −∞ N(0, 1) dCT N(0, 1) dCT (17) a (t−b ) a (t−b ) | N N {z } | C C {z } The denominator term f (x) is a normalizing constant which nomination confirmatory test pass probability pass probability is necessary to ensure that π(θ|x) is a true probability distribu- tion with a total area of one. In many applications of Bayes’ This is the method that is implemented in the theorem, f (x) is intractable or inconvenient to obtain. The conditional_p_id() function. Evaluating the product of numerator, despite not being itself a true probability distri- the two one-dimensional integrals in equation 17 is faster and bution, is still exceptionally useful. It differs from the true more numerically stable than evaluating the double integral distribution only by a constant scaling factor and thus has in equation 12. the same shape as the true distribution. As a result it can answer most questions that one would like to answer from this distribution. The true probability distribution is therefore Conditional Ability True Score Distribution for not equal to the numerator, but is proportional to it. Identified Students

One of the many applications of an identification curve is to π(θ|x) ∝ π(θ) × f (x|θ) |{z} |{z} |{z} use it to understand the ability distribution of the identified posterior prior likelihood CONDITIONAL DISTRIBUTION 9

In Bayesian terminology, π(θ) is called the prior distribution 1.0 of θ and typically represents a state of belief about the rela- 0.9 tive likelihood of different values of θ prior to encountering 0.8 . The term f (x|θ) is the likelihood of the data x given θ. 0.7 All of the relevant information in the data is contained in this 0.6 term. These two components are multiplied together, and their 0.5 product is proportional to the posterior distribution, π(θ|x), 0.4 0.3 the updated belief about the distribution of θ that combines 0.2 information from the prior with information from the data. Probability of identification 0.1 Regarding gifted identification, the “prior” is the marginal 0.0 100 105 110 115 120 125 130 135 distribution of ability (or whatever construct is assessed via Confirmatory test true score the confirmatory test). Let us regard it as a standard normal Figure 9. The conditional distribution of true scores for iden- distribution, representing the distribution of ability or achieve- tified students can be approximated by evaluating the normal ment in the general population. It is the “prior” in the sense probability density function and the identification curve at a that it is the distribution that we expect for ability before we set of true score values. These values are multiplied together know whether the students are identified. The “likelihood” pointwise to approximate the distribution. is the identification curve. The denominator is simply the marginal probability of identification. parameters: the test reliability, the test cutoff, the nomination validity, and the nomination cutoff. The density of the true f (T) × p(identification|T) π(T|identification) = (18) score (rather than the observed score) is returned so long as the p(identification) relyt= argument is specified. For example, the normalized probability density of a true score of t = 2 in a two-stage identification system with a test reliability of 0.95, a test π(T|identification) ∝ f (T) × p(identification|T) (19) cutoff at the 90th percentile, a nomination validity of 0.6, and a nomination cutoff at the 90th percentile is calculated as We refer to the quantity in equation 18 as the normalized con- follows: ditional probability distribution and the quantity in equation d_identified(x=2, relyt=.95, 19 as the unnormalized conditional probability density. test.cutoff=.9, valid=.6, nom.cutoff=.9, normalize=T) Since p(identification|T) is not a probability distribution, the unnormalized conditional distribution π(T|identification) ## [1] 0.6556902 from equation 19 can be approximated by the pointwise prod- This value itself has no absolute interpretation, but the proba- uct of a normal probability density function and the identifi- bility density can be visualized by plotting the density against cation curve. Figure 9 illustrates this process. Select a set of a set of possible true scores. An example is shown in figure points (true scores) for evaluation, represented in the figure as 10. The approximation becomes increasingly accurate as the tick marks along the x-axis. Evaluate the standard normal pdf number of evaluation points is increased. The following code at each of these values (perhaps using R’s dnorm() function) illustrates how such a plot can be made. Plotting the point- and the identification curve at these values via equations 8 wise products against the chosen set of values of the true score or 11 for single-stage systems, or equations 12 or 17 for two- provides a visual depiction of the conditional distribution of stage systems4. The resulting values to be multiplied together true scores for identified students. are indicated by open circles on the normal distribution and the identification curve in the figure. # create a vector of true scores tscore <- seq(-2,4, by=.01) Another way to think about equation 18 that may be more in- # density at each true score tuitive to readers who are relatively unfamiliar with Bayesian p <- d_identified(x=tscore, relyt=.9, logic is to view p(identification|T), the identification curve, as test.cutoff=.9, valid=.6, nom.cutoff=.9, a weighting function applied to the normal distribution f (T). normalize=T) # plot the curve The resulting probability density is the distribution plot(x=tscore, y=p, type="l") of ability true scores for identified students. The function d_identified() calculates the normalized 4One can use the two-stage equations for calculating identifica- (normalize=TRUE) or unnormalized (normalize=FALSE) tion probabilities for single-stage systems by setting the nomination density given the true score and the four relevant system cutoff percentile to a value very close to zero. 10 MATTHEW T. MCBEE1, SCOTT J. PETERS2, & NATASHA L. GODKIN1

low nomination validity in the two-stage system dramatically reduces its sensitivity, resulting in a much lower identification rate. Figure 12 shows a relative comparison of each of the identification curves for the two-stage systems from Figure 11 versus universal screening. In both two-stage systems, the impact of adding the nomination stage falls most heavily on students just above the cutoff, but the overall harm is far more severe for the ineffective identification system, whose

100 105 110 115 120 125 130 135 140 145 nomination cutoff is much higher than its validity justifies. Confirmatory test true score

Figure 10. Conditional probability density of true scores for 1.0 identified students under universal screening. The identifica- 0.9 tion cutoff is represented by the vertical reference line. 0.8

0.7

0.6

Figure 10 displays an example of the density of true scores 0.5 for identified students. The shape of the curve is interesting. 0.4 In an idealized perfect identification system, the identification 0.3

curve would be a step function, and the ability distribution of screening universal versus 0.2 identified students would follow a truncated normal distribu- probability identification Relative 0.1 tion. In an idealized completely non diagnostic system, the 90th 95th 97.5th 99th 99.9th identification curve would be flat, and the ability distribution 0.0 110 115 120 125 130 135 140 145 of identified students would follow a normal distribution. Ac- Confirmatory test true score tual identification systems lie between these extremes, and the Validity=.50, Nomination cutoff = 90th %ile resulting conditional distribution is likewise a compromise Nomination parameters Validity = .85, Nomination cutoff = 70th %ile between two known distributions. Figure 12. Comparison of ineffective and effective two-stage identification systems versus a single-stage system (universal screening). Universal screening is treated as baseline and is represented by the horizontal line at p=1. The plotted lines display how the identification curves differ from baseline. The deleterious impact of the nomination stage is most severe for students whose true scores are just above the cutoff.

Other distribution functions for the true scores of identi- fied students

100 105 110 115 120 125 130 135 140 145 Confirmatory test true score The giftedCalcs package contains three other condi- tional distribution functions. They are the cumulative universal screening density function, p_identified(), the nomination validity=.50, nomination cutoff=90th %ile q_identified() r_identified() Figure 11. Comparison of unnormalized probability densities , and random sampling . for a single-stage identification system (universal screening) The cumulative density function, Π(T|identification), is cal- versus a two-stage system with a nomination validty of 0.5 and culated by the p_identified() function. It is the probabil- a nomination cutoff at the 90th percentile. In both cases, the ity that an identified student has a true ability score smaller test reliability if 0.9 and the test cutoff is at the 90th percentile. than the specified score. The conditional cumulative density The area under each curve is equal to the identification rate, function (cdf) is given by: which is much higher under universal screening.

Z T Figure 11 compares the unnormalized densities of identified Π(T|identification) = π(T|identification) dT (20) student true scores for a one-stage versus a two-stage sys- −∞ tem. Because the densities are not normalized, the areas they enclose are equal to the expected identification rates under Like the cumulative normal density on which it is based, this each system. The combination of high nomination cutoff and equation has no closed-form expression. In substantive terms, CONDITIONAL DISTRIBUTION 11 the cdf describes what proportion of identified students have 145 true ability scores below the specified value. In other words, 140 given a true score, the cdf returns that score’s percentile given 135 the identification system parameters. For example, the per- 130 centile for a student with true ability t = 2 in a two-stage 125 system with a test reliability of 0.9, a test cutoff at the 90th 120 percentile, a nomination validity of 0.5, and a nomination 115 cutoff at the 80th percentile is computed as follows: test true score Confirmatory 110 p_identified(x=2, relyt=.9, test.cutoff=.9, 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 valid=.5, nom.cutoff=.8) Percentile Figure 14. Quantile function of true scores for identified ## [1] 0.7009637 students in a two-stage system with a test reliability of 0.95, a test cutoff at the 90th percentile, a nomination validity of This student is at the 98th percentile of general ability but is 0.70, and a nomination cutoff at the 90th percentile. The only at the 70th percentile among identified students in this identification cutoff is represented by the horizonal reference program. line.

1.0 0.9 0.8 useful for simulation or for illustrative purposes. The func- 0.7 tion is based on rejection sampling as implemented by the 0.6 0.5 SimDesign package. Like the other functions, it simulates 0.4 data from a two-stage identification system if values are pro- 0.3 vided for arguments valid= and nom.cutoff=. True scores Cumulative density Cumulative 0.2 0.1 are simulated if a value is provided for argument relyt=, 0.0 otherwise observed scores are simulated. The following code 100 105 110 115 120 125 130 135 140 145 samples n = 5 values for identified students from a two-stage Confirmatory test true score Figure 13. Conditional cumulative density of true scores system with a test reliability of 0.95, a test cutoff at the 97.5th for identified students under universal screening with a test percentile, a nomination validity of 0.7, and a nomination reliability of 0.95. The identification cutoff is represented by cutoff at the 88th percentile. the vertical reference line. r_identified(n=5, test.cutoff=.975, relyt=.95, valid=.7, nom.cutoff=.88) The quantile function is the inverse of the cumulative density ## [1] 2.161651 2.061438 2.609705 1.826694 1.865042 function. Given a percentile, it returns the corresponding ability true score. The q_identified() function employs the uniroot() one-dimensional root finding function from Moments of the true score distribution for identified stu- base-R to find the input to the p_identified() function that dents returns the desired percentile. The moments of the conditional probability density function For example, the the true score representing the 70th per- for identified students can be derived from basic probability centile score among identified students in a two-stage system theory. with a test reliability of 0.9, a test cutoff at the 90th percentile, a nomination validity of 0.5, and a nomination cutoff at the Expected value. The expected value of the true scores for 80th percentile is computed as follows: identified students is given by: q_identified(p=.7, relyt=.9, test.cutoff=.9, Z ∞ valid=.5, nom.cutoff=.8) E(T|identified) = (T) π(T|identified) dT (21) −∞ ## [1] 1.998423 It can be calculated via the mean_identified() function. Converted to an IQ metric, this is a score of 130. For example, the mean ability true score for students identified in a single-stage system with a test reliability of 0.92 and a Random generation cutoff at the 97th percentile can be computed as follows: mean_identified(relyt=.92, test.cutoff=.97) The r_identified() function can sample random values from the distribution of identified students. This could be ## [1] 2.175452 12 MATTHEW T. MCBEE1, SCOTT J. PETERS2, & NATASHA L. GODKIN1

Standard deviation. The variance of true ability scores is In the single-stage process, all individuals with observed test given by: scores above the identification cutoff are identified. Thus, the curve relating the test observed score to the probability of identification is a step function, as shown in Figure 15. This Var(T|identified) = curve does not depend on the test’s reliability. Reliability Z ∞ (22) dictates the degree of similarity between the true scores and (T 2) π(T|identified) dT − [E(T|identified)]2 the observed scores. −∞ 1.00 and the is the square root of the variance. 0.75 p SD(T|identified) = Var(T|identified) (23) 0.50

0.25 This calculation is implemented in the sd_identified() Probability of identification function. For example, the standard deviation of true ability 0.00 scores for identified students in the single-stage system used 100 110 120 130 140 Confirmatory test observed score in the previous example can be calculated as follows: Figure 15. In a single-stage system, the identification curve sd_identified(relyt=.92, test.cutoff=.97) relating the observed test score to the probability of identifi- cation is a step function. ## [1] 0.4380773

Because the conditional density of true scores is skewed, The probability density function for observed scores is given the mean and standard deviation do not together provide a by: complete description of the distribution, as they do with the normal distribution. Still, it can be surprising to see just how much variability there is even among identified students. For f (X) × p(identification|X) π(X|identification) = (24) example, the standard deviation of true ability scores in a sin- p(identification) gle stage system with a 90th percentile cutoff and a reliability of 0.90 is 0.50. Even this restricted range of students has half the variability of typical students. π(X|identification) ∝ f (X) × p(identification|X) (25)

Conditional Distribution of Observed Scores The resulting probability density of observed scores under a one stage system is a truncated normal distribution, as shown Identification curve and probability density for observed in Figure 16. This is obtained by computing the pointwise scores product of the step function and the marginal distribution of ability observed scores, and follows from the step function An identification curve relating the observed test score character of p(identification|X). to the probability of identification could also be imag- ined. In this case, the quantity to be calculated is However, in a two-stage system, the identification curve re- p(identification|X), which can also be calculated using the lating observed scores to the probability of identification is conditional_p_id() function. One simply does not spec- not a step function. This is because some of the students who ify a value for the test reliability relyt, causing the function would have had observed scores in the identifiable range do to interpret the value of x as an observed score rather than not pass through the nomination stage, and therefore have no a true score5. In the example code below, a vector of two opportunity to take the confirmatory test. observed test z-scores (1.2 and 1.3) are passed to the function. Figure 18 compares the normalized and unnormalized den- The first is below the 90th percentile cutoff, the second above sities of the observed scores for identified students under it. The function returns a zero and a one, indicating that the universal screening versus an inefficient two stage system in first student has a zero probability of identification while the which the nomination cutoff is too high for the nomination second student is certain to be identified. validity. The area enclosed by the unnormalized distributions conditional_p_id(x=c(1.2, 1.3), is proportional to the identification rate under each system. test.cutoff=.9) 5In fact, nearly all the functions in the giftedCalcs package work ## [1] 0 1 in this manner. CONDITIONAL DISTRIBUTION 13

100 105 110 115 120 125 130 135 140 145 Confirmatory test observed score Figure 16. Conditional probability density of observed scores for identified students under universal screening. The identifi- cation cutoff is represented by the vertical reference line.

0.8

0.6 Figure 19. Screenshot of the Shiny app

0.4

0.2 Shiny Web App for Computing Identification Curves and Marginal Psychometrics Probability of identification

0.0 100 110 120 130 140 Confirmatory test observed score We have produced a Shiny web application for calculating Figure 17. Identification curve relating the observed test score marginal program psychometrics and displaying identification to the probability of identification for a two-stage identifica- curves. The app can be accessed at https://mmcbee.shinyapps. tion system. In this example, the nomination validty is 0.5 and io/gifted_identification_explorer/. both the nomination and test cutoffs are at the 90th percentile.

Normalized density Discussion

The equations presented in this paper have many potential uses. Identification curves, in particular, may prove helpful in understanding and anticipating the consequences of different system designs. Scholars who know IRT may find many of the identification curves presented in this paper to be almost comically terrible. We agree. We must face the reality that

120 130 140 150 many (most?) gifted identification systems implemented in Confirmatory test observed score American schools perform as badly as test items that would Unnormalized density immediately be flagged as problematic. The solution is the same in both contexts: revise the process until its performance meets a minimum standard of quality. This work addresses a shortcoming of previous work’s (e.g., McBee et al, 2014; McBee et al. 2016) focus on marginal performance metrics such as sensitivity: these metrics im- plicitly assume the existence of discrete groups of gifted and non-gifted individuals. The concept of giftedness-as-latent-

120 130 140 150 trait has not, in our opinion, been a positive one for our field, Confirmatory test observed score and it is therefore unfortunate to reinforce such notions, even Figure 18. Normalized and unnormalized conditional prob- unintentionally, via the introduction of mathematical tools ability densities of observed scores for identified students for analyzing identification systems. Identification curves under an inefficient two-stage system (blue) versus universal relating the true scores to the probability of identification are screening (red). The identification cutoff is represented by the a superior method of communicating the performance of such vertical reference line. systems. With practice, one can easily and accurately estimate 14 MATTHEW T. MCBEE1, SCOTT J. PETERS2, & NATASHA L. GODKIN1

the marginal statistics from the curve, but the curve communi- curve fitting to produce point estimates and bootstrapping cates far more information about the likely fate of particular to recover the . Future research will students as they encounter the identification process. The extend and further evaluate the conditions under which this curves reinforce the notion of “giftedness” as a continuously- approach is trustworthy. It may even prove possible to re- varying characteristic. There is no discontinuity or “phase cover the identification curve itself from the distribution of change” at the test cutoff; and a student one point below the scores of identified students. We caution readers to apply the threshold on the true score is not qualitatively different from estimate_performance() function carefully, or not at all, a student right above it. Identification curves describe what is to issues with real-world consequences until the validity of likely to happen to both students without pretending that they the approach is more formally established. belong to meaningfully distinct categories. We have used the word “ability” throughout this manuscript to refer to the construct being considered in deference to the prominence that ability measures maintain in determining gifted eligibility in most states and school districts. How- ever, nothing in the mathematics is specific to ability. The construct could be any measurable quality or trait that forms the basis of a classification decision. For example, the word “ability” could be replaced with “achievement” throughout this manuscript without altering the meaning. Nor is the classification limited to gifted programs, specifically. These equations (and software tools) can be used, in principle, to inform understanding of medical diagnosis, special education placement, college admissions, honors programs, tryouts, or auditions. The derivation of the conditional probability density function for the true scores of identified students makes it possible for gifted program personnel to approach the topic of iden- tification from a different position: to begin by considering the distribution of true scores that is desired or necessary to thrive in the program. This distribution is influenced by the of the reservoir of available scores in the general population with the identification curve. The distribution of true scores for identified students emerges from this. The typical method for designing systems is to choose the identifi- cation and nomination cutoffs based on historical precedent, political considerations, or resource availability. The tools pre- sented in the paper make it easy to examine the consequences of those decisions. A more profitable, defensible, and dare we say ethical approach to identification is to envision first the desired distribution of true scores and work backward to the process necessary to create it. The modal true ability score among identified students is substantially higher than the cutoff, even when the modal observed ability score is at the cutoff. And it is the true score that actually dictates what a student can do and what he or she is ready to learn next. The derivation of the conditional density for observed scores opens the tantalizing possibility of estimating the marginal psychometrics of an identification process from the distribu- tion of scores for the identified students. We have already begun work on this idea, and initial results are encourag- ing. The estimate_performance() function represents a preliminary implementation of this concept, using nonlinear CONDITIONAL DISTRIBUTION 15

Acknowledgements program identification.

We used the following R packages in this work. R (Version McBee, M., Peters, S. J., & Miller, E. M. (2016). The 3.5.1; R Core Team, 2017) and the R-packages bindrcpp impact of the nomination stage on gifted program (Version 0.2.2; Müller, 2017), extrafont (Version 0.17; Win- identification: A comprehensive psychometric anal- ston Chang, 2014), ggplot2 (Version 3.1.0; Wickham, 2009), ysis. Gifted Child Quarterly, 60(4), 258–278. giftedCalcs (Version 0.1.1; McBee, n.d.), gridExtra (Version doi:https://doi.org/10.1177/0016986216656256 2.3; Auguie, 2017), MASS (Version 7.3.51.1; Venables & Ripley, 2002), minpack.lm (Version 1.2.1; Elzhov, Mullen, McBee, M., Peters, S. J., & Waterman, C. (2014). Spiess, & Bolker, 2016), mnormt (Version 1.5.5; Azzalini Combining scores in multiple-criteria assess- & Genz, 2016), papaja (Version 0.1.0.9842; Aust & Barth, ment systems: The impact of combination 2018), pbapply (Version 1.3.4; Solymos & Zawadzki, 2018), rule. Gifted Child Quarterly, 58(1), 69–89. reshape2 (Version 1.4.3; Wickham, 2007), SimDesign (Ver- doi:https://doi.org/10.1177/0016986213513794 sion 1.11; Chalmers, 2018), viridis (Version 0.5.1; Garnier, McClain, M.-C., & Pfeiffer, S. (2012). Identification of gifted 2017a, 2017b), and viridisLite (Version 0.3.0; Garnier, 2017b) students in the united states today: A look at state definitions, policies, and practices. Journal of Ap- References plied School Psychology, 28(1), 59–88.

Auguie, B. (2017). GridExtra: Miscellaneous functions Müller, K. (2017). Bindrcpp: An ’rcpp’ interface to active for “grid” graphics. Retrieved from https://CRAN. bindings. Retrieved from https://CRAN.R-project. R-project.org/package=gridExtra org/package=bindrcpp

Aust, F., & Barth, M. (2018). papaja: Create APA R Core Team. (2017). R: A language and environment for manuscripts with R Markdown. Retrieved from statistical computing. Vienna, Austria: R Foun- https://github.com/crsh/papaja dation for Statistical Computing. Retrieved from Azzalini, A., & Genz, A. (2016). The R package mnormt: https://www.R-project.org/ The multivariate normal and t distributions (version 1.5-5). Retrieved from http://azzalini.stat.unipd.it/ Solymos, P., & Zawadzki, Z. (2018). Pbapply: Adding SW/Pkg-mnormt progress bar to ’*apply’ functions. Retrieved from https://CRAN.R-project.org/package=pbapply Chalmers, P. (2018). SimDesign: Structure for organizing monte carlo simulation designs. Retrieved from Venables, W. N., & Ripley, B. D. (2002). Modern applied https://CRAN.R-project.org/package=SimDesign statistics with s (Fourth.). New York: Springer. Re- trieved from http://www.stats.ox.ac.uk/pub/MASS4 Elzhov, T. V., Mullen, K. M., Spiess, A.-N., & Bolker, B. Minpack.lm: R interface to the levenberg- (2016). Wickham, H. (2007). Reshaping data with the reshape pack- marquardt nonlinear least-squares algorithm found age. Journal of Statistical Software, 21(12), 1–20. in minpack, plus support for bounds . Retrieved from Retrieved from http://www.jstatsoft.org/v21/i12/ https://CRAN.R-project.org/package=minpack.lm Garnier, S. (2017a). Viridis: Default color maps from ’mat- Wickham, H. (2009). Ggplot2: Elegant graphics for data plotlib’. Retrieved from https://CRAN.R-project. analysis. Springer-Verlag New York. Retrieved from org/package=viridis http://ggplot2.org Garnier, S. (2017b). ViridisLite: Default color maps from Winston Chang. (2014). Extrafont: Tools for using fonts. Re- ’matplotlib’ (lite version). Retrieved from https: trieved from https://CRAN.R-project.org/package= //CRAN.R-project.org/package=viridisLite extrafont Lindgren, G., Rootzén, H., & Sandsten, M. (2013). Station- Yoon, S. Y., & Gentry, M. (2009). Racial and ethnic rep- ary stochastic processes for scientists and engineers. resentation in gifted programs: Current status of Boca Raton, FL: CRC Press. and implications for gifted asian american students. McBee, M. (n.d.). GiftedCalcs: Psychometrics for gifted Gifted Child Quarterly, 53(2), 121–136.