7 X 10 Paper 10 Pt. Body
Total Page:16
File Type:pdf, Size:1020Kb
JOURNAL OF APPLIED MEASUREMENT, 7(1), Copyright© 2006 Rasch Analysis of Rank-Ordered Data John M. Linacre University of Sydney, Australia Theoretical and practical aspects of several methods for the construction of linear measures from rank-or- dered data are presented. The final partial-rankings of 356 professional golfers participating in 47 stroke-play tournaments are used for illustration. The methods include decomposing the rankings into independent paired comparisons without ties, into dependent paired comparisons without ties and into independent paired com- parisons with ties. A further method, which is easier to implement, entails modeling each tournament as a partial-credit item in which the rank of each golfer is treated as the observation of a category on a partial- credit rating scale. For the golf data, the partial-credit method yields measures with greater face validity than the paired comparison methods. The methods are implemented with the computer programs Facets and Winsteps. Requests for reprints should be sent to 2 L INACRE Rank-ordered data have characteristics that log (bVS / bEE) = BVS – BEE (1) align well with the family of Rasch measurement = log(7/6) = 0.15 models. Ranks are observations of elements im- plying qualitatively more, ordered along an im- A general form of this, similar to models plicit or explicit variable. A single set of ranks, proposed by Bradley and Terry (1952) and Luce called here a “ranking”, contains only enough in- (1959), is: formation to order the elements. If there are two log (Pnm / Pmn) = Bn – Bm (2) or more rankings of the same elements, then there where P is the probability that element n ranks may be enough information to construct interval nm higher than element m. measures of the distances between the elements. The interval measures support inferences about For estimation purposes this becomes: future performance, and also investigation into Bn – Bm = log (Pnm / Pmn) (3) the consistency of particular rankings, or of the ≈ log (Tnm / Tmn ) ranks of an element across rankings. where T is the number of tournaments in In this paper, the rankings of golfers in the 47 nm which golfer n is ranked above golfer m and USPGA-accredited stroke-play golf tournaments vice-versa. The standard error of the measure of 2004 will be used illustratively. Vijay Singh difference B – B is won the 46th Tournament, the Chrysler Classic, n m ½ becoming the first-ever golfer to win $10 million SE = ((Tnm + Tmn )/(Tmn*Tnm)) . (4) in prize money in one year. A total of 680 golfers This model can be implemented directly in the are listed as participating in those 47 tournaments. Rasch software program Facets (Linacre, 2004a). Of these, 17 golfers did not post a final score in at When this is done, estimates are obtained more least one tournament. In addition, 306 players had speedily and robustly when each data point is a final score in only one tournament. These in- entered twice, once as Bn vs. Bm and again as Bm cluded many international and veteran players, vs. B , then weighted 0.5. such as Tom Weiskopf. A further 52 players par- n ticipated in two tournaments, including Jack Implementing this model in other standard Nicklaus and Arnold Palmer. At the other extreme, Rasch software is straightforward. The usual rect- one player participated in 36 tournaments, Esteban angular dataset format is that items are columns Toledo. Vijay Singh played in 28 tournaments, and persons are rows. In this golfing example, Tiger Woods in 18. On average, 133 golfers par- each tournament would be an item, and each ticipated in each tournament. player paired-comparison a row. The player with the higher ranking is scored “1”, and the other Paired Comparisons without Ties player “0”. Analysis of this data set will give the expected measure difference of 0.15 logits with The simplest ranking comprises of two ele- software based on conditional maximum likeli- ments. This is a paired comparison. In the ESPN hood estimation (CMLE), but will yield a mea- th World Ranking after the 45 Tournament, the top sure difference of 0.30 = 2*0.15 for software two players were Vijay Singh (VS) and Ernie Els based on joint maximum likelihood estimation (EE). These two golfers played in the same tour- (JMLE). This estimation bias was noted by naments 13 times. In the first 12 of these tourna- Anderson (1973) and also its correction: divide ments, VS was better ranked 6 times, and EE bet- measure difference by two. ter ranked six times. This would estimate the two In the example, the SE of B – B is ((6+7)/ players to have the same golfing ability. In the VS EE ½ 13th tournament, VS ranked higher than EE. A ra- (6 * 7)) = 0.56 logits. Under JMLE, 0.56 is re- ported as the S.E. of each of B and B , so the tio-scale “odds” comparison of the abilities, b VS EE VS joint SE of B – B is overestimated as 0.56 * and b , of VS and EE would be b / b = 7/6, or VS EE EE VS EE ½ in interval “log-odds” scaling, 2 . Accordingly, each JMLE element SE needs RASCH A NALYSIS OF RANK-ORDERED DATA 3 to be divided by 2½. This is done routinely in This may require some computer programming. Winsteps (Linacre, 2004b) when Paired = Yes is A more fundamental problem is that paired com- specified. parisons within a ranking are not actually inde- pendent as (1) or (3) imply, but are required to Rank Orders as Multiple be self-consistent. Independent Paired Comparisons Rank Orders as Multiple Ernie Els (EE) and Bill Glasson (BG) both Dependent Paired Comparisons played in 16 tournaments, but never in the same tournament. How are they to be compared? They Consider the ranking of three golfers who can be compared through their play against other played in the same three tournaments: Graeme players. Each ranking can be decomposed into a McDowell, David Howell and Jean-Francois set of apparently independent paired compari- Remesy. They never tied. From the ranking of sons, and these can be become the basis for con- each tournament, we can construct three paired structing a measurement framework within which comparisons for these players. But they are not all the golfers can be measured. A convenient independent. From two of the comparisons built group of golfers are the 30 who took part in the from a ranking we can often deduce the third first tournament of 2004, the Mercedes Champi- comparison. onship. EE was one of these, but BG was not, so Table 1 shows the sample space for the in- he is added to the analysis. From the participa- dependent paired comparisons of A, B and C tion of these 31 golfers in the 47 tournaments, (without ties). The probability of observing any 5,220 paired comparisons of the 31 golfers can particular ranking, e.g., A>B>C is be constructed. Of these, 311 compared EE with P(Ranking ) = P / {P } (5) other golfers, and 149 compared BG with another ABC ABC ABC golfer. There are no comparisons of EE and BG. The likelihood of the set of N rankings of Only 4.5% of the 5,220 comparisons are ties. three objects is: Application of (3) to the network of 4,929 paired comparisons without ties produces the finding N that EE is estimated to be 1.15 logits more able Λ=∏ Pr()/{} PABC (6) than BG, and so EE is likely to be higher ranked r=1 than BG 3 times out of 4 were they to meet in a with log-likelihood, canceling out the common golf tournament under similar conditions. divisor: This analysis is fast and easy to conceptual- N Sum ∑ XBys y ize, but has several drawbacks. First the ranking yABC= λ =−∑∑XByr y Nlog ∑ e (7) must be decomposed into paired comparisons. ryABC==11s= Table 1 Probabilities of Rankings of 3 Objects as Paired Comparisons Self-consistent Relative Self-inconsistent Relative within ranking Probability within ranking Probability A>B B>C A>C PABC = PABPBCPAC A>C C>B A>B PACB = PACPCBPAB B>A A>C B>C PBAC = PBAPACPBC A>B B>C C>A PABPBCPCA B>C C>A B>A PBCA = PBCPCAPBA A>C C>B B>A PACPCBPBA C>A A>B C>B PCAB = PCAPABPCB C>B B>A C>A PCBA = PCBPBAPCA Only self-consistent: Sum = {PABC} normalizer 4 L INACRE where y is A, B or C and Xyr is 2 for the highest 1994). As the rankings become longer, the pro- element in ranking r, 1 for the middle element portion of self-consistent pairings to independent and 0 for the lowest element. This yields the maxi- pairings reduces. For three objects (Table 1), the mum-likelihood equation: ratio is 6:8 = 1:1.3. For four objects, 24:64 = 1:2.7. For five objects, 120:1024 = 1:8.5. For 6 NSum dλ objects, 720:32,768 = 1:45.5. For 7 objects, 5,040 =−∑∑XNXPyr ys s (8) dBy rs==11 to 2,097,152 = 1:416.1. The number of terms in the normalizer (needed for computing expected where Ps is the probability of observing ranking scores) is the number of self-consistent pairings, s, i.e., so with three objects, this is 6, but with 13 ob- jects, the number is 6,227,020,800, which is big- Sum ΣΣXBys y XByt y ger than a standard computational “long integer”. Pes = / ∑ e (9) t=1 Thus the burden of computing self-consistent pairings becomes overwhelming as the rankings Equation (8) yields the usual result that the become longer.