An R Package for Case 1 Best-Worst Scaling Mark H. White II

Running head: BWSTOOLS 1 bwsTools: An R Package for Case 1 Best-Worst Scaling Mark H. White II National Coalition of Independent Scholars Author’s note: I would like to thank Anthony Marley, Geoff Hollis, Kenneth Massey, Guy Hawkins, and Geoff Soutar for their correspondence as well as the anonymous reviewers for their helpful comments. All code needed to reproduce analyses in this paper, as well as the source code for the package, can be found at https://osf.io/wb4c3/. Correspondence concerning this article should be addressed to Mark White, [email protected]. BWSTOOLS 2 Abstract Case 1 best-worst scaling, also known as best-worst scaling or MaxDiff, is a popular method for examining the relative ratings and ranks of a series of items in various disciplines in academia and industry. The method involves a survey respondent indicating the “best” and “worst” from a sample of items across a series of trials. Many methods exist for calculating scores at the individual and aggregate levels. I introduce the bwsTools package, a free and open-source set of tools for the R statistical programming language, to aid researchers and practitioners in the construction and analysis of best-worst scaling designs. This package is designed to work seamlessly with tidy data, does not require design matrices, and employs various published individual- and aggregate-level scoring methods that have yet to be employed in free software. Keywords : Best-worst scaling, MaxDiff, choice modeling, R BWSTOOLS 3 1. Introduction Important social and psychological processes require people to choose between alternatives. A high school, for example, might need new chemistry equipment and updated books—but the budget only supports one or the other. In politics, people say they are highly supportive of equality and freedom—but what about when these values come into conflict? Affirmative action policies, for example, have been framed as promoting racial equality in academic institutions, while others have said these policies necessarily limit the freedom for universities to accept who they would like (Lakoff, 2014). Likert-type scales—such as seven-point scales anchored at 1 (Strongly Disagree) and 7 (Strongly Agree) —may not be appropriate measurement tools in these common situations. On a seven-point scale, a respondent can indicate that they “strongly agree” that races should be equal on one item, then also “strongly agree” that universities should be free to accept any students they want on another. The tension between the two, such as in the case of affirmative action, is obscured. Ceiling (plurality responding at the highest point of the scale) and floor (plurality responding at the lowest) effects are common in studying certain important issues like prejudice, values, and political ideology. In the abstract, every one might agree—or at least follow a social norm—that racial inequality is bad and that freedom is good. A different method to measure attitudes in these domains is to have respondents choose between a series of alternatives. Case 1 best-worst scaling is one such method. This article introduces an R package for designing and analyzing data using this method. It is meant as a BWSTOOLS 4 tutorial and introduction; this article does not explore detailed mathematical proofs for different analysis options, but offers suggested readings for those interested. 1.1. The Best-Worst Scaling Method Best-worst scaling is one research method to measure ratings involving trade-offs among many items. This method is also known as “MaxDiff,” “case 1 best-worst scaling,” or “object case best-worst scaling.” I refer to this case of base-worst scaling as BWS in the article and package. BWS involves respondents making repeated selections of the best and worst items in a series of subsets of items (Louviere, Flynn, & Marley, 2015). As a working example, I consider the question: “Of the issues below, which is the most important to you and which is the least important to you when making political decisions?” A collection of t items (“treatments”) are displayed to respondents across b trials (or “blocks”). Each block contains a subset of k items from the total list. Respondents are asked to mark which of the k items is best, which is worst, while k - 2 items are left unmarked. Although the terminology “best” and “worst” is used, it can be generalized to the most or least of any construct. Figure 1 shows an example block. BWS researchers recommend structuring these series of blocks in balanced incomplete block designs (BIBD; Louviere, Lings, Islam, Gudergan, & Flynn, 2013). These designs ensure each item is shown the same number of times r and that each pairwise comparison of items also appears the same number of times λ. The bwsTools package generally assumes that data are generated using a BIBD, although some functions (described below) will analyze data from a BWSTOOLS 5 non-BIBD. Figure 2 shows an example design with t = 13 items, b = 13 blocks, k = 4 items per block, each item is repeated r = 4 times, and each pairwise comparison occurs λ = 1 time. This means that each respondent will yield b times k observations: b “best” choices, b “worst” choices, and b ( k - 2) observations where the item was neither selected “best” nor “worst.” These observations can be used to calculate both aggregate ratings (across the sample) and individual ratings (for each respondent). The motivation for this package was to provide a free, open-source alternative to existing software. bwsTools follows the principles of tidy data (Wickham, 2014), allowing for BWS analyses to be more seamlessly integrated into pipelines for importing, preparing, analyzing, and visualizing data (Wickham & Grolemund, 2017). No design matrices are needed for analysis in bwsTools—only the survey responses. Detailed instructions with annotated code are provided in the package vignettes on how to structure the data in the required tidy format. For individual-level tidying, users can run vignette("tidying_data", "bwsTools") , while vignette("aggregate", "bwsTools")covers formatting data for aggregate-level analysis. All inputs and outputs for functions in bwsTools inherit the class data.frame , allowing inputs and outputs chained in data pipelines easily. bwsTools also provides analysis options for multiple individual- and aggregate-level methods (discussed below), published by multiple researchers, that have yet to be implemented in freely-available, open-source software. Lastly, bwsTools has a publicly-available working GitHub repository 1, documenting all development. Programming best practices, such as unit tests 1 github.com/markhwhiteii/bwsTools BWSTOOLS 6 and continuous integration tools, are used to ensure stable, reliable releases. The current bwsTools analysis functions do not exhaust the list of published methods; a public repository allows for community collaboration and feedback in adding new methods for analyzing BWS data. 2. The bwsTools R Package bwsTools is an R package with three main purposes: generating BIBDs, calculating aggregate ratings, and calculating individual ratings. Each is discussed in turn. The package can be installed from the Comprehensive R Archive Network (CRAN) using the following code: install.packages(“bwsTools”) 2.1. Generating a BIBD The characteristics of a BIBD ( t , b , k , r, and λ) follow specific properties: First, the design contains b blocks of k items; second, each of the t items appears r times and only r times; third, each of the t ! / 2(t - 2)! pairs of items appear λ times and only λ times. An incomplete block design is balanced when λ = r( k - 1) / ( t - 1) and both λ and r are integers. It can be difficult for researchers to create designs satisfying these criteria (Morris, 2011; Wu & Hamada, 2000), so textbooks often reference lists of BIBDs from which researchers can choose. The bwsTools package contains a data.frame object, showing possible values of t , b , k , r, and λ that satisfy the criteria for a BIBD. Thirty-two designs are in this object, taken from Table 11.3 of Cochran and Cox (1957); included are all possible designs where t and b are less than or equal to 20, as BWSTOOLS 7 more than 20 trials may put cognitive strain on a survey respondent. However, Cochran and Cox (1957) provide many more examples of a larger size. While planning a study, a researcher can load the bwsTools package and examine the list of 32 designs with the following code: library(bwsTools) bibds Table 1 shows the first and last six rows that are returned when calling bibds . For example, design six shows a design where six items ( t ) are shown in groups of two ( k) across 15 ( b ) blocks, with each item appearing five times ( r) and each pair of items occurring once (λ). The working example in this article follows design 27 in bibds . To generate a BIBD using bwsTools, one can supply the design number to the make_bibd function. This call produced the design found on the right-hand side of Figure 2: make_bibd(27, seed = 1839) Note that the function will generate one of many designs satisfying these criteria at random. To ensure reproducibility, there is also an argument for a seed to set for the random number generator. This defaults to 1839, so each call to make_bibd without a seed explicitly set will yield the same design every time, making for reproducible designs by default. BWSTOOLS 8 One then assigns a number to each of their items then finds-and-replaces the number in this design with the text of the item. Referring back to Figure 2, the first block lists 2, 6, 7, and 13. Comparing these numbers to the text in the left panel, the items in this block would be: “taxes”, “crime and violence,” “race relations and racism,” and “gun policy.” Each respondent would then be asked to indicate the most and least important issue from that set of four items, then continue on to the second block that contains: “taxes,” “abortion,” “drug and drug abuse,” and “bias in the media” (2, 5, 8, and 10).

An R Package for Case 1 Best-Worst Scaling Mark H. White II

Combining Choice Experiment and Attribute Best–Worst Scaling

Integrated Choice and Latent Variable Models: a Literature Review on Mode Choice Hélène Bouscasse

Cloud Computing Adoption Decision Modelling for Smes: from the PAPRIKA Perspective Salim Alismaili University of Wollongong

A Systematic Review of the Reliability and Validity of Discrete Choice

Best–Worst Scaling Vs. Discrete Choice Experiments: an Empirical Comparison Using Social Care Data Article (Accepted Version) (Refereed)

Experimental Measurement of Preferences in Health and Healthcare Using Best-Worst Scaling: an Overview Axel C

The Generalized Multinomial Logit Model

Design and Analysis of Simulated Choice Or Allocation Experiments in Travel Choice Modeling

Discrete Choice Experiments Are Not Conjoint Analysis

Is Best-Worst Scaling Suitable for Health State Valuation? a Comparison with Discrete Choice Experiments

Constructing Experimental Designs for Discrete-Choice Experiments: Report of the ISPOR Conjoint Analysis Experimental Design Good Research Practices Task Force F

Estimation of Multinomial Logit Models in R : the Mlogit Packages