Chapter 3 Randomization, Probability and Sampling Distributions

Chapter 3 Randomization, Probability and Sampling Distributions 3.1 Assignments and Randomization Recall that Dawn’s study of her cat Bob was presented in Chapter 1. Table 3.1 presents her data. Reading from this table, we see that in Dawn’s study, the chicken-flavored treats were presented to Bob on days (trials): 1, 5, 7, 8, 9, 11, 13, 15, 16 and 18. Why did she choose these days? How did she choose these days? It will be easier to begin with the ‘How’ question. We have been looking at the data collected by Dawn. We have listed the observations; separated them by treatment; sorted them within treatment; and, within treatments, drawn dot plots and computed means, medians, variances and standard deviations. But now we need to get into our time machine and travel back in time to before Dawn collected her data. We go back to when Dawn had her study largely planned: treatments selected; trials defined; response specified; and the decision to have a balanced study with 20 trials. We are at the point where Dawn pondered, “Which 10 trials should have chicken-flavored treats assigned to them? How should I decide?” Table 3.1: Dawn’s data on Bob’s consumption of cat treats. ‘C’ [‘T’] is for chicken [tuna] flavored. Day: 12345678910 Flavor: CTTTCTCCCT NumberConsumed: 4 3 5 0 5 4 5 6 1 7 Day: 11 12 13 14 15 16 17 18 19 20 Flavor: CTCTCCTCTT NumberConsumed: 6 3 7 1 3 6 3 8 1 2 55 The answer is that Dawn did this by using a process called randomization. I will explain what randomization is by showing you three equivalent ways to randomize. First, some terminology. We call the list of 10 trials above an assignment of treatments to trials. It tells us which trials were assigned to the first treatment (chicken). It also implies which trials were assigned to the second treatment (tuna); namely, all of the trials not listed above. If we are going to study assignments—and we are—it is easier if we make our assignments as simple to display as possible. Thus, an assignment will be presented by listing the trials that it assigns to treatment 1. A natural question is: How many different assignments were possible for Dawn’s study? The answer is 184,756. I will give a brief digression into how I obtained this number. You might recall from math the expression m!, which is read em-factorial. If m is a positive integer, then this expression is defined as: m!= m(m − 1)(m − 2) ··· 1 (3.1) Thus, for example, 1! = 1;2! = 2(1) = 2;3! = 3(2)(1)= 6; and so on. By special definition (which will allow us to write more easily certain formulas that will arise later in these notes), 0!= 1. Finally, for any other value of m (negatives, non-integers), the expression m! is not defined. We have the following result. You don’t need to worry about proving it; it is a given in these notes. Result 3.1 (The number of possible assignments.) For a total of n = n1 + n2 units, the number of possible assignments of two treatments to the units, with n1 units assigned to treatment 1 and the remaining n2 units assigned to treatment 2, is n! (3.2) n1!n2! I will evaluate Equation 3.2 for three of the studies presented in Chapters 1 and 2. • For Cathy’s study, n =6 and n1 = n2 =3. Thus, the number of possible assignments is 6! 6(5)(4) = = 20. 3!3! 3(2)(1) Notice that it is always possible to reduce the amount of arithmetic we do by canceling some terms in the numerator and denominator. In particular, the 6! in the numerator can be written as 6(5)(4)3! and its 3! cancels a 3! in the denominator. 56 • For Kymn’s study, n = 10 and n1 = n2 =5. Thus, the number of possible assignments is 10! 10(9)(8)(7)(6) = = 252. 5!5! 5(4)(3)(2)(1) • For Dawn’s study, n = 20 and n1 = n2 = 10. Thus, the number of possible assignments is 20! = 184,756. 10!10! Notice that for Cathy’s and Kymn’s study, I determined the answer by hand because the numbers are small enough to handle easily. Dawn’s study is trickier. Many of you, perhaps most, perhaps all, will consider it easy to determine the answer: 184,756. But I will not require you to do so. As a guide, I will never have you evaluate m! for any m> 10. Sara’s study is a real challenge. The number of possible assignments is 80! . 40!40! This answer, to four significant digits, is 1.075 × 1023. Don’t worry about how I obtained this answer. If this issue, however, keeps you awake at night, then send me an email and I will tell you. If enough people email me, then I will put a brief explanation in the next version of these Course Notes. I now will describe three ways—two physical and one electronic—that Dawn could have per- formed her randomization. 1. A box with 20 cards. Take 20 cards of the same size, shape, texture, etc. and number them 1, 2, ..., 20, with one number to each card. Place the cards in a box; mix the cards thoroughly and select 10 cards at random without replacement. The numbers on the cards selected denoted the trials that will be assigned treatment 1. 2. A deck of 20 ordinary playing cards. (This method is especially suited for units that are trials.) We need to have 10 black cards (spades or clubs) and 10 red cards (diamonds or hearts). We don’t care about the rank (ace, king, 3, etc.) of the cards. The cards are thoroughly shuffled and placed in a pile, face down. Before each trial select the top card from the pile; if it is a black card, then treatment 1 is assigned to the trial; if it is a red card, then treatment 2 is assigned to the trial. The selected card is set aside and the above process is repeated for the remaining trials. 3. Using a website. This method will be explained in Section 3.5 later in this chapter. Are you familiar with the term black box? I like the definition in Wikipedia http://en.wikipedia.org/wiki/Black_box 57 which is: In science and engineering, a black box is a device, system or object which can be viewed solely in terms of its input, output and transfer characteristics without any knowledge of its internal workings, that is, its implementation is ”opaque” (black). Almost anything might be referred to as a black box: a transistor, an algorithm, or the human mind. Our website for randomization is a black box. It executes a computer program that supposedly is mathematically equivalent to my two methods of randomization that involve using cards. I say it’s a black box because we aren’t really interested in the computer code of the program. Thus, if you want to think about how randomization works, I recommend you think of the cards. For my purposes of instruction the most convenient method for me is to use the website to obtain examples for you. If I were to replicate Dawn’s study on my cat Buddy I would use the second card method above. But that’s me. If you perform a project for this class, you may randomize however you please, as long as you use one of the three methods above. Before I get back to Dawn’s study, I want to deal with an extremely common misconception about randomization. Randomization is a process or a method that is fair in the sense that every possible assignment has the same chance of being selected. Randomization does not guaran- tee that the assignment it yields will look random. (Among the many issues involved is how one decides what it means for an assignment to look random.) Later in these Course Notes, we will discuss designs that involve restricted randomization; in particular, we will learn about the Randomized Pairs Design. I used the website and obtained the following assignment for Dawn’s study: 1, 2, 4, 7, 9, 10, 11, 14, 15, 18. Note that this assignment is different from the one that Dawn used. Given that there are 184,756 possible assignments I would have been very surprised if I had obtained the same assignment as Dawn! Here is the commonality of our three methods of randomization: Before we select an assignment, all 184,756 possible assignments for Dawn’s study are equally likely to be selected. For the first method of randomizing, this fact is conveyed by saying that the cards are indistinguishable; they are thoroughly mixed; and 10 cards are selected at random. For the second method of randomizing, this fact is conveyed by saying that the cards are shuffled thoroughly. Finally, whereas the electronic methods’ operations are a total mystery to us, the programmer claims that it makes all assignments equally likely. In this class we will use the website randomizer for a variety of purposes (not just randomization) and we will accept without question its claim of making every assignment equally likely. The most important thing to remember about probability is that it is always about a future uncertainty. Now, back to Dawn’s study of Bob. Remember, we are at a place in time before Dawn selected an assignment.

Load more