Guided2 200 2016
Total Page:16
File Type:pdf, Size:1020Kb
STAT 200 Guided Exercise 2 Answers For On-Line Students, be sure to: Key Topics • Submit your answers in a Word file to Sakai at the same place you downloaded the file • Measures of Central Tendency • Remember you can paste any Excel or JMP output into a • Stem & Leaf Plot and describing distributions Word File (use Paste Special for best results). • Measures of Variability • Put your name and the Assignment # on the file name: e.g. Ilvento Guided2.doc Answer as completely as you can and show your work. Then upload the file via Sakai to get credit. 1. Let’s finish up the Academy Award winners for best actor (and actress) since 1996 that was Given in AssiGnment 1, now that we have command of both central tendency and variability. Each year the Academy of the Screen Actors Guild Gives an award for the best actor and actress in a motion picture. We have recorded the name and age of each since 1996. The data for males and females is given below (the sample size, n =20). The sum of their age and the sum of age squared are also given. YEAR ACTOR AGE ACTRESS AGE 1996 Geoffrey Rush 45 Frances McDormand 39 1997 Jack Nicholson 60 Helen Hunt 34 1998 Roberto Benigni 46 Gwyneth Paltrow 26 1999 Kevin Spacey 40 Hilary Swank 25 2000 Russell Crowe 36 Julia Roberts 33 2001 Denzel Washington 47 Halle Berry 35 2002 Adrien Brody 29 Nicole Kidman 35 2003 Sean Penn 43 Charlize Theron 28 2004 Jamie Foxx 37 Hilary Swank 30 2005 Philip Seymour Hoffman 38 Reese Witherspoon 29 2006 Forest Whitiker 45 Helen Mirren 61 2007 Daniel Day-Lewis 50 Marion Cotillard 32 2008 Sean Penn 48 Kate Winslet 33 2009 Jeff Bridges 60 Sandra Bullock 45 2010 Colin Firth 50 Natalie Portman 29 2011 Jean Dujardin 39 Meryl Streep 62 2012 Daniel Day-Lewis 55 Jennifer Lawrence 22 2013 Matthew McConaughey 44 Cate Blanchett 44 2014 Eddie Redmayne 32 Julianne Moore 54 2015 Leonardo DiCaprio 41 Brie Larson 26 Sum X 885 Sum X 722 Sum X-squared 40,465 Sum X-squared 28,598 Page 1 of 7 a. Here is the Stem and Leaf plot for each Group to compare the distributions. Stem and Leaf Plot of Actors WinninG Academy Award Since 1996 Males Females Stem Leaf Stem Leaf 2 9 2 2 5 6 6 8 9 9 3 2 6 7 8 9 3 0 2 3 3 4 5 5 9 4 0 1 3 4 5 5 6 7 8 4 4 5 5 0 0 5 5 4 6 0 0 6 1 2 6|0 represents 60 6|0 represents 60 b. Calculate the measures of central tendency and variability for each Group. The sum of X and the sum of X-squared for each Group are given above. a. Males Females Mean 885/20 = 44.25 722/20 = 36.10 Median The 10th observation in ordered data The 10th observation in ordered data = 44 The 11Th observation is 45. = 33. The 11th observation is also 33. The average of the two is 44.5 The average of the two is 33. Mode Not a unique mode Not a unique mode RanGe 60 – 29 = 31 62 – 22 = 40 Variance [40,465 – (885)2/20]/(20-1) [28,598 – (722)2/20]/(20-1) [40,465 – 39,161.25]/19 [28,598 – 26,064.20]/19 1303.75/19 = 68.62 2533.80/19 = 133.36 Standard Deviation SQRT(68.62) = 8.28 SQRT(133.36) = 11.55 Coefficient of Variation CV = 8.28/44.25 *100 = 18.72% CV = 11.55/36.10 *100 = 31.99% c. Briefly compare the two distributions with an emphasis on the measures of Central Tendency and Variability. For males, the distribution is symmetric and centered around the mean of 44.25. There are no obvious outliers. The median is very close to the mean at 44.50. The values vary from 29 to 60 for a range of 31 years. The standard deviation is 8.47 years, which is relatively small compared with the mean (CV = 18.72%). For females, the mean is lower at 36.10, which is higher that the median of 33. The distribution for females is influenced by two larger outliers at 61 and 62, which pulled the mean up. Otherwise the spread for females is Page 2 of 7 centered in the mid 20s to mid 30s. The range is larger for females compared with that for males (62-22 = 40), as is the standard deviation (11.55 for females). The higher standard deviation is also a reflection of the outliers. The CV for females is much higher than that of males at 31.99%. d. For both men and women there are a few outliers. For men there are two individuals with a value of 60. For women there is one winner aged 61 and another aged 62. Calculate z-scores for these values and interpret their meaning. Zm = (60-44.25)/8.28 = 1.90 Zf1 = (62-36.10)/11.55 = 2.24 Zf2 = (61-36.10)/11.55 = 2.15 e. Suppose we wanted to remove the two female outliers from the data. Calculate the new mean for women winners for the remaining 16 winners. Hint: subtract the values from the old sum and divide by 17. Did the outliers influence the mean age much? (722-62-61) = 573 599/18 = 33.28 The mean for females decreased from 36.10 to 33.28 by removing the two outliers. This is a 7.8% decrease. 2. The following is some data from The Daily Beast on the 50 Most Stressful Universities in 2010. We are looking at the Acceptance rate for these 50 universities. The Acceptance rate is based on the percentage of applicants who were admitted. The Histogram and the Stem and Leaf Plot for this data is given below (note the Stem and Leaf Plot rounds the numbers to a whole number). Use the stem and leaf values for some calculations, such as the min and max. For other calculations, the Sum of (x) is 1574.70 and the Sum of (x2) is 62204.53. The Median for this data is 26.85. a. Calculate the: Mean = 31.49 Median = 26.85 Mode = 22 Maximum = 73 Minimum = 8 Range = 65 Variance = 257.37 Standard Deviation = 16.04 Coefficient of Variation = 50.94 Page 3 of 7 b. What is the position of the median value for this data? Since n=50, the position is between the 25th and 26th positions. We would take the average of these two values. c. Does the mode make sense as a measure of Central Tendency for this data? Based on the Stem and Leaf Plot, the mode is 22%. This is a measure of center for one bunchinG of the data, but there is much more spread and a other groupings of the data. d. Calculate a z-score for an acceptance rate of 61% z = (61-31.49)/16.04 = 1.84. This value is 1.84 standard deviations above the mean e. Based on what you know about the different criteria used by different universities to judGe students for admittance, why do you think this distribution looks the way it does? Think about the spread of the data and the measures of spread for the data, such as the range and standard deviation. Does the spread seem large? Hint: Harvard has the lowest acceptance rate at 7.9%. The Pennsylvania State University has an acceptance rate of 51.2%. The spread is very large. The CV is 50.94%. It might reflect differences between public and private institutions. Private institutions Generally have lower acceptance rates. Public schools may have as part of their mission to have hiGher rates of acceptance to provide educational opportunities to citizens in the state. Even for the most stress universities, Generally thought to be the most riGorous, the acceptance rate for public institutions should be hiGher. We could think of this data as beinG two populations. The Box Plots show a difference between Public and private Universities. There still is a lot of spread for each type of university - some private universities have high acceptance rates and some public universities have low acceptance rates. But we can see two distinct groups. Page 4 of 7 3. Answer the followinG questions about variability of data sets: a. How would you describe the variance and standard deviation in words, rather than a formula? Think of what you are calculating and how it might be useful in describing a variable. The Variance is the average Squared deviation around the center (in this case the center is the mean). The standard deviation is the average deviation around the center (in this case the center is the mean). b. What is the primary advantage of using the inter-quartile range compared with the range when describing the variability of a variable? The range only uses two values - the maximum and the minimum - to calculate the range. It can be very sensitive to outliers. The inter-quartile range shows the range of the middle 50% of the values. c. Can the standard deviation ever be larger than the variance? Explain. In most cases the standard deviation is less than the variance since it is a square root of the variance. However, in the special case where the variance is between 0 and 1, the standard deviation will be more than the variance. For example, if S2 = .5, then s = .71 d.