Stat 200 Ge2
Total Page:16
File Type:pdf, Size:1020Kb
STAT 200 Guided Exercise 2 Answers Be sure to: Key Topics • Please submit your answers in a Word file to Sakai at the • Measures of Central Tendency same place you downloaded the file. • • You can paste Excel/JMP output into a Word File. Please Stem & Leaf Plot and describing distributions submit only one file for the assignment. • Using Excel to graph data • It is ok to do problems by hand. However, you will need to scan or take a picture of your work. • Put your name and the Assignment # on the file name that you submit: e.g. Ilvento Guided1.doc • Answer completely and show your work. • Guided Assignments are not graded but we check for completed work. Answers are posted on Sakai. 1. Let’s finish up the Academy Award winners for Best actor (and actress) since 1996 that was Given in AssiGnment 1, now that we have command of Both central tendency and variaBility. Each year the Academy of the Screen Actors Guild gives an award for the Best actor and actress in a motion picture. We have recorded the name and age of each since 1996. The data for males and females is given below (the sample size, n =21). The sum of their age as well as the sum of age squared are also given. YEAR ACTOR MALE AGE ACTRESS FEMALE AGE 1996 Geoffrey Rush 45 Frances McDormand 39 1997 Jack Nicholson 60 Helen Hunt 34 1998 Roberto Benigni 46 Gwyneth Paltrow 26 1999 Kevin Spacey 40 Hilary Swank 25 2000 Russell Crowe 36 Julia Roberts 33 2001 Denzel Washington 47 Halle Berry 35 2002 Adrien Brody 29 Nicole Kidman 35 2003 Sean Penn 43 Charlize Theron 28 2004 Jamie Foxx 37 Hilary Swank 30 2005 Philip Seymour Hoffman 38 Reese Witherspoon 29 2006 Forest Whitiker 45 Helen Mirren 61 2007 Daniel Day-Lewis 50 Marion Cotillard 32 2008 Sean Penn 48 Kate Winslet 33 2009 Jeff Bridges 60 Sandra Bullock 45 2010 Colin Firth 50 Natalie Portman 29 2011 Jean Dujardin 39 Meryl Streep 62 2012 Daniel Day-Lewis 55 Jennifer Lawrence 22 2013 Matthew McConaughey 44 Cate Blanchett 44 2014 Eddie Redmayne 32 Julianne Moore 54 2015 Leonardo DiCaprio 41 Brie Larson 26 2016 Casey Affleck 41 Emma Stone 28 Sum X 926 Sum X 750 Sum X-squared 42,146 Sum X-squared 29,382 Page 1 of 7 Here is the Stem and Leaf plot for each Group to compare the distriButions. Stem and Leaf Plot of Actors WinninG Academy Award Since 1996 Males Females Stem Leaf Stem Leaf 2 9 2 2 5 6 6 8 8 9 9 3 2 6 7 8 9 3 0 2 3 3 4 5 5 9 4 0 1 1 3 4 5 5 6 7 8 4 4 5 5 0 0 5 5 4 6 0 0 6 1 2 6|0 represents 60 6|0 represents 60 a. Calculate the measures of central tendency and variaBility for each Group. Males Females Mean 926/21 = 44.09 750/21 = 35.71 Median The 11th observation in ordered data The 11th observation in ordered data = 44 = 33. Mode Not a unique mode Not a unique mode RanGe 60 – 29 = 31 62 – 22 = 40 Variance [42,146 – (926)2/21]/(21-1) [29,382 – (750)2/21]/(21-1) [42,146 – 40,832.19]/20 [29,382 – 26,785.71.20]/20 1313.81/20 = 65.69 2596.29/20 = 129.814 Standard Deviation SQRT(65.69) = 8.10 SQRT(129.814) = 11.39 Coefficient of Variation CV = 8.10/44.09 *100 = 18.38% CV = 11.55/36.10 *100 = 31.90% b. Briefly compare the two distributions with an emphasis on the measures of Central Tendency and Variability. For males, the distribution is symmetric and centered around the mean of 44.09. There are no obvious outliers. The median is very close to the mean at 44.00. The values vary from 29 to 60 for a range of 31 years. The standard deviation is 8.10 years, which is relatively small compared with the mean (CV = 18.38%). Page 2 of 7 For females, the mean is lower at 35.71, which is higher that the median of 33. Two large outliers influence the distribution for females at 61 and 62, which pulled the mean up. Otherwise the spread for females is centered in the mid 20s to mid 30s. The range is larger for females compared with that for males (62-22 = 40), as is the standard deviation (11.39 for females). The higher standard deviation is also a reflection of the outliers. The CV for females is much higher than that of males at 31.90%. c. For both men and women there are a few outliers. For men there are two individuals with a value of 60. For women there is one winner aged 61 and another aged 62. Calculate z-scores for these values and interpret their meaning. Zm = (60-44.09)/8.10 = 1.99 Zf1 = (62-35.71)/11.39 = 2.31 Zf2 = (61-35.71)/11.39 = 2.22 Z-score represents the distance between Xi and the mean X-bar, expressed in standard deviation. For Zm=1.99, it means that there is distance of 1.99 standard deviation between 60 and the mean 44.09. d. Suppose we wanted to remove the two female outliers from the data. Calculate the new mean for women winners for the remaining 19 winners. Hint: subtract the values from the old sum and divide by 19. Did the outliers influence the mean age much? (750-62-61) = 627 627/19 = 33.00 The mean for females decreased from 35.71 to 33.00 by removing the two outliers. This is a 7.6% decrease. Page 3 of 7 2. Below is the data for infant mortality for 34 OECD countries. The Organization for Economic Co-operation and Development (OECD) is an international economic organization of 34 countries, founded in 1961 to stimulate economic progress and world trade. It is a forum of countries describing themselves as committed to democracy and the market economy, providing a platform to compare policy experiences, seeking answers to common problems, identify good practices and coordinate domestic and international policies of its members. OECD’s web site provided some data on infant mortality for 34 countries. Infant mortality (the rate of death of children under 1 year of age per 1,000 live births) is a measure of development. The Histogram and the Stem and Leaf Plot for this data is given below. Use the stem and leaf values for some calculations, such as the min and max. For other calculations, the Sum of (x) is 128.2 and the Sum of (x2) is 664.42. The two outliers are Turkey (10.2) and Mexico (13.0). Stem and Leaf Stem Leaf Count 13 0 1 12 11 10 2 1 9 8 7 0 1 6 0 2 4 6 8 10 12 14 5 001 3 4 0458 4 3 13555667 8 2 0034455568999 13 1 377 3 1|3 represents 1.3 a. Calculate the: Mean = 3.77 Median =(3.1+3.3)/2= 3.2 Mode =undefined Maximum = 13 Minimum = 1.3 Range = 11.7 Variance = 5.486 Standard Deviation =2.34 Coefficient of Variation =62.117 b. What is the position of the median value for this data? Since n=34, the position is between the 17th and 18th positions. We would take the average of these two values. c. Does the mode make sense as a measure of Central Tendency for this data? Based on the Stem and Leaf Plot, there is not a unique mode. There are three values that have 3 observations. They are close to the center and to each, but they are not very useful. d. Calculate a z-score for an infant mortality rate of rate of 13. Z= (13-3.77)/2.34=3.94. This value is 3.94 standard deviations above the mean Page 4 of 7 3. Answer the followinG questions aBout variaBility of data sets: a. How would you describe the variance and standard deviation in words, rather than a formula? Think of what you are calculating and how it might be useful in describing a variable. The Variance is the average Squared deviation around the center (in this case the center is the mean). The standard deviation is the average deviation around the center (in this case the center is the mean). b. What is the primary advantage of using the inter-quartile range compared with the range when describing the variability of a variable? The range only uses two values - the maximum and the minimum - to calculate the range. It can be very sensitive to outliers. The inter-quartile range shows the range of the middle 50% of the values. c. Can the standard deviation ever be larger than the variance? Explain. In most cases the standard deviation is less than the variance since it is a square root of the variance. However, in the special case where the variance is between 0 and 1, the standard deviation will be more than the variance. For example, if S2 = .5, then s = .71 d. Can the variance ever be negative? Why or why not? Since the variance is based on a squared measure, no, it cannot be negative. e. Show the formula for the Coefficient of Variation and explain what it is and how it can be useful in comparing the variability of different variables.