B) Let X Be the Number of Girls Jane Has. What Is the Probability That X=2? (3 Pts)
Total Page:16
File Type:pdf, Size:1020Kb
4. Jane has four children. There are 16 possible arrangements of girls and boys. For example, GGBB means the first two children are girls and the last two are boys. Assume that all 16 arrangements are equally likely. a) Write down all 16 arrangements of the sexes of four children. What is the probability of any one of these arrangements? (4 pts)
b) Let X be the number of girls Jane has. What is the probability that X=2? (3 pts)
c) Starting from your work in part a, find the distribution of X. That is, what values can X take, and what are the probabilities for each value? (4 pts)
d) Construct the probability histogram for X. (4 pts)
e) Given the fact that Jane has three girls and one boy, what is the probability that the boy is the youngest child? (5 pts)
f) Let Y be the number of boys Jane has. Explain why Y has a Binomial distribution. (4 pts)
g) Use the Binomial formula to find the probability that Y=2? (4 pts) 5. The following is an excerpt from a New York Times Magazine article entitled “Do We Really Know What Makes Us Healthy?” by Gary Taubes, published September 16, 2007. The Coronary Drug Project (1970s) was designed to test whether any of five different drugs might prevent heart attacks. The subjects were some 8,500 middle-aged men with established heart problems. They were randomly assigned to take one of the five drugs or a placebo. a) Is this an experiment or observational study? (3 pts)
b) Identify the following: (6 pts) experimental units - response - treatments -
Because one of the drugs, clofibrate, lowered cholesterol levels, the researchers had high hopes that it would ward off heart disease. But when the results were tabulated after five years, clofibrate showed no beneficial effect. The researchers then considered the possibility that clofibrate appeared to fail only because the subjects failed to faithfully take their prescriptions. As it turned out, those men who said they took more than 80 percent of the pills prescribed fared substantially better than those who didn’t. Only 15 percent of these faithful “adherers” died, compared with almost 25 percent of what the project researchers called “poor adherers.” This might have been taken as reason to believe that clofibrate actually did cut heart-disease deaths almost by half, but then the researchers looked at those men who faithfully took their placebos. And those men, too, seemed to benefit from adhering closely to their prescription: only 15 percent of them died compared with 28 percent who were less conscientious. “So faithfully taking the placebo cuts the death rate by a factor of two,” says David Freedman, a professor of statistics at the University of California, Berkeley. “How can this be? Well, people who take their placebo regularly are just different than the others. The rest is a little speculative. Maybe they take better care of themselves in general. But this compliance effect is quite a big effect.” c) What is the lurking variable indicated in these paragraph? (3 pts)
d) Does clofibrate reduce heart attack deaths? Explain briefly. (3 pts) 2. Is there any relationship between a country’s fertility rate and its Gross Domestic Product (GDP)? Fertility rate is defined and the mean number of children per adult woman in that country. GDP is a measure of the country’s economy, and is measured in thousands of US dollars, per person. The plot of the data and some summary statistics appear below.
Scatterplot of GDP vs FERTILITY
35 Mean Stdev GDP 15.99 10.6 30 Fertility 2.379 1.245 25
20 2
P R = 37.8% D G 15
10
5
0 1 2 3 4 5 6 7 FERTILITY
a)Find the correlation, and interpret it.
b) Find the Least Squares Regression Equation to predict y=GDP from x=Fertility Rate.
c)Plot the regression line on the scatterplot above.
d) Interpret the slope and intercept, if appropriate. If not appropriate, explain why not.
e)Interpret R2.
f) One of the countries in the study was Saudi Arabia, with a GDP of 13.33 and a Fertility Rate of 4.5. Find the residual for this point and interpret. 4. In a jury trial, supposed the probability the defendant is convicted, given guilt, is 0.95, and the probability the defendant is acquitted, given innocence, is also 0.95. Suppose that 90% of all defendants truly are guilty.
a) Given the defendant is convicted, find the probability that he or she was actually innocent. Use the table below to find this probability.
Innocent Guilty Convicted Acquitted
b) If there are a thousand trials in a year in one city (and the conditions described above apply), approximately how many innocent defendants will be convicted?
5. According to a recent study, 80% of brides in the US take the surname of their new husband. Suppose we randomly sample 10 recently recorded marriages in City Hall and count X= the number of those brides changed their surname.
a) What is the probability distribution of X?
b) Find the mean and standard deviation of the distribution of X.
c) Find the probability that at least one of those 10 brides kept her own surname.
d) Find the probability that exactly 6 of those 10 brides changed their surname.
e) How would things change in the problem above if instead of 10 random marriages from City Hall we used the 10 marriages announced in the alumni newsletter for the UF Law School? 6. Use the Empirical Rule to explain why the standard deviation of a bell-shaped distribution for a large data set is often roughly estimated by s = range/6. How do breakfast cereals compare in terms of nutritional value and flavor? The following plots and data summaries are based on a data set containing information on several variables for different brands of breakfast cereal. The variables measured include, for each serving of cereal, the number of calories, the grams of sugar, sodium, fat, and fiber, amounts of vitamin, etc. It also includes a rating of the cereal’s healthfulness, as computed by Consumer Reports based on the other variables.
Stem-and-leaf of calories N = 71 Leaf Unit = 1.0
3 5 000 3 6 5 7 00 6 8 0 13 9 0000000 27 10 00000000000000 (29) 11 0000000000000000000000000000 15 12 000000000 6 13 00 4 14 000 1 15 1 16 0
1. Briefly describe the distribution of the stemplot of calories. (5)
2. Use the stemplot of calories to find the Five Number Summary. (5)
3. What is the mode of the distribution? ______(2)
4. Do you expect the mean and the median to be much higher, much lower, or very close to the mode? Explain. (5) The dataset also includes information on the display shelf were these cereals were found in the supermarket. The shelves are coded as 1, 2, or 3, counting from the floor – that is, shelf 1 was the bottom shelf, shelf 2 is in the middle, and shelf 3 is the top one.
13. Is this an experiment, observational study or survey? Explain. (2)
Boxplot of rating vs shelf 100
90 14. Identify the following: (5) 80 experimental units - 70
g 60 n i t response variable - a r 50 factors – 40 30 levels - 20 treatments - 10 1 2 3 shelf 15. Correct or incorrect? Explain. a) There is a negative correlation between the ratings of cereal and their shelf position. (3) b) To improve a cereal’s rating, we should have the supermarket place it on the bottom or the top shelves. (3)
c) We should conduct a regression analysis to predict the average ratings of cereals positioned on the 4th, 5th and 6th shelves. (3)
16. The following are the numerical summaries of the potassium content in these cereals. Based on them, what can you say about the shape of the distribution? (5)
Variable N N* Mean SE Mean StDev Min Q1 Median Q3 Max potass 70 1 95.07 8.46 70.75 15.00 40.00 90.00 120.00 330.00 20. A particular manufacturing line for filling up cereal boxes has a machine with a standard deviation of 0.5 ounces. Boxes being filled today have a label weight of 24 ounces, but the bags can actually hold up to 26 ounces before the contents spill and create a mess. What should the mean setting of the machine be if we want less than 1 in a thousand boxes to spill? (7)
5. A study by the New York Times suggests that improving one’s golf game is good for business. Grael Crystal, an executive compensation expert, compared the golf handicap of a company’s CEO to an index of the company’s stock performance during a period of 3 years. Golf handicap is a measure of how good a golfer is – the lower the handicap, the better the golfer.
The study found information on 51 executives whose golf handicap appeared in Golf Digest magazine, and whose companies’ stock performance was included in the Crystal Report’s database (Mr. Crystal’s publication). Golf Digest sent surveys to the CEO’s of the 300 largest companies in the Fortune 500, and only 74 revealed their golf handicap.
Out of the 51 companies for which they had data on both variables, 7 were excluded from the study. The article states: “Following accepted statistical techniques, Mr. Crystal removed seven chiefs from the analysis because they distorted the trend lines.” Those seven points are plotted below as open circles, while the data that was kept appears as dark circles.
The researcher called this “one of the oddest – but also the strongest” ways of predicting which CEO will perform well in business. Comment on the validity of the results.
100 0 90 1 80 e
c 70 n a
m 60 r o
f 50 r e
p 40
k
c 30 o t
s 20 10 0 0 5 10 15 20 25 30 35 golf handicap