AP Statistics Audit Syllabus

AP Statistics Topics by Chapter

Chapter 1: Exploring Data  Graphs for categorical and quantitative variables  pie charts, bar graphs, stemplots, histograms, ogive  patterns in distributions  shape of distribution including roughly symmetric, skewed, or neither  Discuss measures of center, unusual points, shape and spread  Outliers  relative and cumulative frequency plots  time plots  Measures of center  mean vs. median  Clusters and gaps  quartiles, Q1, Q3  boxplot  Range  5-number summary  IQR  Outliers  standard deviation  choosing which center and spread measure to use  knowing properties of standard deviation  The effect of changing units on summary measures including adding and multiplying  Comparing distributions using side-by-side bar graphs, dotplots, back-to-back stemplots, parallel boxplots, or using number summaries from computer output or calculators.  Comparing center and spread within the group or between groups  Comparing clusters, gaps, outliers, or any other unusual features

1. Which of the following are true statements? I. Pie charts are useful for both categorical and quantitative data II. Histograms are useful for small and large data III. Histograms show the overall shape, center, and spread of the distribution of data (A) I only (B) II only (C) III only (D) I and III only (E) II and III only

2. Which of the following is inappropriate for displaying quantitative data? (A) Stem and leaf plot (B) Dot plot (C) Bar graph (D) Box plot (E) histogram

3. The height of Mrs. Clark’s tomato plants is what type of data? (A) Categorical (B) Quantitative and continuous (C) Quantitative and discrete (D) Categorical and Quantitative (E) Categorical and continuous 4. The mean assessed value of homes in Southern County is $158,000 with a standard deviation of $32,000. If the county supervisors decided to increase everyone’s assessment by $5,000, the new mean and standard deviation would be (A) $158,000 and $32,000 (B) $163,000 and $37,000 (C) $163,000 and $32,000 (D) $158,000 and $37,000 (E) Cannot be determined

5. The mean exam score for the second-period physics class, which had 25 students, was 87.3. The mean exam score for the third-period physics class, which had 19 students, was 92.4. What was the average of both classes? (A) 89.85 (B) 89.50 (C) 90.18 (D) 91.91 (E) Cannot be determined

6. A distribution is skewed right if (A) mean = median (B) mean < median (C) mean > median (D) IQR > difference between the mean and median (E) Cannot be determined

7. Consider the following back-to-back stem and leaf plots comparing weight gain in kilograms for male and female horses. Male Female 1 7 7 2 2 3 4 8 8 3 2 3 1 4 5 6 7 8 9 9 8 7 5 4 4 3 4 6 6 8 9 6 5 3 3 0 5 6 6 5 4 3 3 3 1 6

Which of the following are true statements?

I. The distributions have the same number of observations II. The ranges for the two distributions are the same III. The means for the two distributions are the same IV. The medians of the two distributions are the same V. The variances for the two distributions are the same

(A) I and II (B) I and IV (C) II and V (D) III and V (E) I, II, and III 8. The following stem and leaf plot displays the ages of the presidents of the United States at the time of their inaugurations. 4 2 3 4 6 7 7 8 9 9 5 0 1 1 1 1 1 2 2 4 4 4 4 4 5 5 5 5 5 6 6 6 7 7 7 7 8 6 0 1 1 1 2 4 4 6 5 8 9

(a) Determine if there are any outliers

(b) Make a boxplot for the data.

(c) Describe the shape of the distribution.

9. The following data represents the hours of continuous use for two brands of batteries.

Brand A: 65, 67, 69, 71, 63, 62, 70, 72, 66 Brand B: 65, 67, 67, 68, 70, 64, 64, 65, 65

Using plots and summary statistics investigate and report on the comparison of these two batteries. Include a complete analysis of the distributions. Chapter 2: Describing Location in a Distribution  Percentiles  Z-scores and Z Table  Properties of the normal distribution  Model for measurements  Chebyshev’s Inequality  Density curves  Normal distributions  68-95-99.7 rule  Standard normal curve  Nonstandard normal curves and calculations  Assessing Normality  Normal Probability Plots

1. The grading at Central High gives a B for grades between 86 and 93. On the English final for seniors, what proportion of the class would get a B if the grades were normally distributed with a mean grade of 86.34 and standard deviation of 14.23? (A) 0.07 (B) 0.4905 (C) 0.6801 (D) 0.1896 (E) 0.0280

2. The mean GPA for Central High is 2.9, with the standard deviation of 0.5. Assuming the GPA’s are normally distributed, what GPA score will place a student in the top 5% of the class? (A) 3.72 (B) 3.43 (C) 2.08 (D) 2.90 (E) 3.38

3. On Sarah’s last two biology exams, she scored an 87. The class mean on the first exam was 75, with a standard deviation of 8.9. The class average on the second exam was 73, with a standard deviation of 9.7. Assuming the scores on the exam were approximately normally distributed, on which exam did Sarah score better relative to the rest of her class? (A) She scored better on the first exam (B) She scored better on the second exam (C) She scored equally on both exams (D) It is impossible to determine because the class sizes are unknown (E) It is impossible to determine because the correlation between the two sets of exam scores is not provided. 4. A researcher notes that two populations of lab mice – one consisting of mice with white fur, and one of mice with grey fur – have the same mean weight, and both have approximately normal distributions. However, the population of white mice has a larger standard deviation than the population of grey mice. If the weights for both of these populations were plotted, how would the curves compare to each other? (A) The curves would be identical (B) The curve for the grey mice would be taller because it has a smaller standard deviation (C) The curve for the white mice would be taller because it has a larger standard deviation (D) The curve for the white mice would be taller because the population size of the white mice is larger (E) The curve for the grey mice would be taller because its variance is larger

5. Which of the following statements is NOT true for normally distributed data? (A) The mean and median are equal (B) The area under the curve is dependent upon the mean and standard deviation (C) Almost all of the data lie within three standard deviations of the mean (D) Approximately 68% of all of the data lies within one standard deviation of the median (E) When the data are normalized, the distribution has a mean  = 0 and a standard deviation  = 1.

6. Webb is a baseball fanatic. He keeps his own statistics on the major league teams and individual players. For the 350 regular starters, Webb has found their mean batting average is 0.229, with a standard deviation of 0.024. His sister is appalled that baseball players get paid the salaries they do and get a hit less than 25% of their attempts at bat. To further her argument, she asks for the following information: (a) What proportion of players hit more than 25% of the times they are at bat?

(b) Since the players with the top ten batting averages get cash bonuses, what is the lowest batting average that will receive a bonus? Chapter 3: Analyzing bivariate data  Scatterplots – construct and interpret, analyze patterns  Direction, shape, strength, outliers, influential points  Correlation – calculation and properties  Linear Regression – calculation, principles, and properties  Least-squares regression line  Interpret slope and y-intercept in context  Prediction vs. extrapolation  Residuals  Correlation of determination – r2  Residual plots – constructing and interpreting  Cautions about correlation and regression  Lurking variables

1. A perfect positive correlation means (A) The points in the scatter diagram lie on an upward sloping line (B) The points in a scatter diagram lie on a downward sloping line (C) r is equal to –1 (D) r is equal to zero (E) there is a direct cause and effect relationship between the variables

2. In a regression model, the slope represents (A) The point where the y-axis intersects the x-axis (B) The point where the regression line intersects the y-axis (C) The point where the regression line intersects the x-axis (D) The change in the response variable due to a one-unit change in the independent variable (E) The change in the independent variable due to a one-unit change in the response variable 3. Are jet skis dangerous? Propelled by a stream of pressurized water, jet skis and other so- called wet bikes carry from one to three people, retail for an average price of $5700, and have become one of the most popular types of recreational vehicle sold today. But critics say that they’re noisy, dangerous, and damaging to the environment. An article in the August 1997 issue of the Journal of the American Medical Association reported on a survey that tracked emergency room visits at randomly selected hospitals nationwide. The study recorded data on the number of jet skis in use and the number of accidents for the years 1987–1996. Computer output and a residual plot from a linear regression analysis of the data are shown below.

Predictor Coef SE Coef T P Constant -0.8 109.9 -0.01 0.994 Jetskis 0.0048308 0.0002292 21.08 0.000

S = 188.3 R-Sq = 98.2% R-Sq(adj) = 98.0%

Residuals Versus Jetskis (response is No. of a)

400

300

200

100 l a u d

i 0 s e

R -100

-200

-300

-400

0 500000 1000000 Jetskis a. What is the equation of the least-squares line? Be sure to define any variables you use.

b. Interpret the value of r 2 in the context of this problem.

c. Is a line an appropriate model for these data? Justify your answer.

d. Interpret the value of s in the context of this problem. Chapter 4: More about Relationships between two Variables  Transforming to achieve linearity  Powers and logs  Exponential models  Power models  Relationships between categorical variables  Marginal distributions  Marginal and joint frequencies for two-way tables  Frequency tables and bar charts  Conditional relative frequencies and association  Comparing distributions using bar charts  Conditional distributions  Simpson’s paradox  Establishing causation  Lurking variables  Causation  Common response  Confounding

1. One thousand adults were asked whether Republicans or Democrats have better domestic economic policies. The answers were placed in the table below: Republican Democrat No Opinion Totals Male 220 340 40 600 Female 170 200 30 400 Totals 390 540 70 1000 What is the probability of choosing a Male, given he is a Republican? (A) 22% (B) 37% (C) 56% (D) 77% (E) 58%

2. Using the information above, what is the probability of choosing a Democrat? (A) 34% (B) 37% (C) 50% (D) 54% (E) 57% 3. Which of the following scatterplots would indicate that Y is growing exponentially over time?

(a) (b) (c)

(d) (e) none of these

4. According to the 1990 census, those states with an above-average number X of people who fail to complete high school tend to have an above average number Y of infant deaths. In other words, there is a positive association between X and Y. The most plausible explanation for this is (A) X causes Y. Programs to keep teens in school will help reduce the number of infant deaths. (B) Y causes X. Programs that reduce infant deaths will ultimately reduce high school dropouts. (C) Lurking variables are probably present. For example, states with large populations will have both larger numbers of people who don’t complete high school and more infant deaths. (D) Both of these variables are directly affected by the higher incidence of cancer in certain states. (E) The association between X and Y is purely coincidental. 5. An experiment was conducted to determine the effect of practice time (in seconds) on the percent of unfamiliar words recalled. Here is a Fathom scatterplot of the results with a least-squares regression line superimposed. (a) Sketch a residual plot below.

(b) Does a linear model fit the data well? Justify your answer.

We used Fathom to transform the original data in hopes of achieving linearity. The screen shots below show the results of two different transformations.

(c) Would an exponential model or a power model fit the original data better? Justify your answer.

(d) Use the model you chose in (c) to predict word recall for 25 seconds of practice. Show your method. Chapter 5: Producing Data  Sampling: good and bad methods  Census, Sample Survey  Experiment, Observational study  Voluntary response  Convenience samples  Simple random sample (SRS)  Stratified random sample  Cluster sampling, Systematic sampling, Multi-stage sampling  Designing polls and surveys  Undercoverage, Nonresponse, Question wording  Potential bias  Random number table  Basics of experimental design – well designed and well conducted  Subjects  Experimental units  Factors  Treatments  Explanatory and response variables  Completely randomized design  Control groups  Random assignment  Replication  Placebo effect  Blinding and double blinding  Confounding  Multi-factor experiments  Block designs  Matched pairs  Population vs. sample  Generalizability of results and types of conclusions that can be drawn from observational studies, experiments, and surveys.

1. Ben conducts a study in which 100 subjects, randomly chosen from the population of all students at a school, guess when 60 seconds have elapsed. He records the actual number of seconds that have elapsed when the subjects think it has been 60 seconds. The subjects make their guesses while listening to music. Fifty-five of Ben’s subjects choose fast music and 45 choose slow music. What kind of study is this? (A) Experiment, because the subjects are responding to treatments (B) Experiment, because there is a response variable (C) Experiment, because the subjects are randomly chosen from the population (D) Observational study, because the participants select their own treatments (E) Observational study, because the treatment groups are different sizes 2. Which of the following statements about observational studies is true? I. A census is always preferable to a sample survey since it includes the entire population II. A neutral designer of a survey has no predisposition towards any particular conclusion can still produce biased data III. Statistical inference is not necessary when a census is conducted properly

(A) I only (B) II only (C) I and II (D) I and III (E) I, II, and III

3. A U.S. government researcher wants to select a sample of tax returns that will include returns from a variety of different income levels. He divides the set of all the different incomes shown on the forms into 10 nonoverlapping ranges, then he randomly selects 100 tax returns from each. Which of the following best describes the sampling scheme used in this example? (A) Stratified random sampling (B) Simple random sampling (C) Convenience sampling (D) Two-stage sampling (E) Cluster sampling

4. Which of the following is NOT a property of a large table of random digits? (A) The table will contain, somewhere, the sequence of digits 1 2 3 4. (B) Consecutive rows do not start with the same digit (C) Each digit 0 through 9 occurs with equal frequency (D) Each three-digit number 000 through 999 occurs with equal frequency (E) The contents of one section of the table are independent of other sections of the table

5. The owner of a factory that employs half the citizens in a small town is trying to decide whether to take a public stand on a controversial issue. He realizes that he would benefit from knowing how the townspeople feel. He randomly selects 50 of the townspeople from a list of all the town’s population. He personally contacts all 50 and asks them their opinion on the issue. Most give him an answer, but 12 townspeople decline to participate. He decides to summarize his results on the 38 responses. Which of the following list the most significant sources of bias in this survey? (A) Voluntary response bias and undercoverage (B) Response bias and undercoverage (C) Nonresponse bias and undercoverage (D) Response bias and nonresponse (E) Voluntary response bias and nonresponse

6. Which of the following is the least important way in which the designer of an experiment can guard against confounding? (A) Matching (B) Randomization (C) Replication (D) Control (E) Blocking

7. David knows that dancers are trained to spin many times without losing their ability to move in a straight line after spinning. He wonders whether this ability is dependent on the number of spins. He wants to design an experiment that will compare the ability of experienced female dancers to walk a fixed distance in a straight line after 5 spins with their ability after 10 spins. Which of the following is the most appropriate design for this experiment? (A) Completely randomized design (B) Stratified design (C) Randomized block design (D) Cluster design (E) Matched pairs design

8. Aspirin may enhance impairment by alcohol Aspirin, a longtime antidote for the side effects of drinking, may actually enhance alcohol’s effect, researchers at the Bronx Veteran’s Affairs Medical Center say. In a report on a study published in the Journal of the American Medical Association, the researchers said they found that aspirin significantly lowered the body’s ability to break down alcohol in the stomach. As a result, five volunteers who had a standard breakfast and two extra-strength aspirin tablets an hour before drinking had blood alcohol levels 30 percent higher than when they drank alcohol alone. Each volunteer consumed the equivalent of a glass and a half of wine. That 30 percent could make the difference between sobriety and impairment, said Dr. Charles S. Lieber, medical director of the Alcohol Research and Treatment Center at the Bronx center, who was co-author of the report with Dr. Risto Roine. a. Does this article describe an experiment? Explain.

b. Did this study involve a simple random sample (SRS)? Explain.

c. Did this study use a particular design that we have studied? If so, identify the design. Then comment on the validity of the study. 9. You are participating in the design of a medical experiment to investigate whether or not a calcium supplement in the diet will reduce the blood pressure of middle-aged men. Preliminary research suggests that the supplement may have a greater effect on black men than on white men.

a. What sort of experimental design would you choose, and why?

b. Assume that the experimental population consists of 600 white men and 500 black men. Outline in a diagram the design of the experiment. (Be sure to indicate how many subjects are assigned to the various treatment groups.)

Chapter 6: Probability and Simulation  Simulations – basic process and examples  Interpreting Probability  Probability as long-run relative frequency  Law of Large numbers  Randomness  Legitimate probability models  Sample spaces  Outcomes  Events  Addition rule for disjoint events  Complement rule  Venn diagrams – union and intersection  Independence  Multiplication rule  General addition rule  Conditional probability  Tree diagrams  Proving independence

1. According to a recent national survey of college students, 55% admitted to having cheated at some time during the last year. What is the probability that for two randomly selected college students, one or the other would have cheated during the past year? (A) 0.5500 (B) 0.7975 (C) 0.3025 (D) 0.2475 (E) 0.2025 2. Given two events, A and B, if P(A) = 0.37, P(B) = 0.41, and the P(A or B) = 0.75, then the two events are (A) independent but not mutually exclusive (B) mutually exclusive but not independent (C) mutually exclusive and independent (D) neither mutually exclusive nor independent (E) Cannot be determined

3. Security procedures at the U.S. Capitol require that all bags – meaning briefcases, backpacks, shopping bags, any carrying bags, and purses – must be screened. Currently, it is reported that 95% of all bags that contain illegal items trigger the alarm. 12% of the bags that do not contain illegal items trigger the alarm. If 3 out of every 1,000 bags entering the Capitol contain an illegal item, what is the probability that a bag that triggers the alarm will contain an illegal item? (A) 0.0233 (B) 0.0029 (C) 0.9500 (D) 0.1140 (E) 0.1225

4. Suppose your teacher’s stash of calculators contain 3 defective calculators and 17 good calculators. You select two calculators from the box for you and your friend to use on the AP Statistics exam. What calculations would you use to determine the probability that one of the calculators drawn will be defective? 17 3 (A) + 20 19 骣17 骣 3 骣 3 骣 2 (B) 琪琪+ 琪琪桫20 桫 20 桫 20 桫 20 骣17 骣 3 (C) 琪琪桫20 桫 19 骣17 骣 3 骣 3 骣 17 (D) 琪琪+ 琪琪桫20 桫 19 桫 20 桫 19 骣17 骣 11 骣 3 骣 2 (E) 琪琪琪琪桫20 桫 19 桫 18 桫 17

5. Heart disease is the #1 killer today. Suppose that 8% of the patients in a small town are known to have heart disease. And suppose that a test is available that is positive in 96% of the patients with heart disease, but is also positive in 7% of patients who do not have heart disease. If a person is selected at random and given the test and it comes out positive, what is the probability that the person actually has heart disease? Chapter 7: Random Variables  Discrete vs. continuous  Probability distributions  Notation  Mean, standard deviation, and variance of a random variable  Law of large numbers  Rules for mean and variance  Linear transformations  Linear combinations of random variables  Mean and standard deviation for sums and differences of independent random variables  Independence  Combining normal random variables

1. Robin owns a bookstore. She is working on a presentation to convince her partner to spend $500 on a catchy window display. Robin has data to support the fact that if people come in to browse, 62% will make a purchase. Given that the average purchase is $12.38, what is the expected amount of sales from the next 20 customers who enter the store? (A) $7.68 (B) $153.51 (C) $247.60 (D) $94.60 (E) $58.34

2. A radio station is running a lottery to raise money for a local charity. The prizes are $10, $50, and $100, and a grand prize of $1000. The chances of winning these amounts are 0.25, 0.15, 0.09, and 0.01 respectively. What are your total expected winnings (minus costs) if you pay $1 for a ticket? (A) $29 (B) $10 (C) $90 (D) $290 (E) $28

3. The scores for the top three golfers on a high school golf team are used to determine which high schools advance to the regional level. The Central High team’s top three players have mean scores and standard deviations of: Player 1 Player 2 Player 3

mx 89.5 94.4 97.2

s x 2.3 4.5 3.9

What are the mean score and standard deviation for the Central High team? (A) mx=281.1, s x = 6.38

(B) mx=93.7, s x = 6.38 (C) mx=93.7, s x = 3.57 (D) mx=281.1, s x = 3.57 (E) mx=281.1, s x = 10.7 Chapter 8: The Binomial and Geometric Distributions  Binomial settings  BINS  Binomial distribution  Mean and variance  Normal approximation to the binomial distribution  Geometric distributions

1. Based on his past performance, the probability that Ben will make a free throw is 0.6. What is the probability that he will make 3 out of his next 5 free throws? (A) 0.6630 (B) 0.0960 (C) 0.3456 (D) 0.9360 (E) 0.01536

2. Based on his past performance, the probability that Ben will make a free throw is 0.6. What is the probability that he will miss his first three free throws, and then make his fourth one? (A) 0.9744 (B) 0.1536 (C) 0.8704 (D) 0.096 (E) 0.0384

3. A manufacturer of batteries for hearing aids claims that only 4% of their batteries are defective. A consumer watch group is doubtful of the claim and wants to check it. They have a shipment of 500 batteries. (A) What is the mean and standard deviation of the distribution?

(B) The consumer group has reason to believe that the rate of defective batteries is at least 5%. Based on your findings in (B), what is the probability that more than 5% of this shipment would be defective? Chapter 9: Sampling Distributions  What is a sampling distribution?  Simulation of a sample distribution  Inference  Bias  Variability  Sampling distributions of proportions and means  Mean and standard deviation of sampling distributions  Normal approximations  Sampling distributions of a difference between two independent sample proportions or sample means  Rule of thumbs  Sampling distributions of proportions, calculations and conditions  Central Limit theorem (CLT)  Calculations using x-bar  Normal population distribution vs. CLT

1. Two samples of corn were taken from a field to test the percent of corn plants infested with worms. The USDA states that approximately 28% of all corn plants are infested. One sample contains 100 ears of corn and the second sample 500 ears. Which sample has the larger standard deviation? (A) The sample of 500 will have the larger standard deviation (B) Both samples will have the same standard deviation (C) The sample of 100 ears will have a smaller standard deviation (D) The sample of 100 ears will have the larger standard deviation (E) It is impossible to determine

2. Which of the following statements best describes a sampling distribution of a sample mean? (A) It is x (B) It is the distribution of all possible values of a population parameter (C) It is the distribution of all possible values of a statistic taken from all possible samples of a specific size (D) It is an unbiased estimator  (E) It is the normal distribution with x = 0 and s = 1

3. The conditions that np > 10 and n(1 – p) > 10 are necessary to guard against (A) a skewed distribution (B) a small population size (C) a small sample size (D) a large standard deviation (E) non-randomly selected sample 4. A sample of 5,000 female adults was randomly drawn from the United States. It is known that the diastolic blood pressure for adult women in the United States is N(80, 12). What is the mean and standard deviation of the distribution of the sample means? (A) x=0.16, s = 0.1697 (B) x=80, s = 12 (C) x=80, s = 0.1697 (D) x=3.58, s = 0.024 (E) Cannot be determined

Chapter 10: Confidence Intervals  Connect sampling distributions with confidence intervals  Estimating population parameters  Properties of point estimators  Confidence interval for mu with sigma known  Assumptions needed to be met  Changing confidence level  Interpret CI vs. interpreting confidence level  Determine sample size  CI for mu when sigma is unknown  T-distributions  One-sample t-interval  Paired t procedures  Robustness  CI’s for proportions  Determine sample size for proportions  Standard error  Margin of error  Properties of CI’s

1. Which of the following statements about the t-distribution is true? (A) The t-distribution has a mean of 0 and a standard deviation of 1 (B) The t-distribution has a larger variance than the standard normal distribution (C) The smaller the degrees of freedom, the smaller the variance for the t-distribution (D) The t-distribution is a skewed distribution (E) The normal distribution is flatter and more spread out than the t-distribution

2. Campaign managers conduct regular polls to estimate the proportion of people who will vote for their candidate in an upcoming election. Shortly before the actual election, the campaign manager doubles the sample size of the poll. What effect does this have on the estimate? (A)It increases the reliability of the estimate (B) It decreases the standard deviation of the sampling distribution of the sample proportion (C) It decreases the variability in the population (D)It will reduce the effect of confounding variables (E) It reduces the bias that comes from interviewer effect 3. An ecologist would like to estimate the mean carbon monoxide level of the air in a particular city. The carbon monoxide levels are measured on 14 days during a month and recorded. A histogram of the 25 readings is roughly symmetrical, with no outlying values. The mean and standard deviation of these values are 5.4 and 2.2, respectively. Assume the 25 days can be considered a simple random sample of all days. Which of the following is a correct statement? 2.2 (A)A 95% confidence interval for  is 5.4贝 2.145 14 2.2 (B) A 95% confidence interval for  is 5.4贝 2.145 13 2.2 (C) A 95% confidence interval for  is 5.4贝 2.160 14 2.2 (D)A 95% confidence interval for  is 5.4贝 2.160 13 (E) The sample is too small to trust the results

4. To estimate the proportion of TV viewers watching a certain special, how large of a random sample is required so that the margin of error is 0.04 with 99.6% confidence? (A)18 (B) 36 (C) 96 (D)1296 (E) 1492

5. A quality control engineer at a steel mill must estimate the mean tensile strength of a new machine using a random sample of 12 beans. The actual population distribution for this machine is unknown, but graphical displays of the sample indicate that the assumption of normality is reasonable. Since there are no historical data for this prototype machine, the variability of the process is completely unknown. The engineer determines a t- distribution rather than a z-distribution because (A)He has a small sample, making the z-distribution inappropriate (B) He is using data rather than theoretical methods to determine the mean (C) The data comes from only one machine (D)The variability of the machine is unknown (E) The t-distribution results in a narrower confidence interval

6. A company wants to estimate the mean net weight of all 32-ounce packages of its Yummy Taste cookies at 95% confidence. It is known that the standard deviation of net weights is 0.1 ounce. The sample size that will yield the margin of error within 0.02 ounces of the population mean is (A)9 (B) 10 (C) 96 (D)97 (E) More information is needed 7. A random sample of 25 tourists who visited Hawaii this summer spent an average of $1420 on this trip with a standard deviation of $285. The 95% confidence interval for the mean money spent by all tourists who visit Hawaii is (A)($1302, $1538) (B) ($1308, $1531) (C) ($1397, $1443) (D)($1363, $1477) (E) ($1385, $1465)

8. A sample of 1000 adults showed that 31% of them are smokers. To estimate the proportion of people in the entire population who smoke, what additional information would you need? (A)The size of the population (B) The amount of confidence you desire in your estimate (C) The standard deviation for the number of smokers (D)The length of time the people smoked (E) All the information you need is contained in the problem

9. When comparing a 95% confidence interval with a 99% confidence interval created from the same data, how will the intervals differ? (A)The sample size must be known to determine the difference (B) The mean of the sample must be known to determine the difference (C) The use of the t-distribution or the z-distribution will determine how the two intervals differ (D)The 95% interval will be wider than the 99% interval (E) The 95% interval will be narrower than the 99% interval

10. Increasing the sample size by a factor of 4 will have what effect on the margin of error? (A)It will increase the margin of error by a factor of 4 (B) It will decrease the margin of error by a factor of 4 (C) It will increase the margin of error by a factor of 2 (D)It will decrease the margin of error by a factor of 2 (E) It will decrease the margin of error by a factor of 16

11. The principal of Southside High School, a large urban school of 4,252 students, took a simple random sample of 250 Southside students and found that 43% of them were involved in extracurricular activities. The 90% confidence interval for the estimate of students involved in extracurricular activities at Southside High School is (A)(0.3899, 0.4701) (B) (0.3780, 0.4820) (C) (0.3778, 0.4822) (D)(0.3785, 0.4815) (E) (0.1327, 0.2112)

12. A local politician wants to estimate the percentage of voters who plan to support a referendum to curb development in the county. How large of a sample will be needed to ensure a margin of error of no more than 3%, with 95% confidence? (A)896 (B) 752 (C) 632 (D)1068 (E) More information is needed 13. A national news magazine surveyed 1,500 adults in the United States, and found that 37% disapproved of the Administration’s handing of domestic issues. The magazine reported the results as 37%  3%. What degree of confidence is reported in these results? (A)97% (B) 56% (C) 94% (D)99% (E) There is not enough information

14. Professor Graham wants to reduce the width of the confidence interval around his estimate of the proportion of adults who are carriers of a certain bacteria. What can he do to accomplish this? (A)Decrease his sample size (B) Increase the confidence interval (C) Change his estimate of pˆ (D)Increase his sample size (E) None of these will result in a smaller confidence interval

15. A random sample of adult male physicians at Memorial Hospital was taken, and the mean cholesterol level was found to be 183 mg/dL. A 95% confidence interval for the corresponding population mean is 183  17 mg/dL. Which of the following statements must be true? (A)95% of the population measurements fall between 166 and 200 (B) 95% of the sample measurements fall between 166 and 200 (C) If 100 samples were taken, 95% of the sample means would fall between 166 and 200 (D)P(166  x  200) = 0.95 (E) If  = 160 this x of 183 mg/dL would be unlikely to happen

16. A recent survey of 500 people reported that 67% of American adults believe that high gasoline prices are caused by the greed of oil companies. The margin of error was reported as 3%. What does the margin of error mean? (A)No more than 70% of the population believes that high gasoline prices are caused by the greed of oil companies (B) The actual parameter is between 64% and 70% (C) It is unlikely that the reported statistic would be 67%, unless the true value was between 64% and 70% (D)Three percent of the people were not surveyed (E) Three percent of the time, the value obtained would be different from 67%

17. A random sample of five snack foods available in the vending machines in the school cafeteria contained the following amounts of sodium (in mg): 310, 350, 320, 28, and 340. What is the 90% confidence interval for the amount of sodium in mg per snack food? (A)320  26.1 (B) 320  20.2 (C) 320  18.78 (D)320  24.7 (E) 320  21.78 18. An economist for the government needs to estimate the mean income for households not covered by health insurance in the city of Albany. He collects a random sample of 1,500 families, and finds the mean income for the sampled households is $18,870 with a standard deviation of $7,240. Calculate a reasonable confidence interval for the true mean income for households not covered by health insurance.

19. A random sample of 150 seniors at SDSU were asked if they had cheated on an exam or major paper at any time during their college career. A total of 93 seniors reported that they had cheated. (A)Calculate a 95% confidence interval for the proportion of all seniors who had cheated during their college careers.

(B) How many students should be surveyed to obtain a 95% confidence interval that is within 1% of the correct percent of seniors who cheated?

(C) How would the length of the confidence interval be affected if the confidence level were changed to 80%? Justify your answer. Chapter 11: Testing a Claim  Intro to significance testing  Stating Hypothesis and alternative hypothesis  Components of a significance test  Conditions  Calculations  Interpretation  One-sided vs. two-sided  Statistical significance and P-value  alpha  Duality with CI  Uses and abuses of tests  Statistical significance vs. practical importance  Type I and II errors in contest  Connections between power and Type II error

1. A bottling company claims there are 2 liters of soda in a large bottle. The Bureau of Weights and Measures believes that the company is cheating the consumer by putting less than 2 liters in a bottle. The bureau decides to conduct an experiment to determine if the consumer is being cheated. Which of the following hypotheses would be appropriate? (A) H0 :m= 2, Ha : m 2 (B) H0 :m= 2, Ha : m < 2 (C) H0 :m= 2, Ha : m > 2 (D) H0 :m� 2, Ha : m 2 (E) H0 :m< 2, Ha : m = 2

2. The z-test may not be used when I. the sample is too small II. the standard deviation of the population is unknown III. the population is not normally distributed IV. the sample is not normally distributed

(A) I only (B) II only (C) III only (D) II and IV (E) I and IV

3. The probability of finding a true difference in a hypothesis test can be increased when which of the following is true? (A) n is increased and  is increased (B) n is increased and  is decreased (C) n is decreased and  is increased (D) n is decreased and  is decreased (E) None of the above 4. The analysis of a sample of 250 shoppers at a mall in a large metropolitan area produced a 99% confidence interval that the mean amount spent that day was ($124, $154). Suppose

you wish to test the null hypothesis that H0:  = $160 at the  = 0.01 level of significance. Can you use the data provided to draw a conclusion? (A) Yes; it can be concluded that the mean amount spent is significantly different from $160, since this value is not in the 99% confidence interval (B) Yes; it can be concluded that the mean amount spent is not significantly different from $160, since this value is not in the 99% confidence interval (C) No; the distribution of the population must be known before a conclusion can be drawn (D) No; the data is needed to properly conduct a hypothesis test (E) No; hypotheses cannot be tested based on a confidence interval

5. A researcher conducted an experiment regarding the effectiveness of a new drug. Following the statistical analysis, the results were reported with a p-value of 0.12. Based on this p-value, which of the following conclusions should the researcher reach?

(A) Reject the null hypothesis, since p-value of 0.12 is greater than the significance level of 0.05. (B) Reject the null hypothesis, since 1 – p-value is 0.88, which is greater than the significance level of 0.05. (C) Fail to reject the null hypothesis, since there is a 12% chance that you could obtain these results when H0 is true, which is higher than the significance level of 0.05. (D) Fail to reject the null hypothesis, since there is an 88% chance that you could obtain these results when H0 is true, which is higher than the significance level of 0.05. (E) Accept the null hypothesis, since the p-value is too large

6. If a null hypothesis is rejected when it is actually true, then (A) A Type I error occurs (B) A Type II error occurs (C) A  error occurs (D) A random error occurs (E) A power occurs

7. The power of a significance test against a particular alternative is 91%. Which of the following is true? (A) The probability of a Type I error is 91% (B) The probability of a Type II error is 91% (C) The probability of a Type I error is 9% (D) The probability of a Type II error is 9% (E) The probability of an alpha error is 9%

Chapter 12: Significance Tests in Practice  Testing a claim about mu  One-sample t-test  Paired t-test  Testing a claim about p  Significance tests  What if the conditions aren’t met? Chapter 13: Comparing Two Population Parameters  Matched pairs data vs. independent samples  CI for difference between two means (unpaired and unpaired)  Estimating 1  2  Two sample t-tests and assorted df possibilities  CI for difference between two proportions  Estimating p1  p 2  Significance test for comparing two population proportions (unpaired and unpaired)



Chapter 14: Inference about Distributions of Population Proportions  Chi-square goodness of fit test  Chi-square test of homogeneity  Chi-square test of association/independence  Expected counts  Chi-square distribution

Chapter 15: Inference about Linear Regression  The linear regression model  Population vs. sample regression lines  Significance tests about the slope of a least-squares regression line  Computer output  CI for slope of a least-squares regression line