Elementary Statistics I D Term 2003

Instructor: Carlos J. Morales

EXAM 1

Name: ______

Instructions: Answer as completely as possible, including middle steps to reach final answers. Solutions without any shown work will not receive credit. Explanations should be given in complete sentences. Good luck!

1.1 Compute the mean, median and standard deviation of the following data (20 Pts):

Data = 1, 5, 10, 43.

Mean = 14.75 Media = 7.5 SD = 19.189

1.2 Compute the 1-termWinsorized mean of the data above (5 Pts.). w-mean = 7.5 2. An engineer studying the reliability of a machine collected data on the time that the machine was down during a period of six months. She observed 1000 events when the machine was down, and the following graph is a histogram for the down-time in minutes.

The engineer’s intern computed the mean and median for the down time and obtained 31.95 and 29.83, but he forgot which one was which.

2.1 Which of the two numbers is the mean and which the median? Explain. (10 Pts.)

Because there are extreme values to the right (skewed data), the mean is bigger than the median. So the mean is 31.95.

2.2 The intern also computed all quartiles for the data and obtained

Q1 20.68 Q2 29.83 Q3 40.45 Q4 97.53

Compute the inter-quartile range (IQR), and find the largest data point. (10 Pts.)

IQR = 19.77 Largest data point is Q4 = 97.53.

2.3 If you were to draw a box plot, will you find an outlier? (5 Pts.) A+ = Q3 + 1.5*IQR = 70.105, So any data points above this value will be considered outliers. 3. Juan, a WPI graduate, started his own hi-tech company, and lat year his company went public. A time series graph for the price of the company stock for the last 100 trading days is shown below. A 10-day moving average is also displayed

An investor, interested in buying this company stock looks at a boxplot of the stock prize for the last 100 days (shown below). From the boxplot:

3.1 Is the distribution of the stock price symmetric? Explain. (10 Pts.)

Since the whiskers are of different size, it seems that the data is not symmetric.

3.2 Is the boxplot an appropriate way of summarizing the pattern of variation of the stock-price process? Use evidence from the time series plot to argue your point. (10 Pts.)

The boxplot is not a good way to summarize this data since the data seems not stationary (time series plot shows a possible trend).

3.3 A stock broker tries to convince the investor to buy the stock, and he argues that since historically (last 100 trading days) 50% of the time the price has fluctuated from 6.16 to 9.19, he does not expect the price to drop below $6. Find a possible flaw with this argument. (5 Pts.)

The time series plot shows that trends in the process may exists, so it seems more appropriate to consider the behavior of the prize of the stock in recent history, rather than lumping together all the data. Considering Q1 and Q3 (6.16 and 9.19) might not be appropriate for a non-stationary process. 4. Imagine a local journalist studying the public reaction to a new law enacted by congress on maternity leave. The journalist has limited time to write the article, so she tries to collect some data via the Internet. She figures that the cheapest and fastest way to collect opinion from people is to send email to a few hundred people with a survey. She decides to send email to everyone at WPI and College of the Holy Cross. She gets hundreds of responses, and promptly writes her article about the opinion of the Worcester population on the issue.

4.1 Briefly explain what’s wrong with the data collection mechanism of the journalist. (15 Pts.)

Selection Bias: this study samples students with access to email, which might represent the whole Worcester population. (Similar tot the “Dewy defeats Truman” problem. Also the sampled units will probably include mainly young people which might not represent the position off most people in the city.

Absence off Blocking: no mention of collecting variables such as age, race, level of education, etc.

No randomization: respondents self-select themselves into the study, and the ones who respond may be the ones with strong feelings about the matter, while most people might choose not to respond (there might be a difference between people who choose to respond and the whole population)

4.2 Briefly explain how you would change the design of the data collection. (10 Pts.)

Design a study where: units are randomly selected across all sector of the population in Worcester, making sure we include age groups, education level group, working groups in similar proportion that encountered in the Worcester population. Collect information about variables that may affect people’s opinion such as: married or not, number of kids, age, gender, etc.