MTH u481, Summer 1 2005, Computer Lab 1

Probability Estimation and Confidence Intervals.

Introduction

This lab uses SPSS package, available on NUNET. If you strongly prefer using other programs (e.g. Excel) you have the option of using them as well. SPSS for Windows is a versatile computer package that can perform a wide variety of statistical procedures. Tutorials for SPSS are offered by various universities on-line and can be easily found through Google search for “SPSS Tutorial”. For instance, a nice overview is provided by www.boun.edu.tr/support/bucc/spss/spss.htm website.

To open SPSS:

Open the NUNET menu, Applications, Statistical and Computational packages, double click on SPSS.

SPSS has several different windows inside the application window. Data Editor Window shows the content of a current data file. A blank data editor window automatically opens when you start SPSS, and can be used to enter data. Output window displays the results of any statistical procedure you run, such as descriptive statistics or frequency distributions. A Chart Carousel Window is similar to the output window, but it contains all the graphs that you direct SPSS to create.

SPSS can read different types of data files, most importantly ASCII data files, and SPSS data files with extension *.sav.

Probability estimation and confidence intervals (Any spreadsheet with random number generator can be used instead of SPSS)

Open a new data file, create a new variable and fill it with 100 ‘1’s. This is necessary to produce a random transformation. The easiest way to fill a variable with 1’s is to type 1 in the first row, select “copy”, highlight 99 other rows, and select “paste”.

In the Transform menu choose the Compute option. In the “target variable” window enter a name for a new variable, for example randbern (random Bernoulli). For the function, choose RV.BERNOULLI(p) among the list of functions. This function produces a random Bernoulli variable, which is 1 with probability p, and 0, with probability 1 - p. “p” is a parameter, replace “?” with a value, for example 0.3. Press OK. This will create new variable, randbern, which contains one hundred 0’s or 1’s. (There are 100 values because you entered 100 1’s, and then made a transformation).

Now, pretend that you don’t know where these 100 binary values came from, but you assume that they were generated by a Bernoulli process. So, it is necessary to estimate the probability of a ’1’. The natural way to do this is to take the number of occurrences of a ‘1’, and divide them by n (n = 100). This is the mean of the sample. To compute the mean, select Analyze, Descriptive Statistics, Descriptives, select randbern into the list of investigated variables, and press OK. You’ll get the mean, and some other parameters.

Notice that most probably, the mean is not going to equal 0.3, but it should be close. In most real applications, the actual mean is not known. Now, compute the 90% confidence interval for the mean of the sample. (Its meaning is that if you repeat the experiment many times, then in 90% of the cases the mean will be inside that interval). To compute the interval, select Analyze, Descriptive Statistics, Explore. In the explore window, press the Statistics button, and select confidence interval for the mean. Make it 90%. Press Continue to return to explore menu. Choose “Statistics” in the Display box, select the randbern variable, and press OK again. You will get something like this Explore

Case Processing Summary Cases Valid Missing Total N Percent N Percent N Percent RANDBERN 100 100.0% 0 .0% 100 100.0%

Descriptives Statistic Std. Error RAND Mean .4878 .07903 BERN 90% Confidence Interval Lower .3547 for Mean Bound Upper .6209 Bound 5% Trimmed Mean .4864 Median .0000 Variance .256 Std. Deviation .50606 Minimum .00 Maximum 1.00 Range 1.00 Interquartile Range 1.0000 Skewness .051 .369 Kurtosis -2.103 .724

Record your confidence interval, and recalculate the ranbern variable. You will get a new sample of 100 values. Calculate the mean. Check whether or not 0.3 is in the confidence interval. Repeat this last step 9 more times, to get 10 different values for the mean. Find how many of them are in the interval. Compare it to 90%.