Exploratory Data Analysis and JMP

Total Page:16

File Type:pdf, Size:1020Kb

Exploratory Data Analysis and JMP

September 29, 2009 Lecture #2 240A-1 L. Phillips Exploratory Data Analysis and JMP

I. Open the JMP program by going to Start, Programs, Statistics, JMP 5.0.1 (select).

II. Open the data file students by clicking on the open data table button in the JMP

starter window and scrolling over to the file students.jmp in the folder Sample

Data.

The five columns contain the five variables:

Age: an ordinal variable

Sex: a nominal or categorical variable

Height: a cardinal or numeric variable

Weight: a cardinal or numeric variable:

Idnum: id number, a nominal variable

Note: there are 233 observations or rows

III. To display ordinal and nominal variables, from the menu bar choose

analyze/distributions

In the distribution dialog box, select the variables age and sex and drag to the y,

columns window. Hit the OK button.

You can see there are more boys than girls and more twelve year olds than

other ages. The graph on the left for the variable age is a histogram, plotting the

frequency or number of observations for each age category. The graph on its right

is a mosaic bar chart , showing the fraction of observations in each category. By

hitting the red triangle button to the left of the word age, and choosing histogram

options, you can add a count axis to the histogram.

IV. To display a numerical variable, click on the data window to make it active and ,

from the menu bar choose analyze/distributions September 29, 2009 Lecture #2 240A-2 L. Phillips Exploratory Data Analysis and JMP

In the distribution dialog box, select the variables height and weight and drag to

the y, columns window. Hit the OK button.

Use the hand icon and drag to the right on the histogram columns to

obtain finer categories of height. You can see that the mode is 62 inches. The

maximum height is 72 inches and the minimum height is 51 inches. The graph on

the left for the variable height is a histogram, plotting the frequency or number

of observations for each height. The graph on its right is an outlier box chart .

The ends of the box are the 25th and 75th quantiles (quartiles), 58 and 64,

respectively. The difference between these quartiles, 6, is the inter-quartile range,

a measure of dispersion. Once again, for height, the 75th quartile is 64, with 25%

of the observations lying above this height, and the 25th quartile is 58 with 25% of

the observations lying below this height, so the inter-quartile range is 6. The

median height is 61 inches, with 50 % 0f the observations above this height. The

median is illustrated in the box by a line. The lines on either end of the box are

whiskers, and extend to the outermost data points within the distance, for

example, 75th quartile + 1.5* inter-quartile range, i.e. 64 + 1.5*6, or 73. Since the

maximum height is 72 inches, the whisker ends there. Thus there are no outliers,

or heights to plot beyond this whisker. The 25th quartile is 58, so the whisker will

potentially extend down to 49, but the minimum height is 51 inches, so the

whisker ends at 51, and there are no outlier heights below this .

The diamond is the called the means diamond. Note the mean or average

height is 61.33 inches, above the median of 61. The extent of the diamond is a

95% confidence interval around the mean, i.e. the probabilty of the mean height September 29, 2009 Lecture #2 240A-3 L. Phillips Exploratory Data Analysis and JMP

lying above or below the diamond is only 5%. We will study the calculation of

these confidence intervals in the weeks ahead.

Note there is an outlier observation for the weight variable, so this may be

an individual that requires medical diagnosis. The red bracket in the box plot

designates the range of the shortest half of the data, i.e. the 50% of the

observations that are most dense, i.e clustered around the central tendency.

In the moments list are the mean and standard deviation of the observation

values, for example for height.

V. The Spinning Plot

Select the data window and from the graph menu choose spinning plot. In

the dialog box, (use the control key to) select the height, weight, and age variables

and drag to the y, column box. Click OK.

Note the positive relationship or correlation between weight and height as

age increases. Use the hand icon to rotate the three-dimensional data plot. Try

using the white background( red triangle to the right of the rotation icons). You

can use the lasso icon to select the outlier point and from the data table identify

the idnum of this individual.

VI. Help Menu

The manuals are available online and provide instructions for using the

JMP program. Select help from the menu bar and select contents.

VII. Analysis of a Subset of Female Students

Use the students window and repeat the instructions at the beginning of

section III above, i.e. from the menu bar choose analyze/distributions and select September 29, 2009 Lecture #2 240A-4 L. Phillips Exploratory Data Analysis and JMP

age and sex and drag to the y, columns window. Highlight females in the

histogram. Note that all of the observations for females are now selected in the

data window. From the Tables menu in the bar, select subset. In the dialog box,

choose a name such as female subset of students. This data file can then be used

to conduct analysis on the height, weight, and age variables, as before, including

producing histograms and box plots, as well as a rotating plot, but restricted to

females.

Recommended publications