Exploratory Data Analysis and JMP
Total Page:16
File Type:pdf, Size:1020Kb
September 29, 2009 Lecture #2 240A-1 L. Phillips Exploratory Data Analysis and JMP
I. Open the JMP program by going to Start, Programs, Statistics, JMP 5.0.1 (select).
II. Open the data file students by clicking on the open data table button in the JMP
starter window and scrolling over to the file students.jmp in the folder Sample
Data.
The five columns contain the five variables:
Age: an ordinal variable
Sex: a nominal or categorical variable
Height: a cardinal or numeric variable
Weight: a cardinal or numeric variable:
Idnum: id number, a nominal variable
Note: there are 233 observations or rows
III. To display ordinal and nominal variables, from the menu bar choose
analyze/distributions
In the distribution dialog box, select the variables age and sex and drag to the y,
columns window. Hit the OK button.
You can see there are more boys than girls and more twelve year olds than
other ages. The graph on the left for the variable age is a histogram, plotting the
frequency or number of observations for each age category. The graph on its right
is a mosaic bar chart , showing the fraction of observations in each category. By
hitting the red triangle button to the left of the word age, and choosing histogram
options, you can add a count axis to the histogram.
IV. To display a numerical variable, click on the data window to make it active and ,
from the menu bar choose analyze/distributions September 29, 2009 Lecture #2 240A-2 L. Phillips Exploratory Data Analysis and JMP
In the distribution dialog box, select the variables height and weight and drag to
the y, columns window. Hit the OK button.
Use the hand icon and drag to the right on the histogram columns to
obtain finer categories of height. You can see that the mode is 62 inches. The
maximum height is 72 inches and the minimum height is 51 inches. The graph on
the left for the variable height is a histogram, plotting the frequency or number
of observations for each height. The graph on its right is an outlier box chart .
The ends of the box are the 25th and 75th quantiles (quartiles), 58 and 64,
respectively. The difference between these quartiles, 6, is the inter-quartile range,
a measure of dispersion. Once again, for height, the 75th quartile is 64, with 25%
of the observations lying above this height, and the 25th quartile is 58 with 25% of
the observations lying below this height, so the inter-quartile range is 6. The
median height is 61 inches, with 50 % 0f the observations above this height. The
median is illustrated in the box by a line. The lines on either end of the box are
whiskers, and extend to the outermost data points within the distance, for
example, 75th quartile + 1.5* inter-quartile range, i.e. 64 + 1.5*6, or 73. Since the
maximum height is 72 inches, the whisker ends there. Thus there are no outliers,
or heights to plot beyond this whisker. The 25th quartile is 58, so the whisker will
potentially extend down to 49, but the minimum height is 51 inches, so the
whisker ends at 51, and there are no outlier heights below this .
The diamond is the called the means diamond. Note the mean or average
height is 61.33 inches, above the median of 61. The extent of the diamond is a
95% confidence interval around the mean, i.e. the probabilty of the mean height September 29, 2009 Lecture #2 240A-3 L. Phillips Exploratory Data Analysis and JMP
lying above or below the diamond is only 5%. We will study the calculation of
these confidence intervals in the weeks ahead.
Note there is an outlier observation for the weight variable, so this may be
an individual that requires medical diagnosis. The red bracket in the box plot
designates the range of the shortest half of the data, i.e. the 50% of the
observations that are most dense, i.e clustered around the central tendency.
In the moments list are the mean and standard deviation of the observation
values, for example for height.
V. The Spinning Plot
Select the data window and from the graph menu choose spinning plot. In
the dialog box, (use the control key to) select the height, weight, and age variables
and drag to the y, column box. Click OK.
Note the positive relationship or correlation between weight and height as
age increases. Use the hand icon to rotate the three-dimensional data plot. Try
using the white background( red triangle to the right of the rotation icons). You
can use the lasso icon to select the outlier point and from the data table identify
the idnum of this individual.
VI. Help Menu
The manuals are available online and provide instructions for using the
JMP program. Select help from the menu bar and select contents.
VII. Analysis of a Subset of Female Students
Use the students window and repeat the instructions at the beginning of
section III above, i.e. from the menu bar choose analyze/distributions and select September 29, 2009 Lecture #2 240A-4 L. Phillips Exploratory Data Analysis and JMP
age and sex and drag to the y, columns window. Highlight females in the
histogram. Note that all of the observations for females are now selected in the
data window. From the Tables menu in the bar, select subset. In the dialog box,
choose a name such as female subset of students. This data file can then be used
to conduct analysis on the height, weight, and age variables, as before, including
producing histograms and box plots, as well as a rotating plot, but restricted to
females.