<p>September 29, 2009 Lecture #2 240A-1 L. Phillips Exploratory Data Analysis and JMP</p><p>I. Open the JMP program by going to Start, Programs, Statistics, JMP 5.0.1 (select).</p><p>II. Open the data file students by clicking on the open data table button in the JMP </p><p> starter window and scrolling over to the file students.jmp in the folder Sample </p><p>Data.</p><p>The five columns contain the five variables:</p><p>Age: an ordinal variable</p><p>Sex: a nominal or categorical variable</p><p>Height: a cardinal or numeric variable</p><p>Weight: a cardinal or numeric variable:</p><p>Idnum: id number, a nominal variable</p><p>Note: there are 233 observations or rows</p><p>III. To display ordinal and nominal variables, from the menu bar choose </p><p> analyze/distributions</p><p>In the distribution dialog box, select the variables age and sex and drag to the y, </p><p> columns window. Hit the OK button.</p><p>You can see there are more boys than girls and more twelve year olds than</p><p> other ages. The graph on the left for the variable age is a histogram, plotting the </p><p> frequency or number of observations for each age category. The graph on its right</p><p> is a mosaic bar chart , showing the fraction of observations in each category. By </p><p> hitting the red triangle button to the left of the word age, and choosing histogram </p><p> options, you can add a count axis to the histogram.</p><p>IV. To display a numerical variable, click on the data window to make it active and , </p><p> from the menu bar choose analyze/distributions September 29, 2009 Lecture #2 240A-2 L. Phillips Exploratory Data Analysis and JMP</p><p>In the distribution dialog box, select the variables height and weight and drag to </p><p> the y, columns window. Hit the OK button.</p><p>Use the hand icon and drag to the right on the histogram columns to </p><p> obtain finer categories of height. You can see that the mode is 62 inches. The </p><p> maximum height is 72 inches and the minimum height is 51 inches. The graph on </p><p> the left for the variable height is a histogram, plotting the frequency or number </p><p> of observations for each height. The graph on its right is an outlier box chart . </p><p>The ends of the box are the 25th and 75th quantiles (quartiles), 58 and 64, </p><p> respectively. The difference between these quartiles, 6, is the inter-quartile range, </p><p> a measure of dispersion. Once again, for height, the 75th quartile is 64, with 25% </p><p> of the observations lying above this height, and the 25th quartile is 58 with 25% of</p><p> the observations lying below this height, so the inter-quartile range is 6. The </p><p> median height is 61 inches, with 50 % 0f the observations above this height. The </p><p> median is illustrated in the box by a line. The lines on either end of the box are </p><p> whiskers, and extend to the outermost data points within the distance, for </p><p> example, 75th quartile + 1.5* inter-quartile range, i.e. 64 + 1.5*6, or 73. Since the </p><p> maximum height is 72 inches, the whisker ends there. Thus there are no outliers, </p><p> or heights to plot beyond this whisker. The 25th quartile is 58, so the whisker will </p><p> potentially extend down to 49, but the minimum height is 51 inches, so the </p><p> whisker ends at 51, and there are no outlier heights below this .</p><p>The diamond is the called the means diamond. Note the mean or average </p><p> height is 61.33 inches, above the median of 61. The extent of the diamond is a </p><p>95% confidence interval around the mean, i.e. the probabilty of the mean height September 29, 2009 Lecture #2 240A-3 L. Phillips Exploratory Data Analysis and JMP</p><p> lying above or below the diamond is only 5%. We will study the calculation of </p><p> these confidence intervals in the weeks ahead.</p><p>Note there is an outlier observation for the weight variable, so this may be </p><p> an individual that requires medical diagnosis. The red bracket in the box plot </p><p> designates the range of the shortest half of the data, i.e. the 50% of the </p><p> observations that are most dense, i.e clustered around the central tendency.</p><p>In the moments list are the mean and standard deviation of the observation</p><p> values, for example for height.</p><p>V. The Spinning Plot</p><p>Select the data window and from the graph menu choose spinning plot. In</p><p> the dialog box, (use the control key to) select the height, weight, and age variables</p><p> and drag to the y, column box. Click OK. </p><p>Note the positive relationship or correlation between weight and height as </p><p> age increases. Use the hand icon to rotate the three-dimensional data plot. Try </p><p> using the white background( red triangle to the right of the rotation icons). You </p><p> can use the lasso icon to select the outlier point and from the data table identify </p><p> the idnum of this individual.</p><p>VI. Help Menu</p><p>The manuals are available online and provide instructions for using the </p><p>JMP program. Select help from the menu bar and select contents.</p><p>VII. Analysis of a Subset of Female Students</p><p>Use the students window and repeat the instructions at the beginning of </p><p> section III above, i.e. from the menu bar choose analyze/distributions and select September 29, 2009 Lecture #2 240A-4 L. Phillips Exploratory Data Analysis and JMP</p><p> age and sex and drag to the y, columns window. Highlight females in the </p><p> histogram. Note that all of the observations for females are now selected in the </p><p> data window. From the Tables menu in the bar, select subset. In the dialog box, </p><p> choose a name such as female subset of students. This data file can then be used </p><p> to conduct analysis on the height, weight, and age variables, as before, including </p><p> producing histograms and box plots, as well as a rotating plot, but restricted to </p><p> females.</p>
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages4 Page
-
File Size-