Laboratory 1 for ST7002

We are going to work with the Diamond data as discussed in class. The aim of this laboratory is to introduce the regression facilities in Minitab and to refresh memories on how to use the package. The diamonds dataset is located in https://www.scss.tcd.ie/postgraduate/pgcertstats/current/Lecture%20Notes/. Here is a brief description of the data.

Carat - Weight of diamond stones in carat units Colour - D, E, F, G, H or I Clarity - IF, VVS1, VVS2, VS1 or VS2 Certification Body - GIA, IGI or HRD Price (Singapore $)

The aim is to model the price diamonds using the above variables. The website www.adiamondisforever.com educates the layperson on the factors that influence the price of a diamond stone. These are the 4 C's: Carat, Clarity, Colour and Cut.

How to use the regression routine in Minitab?

First of all remember there is a Help Menu which is very useful and also Help buttons on each screen. I am going to go through each window for the Regression command and explain most of it. We do not need to know what everything is. I would strongly encourage everybody to explore the package.

 Choose Stat-Regression-Regression to get

Only quantitative variables will appear on this menu. Choosing the Graph button gives

I suggest that you choose Standardize Residuals for plots. The Four in one option is useful. You can plot Residuals versus any of the variables using the box in the end

Choosing the Options button gives this

Durbin-Statistic – test for autocorrelation of residuals as discussed very briefly in class. We will look at Variance- inflation factors late. Forget the rest for the moment. To calculate prediction intervals for new variables, it is easier to type in the values into a new column or columns if you have more than 1 independent variable. You need a column for each independent variable. Choosing Results gives this

This is self explanatory. .

Choosing Storage gives this

This will store the checked variables on your Minitab Worksheet. Other useful commands

 Use Calc – Calculator to create new variables e.g. interactions or logged variables

 Use Calc- Make Indicator variables to create indicator variables Steps to complete

First step is to import data into Minitab. I use File-Open Worksheet and change Files of Type to Text. (Either chose the *.csv option or *.txt option)

Look at each variable separately (either graphically or just getting frequencies). In case you have forgotten Stat-Tables-Tally generates frequencies.

The aim of the exercise is predict the price of diamonds from the variables Carats, Colour Clarity and Certification. Now look at each of the variables vs Price using appropriate graphs. Summarise results.

Build a regression model of Price vs Carats and interpret coefficients. (I know we should look at diagnostics first).

Interpret coefficients.

Is it a good model – why or why not?

What do you think of the diagnostics? Use log(price) and see what happens and interpret the coefficients.

Again look at diagnostics. Do they suggest anything?

Trying adding Colour to the model

Does this variable make a difference?