<<

ST 430/514 Introduction to Regression / for and the Social II Introduction to

Modeling a Response A regression model describes how a dependent (or response) Y is affected, on , by one or more independent variables (or factors, or covariates) x1, x2,..., xk .

Example Bleaching cotton:

Y = measured whiteness of a cotton swatch

x1 = of bleaching bath

x2 = time spent in the bath.

1 / 13 Introduction to Regression Analysis Modeling a Response ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

The average value of Y , E(Y ), depends on x1, x2,..., xk , so it is a function of them:

E(Y ) = f (x1, x2,..., xk ) = f (x).

We may know the general form of f (x), but it may contain constants β0, β1, . . . , βp whose values are unknown.

So more completely,

E(Y ) = f (x1, x2,..., xk ; β0, β1, . . . , βp) = f (x, β).

This equation is a regression model.

2 / 13 Introduction to Regression Analysis Modeling a Response ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

In any given , Y will differ from E(Y ).

The difference  = Y − E(Y ) is called the random , and clearly

E() = E(Y ) − E(Y ) = 0.

We can then write the regression model as

Y = E(Y ) +  = f (x, β) + .

3 / 13 Introduction to Regression Analysis Modeling a Response ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

Example: bleaching cotton Bleaching is a chemical reaction in which colored impurities are oxidized either to colorless products, or to soluble products that are washed out.

If we knew all the reactions, their rates at various , and the solubility of the products, we could use a process-based model to predict whiteness, E(Y ).

In practice, we don’t have all the details, so instead we use an empirical model.

4 / 13 Introduction to Regression Analysis Modeling a Response ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

The simplest empirical model is a linear function:

E(Y ) = β0 + β1x1 + β2x2.

A quadratic model gives a better approximation:

2 2 E(Y ) = β0 + β1x1 + β2x2 + β3x1x2 + β4x1 + β5x2 .

2 If β4 < 0, β5 < 0, and β3 < 4β4β5, this function has a maximum, which gives the optimum combination of temperature and time.

5 / 13 Introduction to Regression Analysis Overview of Regression Analysis ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Origin of “Regression”

Francis Galton studied inheritability of physical characteristics such as height.

Consider the of an individual’s height from the gender average.

Suppose that the deviation height Y of a son is, on average, linearly related to the average deviation height x of his parents:

E(Y ) = β0 + β1x

6 / 13 Introduction to Regression Analysis Regression Applications ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

The intercept β0 measures overall increase in height between generations, which is interesting but not related to inheritability.

If β1 = 1, the son inherits the full characteristic of his parents.

If β1 = 0, there is no inheritability.

Galton observed β1 ≈ 2/3, and described this as a regression to the . (OED: from Latin regressus, from regredi ’go back, return’, from re- ’back’ + gradi ’to walk’.)

7 / 13 Introduction to Regression Analysis Regression Applications ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

See , “Regression towards mediocrity in hereditary stature”. The Journal of the Anthropological Institute of Great Britain and Ireland, Vol 15, pages 246–263. (or Wikipedia!)

The term “regression” has since been used for any such analysis, involving one or more variables, and involving linear and nonlinear relationships, mostly having no connection with inheritability.

8 / 13 Introduction to Regression Analysis Regression Applications ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Estimation

In a regression context, we from many populations.

For example, in bleaching cotton, for each combination of temperature and time, we could many cotton swatches. Each time, the measured whiteness is drawn from some population.

The constants β0, β1, . . . , βp are of that collection of populations.

9 / 13 Introduction to Regression Analysis Collecting the for Regression ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

We need to make inferences about them, in the form of: point estimates; interval estimates; hypothesis tests.

We shall get point estimates using the method of .

For other inferences, we need to know the distribution of the , and we shall assume that they are normally distributed.

10 / 13 Introduction to Regression Analysis Collecting the Data for Regression ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Observational and

In some investigations, the independent variables x1, x2,..., xk can be controlled; that is, held at desired values. For example, time and temperature in the bleaching problem. The resulting data are called experimental.

11 / 13 Introduction to Regression Analysis Collecting the Data for Regression ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II

In other cases, the independent variables cannot be controlled, and their values are simply observed. For example, Galton’s heights of parents and sons. The resulting data are called observational. Observational data show how the value of the response is associated with values of the independent variables, but generally cannot reveal cause and effect.

George Box: “To find out what happens to a system when you interfere with it, you have to interfere with it (not just passively observe it).”

12 / 13 Introduction to Regression Analysis Collecting the Data for Regression ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Random Thoughts About Statistical Models

A model is a simplified representation of reality.

George Box: “, but some are useful.”

John Tukey: “An approximate answer to the right question is worth a good deal more than the exact answer to an approximate problem.”

Albert Einstein: “For every complex question there is a simple and wrong solution.”

13 / 13 Introduction to Regression Analysis Random Thoughts