Non-Linear &

“If the are boring, then you've got the wrong numbers.” Edward R. Tufte (Statistics Professor, Yale University)

Regression Analyses When do we use these?

PART 1: find a relationship between response variable (Y) and a predictor variable (X) (e.g. Y~X) PART 2: use relationship to predict Y from X

Simple : y = b + m*x

y = β0 + β1 * x1

Multiple linear regression: y = β0 + β1*x1 + β2*x2 … + βn*xn

Non linear regression: when a line just doesn’t fit our

Logistic regression: when our data is binary (data is represented as 0 or 1) Non-linear Regression Curvilinear relationship between response and predictor variables

• The right type of non- are usually conceptually determined based on biological considerations

• For a starting point we can plot the relationship between the 2 variables and “visually check” which model might be a good option

• There are obviously MANY curves you can generate to try and fit your data Exponential Curve 퐸푥푝표푛푒푛푡𝑖푎푙: 푦 = 푎 + 푏푐푥 Non-linear regression option #1 • Rapid increasing/decreasing change in Y or X for a change in the other

Ex: bacteria growth/decay, human population growth, infection rates (humans, trees, etc.)

0 < c < 1 c > 1

+b +b

response (y) response (y) response a a predictor (x) predictor (x)

a a

0 < c < 1 c > 1 response (y) response -b (y) response -b

predictor (x) predictor (x) Logarithmic Curve 퐿표𝑔푎푟𝑖푡ℎ푚𝑖푐: 푦 = 푎 + 푏푥푐 Non-linear regression option #2 • Rapid increasing/decreasing change in Y or X for a change in the other

Ex: survival thresholds, resource optimization

-c +c

+b +b

response (y) response (y) response a a predictor (x) predictor (x)

a a

-c +c response (y) response -b (y) response -b

predictor (x) predictor (x) 푏 Hyperbolic Curve 퐻푦푝푒푟푏표푙𝑖푐: 푦 = 푎 + 푥 + 푐 Non-linear regression option #3 • Rapid increasing/decreasing change in Y or X for a change in the other Ex: survival of a function of population

• Similar to exponential and logarithmic curve but now we have 2 asymptotes

a

+b -b

response (y) response (y) response

a

c predictor (x) c predictor (x) Parabolic Curve 푃푎푟푎푏표푙𝑖푐: 푦 = 푎 + 푏 ∗ 푥 − 푐 2 Non-linear regression option #4 • Rapid increasing/decreasing change in Y or X for a change in the other followed by the reverse trend Ex: survival of a function of an environmental variable

Upward Parabolic Downward Parabolic

a

+b -b

response (y) response (y) response a c c predictor (x) predictor (x) 2 Gaussian Curve 퐺푎푢푠푠𝑖푎푛: 푦 = 푎 ∗ 푏 푥−푐 Non-linear regression option #5 • Resembles a normal distribution Ex: survival of a function of an environmental variable

• Where 0 < b < 1

a

b response (y) response

c predictor (x) Sigmoidal Curve 푎 푆𝑖𝑔푛표𝑖푑푎푙: 푦 = 푥−푐 + 푑 Non-linear regression option #6 1 + 푏 • Stability in Y followed by rapid increase then stability again Ex: restricted growth, learning response, a threshold has to occur for a response effect

• Where b > 1 and c > 1

a

c

b response (y) response

d predictor (x) Michaelis Menten Curve 푎 ∗ 푥 푀𝑖푐ℎ푎푒푙𝑖푠 푀푒푛푡푒푛: 푦 = Non-linear regression option #7 푏 + 푥 • Rapid increasing/decreasing change in Y or X for a change in the other Ex: biological process as a function of resource availability

• Similar to exponential and logarithmic curve but now we have 2 parameters – this model comes from kinetics/physiology

a

1 a

2 response (y) response response (y) response -b

a b predictor (x) predictor (x) Non-Linear Regression Procedure:

1. Plot your variables to visualize the relationship a. What curve does the pattern resemble? b. What might alternative options be?

2. Decide on the curves you want to compare and run a non-linear regression curve fitting a. You will have to estimate your parameters from your curve to have starting values for your curve fitting function

3. Once you have parameters for your curves compare models with AIC

4. Plot the model with the lowest AIC on your point data to visualize fit

Non-linear regression curve fitting in R: install.packages("minpack.lm") nlsLM(responseY~MODEL, start=list(starting values for model parameters)) Non-Linear Regression Output from R

Non-linear model that we fit Simplified logarithmic with slope=0

Estimates of model parameters

Residual sum-of-squares for your non-linear model

Number of iterations needed to estimate the parameters Non-Linear Regression Curve Fitting Procedure:

1. Plot your variables to visualize the relationship a. What curve does the pattern resemble? b. What might alternative options be?

2. Decide on the curves you want to compare and run a non-linear regression curve fitting a. You will have to estimate your parameters from your curve to have starting values for your curve fitting function

3. Once you have parameters for your curves compare models with AIC

4. Plot the model with the lowest AIC on your point data to visualize fit

Non-linear regression curve fitting in R: install.packages("minpack.lm") nlsLM(responseY~MODEL, start=list(starting values for model parameters)) Akaike’s Information Criterion (AIC) How do we decide which model is best? In the 1970s he used information theory to build a numerical equivalent of Occam's razor

Occam’s razor: All else being equal, the simplest explanation is the best one • For , this the simplest model is preferred to a more complex one Hirotugu Akaike, 1927-2009 • Of course, this needs to be weighed against the ability of the model to actually predict anything

• AIC considers both the fit of the model and the model complexity • Complexity is measured as number parameters or the use of higher order polynomials • Allows us to balance over- and under-fitting in our modelled relationships – We want a model that is as simple as possible, but no simpler – A reasonable amount of explanatory power is traded off against model complexity – AIC measures the balance of this for us

Akaike’s Information Criterion (AIC) AIC in R • AIC is useful because it can be calculated for any kind of model allowing comparisons across different modelling approaches and model fitting techniques

• Model with the lowest AIC value is the model that fits your data best (e.g. minimizes your model residuals) – Output from R is a single AIC value

Akaike’s Information Criterion in R to determine best model: AIC(nlsLM(responseY~MODEL1, start=list(starting values))) AIC(nlsLM(responseY~MODEL2, start=list(starting values))) AIC(nlsLM(responseY~MODEL3, start=list(starting values))) Non-Linear Regression Curve fitting

• Use the parameter estimates outputted from nlsLM() to generate curve for plotting Non-Linear Regression Assumptions • NLR make no assumptions for normality, equal , or outliers

• However the assumptions of independence (spatial & temporal) and design considerations (, sufficient replicates, no pseudoreplication) still apply

• We don’t have to worry about statistical power here because we are fitting relationships – All we care about is if or how well we can model the relationship between our response and predictor variables Non-Linear Regression R2 for “” • Calculating an R2 is NOT APPROPIATE for non-linear regression

• Why? – For linear models, the sums of the squared errors always add up in a specific manner: 푆푆푅푒푔푟푒푠푠푖표푛 + 푆푆퐸푟푟표푟 = 푆푆푇표푡푎푙 2 푆푆푅푒푔푟푒푠푠푖표푛 – Therefore 푅 = 푆푆푇표푡푎푙 which mathematically must produce a value between 0 and 100%

– But in nonlinear regression 푆푆푅푒푔푟푒푠푠푖표푛 + 푆푆퐸푟푟표푟 ≠ 푆푆푇표푡푎푙 – Therefore the ratio used to construct R2 is in nonlinear regression

• Best to use AIC value and the measurement of the residual sum-of-squares to pick best model then plot the curve to visualize the fit Logistic Regression (a.k.a logit regression) Relationship between a binary response variable and predictor variables

푒훽0+훽1푥1+훽2푥2+⋯+훽푛푥푛 퐿표𝑔𝑖푠푡𝑖푐 푀표푑푒푙: 푦 = Logit Model 1 − 푒훽0+훽1푥1+훽2푥2+⋯+훽푛푥푛

• Binary response variable can be considered a class (1 or 0) • Yes or No • Present or Absent

• The linear part of the logistic regression equation is used to find the probability of being in a category based on the combination of predictors

• Predictor variables are usually (but not necessarily) continuous • But it is harder to make inferences from regression outputs that use discrete or categorical variables Binomial distribution vs Normal distribution

• Key difference: Values are continuous (Normal) vs discrete (Binomial)

• As sample size increases the binomial distribution appears to resemble the normal distribution

• Binomial distribution is a family of distributions because the shape references both the number of observations and the probability of “getting a success” - a value of 1

“What is probability of x success in n independent and identically distributed Bernoulli trials?”

• Bernoulli trial (or binomial trial) - a random with exactly two possible outcomes, "success" and "failure", in which the probability of success is the same every time the experiment is conducted Logistic Regression vs Linear Regression • Linear Regression - references the Gaussian (normal) distribution - uses ordinary to find a best fitting line the estimates parameters that predict the change in the dependent variable for change in the independent variable

• Logistic regression - references the Binomial distribution - estimates the probability (p) of an event occurring (y=1) rather then not occurring (y=0) from a knowledge of relevant independent variables (our data) - regression coefficients are estimated using maximum likelihood estimation (iterative process)

Maximum likelihood estimation How coefficients are estimated for logistic regression

• Complex iterative process to find coefficient values that maximizes the

Likelihood function - probability for the occurrence of a observed set of values X and Y given a function with defined parameters

Process:

1. Begins with a tentative solution for each coefficient 2. Revise it slightly to see if the likelihood function can be improved 3. Repeats this revision until improvement is minute, at which point the process is said to have converged

Logistic Regression vs Linear Regression • Linear Regression - references the Gaussian (normal) distribution - uses to find a best fitting line the estimates parameters that predict the change in the dependent variable for change in the independent variable

• Logistic regression - references the Binomial distribution - estimates the probability (p) of an event occurring (y=1) rather then not occurring (y=0) from a knowledge of relevant independent variables (our data) - regression coefficients are estimated using maximum likelihood estimation (iterative process)

Simple Logistic Regression in R: lm(response~predictor, family="binomial") summary(lm(response~predictor, family="binomial"))

Multiple Logistic Regression in R: lm(response~predictor1+predictor2+…+predictorN, family="binomial") summary(lm(response~predictor1+predictor2+…+predictorN, family="binomial")) Logistic Regression (a.k.a logit regression) Output from R

Estimate of model parameters (intercept and slope) of estimates

AIC value for the model

Tests the null hypothesis that the coefficient is equal to zero (no effect)

A predictor that has a low p-value is likely to be a meaningful addition to your model because changes in the predictor's value are related to changes in the response variable

A large p-value suggests that changes in the predictor are not associated with changes in the response Logistic Regression (a.k.a logit regression) Pseudo R2 for “goodness of fit”

• In linear regression, the relationship between the dependent and the independent variables is linear

• However this assumption is not made in logistic regression 2 푆푆푅푒푔푟푒푠푠푖표푛 so we cannot use the calculation 푅 = 푆푆푇표푡푎푙 - REMEMBER we are not using sum-of-squares to estimate our parameters – we are using maximum likelihood estimation

• We can however calculate a pseudo R2 - Lots of options on how to do this, but the best for logistic regression appears to be McFadden's calculation

Estimating McFadden’s pseudo R2 in R: 2 푙푛퐿 푀퐹푈퐿퐿 푅 = 1 − mod=lm(response~predictor,family="binomial") 푙푛퐿 푀푖푛푡푒푟푐푒푝푡 mcF.r2=1-mod$/mod$null.deviance

퐿 = Estimated likelihood NOTE: Pseudo R2 will be MUCH lower than R2 values! Logistic Regression (a.k.a logit regression) Assumptions • Logistic regression make no assumptions for normality, equal variances, or outliers

• However the assumptions of independence (spatial & temporal) and design considerations (randomization, sufficient replicates, no pseudoreplication) still apply

• Logistic regression assumes the response variable is binary (0 & 1)

• We don’t have to worry about statistical power here because we are fitting relationships – All we care about is if or how well we can model the relationship between our response and predictor variables Important to Remember

A non-linear or logistic relationship DOES NOT imply causation!

AIC or pseudo 푅2 implies a relationship rather than one or multiple factors causing another factor value

Be careful of your interpretations!