4/20/2018
Predicting the Probability of Attaining Water Quality Classes
Tom Danielson, Ph.D. Jeanne DiFranco Beth Connors Leon Tsomides
MAINE DEPARTMENT OF ENVIRONMENTAL PROTECTION Protecting Maine’s Air, Land and Water
Overview
• Conceptual Introduction to Discriminant Analysis – Example with iris data • Case study related to water quality – Wetland bioassessments
1 4/20/2018
Discriminant Analysis
• Predicts the probability of a sample belonging to a pre‐defined group
Ronald Fisher, 1936
“The greatest biologist since Darwin” Richard Dawkins
Edgar Anderson’s Iris Data (1935)
Beachhead Iris Northern Blue Flag Southern Blue Flag Iris setosa Iris versicolor Iris virginica
L. Bernler
Bob Gutowski
C. Pierce
2 4/20/2018
Iris Measurements PETAL 50 flowers of each species • Petal length (cm) • Petal width (cm) • Sepal length (cm) • Sepal width (cm)
SEPAL
Comparing One Variable with Two Species
T‐test (p < 0.001, means are different) 8.0 7.5 7.0 6.5 6.0 5.5 Sepal Length Sepal 5.0 versicolor virginica
3 4/20/2018
Comparing One Variable with Three Species
ANOVA (p < 0.001, at least one mean is different) 8.0 7.5 7.0 6.5 6.0 5.5 5.0
Sepal Length Sepal 4.5 setosa versicolor virginica
Same For All Four Variables
2.5 7 6 2.0 5 1.5 4 1.0 3 0.5 2 Petal Width Petal Petal Length 1
s et os a vers i c olor virginic a setosa versicolor virginica
8.0 7.5 4.0 7.0 3.5 6.5 6.0 3.0 5.5 2.5 5.0
Sepal Width 4.5
2.0 Sepal Length
s et os a vers i c olor virginic a setosa versicolor virginica
4 4/20/2018
Discriminant Analysis
• Classifies samples based on patterns in the variables – 2 or more groups – Groups must be already defined – Multiple variables that show some differences among groups (metrics) – Transform to approximate normal distribution
Data Format
n = 150, randomly selected 100 for model building and 50 for validating model
Sepal.Length Sepal.Width Petal.Length Petal.Width Species 5.1 3.5 1.4 0.2 setosa 4.9 3 1.4 0.2 setosa 4.7 3.2 1.3 0.2 setosa 4.6 3.1 1.5 0.2 setosa 5 3.6 1.4 0.2 setosa 5.4 3.9 1.7 0.4 setosa ……………
5 4/20/2018
2-Variable Model • Sepal Length Sepal Width Model setosa versicolor virginica setosa 33 10• Model Building Actual versicolor 1 21 9 (n=100) virginica 0926 80% correct
2-Variable Model • Sepal Length Sepal Width Model setosa versicolor virginica setosa 33 10• Model Building Actual versicolor 1 21 9 (n=100) virginica 0926 80% correct
Model setosa versicolor virginica • Validation (n=50) setosa 16 00 88% correct Actual versicolor 0 14 5 virginica 069
6 4/20/2018
2.5 7 6 2.0 5 1.5 4 1.0 3 0.5 2 Petal Width Petal Petal Length 1
setosa versicolor virginica setosa versicolor virginica
8.0 7.5 4.0 7.0 3.5 6.5 6.0 3.0 5.5 2.5 5.0
Sepal Width 4.5
2.0 Sepal Length
setosa versicolor virginica setosa versicolor virginica
3-Variable Model • Petal Length Sepal Length Sepal Width Model setosa versicolor virginica setosa 34 00• Model Building Actual versicolor 0 29 2 98% correct virginica 0233 100% correct with validation data
7 4/20/2018
Variable Coefficients
Sepal Sepal Petal Length Width Length Constant ( ) ( ) ( ) setosa ‐77.7 24.6 17.3 ‐20.3 versicolor ‐73.4 14.8 5.3 9.6 virginica ‐101.3 10.5 5.0 20.9