Input Variable Selection and Calibration Data Selection for Storm Water Quality Regression Models
Siao Sun and Jean-Luc Bertrand-Krajewski
1 Outline
• Problem statement • Case and data • Methods • Results and discussion • Conclusions
Siao Sun, Jean-Luc Bertrand-Krajewski 9th International Conference on Urban 2 INSA Lyon, LGCIE, France Drainage Modelling Belgrade 2012 Problem statement
• Storm water quality models are a useful tool in storm water management. • Interests grow in analyzing existing data to develop models. • Regression for storm water quality modeling is a common method.
Siao Sun, Jean-Luc Bertrand-Krajewski 9th International Conference on Urban 3 INSA Lyon, LGCIE, France Drainage Modelling Belgrade 2012 Problem statement
Context: Regression model for modeling storm water quality.
With numerous measured events, questions are:
1) Inputs selection 2) Calibration data selection
Siao Sun, Jean-Luc Bertrand-Krajewski 9th International Conference on Urban 4 INSA Lyon, LGCIE, France Drainage Modelling Belgrade 2012 Case and data
Siao Sun, Jean-Luc Bertrand-Krajewski 9th International Conference on Urban 5 INSA Lyon, LGCIE, France Drainage Modelling Belgrade 2012 Case and data
Saône River
Genay
St Germain Neuville Montanay
Curis Cailloux Albigny Fleurieu
Poleymieux
Couzon Fontaines St Martin Rochetaillée Sathonay V. Rhône Limonest Fontaines/ St Romain Saône
Sathonay C. Collonges River St Didier St Cyr Rillieux la Pape La tour de Dardilly Salvagny
Champagne Caluire
Vaulx en Velin Jonage
Marcy L'étoile
Charbonnières 4 9 Ecully Meyzieu 6 1 Décines St Genis Ecully catchment les Ollières Tassin la demi lune LYON Villeurbanne 5 3 2
Craponne
Francheville St Foy les Residential Lyon 7 8 Bron La Chassieu Mulatière
Grézieu la varenne Oullins
St Fons Vénissieux Combined sewer Pierre Bénite St Priest
St Genis Laval Irigny 245 ha Feyzin Corbas
Mions
Vernaison Charly Solaize 60 active ha Rhône 42 % impervious River Siao Sun, Jean-Luc Bertrand-Krajewski 9th International Conference on Urban 6 INSA Lyon, LGCIE, France Drainage Modelling Belgrade 2012 Case and data
• 239 storm events between 2004-2008 • Event TSS load (kg) as storm water quality index = N + + • Regression model y ∑i=1bi xi b0 e
Output TSS Inputs
56 potential explanatory variables
Siao Sun, Jean-Luc Bertrand-Krajewski 9th International Conference on Urban 7 INSA Lyon, LGCIE, France Drainage Modelling Belgrade 2012 Methodology - Input selection
Shall we use all 56 variables as model inputs?
• Calibration (simulation ability) and verification (prediction ability) were performed. • When splitting data, uncertainty due to calibration data selection exists. • Cross validation was used.
Siao Sun, Jean-Luc Bertrand-Krajewski 9th International Conference on Urban 8 INSA Lyon, LGCIE, France Drainage Modelling Belgrade 2012 Results - Input selection
0.85
0.8
0.75 NS values 0.7
Calibration Verification 0.65
0 5 10 15 20 25 30 35 40 Number of model inputs 7 variables are selected
Siao Sun, Jean-Luc Bertrand-Krajewski 9th International Conference on Urban 9 INSA Lyon, LGCIE, France Drainage Modelling Belgrade 2012 Results - Input selection
7 variables are selected
• Antecedent dry period from the last rainfall event exceeding 5 mm • Antecedent dry period from the last rainfall event exceeding 30 mm • Maximum rain intensity in 10 minutes during 12 hours before the event • Maximum rain intensity in 10 minutes during 72 hours before the event • Maximum rain intensity in 30 minutes during 4 hours before the event • Maximum flow rate • Total flow volume
Siao Sun, Jean-Luc Bertrand-Krajewski 9th International Conference on Urban 10 INSA Lyon, LGCIE, France Drainage Modelling Belgrade 2012 Methodology - Calibration data selection
• Random selection
The number of calibration events: 8-239
For each number, calibration events are randomly selected for many times to study the uncertainty due to calibration data selection
A calibrated model is evaluated by calibration data sets, verification data sets and all data sets
Siao Sun, Jean-Luc Bertrand-Krajewski 9th International Conference on Urban 11 INSA Lyon, LGCIE, France Drainage Modelling Belgrade 2012 Results - Calibration data selection
• Random selection (a) Calibration 1
0.8
0.6
NS values 0.4
0.2 20 40 60 80 100 120 140 160 180 200 220 240 (b) Verification 1
0.5 NS values
0 20 40 60 80 100 120 140 160 180 200 220 240 (c) All data 0.8
0.6
0.4 NS values
0.2 20 40 60 80 100 120 140 160 180 200 220 240 Number of Events used for calibration
Siao Sun, Jean-Luc Bertrand-Krajewski 9th International Conference on Urban 12 INSA Lyon, LGCIE, France Drainage Modelling Belgrade 2012 Results - Calibration data selection
• Random selection
(c) All data 0.8
0.6
0.4 NS values
0.2 20 40 60 80 100 120 140 160 180 200 220 240 Number of Events used for calibration
Siao Sun, Jean-Luc Bertrand-Krajewski 9th International Conference on Urban 13 INSA Lyon, LGCIE, France Drainage Modelling Belgrade 2012 Methodology - Calibration data selection
• Select representative data for calibration using cluster method
Divide all events into n clusters if n events is wanted for calibration
A cluster contains data sets of similarity according to standardized Euclidean distance between data sets
One data set is selected from a cluster to represent the cluster
Siao Sun, Jean-Luc Bertrand-Krajewski 9th International Conference on Urban 14 INSA Lyon, LGCIE, France Drainage Modelling Belgrade 2012 Results - Calibration data selection • Cluster selection (a) Calibration 1 0.8 0.6
NS values 0.4 Cluster method
0.2 20 40 60 80 100 120 140 160 180 200 220 240
(b) Verification 1
0.5
NS values
0 20 40 60 80 100 120 140 160 180 200 220 240
(c) All data 0.8
0.6
0.4 NS values
0.2 20 40 60 80 100 120 140 160 180 200 220 240 Number of Events used for calibration
Siao Sun, Jean-Luc Bertrand-Krajewski 9th International Conference on Urban 15 INSA Lyon, LGCIE, France Drainage Modelling Belgrade 2012 Results - Calibration data selection • Cluster selection (a) Calibration 1 0.8 0.6
NS values 0.4 Cluster method
0.2 20 40 60 80 100 120 140 160 180 200 220 240
Siao Sun, Jean-Luc Bertrand-Krajewski 9th International Conference on Urban 16 INSA Lyon, LGCIE, France Drainage Modelling Belgrade 2012 Results - Calibration data selection • Cluster selection
(b) Verification 1
0.5 NS values
0 20 40 60 80 100 120 140 160 180 200 220 240
Siao Sun, Jean-Luc Bertrand-Krajewski 9th International Conference on Urban 17 INSA Lyon, LGCIE, France Drainage Modelling Belgrade 2012 Results - Calibration data selection • Cluster selection
(c) All data 0.8
0.6
0.4 NS values
0.2 20 40 60 80 100 120 140 160 180 200 220 240 Number of Events used for calibration
Siao Sun, Jean-Luc Bertrand-Krajewski 9th International Conference on Urban 18 INSA Lyon, LGCIE, France Drainage Modelling Belgrade 2012 Conclusions
1. Overfitting occurs when too many inputs are considered in a model 2. Data used for calibration can affect model behaviors 3. A cluster method can effectively aid choosing representative calibration data
Siao Sun, Jean-Luc Bertrand-Krajewski 9th International Conference on Urban 19 INSA Lyon, LGCIE, France Drainage Modelling Belgrade 2012 Thank you for your attention!
Input Variable Selection and Calibration Data Selection for Storm Water Quality Regression Models
Siao Sun and Jean-Luc Bertrand-Krajewski
Siao Sun, Jean-Luc Bertrand-Krajewski 9th International Conference on Urban 20 INSA Lyon, LGCIE, France Drainage Modelling Belgrade 2012