STATISTICAL PLANNING of EXPERIMENTS Seppo Tenitz FOOD-601 Extrusion 2019
Total Page:16
File Type:pdf, Size:1020Kb
STATISTICAL PLANNING OF EXPERIMENTS Seppo Tenitz FOOD-601 Extrusion 2019 1 1. FUNDAMENTALS AND RECAPITULATION • Experimental research finds out how different treatments affect the experimental units to be studied. The treatments differ from one another by different combinations of levels in independent (also predicting, or explanatory) variables. • An experimental unit is that something, independent of oth- er experimental units, which a certain treatment is focused on. A certain treatment leads to a certain observation about dependent (also predicted, or explained) variables. • A plan of experiments determines how treatments decided in advance (a priori) are focused on the experimental units, and in which order the treatments will be put into practice. • The aim is to produce as trustful and convincing information as possible about the effects of the treatments on the pre- dicted variables. 2 1 The alternative to experimental research is observational re- search: • no controlled experiments but direct observations are col- lected about the experimental units • the state of the experimental units are not affected active- ly but it is followed how the units react the variables pre- dicting their state • the experimental units are chosen randomly from a larger population or the universe (perusjoukko) of those units • a trustful final result usually requires more observations than what is needed in experimental studies • a natural approach e.g. in economical, social, environmen- tal or behavioural research 3 • In all experimentation there is usually in the backgound some implicit or abreast preconception about the pat- tern how the values of the predicted variables are de- termined by the predicting variables and their levels • If that preconception can be shaped into a form of an informative set of experiments the result of the test se- ries is often a mathematical equation or a set of such ones that precisely decribes that pattern and by which it is possible by calculations to predict the results of future experiments • Such a mathematical description of a real system is called an experimental mathematical model (system) of the real system if it can be seen to have some gen- eral validity and not just to be a mere simplification of a bunch of results 4 2 • One frequently applied experimental model type is a by statistical means obtained polynomial of one or several (predicting) variables • The question in the application of such a model structure with systems based on physics and chemistry is whether the polynomials describe in a scientifically and meaningly correct way the relationship between the predicting vari- ables and the response variables • The answer is very often no, but the polynomial is an ade- quate choice because any continuous, differentiable and scientifically valid model function y = f(x) can be approxi- mated using a Taylor series, if the scrutinity is restricted to a narrow region of x. The new function is in the form of y 2 p-1 = bo + b1x + b2x + ... + bp-1x + , i.e. a polynomial. • The relation enlarges also to functions of several variables 5 For a researcher who performs experiments to obtain such a metamodel (a ”model of a model”) as a polynomial, there are three main types of variables to consider: controlled, indepen- dent and dependent: Controlled variables (whose values are kept constant, e.g. sample temperature in a measurement) Predicting variables Response variables (whose values can (whose values are be chosen freely dependent on the and which are in system values of the pre- that way indepen- dicting variables) dent but controlled) Other variables These must be avoided or their influence must be minimized! 6 3 Different types of errors are inevitably connected with experimental studies: • Usual sources of coarse or systematic errors are, for instance • an incorrect or unsuitable experimental design • an error in experimental working made by a researcher • an error in the observation of the measured quantity • an error in writing down the measured quantity • an error in calculation or • limitations of the measuring equipment used: accuracy (ul- koinen tarkkuus), precision (sisäinen tarkkuus), measuring range (mittausalue) • The always inherent random fluctuation in the measured value of a variable becomes apparent in studies as a statistical exper- imental error, the magnitude of which can be stated as the vari- ance of a set of repeated measurements or trials. 7 Bricks of the statistical planning of experiments by which the influences of errors are tackled are • replication of an experiment in similar conditions • the experimental error can be evaluated • an exact evaluation of the experimental effect is enabled • randomization of the experimental units or the order of experiments • the effects of unanticipated sources of variation on the ob- servations are made smaller • blocking of the experimental units • the effects of unavoidable but predictable sources of varia- tion on the observations are made smaller • subgrouping of experiments e.g. by differences in batches of ingredients, type of soil, human beings, points in time, sex, ... 8 4 The nature of the variables connected with the research problem largerly determines the sta- tistical analysis method of the results. e.g.: Predicted / dependent Variable Categorical Continuous Logistic analysis Analysis of Categorical of variance variance Predicting / independent Continuous Logistic regres- Regression sion analysis analysis 9 • From the point of view of the regression analysis, in the design of experiments (DoE) either the uncertainty of the coefficients bi of a polynomial model or the predictions obtained for y by that (which both are dependent on both the error variance of the measured values of y and the experimental design) are tried to be minimized. • Other objectives: • it is ensured that enough informative events arises as a result of the experiments • it is ensured that the effects of predicting variables can be studied apart from one another (to avoid aliasing) • it is ensured that the interactions between the predict- ing variables can be found out trustfully • the amount of experiments is minimized, if needed 10 5 A good experimental design for e.g. preliminary experiments: the 22 full factorial design + center points → ŷ = bo + b1x1 + b2x2 + b12x1x2 x2 40 min 1 3 replicates from which the 0 error variance time 20 min -1 x -1 0 1 1 100oC temperature 120oC 11 Terms and concepts: constant, intercept regression coeffi- predicting cients, slopes variables ŷ = bo + b1x1 + b2x2 2 2 ŷ = bo + b1x1 + b2x2 + b12x1x2 + b11x1 + b22x2 MAIN EFFECTS, i.e. LINEAR PREDICTORS: predictors are made up of pre- dicting variables or, alterna- x1, x2 tively, are such ones already NONLINEAR EFFECTS: the constant and the regression x 2, x 2, x x coefficients together are often 1 2 1 2 called regression parameters INTERACTIONS: the effect of x1 on the value of y is dependent on the value of x2 and vice versa: x1x2 12 6 Graphically, e.g. in the case of two predicting variables: X2 = u1 Y X2 = u2 the value of X2 is fixed, the values of Y on different val- ues of X1 are measured X1 13 Applications of the results of regression analysis: • Identification of statistically significant predictors • Comparison of alternatives: Is there e.g. any difference be- tween the influence of two optional ingredients? If not, the cheaper one is chosen. • Obtaining an optimal result: Which levels of which predict- ing variables should be chosen. Fitting several simultane- ous objectives together. Taking the bounds associated with the predicting variables into consideration. • Reducing variability: Is it e.g. possible to reduce the varia- tion in some quality attribute by changing the recipe. • Improving robustness: By which recipe e.g. replacing the baking oven used with a new one would affect as little as possible some quality attribute. 14 7 Hands-on questions: • what kind of model and what kind of experimental design? • an experimental design of certain type → a regression model of certain structure! • which of the all possible predicting variables are included in the model? • screening experiments, the Pareto principle, scientific literature, causal analysis + practical experience, ... • are qualitative variables taken along and which would be their values? • should there be used transformed variables instead of the actual ones (log xi, y, …)? • yes, if the presuppositions associated with the analysis methods to be used are not otherwise fulfilled 15 • Two important properties characterizing an experi- mental design are its orthogonality and rotatability (kiertosymmetrisyys tai kierrettävyys) • a design matrix X is orthogonal if its condition number (kuntoisuusluku) = 1 • Every rotatable experimental design is also ortho- gonal but not necessarily vice versa • E.g. a 23 factorial design, Matlab: • X = twon(3); cond(X) = 1 → orthogonal • X = [X; [2.5 0.13 6]]; cond(X) = 2.5067 → this is not; instead there is multicolliarity in the data 16 8 Similar partial multicollinearity in a D-optimal design whose condition number = 1.583, i.e. quite small: one variable Yet the Pearson’s correlation coefficient between the last two variables is remarkably -1 -1 1 high (-0.474) → the rule of thumb is 0.30 so like multi- 0.5 1 -1 collinear designs should not -1 1 -1 be applied without a careful preconsideration 0.5 -1 1 one experiment 0.5 1 1 -1 1 1 coded i.e. nor- malized values 0.25 0 0 17 Consequences of multicollinearity: • The basis: perfect multicollinearity – let x2 = x1 in the regression model ŷ = bo + b1x1 + b2x2 – by substituting ŷ = bo + (b1 + b2)x1 = bo + x1 → the coefficients b1, b2 and stick together so the effects of the predictors x1 and x2 on ^the response variable ŷ cannot be separated from one anotherb → also the OLS calculation of the model breaks down ... • The practice: partial multicollinearity • the variance inflation factor VIFj of the j:th coefficient bj of 2 2 the model = 1/(1-Rj ) in which Rj is the coefficient of deter- mination in linear regression of a predictor on all the other 2 predictors of the model (here e.g.