STATISTICAL PLANNING OF Seppo Tenitz FOOD-601 Extrusion 2019

1

1. FUNDAMENTALS AND RECAPITULATION

• Experimental research finds out how different treatments affect the experimental units to be studied. The treatments differ from one another by different combinations of levels in independent (also predicting, or explanatory) variables. • An experimental unit is that something, independent of oth- er experimental units, which a certain treatment is focused on. A certain treatment leads to a certain observation about dependent (also predicted, or explained) variables. • A plan of experiments determines how treatments decided in advance (a priori) are focused on the experimental units, and in which order the treatments will be put into practice. • The aim is to produce as trustful and convincing information as possible about the effects of the treatments on the pre- dicted variables.

2

1 The alternative to experimental research is observational re- search: • no controlled experiments but direct observations are col- lected about the experimental units • the state of the experimental units are not affected active- ly but it is followed how the units react the variables pre- dicting their state • the experimental units are chosen randomly from a larger population or the universe (perusjoukko) of those units • a trustful final result usually requires more observations than what is needed in experimental studies • a natural approach e.g. in economical, social, environmen- tal or behavioural research

3

• In all experimentation there is usually in the backgound some implicit or abreast preconception about the pat- tern how the values of the predicted variables are de- termined by the predicting variables and their levels • If that preconception can be shaped into a form of an informative set of experiments the result of the test se- ries is often a mathematical equation or a set of such ones that precisely decribes that pattern and by which it is possible by calculations to predict the results of future experiments • Such a mathematical description of a real system is called an experimental mathematical model (system) of the real system if it can be seen to have some gen- eral and not just to be a mere simplification of a bunch of results

4

2 • One frequently applied experimental model type is a by statistical obtained polynomial of one or several (predicting) variables • The question in the application of such a model structure with systems based on physics and chemistry is whether the polynomials describe in a scientifically and meaningly correct way the relationship between the predicting vari- ables and the response variables • The answer is very often no, but the polynomial is an ade- quate choice because any continuous, differentiable and scientifically valid model function y = f(x) can be approxi- mated using a Taylor series, if the scrutinity is restricted to a narrow region of x. The new function is in the form of y 2 p-1 = bo + b1x + b2x + ... + bp-1x + , i.e. a polynomial. • The relation enlarges also to functions of several variables

5

For a researcher who performs experiments to obtain such a metamodel (a ”model of a model”) as a polynomial, there are three main types of variables to consider: controlled, indepen- dent and dependent: Controlled variables (whose values are kept constant, e.g. sample temperature in a measurement)

Predicting variables Response variables (whose values can (whose values are be chosen freely dependent on the and which are in system values of the pre- that way indepen- dicting variables) dent but controlled) Other variables These must be avoided or their influence must be minimized!

6

3 Different types of errors are inevitably connected with experimental studies:

• Usual sources of coarse or systematic errors are, for instance • an incorrect or unsuitable experimental design • an error in experimental working made by a researcher • an error in the observation of the measured quantity • an error in writing down the measured quantity • an error in calculation or • limitations of the measuring equipment used: accuracy (ul- koinen tarkkuus), precision (sisäinen tarkkuus), measuring (mittausalue) • The always inherent random fluctuation in the measured value of a variable becomes apparent in studies as a statistical exper- imental error, the magnitude of which can be stated as the vari- ance of a set of repeated measurements or trials.

7

Bricks of the statistical planning of experiments by which the influences of errors are tackled are

of an in similar conditions • the experimental error can be evaluated • an exact evaluation of the experimental effect is enabled • of the experimental units or the order of experiments • the effects of unanticipated sources of variation on the ob- servations are made smaller • of the experimental units • the effects of unavoidable but predictable sources of varia- tion on the observations are made smaller • subgrouping of experiments e.g. by differences in batches of ingredients, type of soil, human beings, points in time, sex, ...

8

4 The nature of the variables connected with the research problem largerly determines the sta- tistical analysis method of the results. e.g.:

Predicted / dependent Variable Categorical Continuous

Logistic analysis Analysis of Categorical of variance Predicting / independent Continuous Logistic regres- Regression sion analysis analysis

9

• From the point of view of the , in the (DoE) either the uncertainty of the

coefficients bi of a polynomial model or the predictions obtained for y by that (which both are dependent on both the error variance of the measured values of y and the experimental design) are tried to be minimized. • Other objectives: • it is ensured that enough informative events arises as a result of the experiments • it is ensured that the effects of predicting variables can be studied apart from one another (to avoid aliasing) • it is ensured that the interactions between the predict- ing variables can be found out trustfully • the amount of experiments is minimized, if needed

10

5 A good experimental design for e.g. preliminary experiments: the 22 full factorial design + center

points → ŷ = bo + b1x1 + b2x2 + b12x1x2 x2 40 min 1 3 replicates from which the

0 error variance time

20 min -1 x -1 0 1 1 100oC temperature 120oC

11

Terms and concepts:

constant, intercept regression coeffi- predicting cients, slopes variables ŷ = bo + b1x1 + b2x2 2 2 ŷ = bo + b1x1 + b2x2 + b12x1x2 + b11x1 + b22x2

MAIN EFFECTS, i.e. LINEAR PREDICTORS: predictors are made up of pre- dicting variables or, alterna- x1, x2 tively, are such ones already NONLINEAR EFFECTS: the constant and the regression x 2, x 2, x x coefficients together are often 1 2 1 2 called regression parameters INTERACTIONS:

the effect of x1 on the value of y is dependent on the value of x2 and vice versa: x1x2

12

6 Graphically, e.g. in the case of two predicting variables:

X2 = u1 Y

X2 = u2

the value of X2 is fixed, the values of Y on different val-

ues of X1 are measured

X1

13

Applications of the results of regression analysis:

• Identification of statistically significant predictors • Comparison of alternatives: Is there e.g. any difference be- tween the influence of two optional ingredients? If not, the cheaper one is chosen. • Obtaining an optimal result: Which levels of which predict- ing variables should be chosen. Fitting several simultane- ous objectives together. Taking the bounds associated with the predicting variables into consideration. • Reducing variability: Is it e.g. possible to reduce the varia- tion in some quality attribute by changing the recipe. • Improving robustness: By which recipe e.g. replacing the baking oven used with a new one would affect as little as possible some quality attribute.

14

7 Hands-on questions: • what kind of model and what kind of experimental design? • an experimental design of certain type → a regression model of certain structure! • which of the all possible predicting variables are included in the model? • screening experiments, the Pareto principle, scientific literature, causal analysis + practical experience, ... • are qualitative variables taken along and which would be their values? • should there be used transformed variables instead of the

actual ones (log xi, y, …)? • yes, if the presuppositions associated with the analysis methods to be used are not otherwise fulfilled

15

• Two important properties characterizing an experi- mental design are its and rotatability (kiertosymmetrisyys tai kierrettävyys) • a design matrix X is orthogonal if its condition number (kuntoisuusluku) = 1 • Every rotatable experimental design is also ortho- gonal but not necessarily vice versa • E.g. a 23 factorial design, Matlab: • X = twon(3); cond(X) = 1 → orthogonal • X = [X; [2.5 0.13 6]]; cond(X) = 2.5067 → this is not; instead there is multicolliarity in the data

16

8 Similar partial multicollinearity in a D- whose condition number = 1.583, i.e. quite small: one variable Yet the Pearson’s correlation coefficient between the last two variables is remarkably -1 -1 1 high (-0.474) → the rule of thumb is  0.30 so like multi- 0.5 1 -1 collinear designs should not -1 1 -1 be applied without a careful preconsideration 0.5 -1 1 one experiment 0.5 1 1 -1 1 1 coded i.e. nor- malized values 0.25 0 0

17

Consequences of multicollinearity:

• The basis: perfect multicollinearity

– let x2 = x1 in the regression model ŷ = bo + b1x1 + b2x2

– by substituting ŷ = bo + (b1 + b2)x1 = bo + x1

→ the coefficients b1, b2 and  stick together so the effects of

the predictors x1 and x2 on ^the response variable ŷ cannot be separated from one anotherb → also the OLS calculation of the model breaks down ... • The practice: partial multicollinearity

• the variance inflation factor VIFj of the j:th coefficient bj of 2 2 the model = 1/(1-Rj ) in which Rj is the coefficient of deter- mination in of a predictor on all the other 2 predictors of the model (here e.g. R1 in x1̂ = bo' + b2'x2 ) • VIF > 1 if the predictor is collinear, considerable with small samples if it is >2.5 and problematic at least if it is >(5-10)

18

9 • The square roots of the coefficients’ VIFs, calculated e.g. in Matlab by diag(corrcoef(X)^-1)', tell how much larger the standard errors of the coefficients are, compared with the case in which each predictor were uncorrelated with the other predictors in the equation • Thus, because the t for a coefficient is the ratio of the coefficient’s value and its , the coefficient would have to be, e.g. if the VIF = 4.0, two times larger to be statistically significant compared to an orthogonal case → multicollinearity tends to increase the standard errors of the regression coefficients, as well as the probabilities of their t or F statistics (reducing the power of the analysis) → and, if substantial, makes the calculated model unstable (small changes in the actualized predictor values tend to cause large changes in the coefficient values or even lead to changes in their signs)

20

A computer printout for a second-order polynomial:

Parameter Standard Model estimate error t statistic Probability intercept 9.1128 0.15234 59.82 0.0000 *** ltila -0.018291 0.10772 -0.1698 0.8708 kpit 1.3325 0.10772 12.37 0.0000 *** pnop -0.26369 0.10772 -2.448 0.0499 * ltila*ltila 0.15455 0.15234 1.014 0.3495 ltila*kpit 0.16127 0.15234 1.059 0.3305

ltila*pnop 0.18763 0.15234 1.232 0.2642 predictors kpit*kpit 0.24503 0.15234 1.608 0.1589 kpit*pnop 0.15868 0.15234 1.042 0.3377 pnop*pnop 0.11506 0.15234 0.7553 0.4787

statistically significant if prob < 0.05

21

10 So, in an orthogonal experimental design X = [X1, X2,..., Xk ] the pairwise correlation coefficients of the predicting vari-

ables x1, x2, ..., xk are zeros. Their effects on the response variable y can therefore be studied fully apart from one an- other. Orthogonality in all minimizes the of the regression parameters in the vector diag[2 (XTX)-1] where 2 is the error variance of y and X is the model matrix, gi- ven by [ones(N,1),),Xm] in Matlab. N is the number of ex- periments, correspondingly, and Xm may contain, in addi-

tion to the columns X1, X2,..., Xk , also columns of the sec- ond-order and/or terms of the predicting vari- ables. In a rotatable experimental design the variance of the pre- dicted variable depends at any combination of the coded predicting variables only on the distance of the inspection point from the coded centre of the design region (0,0,...,0).

22

Prediction variances of a rotatable second-order design:

23

11 Optimality and effiency of an experimental design: • The basis of optimality is some adequate calculato- ry estimate, the value of which is minimized or maxi- mized → different designs by the criterion chosen • Many types of criteria exist: A, C, D, E, G, I, T, ... • E.g. in a D-optimal experimental design the determi- nant of the information matrix XTX is maximized, in an A-optimal design the trace of the inverse matrix of XTX = tr[(XTX)–1 ] is maximized, etc. • D- or A-optimality minimizes the generalized vari- ance det[var(β)] of the regression parameters β at a certain number of experiments • G- or I- (also IV- or V-) optimality minimizes the pre- diction variance at a certain number of experiments • Optimality is dependent on the model matrix!

24

• The effiency (tehokkuus) of an experimental design manifests the optimality of the design as a numeri- cal value in terms of some optimality criterion • E.g. D- DE = 100[det(XTX)]1/p/N in which N is the number of experiments and p is the number of parameters in the model (including the constant) • values at the range [0, 100]; an orthogonal and ba- lanced design obtains the value 100 • Conventional application of the designs: • identification of the most important predicting vari- ables → interaction and/or main effects models are sufficient → D-optimal • optimization or simulation → second-order models or mixture models → G- or I-optimal

25

12 The ratio of D-ef- 2 ficiencies, e.g.:

CCD

→ the D-optimal

design 1 is 1/De = 2,1 times more ef- 1 ficient than the de- sign 2 → about double amount of experiments for an D-opt. equal variance of the regression pa- rameters is need- ed in the case 2

26

How to choose the values of the predicting vari- ables?

• General difficulties • too large a region → the model is inadequate to describe the maybe steep nonlinearity resulting • max(y)-min(y)  experimental error → the experi- mental effect is not distinguished from the noise • Rules of thumb • max(y)/min(y)  10 • max(y)-min(y) >> exp. error (at least 4-6x) • If the objective is to find the optimum of the system • first designs of a few experiments far from the op- timum point, first-order models • then, by acting systematically, seeking to the vici- nity of the optimum point where a larger design and more experiments → a second-order model

27

13 ”climbing a hill”

85 80

90

70 90

60

28

2. MOST GENERIC EXPERIMENTAL DESIGNS

• Experiments for regression modeling can be divided to one-factor-at-a-time experiments, screening (seulonta, haravointi) experiments and optimization experiments. • The OFAT experiments are based on the ancient doc- trine: change the value of one factor at a time and keep the values of other factors constant. This type of exper- iments are distrustful and in practice waste of time and resources. They must be strictly avoided! • In the screening experiments the objective is to find out the most important factors that affect the response vari- able(s) and the directions of their influences. • In the optimization experiments it is usually strived to a second-order regression model and greater exactness than that in the screening experiments.

29

14 A customary result of OFAT experimentation presented using the ”true” contour plot (for the yield of a chemical reaction):

the real maximum

Interactions between the predicting variables can- not be detected by OFAT experiments!

the found maximum

30

DoE methods synoptic table

(randomized complete block)

31

15 32

• According to which type of effects on the variable y one is able to find out by a certain experimental de- sign the designs are divided into three classes: • In a resolution III design, main effects are aliased with two-factor interactions. Unless the two-factor interactions are negligible, estimates of the main effects will be biased (harhainen). • In a resolution IV design, two-factor interactions will be aliased in pairs, but it is possible to esti- mate the main effects clear of any other main ef- fects or two-factor interactions. • In a resolution V design, all main effects and two- factor interactions will be clear of other main ef- fects or two-factor interactions. Unless the high- er order interactions are sizeable, a resolution V design will often be almost as good as a full fac- torial design.

33

16 2.1. Orthogonal designs of the resolution V

• Identification of a second-order polynomial pre-supposes experiments at least at three levels • The 2k design + center point experiments is not enough because the quadratic terms are aliased with one another • A 3k design is possible but not recommended: • the number of experiments increases fast when the number of predicting variables k increases • more experiments are needed than what is obligate • the design type is not rotatable which is considered to be a severe deficiency

34

Number of experiments (N) in terms of the number of pre- dicting variables (k) in different types of experimental de- signs:

35

17 Aliasing (samaistuminen) one variable e.g. in the 23 : -1 -1 -1 one experiment 1 -1 -1 -1 1 -1 coded i.e. nor- 1 1 -1 malized values -1 -1 1 only the main effects and 1 -1 1 the interaction terms can be included in the model! -1 1 1 1 1 1

36

>> [Xm, predictors] = intera(X) X is the ”design matrix” Xm = [ones(8,1),Xm] is the -1 -1 -1 1 1 1 1 1 1 ”model matrix” 1 -1 -1 1 -1 -1 1 1 1 2 2 The columns of x1 , x2 -1 1 -1 1 -1 1 1 -1 1 2 ja x3 are similar i.e. ali- 1 1 -1 1 1 -1 1 -1 1 ases in the Xm matrix. The pairwise Pearson’s -1 -1 1 1 1 -1 1 -1 1 correlation coefficients 1 -1 1 1 -1 1 1 -1 1 between them = 1. -1 1 1 1 -1 -1 1 1 1 1 1 1 1 1 1 1 1 1 predictors = 1 2 3 1 1 1 2 2 3 0 0 0 1 2 3 2 3 3

37

18 Central composite designs (CCDs):

• Consist of a 2k experimental design in which center point experiments and axial point (star point,  point) experiments have been added • Axial experiments are experiments in which all other variables but one variable have the center point value zero • The values in those points differing from zero are mostly either 1, 2k/4 or k • Three main types: CCC (circumscribed, the elemen- tary case), CCI (inscribed) and CCF (face centered) • CCC and CCI are spherical and rotatable, CCF is cu- bic and not rotatable

38

CCC CCI CCF

39

19 Scaled prediction variances (SPVs) for the CCF design (k = 2,  = 1, number of center point replications n = 1):

40

k = 2 As an example the CCC design, k = 2, n =2:

X = ccdesign(2, 'center',2) block 1

block 2

one to each also e.g.: block X = ccdesign(2, 'type', 'inscribed', 'center', 2) [X^ blocks] = ccdesign(3, 'type', 'faced', 'blocksize',7)

41

20 A CCC or CCI design, three predicting variables:

> 3 variables → ”hypersphere”

42

43

21 In an uniform precision experimental design the vari- ance of the model prediction ŷ is in the coded center (0, 0, ...,0) of the design equal to its variance ”at the coded radius of one”. The variance of ŷ is thus close to equal throughout the design region. The design is in that case said to be stable regarding the prediction variance. This property protects the regression coef- ficients of the model against third-order and higher interactions better than what nothing but orthogonal- ity does. The uniform precision designs are rotatable and hence also orthogonal. The uniform precision of the response variable is ob- tained by a proper number of the center point experi- ments, e.g. in Matlab by the command X = ccdesign (3,'center','uniform').

44

D-, A-, and G-efficiencies and IV criterion values for CCDs with cuboidal and spherical regions with one and two center points for k = 3 to 10 predicting vari- ables

45

22 Efficiencies of optimal CCDs for 3 predicting variables RCCD: rotatable OCCD: orthogonal FCC: face centered

no  number of replica- tions in the center point

rs  number of replica- tions in the  points

46

Box-Behnken designs (BBDs):

• ”incomplete central com- posite designs” • levels are -1, 0 and 1 so no corner points (corre- spond with ultimate con- ditions which often diffi- cult to realise) • almost rotatable or rota- table • here for three variables (15 experiments, 3 repli- Blocking by e.g.: cates in the center point): [X blocks] = bbdesign(4) X=bbdesign(3, 'center',3)

47

23 2.2. Second-order designs of small run size: • Doehlert design: • For k = 2, 4, 6 and 8 requires fewer experimental runs than the central composite or Box-Behnken designs • Not an orthogonal nor a rotatable design • A spherical design region • In Matlab e.g. X=doehlert(3) for three predicting vari- ables k • Popular especially in analytical chemistry • Hybrid design: • Saturated or near-saturated second-order designs → pure error and lack-of-fit cannot be estimated • Very small in size, excellent prediction variance pro- perties

48

• Small composite design (SCD): • Fractional factorial design in the cube of resolution III* + the axial and center runs as in the CCD • Main effects are aliased with paired interactions and paired interactions may be aliased with each other • Variance properties not comparable to those of the CCD • Available in many softwares and therefore popular, but altogether not advisable • Hoke design: • Three levels (-1, 0, 1) of the predicting variables similar- ly to the BB design; irregular fractions of the 3k factorial • Several versions called D1-D7 • Saturated (D1-D3) or near saturated (D4-D7) designs • A cuboidal design region

49

24 D2 Hoke design Hybrid design D310 SCD for k = 3 for k = 3:

50

2.3. Optimal designs

• Two types exist: continuous (or approximate) and dis- crete (or exact) of which the latter ones are important in practice • A computer program generates for a certain number of predicting variables and experiments and for a certain model structure, e.g. in Matlab for a D-optimal design

minmax = [-1,-1,-1;0.5,1,1]; % bounds of the design region [X,Xq]=rowexch(3,10,'quadratic','bounds',minmax); • Usually not orthogonal • Cost efficient; less experiments for an equal uncer- tainty of the model parameters or the model predic- tions than when using non-optimal designs

51

25 • Used when ordinary designs are not suitable for a certain situation or do not work in it, e.g. • the design region is irregular in shape • results of previous experiments should be includ- ed in the observation materials • qualitative predicting variables have more than two levels • the structure of the regression model is atypical • a mixture design includes both mixture variables and process variables • in mixture experiments the sum of the mass frac- tions of the mixture variables in one experiment is always 1 • therefore, also structures of the polynomials to be fitted to the observations become different from those in more conventional experiments

52

Pros of optimal designs: Cons of optimal designs: • suitable for any number of obser- • depend on assumed model vations and predicting variables • no replication and for any degree of the model • not always nice and symmetric; • can cope with constraints on the multicollinearity is usual design region • sometimes strange, exotic levels • can cope with quantitatative and of the predicting variables qualitative predicting variables at • several criteria can be used for the same time computing optimal designs → • flexibility: allow the researcher to which one should be chosen? create tailor-made designs • require computational effort • can be used when heterogenous variance correlated observations blocked experiments split-plot experiments nonlinear models

53

26 3. CHOICE OF THE EXPERIMENTAL DESIGN

• Depends on the objective and on which properties of the design are liked to emphasize. • Things that must be taken into consideration: • structure of the model to be fitted, number of quali- tative predicting variables (preferably not >1) • way of performing the experiments (replication, ran- domization, blocking and number of blocks) • orthogonality/rotatability of the design, stability of the model predictions, shape of the design region • situational limitations of the design • number of experiments (in terms of time, funding, ...) • infeasible experiments in the design range • unfit levels or number of the predicting variables

54

• efficiency of the design • possibility to proceed sequentially in the experimenta- tion • e.g. the CCD: first 2k designs and center point experi- ments from which, by the t-test, an information about necessity of the second-order terms in the model. Af- ter that the rest experiments, if needed. • occasionally the power (voimakkuus, herkkyys) of the tests aimed: is the experimental effect detected when the causation exists (the probability of which = 1 − b )

probabi- lity = b

• e.g. with optimal designs: how many experiments are needed to obtain a certain statistical reliability

55

27 Relative efficiencies of some G-, D- and I-optimal second-or- der designs at dif- ferent number of ex- periments (n) when there are from 2 to 5 predicting variables.

56

D- and G-efficien- cies of the most usual second-or- der designs

”hybrid design 310”

57

28 According to Montgomery (2013), a desirable design

58

4. EXPERIMENTATION

• Randomize the experiments in the manner stipulated by the type of your experimental design. Reset the val- ue of each predicting variable before each experiment. Use a split-plot design if you cannot do so. • Replicates must be genuine repetitions of the experi- ments. Repeated measurements from the same sam- ple are not replicates. Yet, take several values from each sample to calculate representative values. • As a novice, use established experimental designs and ways of blocking. Pool (i.e. combine) the replicates of the blocks only if their variances are equal. • If you deviate from the experimental design first check if remarkable aliasing appears between the predictors. • Always inspect your results critically in the light of the theory, if such is possible.

59

29 Randomization:

• Two fundamental ways to perform exist • complete randomization • the replicates are inserted evenly among the oth- er experiments, the order of which is randomized. The experimentation is commenced and finished by a replicate. • within a blocks structure • the treatment surroundings inside each block is homogenous but it varies between the blocks • the block effect is eliminated from the experimen- tal error to make that error as small as possible • the order of experiments is randomized only with- in the blocks

60

Orthogonal blocking:

• An experimental design blocks orthogonally if it is divided into blocks so that block effects do not af- fect the parameter estimates of the regression mod- el. • Two conditions must be satisfied to block a second- order design orthogonally: • each block must be a first-order orthogonal design • the fraction of the total sum of squares for the levels of each variable contributed by every block must be equal to the fraction of the total observations that occur in the block • As an example, let’s investigate the CCC design (k=2, n=2) shown in the last lecture

61

30 • Now, block 1 = X([1:4,9],:) and block 2 = X([5:9,10],:) • for both blocks cond([x1,x2]) = 1 so they are ortho- gonal • for the block 1 the sum of squares for each variable x1 and x2 = 4, the total sum of squares = 8, the num- ber of observations for either of the variables = 5 and the number of observations = 10 • so, 4/8 = 5/10 and the condition 2 is fullfilled • similarly, in the block 2 the sum of squares for each variable x1 and x2 = 4, the total sum of squares = 8, the number of observations for both of the variables = 5 and the total number of observations = 10 • so, 4/8 = 5/10 and the condition 2 is fullfilled • The design is thus orthogonally blocked

62

Split-plot experiments:

• Changing the values of some predicting variables may be difficult (from natural reasons) or uneconomic (re- levant in the industry) in the experimentation • NB: a bare inconvenience of the correct conduct is not a decent reason to cut through in the experiments! • For instance in extrusion it is laborious to change the temperature of the barrel in a randomized order. But restricting the randomization by performing the experi- ments in the order of ascending temperatures creates correlations between the results obtained at same tem- peratures and corrupts the results of statistical tests • The statistically correct solution is split-plot designs, a different regression model fitted to the data and usu- ally another way (REML) to compute the model

63

31 z1 x1 x2 e.g. a split-plot-Box-Behnken de- -1 -1 0 sign (1 + 2 variables): -1 1 0 Block 1 • the order of the blocks is first -1 0 -1 randomized -1 0 1 1 -1 0 • after that, the orders of the 1 1 0 experiments in the blocks are Block 2 randomized 1 0 -1 1 0 1 • the value of the z1 variable in 0 -1 -1 each blocks is constant and its 0 1 -1 value will not be reset inside Block 3 the blocks (a HTC variable) 0 -1 1 0 1 1 • the values of the variables x1 0 0 0 and x must be reset before 2 0 0 0 each experiment (ETC vari- Block 4 ables) 0 0 0 0 0 0

64

• The regression models obtained when using split-plot designs contain both fixed and random variables. • The values of the fixed variables are assumed to be measured without an error. Their values in one study can be assumed to be equal to those in another study performed similarly. • The values of the random variables are drawn from a larger population of values which they represent as a random sample. They so have a measurement error. • The ability to recognize the two types of variables for the model computation is critical when applying the split-plot designs. Making the difference is not always easy or even unambiguous. The results of the compu- tation are greatly dependent on the choices made.

65

32 So, split-plot regression models are mixed effects models. Such models contain two error terms  and  instead of the one error term  in a common fixed effects model: y = XB +  fixed effects model y = XB +  +  mixed effects model The error term  in the second model is an estimate of the whole plot (WP) error and the error term  is an estimate of the subplot (SP) error in the data. Their values must be es- timated separately for the validity of the regression analysis. The split-plot experimental design consists of the WP and SP parts, correspondingly. The HTC variables belong to the WP part and the ETC variables belong to the SP part of the design.

66

5. MNEMONICS

1. All models are incorrect but some models are useful. That is especially true in the case of the data-driven models. 2. No analysis or computation method is able to com- pensate problems arising from poor upfront plan- ning of the experiments (”rubbish in, rubbish out”)! 3. The importance of careful work in the experimenta- tion cannot be overemphasized. Small statistical experimental designs are sensitive to errors in ex- periments because every experiment estimates the effect of more than one predicting variable on the response variable(s).

67

33 4. Pimping up a rationalized experimental design by add- ing new design points in it e.g. in the hope of better mapping some range of predicting variables is quite a bit questionable. The performance of the design prob- ably just worsens because of the loss of such properties as orthogonality, rotatability or optimality of the design. 5. Statistical or chemometrical methods never prove that some certain predicting variable has a certain effect in some application. • But they serve reliable ways to process informa- tion and analyze results • and produce facts about the levels of uncertainty related to decision making.

68

34