Algebraic Statistics in Design of Experiments

Algebraic statistics in Design of Experiments Maria Piera Rogantin DIMA – Universitàdi Genova – [email protected] Torino, September 2004 3. Indicator function and orthogo- Table of contents nality (a) Orthogonality 1. General aspects (b) Complex coding for full factorial (a) What is the Design of experiments? designs (b) Design & Ideal of the design (c) Indicator function of a fraction (c) The full factorial design (d) Results about orthogonality 2. Fractions and confounding (e) Generation of fractions with a given orthogonal structure (a) Fraction of a full factorial design (f) Regular fractions (b) Space of the responses on the fraction 4. Models on a fraction and term- orders (c) Identifiability of a model (a) Identifiable sub-models and initial (d) Fractions and identifiable linear orders models (b) Model curvature (e) Confounding of subspaces (c) Block term-order and factor screening 1 PART I: General aspects Pistone G., Wynn H. P.(1996). Generalized confounding with Gröbner bases, Biometrika, 83(1): 653–666. Robbiano L. (1998). Gröbner Bases and Statistics, Gröbner Bases and Applications (Proc. of the Conf. 33 Years of Gröbner Bases), Buchberg- er, B. & Winkler, F. ed., Cambridge University Press, 179–204. Pistone G., Riccomagno E. and Wynn H. P. (2001). Algebraic Statistics: Computational Commutative Algebra in Statistics, Chapman&Hall. 2 1.1 What is the Design of experiments? ∗ An example: the factorial design Push-off force of a spark control valve. (Wu & Hamada. 2000. Experiments. Whiley & Son, p.247) Influence of three factors on the response: - weld time (0.3 - 0.5 - 0.7 seconds) - pressure (15 - 20 - 25 psi) - moisture (0.8 - 1.2 - 1.8 percent) Each factor has three levels. We consider them as ordinal levels: low - medium - high; The levels are coded by integer numbers. full factorial design Each treatment corresponds to a point in Z3. ∗Full Design 3 In particular, the experiment is realized on fewer treatments ∗ (because of problems with costs, times, practical constraints, setting the factors and measuring the responses . ) How to choose the treatments? Which fraction for “the best” study of the responses? time pressure moisture -1 -1 -1 -1 0 0 -1 1 1 0 -1 0 0 0 1 0 1 -1 1 -1 1 1 0 -1 1 1 0 fractional factorial design or fraction ∗Fraction 4 Design and functions on the design∗ Response Factors Push-off force (lb) Time Pressure Moisture 111.1 -1 -1 -1 131.0 -1 0 0 65.4 -1 1 1 125.5 0 -1 0 46.9 0 0 1 113.7 0 1 -1 72.5 1 -1 1 141.1 1 0 -1 134.2 1 1 0 Linear influence of the factors and their interactions: Force = θ0 + θ1 T + θ2 P + θ3 M + θ12 T · P + θ13 T · M + ... the θ’s signify the “importance” of the terms w.r.t. the response. ∗General design 5 Full factorial designs and fractions: notations • Ai = {aij : j = 1,...,ni} factors m m aij levels coded by rational numbers Q or complex numbers C m m m • D = A1 × . × Am ⊂ Q (or D ⊂ C ) with N = j=1 nj points full factorial design Q • A fraction is a subset F ⊂D; 6 Responses on a design f : D 7→ R (functions defined on D) ∗ “Design” indicates either “Full factorial design” or “Fractional factorial design” or ... • Xi : D ∋ (d1,...,dm) 7→ di projection, frequently called factor α α1 αm • X = X1 · · · Xm , αi <ni, i = 1,...,m α = (α1,...,αm) monomial responses or terms or interactions The term Xα has order k if in α there are k non-null values: Xα is an interaction of order k (binary case: order = degree) Definitions: 1 • Mean value of f on D, ED(f): ED(f) = #D d∈D f(d) P • A contrast is a response f such that ED(f) = 0. • Two responses f and g are orthogonal on D if ED(f g) = 0. ∗General design 7 The polynomial complete regression model∗ • L = {(α1,...,αm) : αi <ni, i =1,...,m} exponents (or logarithms) of all the interactions • complete regression model: For all di ∈ D and for the observed value yi α yi = θα X (di) αX∈L whit θα ∈ R or θα ∈ C. In vector notation, Y =(y1,...,yn) response measured on the points of D: α Y = θα X αX∈L α • Z = [X (d)]d∈D,α∈L matrix of the complete regression model • θ =(θα)α∈L vector of the coefficients Matrix notation of the complete regression model Y = Zθ ∗General design 8 On the full factorial design: the complete regression model is identifiable, i.e. there is a unique solution w.r.t. θ: θˆ = Z−1 Y (the matrix Z is a square full rank matrix) On a fraction: there is not a unique solution w.r.t. θ α (Z = [X (d)]d∈F,α∈L has less rows than columns) θˆ = (Z′Z)−Z′ Y 9 In general, for linear regression models of the form: Y = W θ + R where W is a matrix of “explicative variables” and R is a vector of residuals Even if θˆ = (W ′W )−W ′ Y is not unique, the approximation of the response Y through the “explicative variables” is always unique: Yˆ = W (W ′W )−W ′ Y Y R Yˆ is the orthogonal projection of Y in the linear space Y=^ W ^θ = W (W'W)- W' Y = P Y generate by the columns of W . Wθ Moreover, if Y is a multivariate random variable with normal distribution Y ∼ N (W θ, σ2I) then θˆ = (W ′W )−W ′ Y is a solution of maximum likelihood. 10 Example: 2 × 3 full factorial design∗ -1 -1 -1 0 A = {−1, 1}, n =2 -1 1 1 1 D = A2 = {−1, 0, 1}, n2 =3 1 -1 1 0 1 1 2 2 monomial responses: 1, X1, X2,X2 , X1X2,X1X2 L = {(0, 0), (1, 0), (0, 1), (0, 2), (1, 1), (1, 2)} complete regression model: 2 2 Y = θ00 + θ10X1 + θ01X2 + θ02X2 + θ11X1X2 + θ12X1X2 2 2 1 X1 X2 X2 X1X2 X1X2 1 −1 −11 1 −1 1 −100 0 0 matrix of the complete regression model Z = 1 −1 1 1 −1 −1 1 1 −1 1 −1 1 1100 0 0 1111 1 1 ∗Full Design 12 1.2 Design and design ideal∗ The application of computational commutative algebra to the study of estimability, confounding on the fractions of factorial designs has been proposed by Pistone & Wynn (Biometrika 1996). 1st idea Each set of points D ⊆ Qm is the set of the solutions of a system of polynomial equations; we assume that each solution is simple. 2nd idea Each real function defined on D is a polynomial function with coefficients into the field of real number R. It is a restriction to D of a real polynomial. ∗General Design 13 A design D is a finite set of distinct points in a m-dimensional field km ∗ The defining ideal of the design (or design ideal) I(D) is the set of all polynomials on k[x1, . , xm] that vanish on D. • The design D is a variety • The design ideal I(D) is radical • I(D) is the intersection of the design ideals of each point of D ∗General design 14 Operations with designs & Operations with ideals∗ • Product of designs. m m D1 ⊂ k 1 D2 ⊂ k 2 I (D1 ×D2) =< I1, I2 > m +m D1 ×D2 ⊂ k 1 2 τ term ordering on k[x1, . , xm1+m2] τ1, τ2 t.o. restricted to k[x1, . , xm1] and k[x1, . , xm2] G1,τ1, G2,τ2, G-bases of I (D1), I (D2) G = g1, g2 | g1 ∈ G1,τ1 g2 ∈ G2,τ2 n o is G-basis of I (D1 ×D2) ∗General design 15 • Restriction of a design∗ m D ⊂ k I + J ideal in k[x1, . , xm] I = I(D) I + J = {f + g | f ∈ I g ∈ J} J ideal in k[x1, . , xm] Variety(I + J) = D∩ Variety(J) ∗General design 16 • Union of designs∗ m D1, D2 ⊂ k τ term ordering on k[x1, . , xm], m D1 ∪D2 ⊂ k G1, G2 G-bases of I (D1), I (D2) A G-basis of I (D1 ∪D2) is G = {g1g2 | g1 ∈ G1 g2 ∈ G22} ∗General design 17 1.3 The full factorial design.∗ Polynomial representation The full factorial design D corresponds to the set of solutions of the system 1 n1 = n1− k (X1 − a11) ··· (X1 − a1n1 ) = 0 X1 k=0 ψ1k X1 . (X2 − a21) ··· (X2 − a2n2 ) = 0 . . or . P . . n −1 (Xm − am1) ··· (Xm − amn ) = 0 nm m k m Xm = k=0 ψmk Xm P In the previous examples: 2 (X1 − 1)(X1 +1) = 0 X1 = 1 rewriting or 3 X2 (X2 − 1)(X2 +1) = 0 X = X2 rules 2 3 X1 (X1 − 1)(X1 +1) = 0 X1 = X1 rewriting 3 X2 (X2 − 1)(X2 +1) = 0 or X2 = X2 rules ( 1)( +1) = 0 3 = X3 X3 − X3 X3 X3 ∗Full Design 18 Space of the real functions on the full factorial design R(D)∗ For a full factorial design each function is represented in a unique way by an identified complete regression model (i.e. as a linear combination of constant, simple terms and interactions): R(D) = θ Xα , θ ∈ R α α αX∈L In general, for full factorial designs or fraction: • R(D) is a Hilbert vector space (classical results derive from this structure) 1 The scalar product is: <f,g >= ED(f g) = N d∈D f(d)g(d) P • R(D) is a ring (algebraic statistical approach) The products are reduced with the rules derived by the polynomial representation of the full factorial design: ni−1 ni k Xi = ψik Xi , ψik ∈ R for i = 1, .

Algebraic Statistics in Design of Experiments

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support