1 Useful Background Information in This Section of the Notes, Various

1 Useful Background Information In this section of the notes, various definitions and results from calculus, linear/matrix algebra, and least-squares regression will be summarized. I will refer to these items at various times during the semester. 1.1 Taylor Series (k) th 1. Let η (x) denote the k derivative of function η(x). For function η and x0 in some interval I, define (x − x )2 (x − x )n P (x; x ) = η(x ) + η(1)(x )(x − x ) + η(2)(x ) 0 + ··· + η(n)(x ) 0 n 0 0 0 0 0 2! 0 n! (x − c)n+1 R (x; c) = η(n+1)(c): n (n + 1)! Then, there exists some number z between x and x0 such that η(x) = Pn(x; x0) + Rn(x; z) 2. Taylor Series for functions of one variable: If η is a function that has derivatives of all orders throughout an interval I containing x0 and if lim Rn(x; x0) = 0 for every x0 in I, then n!1 η(x) can be represented by the Taylor series about x0 for any x0 in I. That is, 1 k X (x − x0) η(x) = η(x ) + η(k)(x ) 0 k! 0 k=1 th 3. Note that Pn(x; x0) is a polynomial of degree n. Thus, Pn(x; x0) is an n -order Taylor series approximation of η(x) because Rn(x; x0) vanishes as n increases. 4. Practically, this means that even if the true form of η(x) is unknown, we can use a polynomial f(x) = Pn(x; x0) to approximate it with the approximation improving as n increases. 5. In statistics, we may fit a linear model 2 n f(x) = β0 + β1x + β2x + ··· + βnx : What we are actually doing is fitting (1) (2) 2 (n) n f(x) = Pn(x; 0) = η(0) + η (0)x + η (0)x + ··· + η (0)x (i) i where β0 = η0 and βi = η (0)x for i = 1; 2; : : : ; n and we assume the remainder Rn(x; 0) is negligible. 6. Taylor series can be generalized to higher dimensions. I will only review the 2-dimensional case. @ηn 7. For function η(x; y) let be the nth-order partial derivative with derivation taken k @xk@yn−k times with respect to x and (n − k) times with respect to y. 8. If η is a function of (x; y) that has partial derivatives of all orders inside a ball B containing p0 and if lim Rn(p; p0) = 0 for every p0 in B, then η(p) can be represented by the 2-variable n!1 Taylor series about p0 for any p0 in B. 4 9. For function η(x; y) and p0 = (x0; y0) in some open ball B containing p0, define p = (x; y) and (x − x0) @η (y − y0) @η Pn(p; p ) = η(p ) + + 0 0 1! @x 1! @y p0 p0 2 2 2 2 2 (x − x0) @ η (x − x0)(y − y0) @ η (y − y0) @ η + + + 2! @x2 1!1! @x@y 2! @y2 p0 p0 p0 + ··· n−1 k (n−1−k) k X (x − x0) (y − y0) @ η + k!(n − 1 − k)! @kx@(n−1−k)y k=0 p0 n k (n−k) k X (x − x0) (y − y0) @ η + k!(n − k)! @kx@(n−k)y k=0 p0 n+1 k (n+1−k) k ∗ X (x − x0) (y − y0) @ η Rn(p; p ) = k (n−k) k!(n + 1 − k)! @ x@ y ∗ k=0 p ∗ where p is a point on the line segment joining p and p0. 10. Taylor Series for functions of two variables: There exists some point pz on the line segment joining p and p0 such that η(p) = Pn(p; p0) + Rn(p; pz) 11. Note that Pn(p; p0) is a polynomial of degree n in variables x and y. Thus, Pn(p; p0) is an th n -order Taylor series approximation of η(p) because Rn(p; p0) vanishes as n increases. 12. Practically, this means that even if the true form of η(p) is unknown, we can use a polynomial f(p) = Pn(p; p0) to approximate it with the approximation improving as n increases. 13. In statistics, we may fit a linear model n n−i X X i j f(x; y) = βi;jx y i=0 j=0 What we are actually doing is fitting f(x) = Pn(p; (0; 0)) where β0;0 = η(0; 0) and βi;j = (i+j) i j η (0; 0)x y for i + j = 1; 2; : : : ; n, and we assume the remainder Rn(p; (0; 0)) is negligible. @2f @2f @2f 14. On the following page: f = f = f = : 12 @x@y 11 @x2 22 @y2 @2f 2 @2f @2f Thus, ∆ = − : @x@y @x2 @y2 5 66 1.2 Matrix Theory Terminology and Useful Results 2 3 x11 x12 x13 ··· x1k 6 x21 x22 x23 ··· x2k 7 6 7 0 15. If x = 6 x31 x32 x33 ··· x3k 7 then the symmetric matrix X X can be written as 6 7 4 ····· 5 xn1 xn2 xn3 ··· xnk 2 Pn 2 Pn Pn Pn 3 p=1 xp1 p=1 xp1xp2 p=1 xp1xp3 ··· p=1 xp1xpk Pn 2 Pn Pn 6 p=1 xp2 p=1 xp2xp3 ··· p=1 xp2xpk 7 0 6 Pn 2 Pn 7 X X = 6 symmetric x ··· xp3xpk 7 6 p=1 p3 p=1 7 4 ······ 5 Pn 2 p=1 xpk 16. Transpose of a product of two matrices: (AB)0 = B0A0: 0 0 0 0 0 17. Transpose of a product of k matrices: If B = A1A2 ··· Ak−1Ak then B = AkAk−1 ··· A2A1: 18. The trace of a square matrix A, denoted tr(A), is the sum of the diagonal elements of A. 19. For two k-square matrices A and B, tr(A ± B) = tr(A) ± tr(B). 20. Given an m × n matrix A and an n × m matrix B, then tr(AB) = tr(BA). 21. The rank of a matrix A, denoted rank(A), is the number of linearly independent rows (or columns) of A. 22. If the determinant is nonzero for at least one matrix formed from r rows and r columns of matrix A but no matrix formed from r + 1 rows and r + 1 columns of A has nonzero determinant, then the rank of A is r. 23. Consider a k-square matrix A with rank(A) = k. The k-square matrix A−1 where AA−1 = −1 A A = Ik is called the inverse matrix of A. 24. A k-square matrix A is singular if A is not invertible. This is equivalent to saying jAj = 0 or rank(A) < k. 25. Any nonsingular square matrix (i.e., its determinant =6 0) will have a unique inverse. 26. In the use of least squares as an estimation procedure, it is often required to invert matrices which are symmetric. The inverse matrix is also important as a means of solving sets of simultaneous independent linear equations. If the set of equations is not independent, there is no unique solution. 27. The set of k linearly independent equations a11x1 + a12x2 + ··· + a1kxk = g1 a21x1 + a22x2 + ··· + a2kxk = g2 ········· ak1x1 + ak2x2 + ··· + akkxk = gk can be written in matrix form as Ax = g: Thus, the solution is x = A−1g: 7 28. If A = diag(a1; a2; ··· ; ak) is a diagonal matrix with nonzero diagonal elements −1 a1; a2; ··· ; ak, then A = diag(1=a1; 1=a2; ··· ; 1=ak) is a diagonal matrix with diagonal elements 1=a1; 1=a2; ··· ; 1=ak. 29. If S is a nonsingular symmetric matrix, then (S−1)0 = S−1: Thus, the inverse of a nonsingular symmetric matrix is itself symmetric. 30. A square matrix A is idempotent if A2 = A. 0 −1 0 31. A nonsingular k-square matrix P is orthogonal if P = P ; or equivalently, PP = Ik: 32. Suppose P is a k-square orthogonal matrix, x is a k × 1 vector, and y = P x is a k × 1 vector. The transformation y = P x is called an orthogonal transformation. 33. If y = P x is an orthogonal transformation then y0y = x0P 0P x = x0x. 1.3 Eigenvalues, Eigenvectors, and Quadratic Forms 34. If A is a k-square matrix and λ is a scalar variable, then A − λIk is called the characteristic matrix of A. 35. The determinant jA − λIkj = h(λ) is called the characteristic function of A. 36. The roots of the equation h(λ) = 0 are called the characteristic roots or eigenvalues of A. 37. Suppose λ∗ is an eigenvalue of a k-square matrix A, then an eigenvector associated with λ∗ ∗ ∗ is defined as a column vector x which is a solution to Ax = λ x or (A − λ Ik)x = 0: 38. An important use of eigenvalues and eigenvectors in response surface methodology is in the application to problems of finding optimum experimental conditions. 39. The quadratic form in k variables x1; x2; : : : ; xk is k X 2 XX Q = biixi + 2 bijxixj (1) i=1 i<j where we assume the elements bij (i = 1; : : : ; k j = 1; : : : ; k) are real-valued. 40. In matrix notation: Q = x0Bx where 2 3 2 3 x1 b11 b12 ··· b1k 6 x2 7 6 b22 ··· b2k 7 x = 6 7 B = 6 7 4 ··· 5 4 ······ 5 xk symmetric bkk 41. B and jBj are, respectively, called the matrix and determinant of the quadratic form Q.

1 Useful Background Information in This Section of the Notes, Various

Quadratic Forms and Automorphic Forms

Lecture 13: Simple Linear Regression in Matrix Format

Introducing the Game Design Matrix: a Step-By-Step Process for Creating Serious Games

Stat 5102 Notes: Regression

Uncertainty of the Design and Covariance Matrices in Linear Statistical Model*

The Concept of a Generalized Inverse for Matrices Was Introduced by Moore(1920)

QUADRATIC FORMS and DEFINITE MATRICES 1.1. Definition of A

Quadratic Form - Wikipedia, the Free Encyclopedia

Kriging Models for Linear Networks and Non‐Euclidean Distances

Ellipse Axes, Eccentricity, and Direction of Rotation

Linear Regression with Shuffled Data: Statistical and Computational

Week 7: Multiple Regression