P1: KpB MABK012-BOOK MABK012/Bayer Trim Size: 7.5in× 10.5in July 15, 2010 16:24
12 Affine Transformations
Chaotic features of the World erase And you will see its Beauty. — Alexander A. Block (1880–1921)1
12.1 Introduction Suppose we are struggling with a geometric problem concerning an arbitrary triangle or an arbitrary parallelogram. How often we would wish for the triangle to be an equilateral or 45◦ − 90◦ − 45◦ triangle, or for the parallelogram to be a square! The solution is so easy in these cases. But we know that these would be just very particular instances of the problem. Solving them will make us feel better, but not much better. Well, the good news is that for some problems, solving just a particular instance turns out to be sufficient to claim that the problem is solved in complete generality! In this chapter we learn how to recognize some of these problems, and we justify such an approach. We start by reviewing some familiar concepts. Let A and B be sets. A function or mapping f from A to B, denoted f : A → B, is a set of ordered pairs (a,b), where a ∈ A and b ∈ B, with the following property: for every a ∈ A there exists a unique b ∈ B such that (a,b) ∈ f . The fact that (a,b) ∈ f is usually denoted by f (a) = b, and we say that f maps a to b. Another way to denote that f maps a to b is f : a → b; if it is clear which function is being discussed, we will often just write a → b. We also say that b is the image of a (in f ), and that a is a preimage of b (in f ). The set A is called the domain of f and the set B is the codomain of f .Thesetf (A) ={f (a):a ∈ A} is a subset of B, called the range of f . A function f : A → B is surjective (or onto)iff (A) = B; that is, f is surjective if every element of B is the image of at least one element of A. A function f : A → B is injective (or one-to-one) if each element in the range of f is the image of exactly one element of A; that is, f is injective if f (x) = f (y) implies x = y. A function f : A → B is bijective if it is both surjective and injective.
1 Translated from the Russian by Vera Zubareva.
251 P1: KpB MABK012-BOOK MABK012/Bayer Trim Size: 7.5in× 10.5in July 15, 2010 16:24
252 12 Affine Transformations
A B A B A B f g h
(i) f is injective (ii) g is surjective (iii) h is bijective
FIGURE 12.1.
If f : A → B and g : B → C are functions, then the composition of f and g, denoted g ◦ f ,is a function from A to C such that (g ◦ f )(a) = g(f (a)) for any a ∈ A. The proof of Theorem 12.1 is left to the reader and can be found in many texts.
Theorem 12.1. A composition of two bijections is a bijection.
If f : A → B, then f −1 : B → A is the inverse of f if (f −1 ◦ f )(a) = a for any a ∈ A and (f ◦ f −1)(b) = b for any b ∈ B. A function f has an inverse if and only if f is a bijection. Let E2 denote the Euclidean plane. Introducing a coordinate system2 OXY on E2, we can identify every point P with the ordered pair of its coordinates (x ,y ); alternatively, P can be identified −→ P P = 3 with its position vector, OP xP ,yP . The collection of all such vectors form a vector space, namely R2.Ifx represents the vector with initial point at the origin and terminal point at (x ,y ), −→ P P then OP, xP ,yP , and x can also be used to denote x. A transformation of a set is a bijection of the set to itself. It is easy to see that any transformation E2 → E2 ˜ R2 → R2 ˜ = f : corresponds to a bijection f : , in that f ( xP ,yP ) xP ,yP whenever f (P ) = P . Since f and f˜ uniquely define one another within a fixed coordinate system, we will also refer to f˜ as a transformation of the plane, and we will write f to denote either a mapping of E2 to E2 or a mapping of R2 to R2. It will be clear from the context which of the two mappings f represents. −→ E2 Just as any point P in OXY corresponds−→ to a unique vector OP, each figure ϕ in uniquely corresponds to a set of vectors OP of R2, where P ∈ ϕ. We say that this set of vectors is a R2 { ∈ ⊆ E2} figure−→ in −→, and we denote it again by ϕ.Thesetf (ϕ) is defined as f (P ):P ϕ ,or {f (OP):OP ∈ ϕ ⊆ R2}. It is not hard to make the relationship between point spaces and vector spaces more precise, but we will not do it here.4 In fact, we freely interchange the representations of point and vector, (x,y) and x,y, when they are domain elements of a function f . Transformations of the plane and their application to solving geometry problems form the focus of this chapter. The transformations we study will be of two types, illustrated by the following examples:
f (x,y) =2x − 3y,x + y and g(x,y) =2x − 3y + 1,x+ y − 4.
←→ ←→ 2 Recall that OXY denotes a coordinate system (not necessarily Cartesian) with axes OX and OY. 3 Students who have studied some linear algebra may recall that a vector space is a collection of objects on which an “addition” operation may be performed in such a way that nice properties like commutativity and the existence of additive inverses hold, but a precise definition of vector space is not necessary in order to continue reading. 4 See, for example, [34], [50], or [65] for rigorous expositions. P1: KpB MABK012-BOOK MABK012/Bayer Trim Size: 7.5in× 10.5in July 15, 2010 16:24
12.2 Matrices 253
At this point it is not obvious that f and g are bijections, but this will be verified later in the chapter. To get a more concrete sense of what f and g do, consider how they “transform” the vectors 0, 0, 0, 1, 1, 0, and 1, 1.
x f (x) g(x) 0, 0 0, 0 1, −4 0, 1 −3, 1 −2, −3 1, 0 2, 1 3, −3 1, 1 −1, 2 0, −2
Notice that the origin, 0, is fixed under f , while g(0, 0) =1, −4. Notice also that f (0, 1+ 1, 0) = f (0, 1) + f (1, 0); again, this is not true of g. These properties of f are indicative of the linearity of that mapping. A function T : R2 → R2 is called linear if T (x +y) = T (x) + T (y) for any vectors x and y, and T (kx) = kT (x) for any vector x and scalar k. The reader can verify that these properties hold for f but not for g. As will be shown later in this chapter, both f and g map a line segment to a line segment. Therefore, knowing where f and g map the points corresponding to the vectors 0, 0, 0, 1, 1, 1, and 1, 0 is sufficient for determining the image of the unit square, S, having vertices at these four points. Figure 12.2 shows S together with f (S) and g(S). Notice that both f (S) and g(S)are parallelograms; Theorem 12.7 will prove that this is not a coincidence.
12.2 Matrices Transformations of E2 or R2 are often studied via another type of mathematical object, the matrix. Though the benefits of using the language of matrices are not striking when we study E2, matrices
f ((1,1))
f ((1,0)) f f ((0,1))
f (O) (0,1) (1,1)
O (1,0)
g ((1,1)) g
g ((0,1)) g ((1,0))
g (O)
FIGURE 12.2. P1: KpB MABK012-BOOK MABK012/Bayer Trim Size: 7.5in× 10.5in July 15, 2010 16:24
254 12 Affine Transformations
turn out to be very convenient when generalizing geometric notions of the plane to spaces of higher dimensions.5 An m × n matrix A is a rectangular array of real numbers, ⎡ ⎤ a a ... a ⎢ 11 12 1n ⎥ ⎢ a a ... a ⎥ ⎢ 21 22 2n ⎥ ⎢ ⎥ = ⎢ .. .⎥ A ⎢ ⎥ . ⎢ .. .⎥ ⎣ .. .⎦ am1 am2 ... amn
th th The entry in the i row and the j column is denoted aij , and we often write A = [aij ]. Two = = matrices A [aij ] and A [aij ] are called equal if they have an equal number of rows, an equal = × number of columns, and aij aij for all i and j. When the matrix is n n, so that there are an equal number of rows and columns, the matrix is called a square matrix. Notice that a vector v =v1,v2 × can be thought of as the 1 2matrix[v1 v2], called a “row vector.” It can also be thought of as a v “column vector” by writing v as the 2 × 1matrix 1 . v2 If A = [aij ] and B = [bij ] are both m × n matrices, then the sum A + B is the m × n matrix C = [cij ] in which cij = aij + bij .IfA = [aij ]isanm × n matrix and c ∈ R, then the scalar multiple of A by c is the m × n matrix cA = [caij ]. (That is, cA is obtained by multiplying each entry of A by c.) = × = The product AB of two matrices is defined when A [aij ]isanm n matrix and B [bij ]is × = = n × an n p matrix. Then AB [cij ], where cij k=1 aikbkj . For example, if A is a 2 2 matrix, and B is a 2 × 1 matrix, then a a b a b + a b AB = 11 12 11 = 11 11 12 21 . a21 a22 b21 a21b11 + a22b21
We say that here we multiply A by a (column) vector. Notice that BA is not defined in this case. If A and B are both 2 × 2 matrices, a a b b a b + a b a b + a b AB = 11 12 11 12 = 11 11 12 21 11 12 12 22 . a21 a22 b21 b22 a21b11 + a22b21 a21b12 + a22b22
Although BA is defined in this case, in general BA is not equal to AB. So matrix multiplication is not commutative. These two instances of matrix multiplication (when A is a 2 × 2 matrix and B is a 2 × 1ora2× 2 matrix) are the only ones we will need in this book. In what follows, no matter whether x is a 1 × 2 vector or 2 × 1 vector, when it is used in the expression Ax, it is always understood as a column vector, i.e., as a 2 × 1 matrix. Theorem 12.2 summarizes some of the most useful properties of matrix operations. Its proof can easily be produced by the reader (part (4) is the most difficult) or may be found in a standard linear algebra text.
5 Here, when we say “language,” we mean the objects, their notation, operations on the objects, and properties of those operations – similar to the “languages” of trigonometry, algebra, logic, and calculus. P1: KpB MABK012-BOOK MABK012/Bayer Trim Size: 7.5in× 10.5in July 15, 2010 16:24
12.2 Matrices 255
Theorem 12.2. (1) If A and B are m × n matrices, then A + B = B + A. (2) If A, B, and C are m × n matrices, then A + (B + C) = (A + B) + C. (3) Given an m × n matrix A, there exists a unique m × n matrix B such that A + B = B + A is the zero matrix (that is, the matrix with 0 in every entry). (4) If A is an m × n matrix, B is an n × p matrix, and C is a p × q matrix, then A(BC) = (AB)C. (5) If A and B are m × n matrices, C is an n × p matrix, and D is a q × m matrix, then (A + B)C = AC + BC and D(A + B) = DA + DB. (6) If r, s ∈ R, A is an m × n matrix, and B is an n × p matrix, then (a) r(sA) = (rs)A = s(rA), and (b) A(rB) = r(AB). (7) If r, s ∈ R, and A and B are m × n matrices, then (a) (r + s)A = rA + sA, and (b) r(A + B) = rA + rB.
Using the notation of matrices, we can represent the functions f (x,y) =2x − 3y,x + y and g(x,y) =2x − 3y + 1,x+ y − 4 x using matrix multiplication as follows. First, let x = , and let y 2 −3 A = . 11
Then 2 −3 x f (x) = Ax = . 11 y One way to think about the matrix A corresponding to the transformation f is that the columns of A specify the images of the vectors i =1, 0 and j =0 , 1. Using matrix multiplication, 2 −3 1 2 2 −3 0 −3 we see that Ai = = , and Aj = = , as illustrated in 11 0 1 11 1 1 Figure 12.3. 1 If we let b = , then the same 2 × 2matrixA gives −4 2 −3 x 1 g(x) = Ax + b = + 11 y −4 2x − 3y 1 2x − 3y + 1 = + = . x + y −4 x + y − 4 P1: KpB MABK012-BOOK MABK012/Bayer Trim Size: 7.5in× 10.5in July 15, 2010 16:24
256 12 Affine Transformations
Aj A i f
j
i
g
A i + b
A j + b
FIGURE 12.3.