Part I
Linear Algebra
4 Chapter 1
Vectors
1.1 Vectors Basics
1.1.1 Definition
Vectors
• A vector is a collection of n real numbers, x1, x2, ..., xn, as a single point in a n-dimensional space arranged in a column or a row, and can be thought of as point in space, or as providing a direction.
• Each number xi is called a component or element of the vector. • The inner product define the length of a vector, as well as generalize the notion of angle between two vectors.
• Via the inner product, we can view a vector as a linear function. We can also compute the projection of a vector onto a line defined by another.
5 • We usually write vectors in column format: x1 x2 x = . . xm
• Geometry. A vector represents both a direction from the origin and a point in the multi-dimensional space Rn, where each component corresponds to coor- dinate of the point. • Transpose. If x is a column vector, xT (transpose) denotes the corresponding T row vector, and vice-versa. x = x1, ...xn .
1.1.2 Independence
n • A set of m vectors x1, ..., xm in R is said to be independent if no vector in the set can be expressed as a linear combination of the others. This means that the condition m m X λ ∈ R : λixi = 0 i=1 implies λ = 0. • If two vectors are linear independent, then they cannot be a scaled version of each other. • Example. The vectors x1 = [1, 2, 3] and x2 = [3, 6, 9] are not independent, since 3x1 − x2 = 0, x2 is a scaled version of x1.
1.1.3 Subspace, Span, Affine Sets
Subspace and Span
• An nonempty subspace, V, of Rn is a subset that is closed under addition and scalar multiplication. That is, for any scalars α, β x, y ∈ V → αx + βy ∈ V
6 • A subspace always contains the zero element.
• Geometrically, subspaces are "flat" (like a line or plane in 3D) and pass through the origin.
n • A subspace S can always be represented as the span of a set of vectors xi in R , that is, a set of the form m X m S = span x1, ..., xm := λixi : λ ∈ R i=0
• The set of all possible linear combinations of the vectors in S = {x(1), ··· , x(m)} forms a subspace, which is called the subspace generated by S, or the span of S, denoted by span(S).
Direct Sum
• Given two subspaces X , Y ∈ Rn, the direct sum of X , Y, denoted X ⊕ Y, is the set of vectors of the form x + y, with x ∈ X , y ∈ Y.
• X ⊕ Y is itself a subspace.
Affine Sets
• An affine set is a translation of a subspace. It is "flat" but does not necessarily pass through 0, as a subspace would. (Like a line or a plane that does not go through the origin.)
• Affine set A can always be represented as the translation (a constant term) of the subspace spanned by some vectors: m X m A = x0 + λixi : λ ∈ R = x0 + S i=1
where x0 is a given point and S is a given subspace. Affine is linear plus a constant term.
7 • Subspaces (or sometimes called linear subspaces) are just affine spaces contain- ing the origin.
• Line. When S is the span of a single non-zero vector (1 dimension), the set A is called a line passing through the point x0. u is the direction of the line, t is the magnitude and x0 is a point through which it passes. A = x0 + tu : t ∈ R
1.1.4 Basis and Dimension
Basis
• A basis of Rn is a set of n independent (irreducible) vectors.
• If the vectors u1, ··· , un form a basis, we can express any vector as a linear Pn combination of ui, x = i=1 λiui for appropriate numbers λ1, ··· , λn.
n • Standard basis. Standard basis (natural basis) in R consists of the vector ei, where the i-th element is 1 and the rest are 0. 1 0 0 3 e1 = 0 , e2 = 1 , e3 = 0 ∈ R 0 0 1
Basis of a Subspace
• The basis of a given subspace S ⊆ Rn is any independent set of vectors whose span is S.
• If vectors (u1, ··· , ur) form a basis of S, we can express any vector in the Pr subspace S as a linear combination of (u1, ··· , ur), x = i=1 λiui. • Dimension. The number of vectors in the basis is independent of the choice of the basis. We will always find a fixed minimum number of independent (ir- reducible) vectors for the subspace S. This minimum number is called the di- mension of S.
8 • Example. In R3, you need 2 independent vectors to describe a plane contain- ing the origin. (dimension of 2). The dimension of a line is 1, since a line is x0 + span(x1) for non-zero x1.
Dimension of an Affine Subspace
• The set L in R3,
x1 − 13x2 + 4x3 = 2
3x2 − x3 = 9
is an affine subspace of dimension 1. The linear subspace can be obtained by setting the constant term to 0,
x1 − 13x2 + 4x3 = 0
3x2 − x3 = 0
• Solve for x3 and we get x1 = x2, x3 = 3x2. The representation of linear subspace x ∈ R3: 1 x = 1 t, for scalar t = 2 3
• The linear subspace is the span of u = (1, 1, 3) of dimension 1. We can find a particular solution x0 = (38, 0, −9) and the affine subspace L is thus the line x0 + span(u).
1.2 Orthogonality and Orthogonal Complements
1.2.1 Orthogonal Vectors
• Orthogonal. Two vectors x, y in an inner product space X are orthogonal, denoted x ⊥ y, if hx, yi = 0.
9 • Mutually orthogonal. Nonzeros vectors x(1), x(2), ··· , x(d) are said to be mutually orthogonal if hx(i), x(j)i = 0 whenever i 6= j. In other words, each vector is orthogonal to all other vectors in the collection.
• Mutually orthogonal vectors are linearly independent but linearly independent vectors are not necessary mutually orthogonal.
1.2.2 Orthogonal Complement
• Orthogonal complement. A vector x ∈ X is orthogonal to a subset S of an inner product space X if x ⊥ s, ∀s ∈ S. The set of vectors in X that are orthogonal to S is called the orthogonal complement of S, denoted as S⊥.
• Direct sum and orthogonal decomposition. If X is a subspace of an inner product space X , then any vector x ∈ X can be written in a unique way of the sum of one element in S and one in the orthogonal complement S⊥.
X = S ⊕ S⊥, for any subspace S ⊆ X x = y + z, x ∈ X , y ∈ S, z ∈ S⊥
• Fundamental properties of inner product spaces. Let x, z be any two ele- ments of a inner product space X , let kxk = phx, xi, and let α be a scalar. Then:
– |hx, zi| ≤ kxkkzk, and equality holds iff x = αz, or z = 0 (Cauchy- Schwartz). – kx + zk2 + kx − zk2 = 2kxk2 + 2kzk2 (parallelogram law) – if x ⊥ z, then kx + zk2 = kxk2 + kzk2 (Pythagoras theorem) – for any subspace S ⊆ X it holds that X = S ⊕ S⊥ – for any subspace S ⊆ X it holds that dim X = dim S + dim S⊥
10 Figure 1.1: Left: Two dimension subspace X in R3 and its orthogonal complement S⊥. Right: Any vector can be written as the sum of an element x in a subspace S and one y in its orthogonal complement S⊥
1.3 Inner Product, Norms and Angles
1.3.1 Inner Product
• The Inner product. The inner product (scalar product, dot product) on a (real) vector space X is a real-valued function which maps any pair of elements x, y ∈ X into a scalar denoted by hx, yi.
• Axioms The inner product satisfies the following axioms: for any x, y, z ∈ X and scalar α
– hx, yi ≥ 0 – hx, x = 0i if and only inf x = 0 – hx + y, zi = hx, zi + hy, zi – hαx, yi = αhx, yi – hx, yi = hy, xi
• The standard inner product defined in Rn is the "row-column" product of two
11 vectors
n T X hx, yi = x y = xiyi i=1
• Orthogonality. Two vectors x, y ∈ Rn are orthogonal if xT y = 0.
1.3.2 Norms
• When we try to define the notion of size, or length, of a vector in high dimen- sions (not just a scalar), we are faced with many choices. These choices are called norm.
• The norm of a vector x, denoted by kxk, is a real-valued function that maps any element x ∈ X into a real number kvk that satisfy a set of rules that the notion of size should involved.
• Definition of norm. A function from X to Rn is a norm if
1. kxk ≥ 0 ∀x ∈ X , and kxk = 0 if and only if x = 0 2. kx + yk ≤ kxk + kyk, for any x, y ∈ X (triangle inequality) 3. kαxk = |α|kxk, (for any scalar α and any x ∈ X )
• The Euclidean Norm (l2-norm). The euclidean norm corresponds to the usual notion of distance in two or three dimensions. The set of points with equal l2-norm is a circle (in 2D) and a sphere (in 3D), or a hyper-sphere in higher dimensions. v u m √ uX 2 T p ||x||2 = t xi = x x = hx, xi i=1
12 • The l1-norm. The l1-norm corresponds to the distance travelled on a rectan- gular grid to go from one point to another.
• The l∞-norm. The l∞-norm takes the largest component in the vector, ||x||∞ = max1≤i≤n |xi|. It is useful in measuring peak values.
13 • Cardinality (l0-norm). The cardinality of a vector x is defined as the number of nonzero elements in x: n ( . X . 1 if xk 6= 0 card(x) = I(xk 6= 0), where I(xk 6= 0) = 0 k=1 otherwise
It is often called the l0 norm, kxk0, although it is not a norm in the proper sense, since it doesn’t satisfy the third property.
1.3.3 Angels Between Vectors
• The corresponding angle θ of vectors x, y is xT y cos θ = ||x||2||y||2
• The notion above generalizes the usual notion of angle in two dimensions to higher dimensions. • It is useful in measuring the similarity (closeness) between two vectors. • When the two vectors are orthogonal, xT y = 0, the angle between them is θ = 90◦. • When the angle is 0◦ or 180◦, then x is aligned (parallel) with y, y = αx. In T 2 this situation,|x y| acheives its maximum value |α|kxk2.
14 1.3.4 Cauchy-Schwarz Inequality
• Since | cos θ| ≤ 1, it follows that for any two vectors x, y ∈ Rn, we have xT y cos θ = ≤ 1 ||x||2||y||2 T x y ≤ ||x||2 · ||y||2
• When x, y are collinear (lie on a single straight line), the above inequality is an equality,
1.4 Projection on a Line
1.4.1 Definition
n n n • A line in R passing through x0 ∈ R and with direction u ∈ R is x0 + tu : t ∈ R
• The projection of a given point x on the line is a vector z located on the line, that is closet to x (in Euclidean norm).
• This corresponds to the optimization problem (least-squares):
min ||x − (x0 + tu)||2 t
• Example. Projection of the vector x = (1.6, 2.28) on a line passing through the origin x0 = 0 and with normalized direction u = (0.89, 0.45). At opti- mality the residual vector x − z is orthogonal to the line, hence z = tu, with magnitude t = xT u = 2.0035 and direction u. The scalar t = xT u or uT x (the scalar product between x and u) is the component of x along the normal- ized direction u. Any other point on the line is farther away from the point x than its projection z = tu = (xT u)u or (uT x)u
15 Figure 1.2
1.4.2 Closed-form Expression
• Assuming that u is normalized, ||u||2 = 1, the objective function of the projec- tion problem after squaring is
2 2 T 2 2 ||(x − x0) − tu||2 = ||x − x0||2 − 2tu (x − x0) + t ||u||2 (||u||2 = 1) 2 T 2 = ||x − x0||2 − 2tu (x − x0) + t T 2 = (t − u (x − x0)) + constant
Thus, the optimal solution to the projection problem is
∗ T t = u (x − x0)
and the projected vector is
∗ ∗ T z = x0 + t u = x0 + u (x − x0)u
T • The scalar product u (x − x0) is the component of x − x0 along the direction u.
• If u is not normalized, we replace u with its scaled version u/||u||2 (u is a vector
16 and ||u||2 is a scalar):
T ∗ u u z = x0 + (x − x0) ||u||2 ||u||2 T u (x − x0) = x0 + 2 u ||u||2 uT (x − x ) = x + 0 u 0 uT u
1.4.3 Interpreting the Scalar Product
• We can interpret the scalar product (inner product) between two non-zero vectors x, u as the projection of x on the line of direction u passing through the origin.
∗ T • If u is normalized (||u||2 = 1), then the projection of x on L is z = (u x)u. ∗ T Its length is ||z ||2 = |u x|. (correspond to t in the above figure.) • In general, the scalar product uT x is simply the component of x along the nor- malized direction u/||u||2 defined by u .
1.4.4 Euclidean Projection on a Set
n n • A euclidean project of a point x0 ∈ R on a set S ⊆ R is a point that achieves the smallest Euclidean distance from x0 to the set. • This corresponds to the solution to the optimization problem:
min ||x − x0||2 : x ∈ S x
• When the set S is convex, there is a unique solution to the above problem. In particular, the projection on an affine subspace is unique.
3 • Example. S is the hyperplane S = {x ∈ R : 2x1 + x2 − x3 = 1}. The projection of x0 = 0 on S is aligned with the coefficient vector a = (2, 1, −1). Setting x = ta (i.e. x is point on a direction with the magnitude t) defines the
17 hyperplane that is perpendicular to S. We can solve for the scalar t and obtain t = 1/(aT a) = 1/6.
x = ta = (2t, t, −t) → 2x1 + x2 − x3 = 4t + t + t = 6t = 1
So the projection x∗ = a/(aT a) = (1/3, 1/6, −1/6)
1.5 Orthogonalization: The Gram-Schmidt Procedure
1.5.1 Orthogonalization
n T • A basis (ui)i=1 is orthogonal if ui uj = 0 if i 6= j. If ||ui||2 = 1, we say that the basis is orthornormal
• Orthogonalization refers to the procedure that finds an orthonormal basis of the span of given vectors.
n • Given vectors a1, ..., ak ∈ R , an orthogonalization procedure finds the an orthonormal basis for the span of the vectors a1, ..., am.
n • The orthogonalization procedure computes vectors q1, ..., qn ∈ R such that S = span a1, ...am = span q1, ..., qr
where r is the dimension of S, and
T qi qj = 0, i 6= j (independent) T qi qi = 1, 1 ≤ i, j ≤ r (normalized) The vectors q1, ..., qr form an orthonormal basis for the span of the vectors a1, ..., am .
1.5.2 Basic Step: Projection on a Line
• Consider the line L(q) = {tq : t ∈ R} passing through zero, where q ∈ Rn is n given, and normalized (||q||2 = 1). The projection of a given point a ∈ R on
18 the line is a vector aproj located on the line that is closet to a, which corresponds to the problem:
min ||a − tq||2 t
∗ ∗ • The projection of a on the line L(q) is the vector aproj = t q, where t is the T optimal value. The solution has a closed-form expression, aproj = q a q.
• The vector x can be written as a sum of its projection aproj, and the vector that T is orthogonal to the project a − aproj where aproj = q a q and a − aproj = a − qT aq: