The Kernel Trick, Gram Matrices, and Feature Extraction

Total Page:16

File Type:pdf, Size:1020Kb

The Kernel Trick, Gram Matrices, and Feature Extraction The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 — Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 — Fall 2017 Principle Component Analysis • Setting: find the dominant eigenvalue-eigenvector pair of a positive semidefinite symmetric matrix A. xT Ax u1 = arg max x xT x • Many ways to write this problem, e.g. B is Frobenius norm k kF B 2 = B2 T 2 k kF i,j λ1u1 = arg min xx A F i j x k − k X X p PCA: A Non-Convex Problem • PCA is not convex in any of these formulations • Why? Think about the solutions to the problem: u and –u • Two distinct solutions à can’t be convex • Can we still use momentum to run PCA more quickly? Power Iteration • Before we apply momentum, we need to choose what base algorithm we’re using. • Simplest algorithm: power iteration • Repeatedly multiply by the matrix A to get an answer xt+1 = Axt Why does Power Iteration Work? n T • Let eigendecomposition of A be A = λiuiui i=1 X • For λ > λ λ λ 1 2 ≥ 1 ≥ ···≥ n • PI converges in direction because cosine-squared of angle to u1 is (uT x )2 (uT Atx )2 λ2t(uT x )2 cos2(✓)= 1 t = 1 0 = 1 1 0 x 2 Atx 2 n λ2t(uT x )2 k tk k 0k i=1 i i 0 n λ2t(uT x )2 λ 2t =1 i=2 i i 0 =1P ⌦ 2 − n λ2t(uT x )2 − λ Pi=1 i i 0 ✓ 1 ◆ ! P What about a more general algorithm? • Use both current iterate, and history of past iterations xt+1 = ↵tAxt + βt,1xt 1 + βt,2xt 2 + + βt,tx0 − − ··· • for fixed parameters ⍺ and β • What class of functions can we express in this form? • Notice: x t is always a degree-t polynomial in A times x0 • Can prove by induction that we can express ANY polynomial Power Iteration and Polynomials • Can also think of power iteration as a degree-t polynomial of A t xt = A x0 t • Is there a better degree-t polynomial to use than f t ( x )= x ? • If we use a different polynomial, then we get n T xt = ft(A)x0 = ft(λi)uiui x0 i=1 X • Ideal solution: choose polynomial with zeros at all non-dominant eigenvalues • Practically, make f ( λ ) to be as large as possible and f (λ) < 1 for all λ < λ t 1 | t | | | 2 Chebyshev Polynomials Again • It turns out that Chebyshev polynomials solve this problem. • Recall: T 0 ( x )=0 ,T 1 ( x )= x and Tn+1(x)=2xTn(x) Tn 1(x) − − • Nice properties: x 1 T (x) 1 | | ) | n | Chebyshev Polynomials 1.5 1 0.5 0 T0(u)=1 -0.5 -1 -1.5 0 0.5 1 1.5 -1.5 -1 -0.5 Chebyshev Polynomials 1.5 1 0.5 0 T1(u)=u -0.5 -1 -1.5 0 0.5 1 1.5 -1.5 -1 -0.5 Chebyshev Polynomials 1.5 1 0.5 0 T (u)=2u2 1 2 − -0.5 -1 -1.5 0 0.5 1 1.5 -1.5 -1 -0.5 Chebyshev Polynomials 1.5 1 0.5 0 -0.5 -1 -1.5 0 0.5 1 1.5 -1.5 -1 -0.5 Chebyshev Polynomials 1.5 1 0.5 0 -0.5 -1 -1.5 0 0.5 1 1.5 -1.5 -1 -0.5 Chebyshev Polynomials 1.5 1 0.5 0 -0.5 -1 -1.5 0 0.5 1 1.5 -1.5 -1 -0.5 Chebyshev Polynomials 1.5 1 0.5 0 -0.5 -1 -1.5 0 0.5 1 1.5 -1.5 -1 -0.5 Chebyshev Polynomials Again • It turns out that Chebyshev polynomials solve this problem. • Recall: T 0 ( x )=0 ,T 1 ( x )= x and Tn+1(x)=2xTn(x) Tn 1(x) − − • Nice properties: n x 1 T (x) 1 Tn(1 + ✏) ⇥ 1+p2✏ | | ) | n | ⇡ ⇣⇣ ⌘ ⌘ Using Chebyshev Polynomials • So we can choose our polynomial f in terms of T • Want: f ( λ ) to be as large as possible, subject to f (λ) < 1 for all λ < λ t 1 | t | | | 2 • To make this work, set x f (x)=T n n λ ✓ 2 ◆ • Can do this by running the update 2A A xt+1 = xt xt 1 xt = Tt x0 λ − − ) λ 2 ✓ 2 ◆ Convergence of Momentum PCA n λi T Tt uiu x0 x i=1 λ2 i t = xt P n ⇣ ⌘ k k T 2 λi (uT x )2 i=1 t λ2 i 0 r P ⇣ ⌘ • Cosine-squared of angle to dominant component: 2 λ1 T 2 T 2 Tt (u1 x0) 2 (u1 xt) λ2 cos (✓)= 2 = xt n ⇣2 ⌘λi T 2 T (u x0) k k i=1 t λ2 i P ⇣ ⌘ Convergence of Momentum PCA (continued) 2 λ1 T 2 T 2 Tt (u1 x0) 2 (u1 xt) λ2 cos (✓)= 2 = xt n ⇣2 ⌘λi T 2 T (u x0) k k i=1 t λ2 i ⇣ ⌘ n P2 λi T 2 i=2 Tt λ (ui x0) =1 2 n ⇣ λ ⌘ − P T 2 i (uT x )2 i=1 t λ2 i 0 n ⇣T ⌘2 P i=2(ui x0) 2 λ1 1 =1 ⌦ Tt− ≥ − T 2 λ1 (uT x )2 − λ2 tP λ2 1 0 ✓ ✓ ◆◆ ⇣ ⌘ Convergence of Momentum PCA (continued) 2 2 λ1 2 λ1 λ2 cos (✓) 1 ⌦ T − =1 ⌦ T − 1+ − ≥ − t λ − t λ ✓ ✓ 2 ◆◆ ✓ ✓ 2 ◆◆ 2t λ λ − =1 ⌦ 1+ 2 1 − 2 λ − 0 r 2 ! 1 • Recall that standard power@ iteration had: A 2t 2t λ λ λ − cos2(✓)=1 ⌦ 2 =1 ⌦ 1+ 1 − 2 − λ − λ ✓ 1 ◆ ! ✓ 2 ◆ ! • So the momentum rate is asymptotically faster than power iteration Questions? The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 — Fall 2017 Basic Linear Models • For classification using model vector w output = sign(wT x) • Optimization methods vary; here’s logistic regression (yi 1, 1 ) 2 {− } 1 n minimize log 1 + exp( wT x y ) w n − i i i=1 X Benefits of Linear Models • Fast classification: just one dot product • Fast training/learning: just a few basic linear algebra operations • Drawback: limited expressivity • Can only capture linear classification boundaries à bad for many problems • How do we let linear models represent a broader class of decision boundaries, while retaining the systems benefits? The Kernel Method • Idea: in a linear model we can think about the similarity between two training examples x and y as being xT y • This is related to the rate at which a random classifier will separate x and y • Kernel methods replace this dot-product similarity with an arbitrary Kernel function that computes the similarity between x and y K(x, y): R X ⇥ X ! Kernel Properties • What properties do kernels need to have to be useful for learning? • Key property: kernel must be symmetric K(x, y)=K(y, x) • Key property: kernel must be positive semi-definite n n ci R,xi , cicjK(xi,xj) 0 8 2 2 X ≥ i=1 j=1 X X • Can check that the dot product has this property Facts about Positive Semidefinite Kernels • Sum of two PSD kernels is a PSD kernel K(x, y)=K1(x, y)+K2(x, y)isaPSDkernel • Product of two PSD kernels is a PSD kernel K(x, y)=K1(x, y)K2(x, y)isaPSDkernel • Scaling by any function on both sides is a kernel K(x, y)=f(x)K1(x, y)f(y)isaPSDkernel Other Kernel Properties • Useful property: kernels are often non-negative K(x, y) 0 ≥ • Useful property: kernels are often scaled such that K(x, y) 1, and K(x, y)=1 x = y , • These properties capture the idea that the kernel is expressing the similarity between x and y Common Kernels • Gaussian kernel/RBF kernel: de-facto kernel in machine learning K(x, y)=exp γ x y 2 − k − k • We can validate that this is a kernel • Symmetric? ✅ • Positive semi-definite? ✅ WHY? • Non-negative? ✅ • Scaled so that K(x,x) = 1? ✅ Common Kernels (continued) • Linear kernel: just the inner product K(x, y)=xT y • Polynomial kernel: K(x, y)=(1+xT y)p • Laplacian kernel: K(x, y)=exp( β x y ) − k − k1 • Last layer of a neural network: if last layer outputs φ(x), then kernel is K(x, y)=φ(x)T φ(y) Classifying with Kernels • An equivalent way of writing a linear model on a training set is n T output(x) = sign w x x 0 i i 1 i=1 ! X @ A • We can kernel-ize this by replacing the dot products with kernel evals n output(x) = sign wiK(xi,x) i=1 ! X Learning with Kernels • An equivalent way of writing linear-model logistic regression is T 1 n n minimize log 1 + exp w x x y w n 0 0− 0 j j1 i i11 i=1 j=1 X B B X CC @ @ @ A AA • We can kernel-ize this by replacing the dot products with kernel evals 1 n n minimize log 1 + exp w y K(x ,x ) w n 0 0− j i j i 11 i=1 j=1 X X @ @ AA The Computational Cost of Kernels • Recall: benefit of learning with kernels is that we can express a wider class of classification functions • Recall: another benefit is linear classifier learning problems are “easy” to solve because they are convex, and gradients easy to compute • Major cost of learning naively with Kernels: have to evaluate K(x, y) • For SGD, need to do this effectively n times per update • Computationally intractable unless K is very simple The Gram Matrix • Address this computational problem by pre-computing the kernel function for all pairs of training examples in the dataset.
Recommended publications
  • Tight Frames and Their Symmetries
    Technical Report 9 December 2003 Tight Frames and their Symmetries Richard Vale, Shayne Waldron Department of Mathematics, University of Auckland, Private Bag 92019, Auckland, New Zealand e–mail: [email protected] (http:www.math.auckland.ac.nz/˜waldron) e–mail: [email protected] ABSTRACT The aim of this paper is to investigate symmetry properties of tight frames, with a view to constructing tight frames of orthogonal polynomials in several variables which share the symmetries of the weight function, and other similar applications. This is achieved by using representation theory to give methods for constructing tight frames as orbits of groups of unitary transformations acting on a given finite-dimensional Hilbert space. Along the way, we show that a tight frame is determined by its Gram matrix and discuss how the symmetries of a tight frame are related to its Gram matrix. We also give a complete classification of those tight frames which arise as orbits of an abelian group of symmetries. Key Words: Tight frames, isometric tight frames, Gram matrix, multivariate orthogonal polynomials, symmetry groups, harmonic frames, representation theory, wavelets AMS (MOS) Subject Classifications: primary 05B20, 33C50, 20C15, 42C15, sec- ondary 52B15, 42C40 0 1. Introduction u1 u2 u3 2 The three equally spaced unit vectors u1, u2, u3 in IR provide the following redundant representation 2 3 f = f, u u , f IR2, (1.1) 3 h ji j ∀ ∈ j=1 X which is the simplest example of a tight frame. Such representations arose in the study of nonharmonic Fourier series in L2(IR) (see Duffin and Schaeffer [DS52]) and have recently been used extensively in the theory of wavelets (see, e.g., Daubechies [D92]).
    [Show full text]
  • Week 8-9. Inner Product Spaces. (Revised Version) Section 3.1 Dot Product As an Inner Product
    Math 2051 W2008 Margo Kondratieva Week 8-9. Inner product spaces. (revised version) Section 3.1 Dot product as an inner product. Consider a linear (vector) space V . (Let us restrict ourselves to only real spaces that is we will not deal with complex numbers and vectors.) De¯nition 1. An inner product on V is a function which assigns a real number, denoted by < ~u;~v> to every pair of vectors ~u;~v 2 V such that (1) < ~u;~v>=< ~v; ~u> for all ~u;~v 2 V ; (2) < ~u + ~v; ~w>=< ~u;~w> + < ~v; ~w> for all ~u;~v; ~w 2 V ; (3) < k~u;~v>= k < ~u;~v> for any k 2 R and ~u;~v 2 V . (4) < ~v;~v>¸ 0 for all ~v 2 V , and < ~v;~v>= 0 only for ~v = ~0. De¯nition 2. Inner product space is a vector space equipped with an inner product. Pn It is straightforward to check that the dot product introduces by ~u ¢ ~v = j=1 ujvj is an inner product. You are advised to verify all the properties listed in the de¯nition, as an exercise. The dot product is also called Euclidian inner product. De¯nition 3. Euclidian vector space is Rn equipped with Euclidian inner product < ~u;~v>= ~u¢~v. De¯nition 4. A square matrix A is called positive de¯nite if ~vT A~v> 0 for any vector ~v 6= ~0. · ¸ 2 0 Problem 1. Show that is positive de¯nite. 0 3 Solution: Take ~v = (x; y)T . Then ~vT A~v = 2x2 + 3y2 > 0 for (x; y) 6= (0; 0).
    [Show full text]
  • Gram Matrix and Orthogonality in Frames 1
    U.P.B. Sci. Bull., Series A, Vol. 80, Iss. 1, 2018 ISSN 1223-7027 GRAM MATRIX AND ORTHOGONALITY IN FRAMES Abolhassan FEREYDOONI1 and Elnaz OSGOOEI 2 In this paper, we aim at introducing a criterion that determines if f figi2I is a Bessel sequence, a frame or a Riesz sequence or not any of these, based on the norms and the inner products of the elements in f figi2I. In the cases of Riesz and Bessel sequences, we introduced a criterion but in the case of a frame, we did not find any answers. This criterion will be shown by K(f figi2I). Using the criterion introduced, some interesting extensions of orthogonality will be presented. Keywords: Frames, Bessel sequences, Orthonormal basis, Riesz bases, Gram matrix MSC2010: Primary 42C15, 47A05. 1. Preliminaries Frames are generalizations of orthonormal bases, but, more than orthonormal bases, they have shown their ability and stability in the representation of functions [1, 4, 10, 11]. The frames have been deeply studied from an abstract point of view. The results of such studies have been used in concrete frames such as Gabor and Wavelet frames which are very important from a practical point of view [2, 9, 5, 8]. An orthonormal set feng in a Hilbert space is characterized by a simple relation hem;eni = dm;n: In the other words, the Gram matrix is the identity matrix. Moreover, feng is an orthonor- mal basis if spanfeng = H. But, for frames the situation is more complicated; i.e., the Gram Matrix has no such a simple form.
    [Show full text]
  • Uniqueness of Low-Rank Matrix Completion by Rigidity Theory
    UNIQUENESS OF LOW-RANK MATRIX COMPLETION BY RIGIDITY THEORY AMIT SINGER∗ AND MIHAI CUCURINGU† Abstract. The problem of completing a low-rank matrix from a subset of its entries is often encountered in the analysis of incomplete data sets exhibiting an underlying factor model with applications in collaborative filtering, computer vision and control. Most recent work had been focused on constructing efficient algorithms for exact or approximate recovery of the missing matrix entries and proving lower bounds for the number of known entries that guarantee a successful recovery with high probability. A related problem from both the mathematical and algorithmic point of view is the distance geometry problem of realizing points in a Euclidean space from a given subset of their pairwise distances. Rigidity theory answers basic questions regarding the uniqueness of the realization satisfying a given partial set of distances. We observe that basic ideas and tools of rigidity theory can be adapted to determine uniqueness of low-rank matrix completion, where inner products play the role that distances play in rigidity theory. This observation leads to efficient randomized algorithms for testing necessary and sufficient conditions for local completion and for testing sufficient conditions for global completion. Crucial to our analysis is a new matrix, which we call the completion matrix, that serves as the analogue of the rigidity matrix. Key words. Low rank matrices, missing values, rigidity theory, iterative methods, collaborative filtering. AMS subject classifications. 05C10, 05C75, 15A48 1. Introduction. Can the missing entries of an incomplete real valued matrix be recovered? Clearly, a matrix can be completed in an infinite number of ways by replacing the missing entries with arbitrary values.
    [Show full text]
  • Inner Products and Norms (Part III)
    Inner Products and Norms (part III) Prof. Dan A. Simovici UMB 1 / 74 Outline 1 Approximating Subspaces 2 Gram Matrices 3 The Gram-Schmidt Orthogonalization Algorithm 4 QR Decomposition of Matrices 5 Gram-Schmidt Algorithm in R 2 / 74 Approximating Subspaces Definition A subspace T of a inner product linear space is an approximating subspace if for every x 2 L there is a unique element in T that is closest to x. Theorem Let T be a subspace in the inner product space L. If x 2 L and t 2 T , then x − t 2 T ? if and only if t is the unique element of T closest to x. 3 / 74 Approximating Subspaces Proof Suppose that x − t 2 T ?. Then, for any u 2 T we have k x − u k2=k (x − t) + (t − u) k2=k x − t k2 + k t − u k2; by observing that x − t 2 T ? and t − u 2 T and applying Pythagora's 2 2 Theorem to x − t and t − u. Therefore, we have k x − u k >k x − t k , so t is the unique element of T closest to x. 4 / 74 Approximating Subspaces Proof (cont'd) Conversely, suppose that t is the unique element of T closest to x and x − t 62 T ?, that is, there exists u 2 T such that (x − t; u) 6= 0. This implies, of course, that u 6= 0L. We have k x − (t + au) k2=k x − t − au k2=k x − t k2 −2(x − t; au) + jaj2 k u k2 : 2 2 Since k x − (t + au) k >k x − t k (by the definition of t), we have 2 2 1 −2(x − t; au) + jaj k u k > 0 for every a 2 F.
    [Show full text]
  • Exceptional Collections in Surface-Like Categories 11
    EXCEPTIONAL COLLECTIONS IN SURFACE-LIKE CATEGORIES ALEXANDER KUZNETSOV Abstract. We provide a categorical framework for recent results of Markus Perling on combinatorics of exceptional collections on numerically rational surfaces. Using it we simplify and generalize some of Perling’s results as well as Vial’s criterion for existence of a numerical exceptional collection. 1. Introduction A beautiful recent paper of Markus Perling [Pe] proves that any numerically exceptional collection of maximal length in the derived category of a numerically rational surface (i.e., a surface with zero irreg- ularity and geometric genus) can be transformed by mutations into an exceptional collection consisting of objects of rank 1. In this note we provide a categorical framework that allows to extend and simplify Perling’s result. For this we introduce a notion of a surface-like category. Roughly speaking, it is a triangulated cate- T num T gory whose numerical Grothendieck group K0 ( ), considered as an abelian group with a bilinear form (Euler form), behaves similarly to the numerical Grothendieck group of a smooth projective sur- face, see Definition 3.1. Of course, for any smooth projective surface X its derived category D(X) is surface-like. However, there are surface-like categories of different nature, for instance derived categories of noncommutative surfaces and of even-dimensional Calabi–Yau varieties turn out to be surface-like (Example 3.4). Also, some subcategories of surface-like categories are surface-like. Thus, the notion of a surface-like category is indeed more general. In fact, all results of this paper have a numerical nature, so instead of considering categories, we pass directly to their numerical Grothendieck groups.
    [Show full text]
  • Applications of the Wronskian and Gram Matrices of {Fie”K?
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Applications of the Wronskian and Gram Matrices of {fie”k? Robert E. Hartwig Mathematics Department North Carolina State University Raleigh, North Carolina 27650 Submitted by S. Barn&t ABSTRACT A review is made of some of the fundamental properties of the sequence of functions {t’b’}, k=l,..., s, i=O ,..., m,_,, with distinct X,. In particular it is shown how the Wronskian and Gram matrices of this sequence of functions appear naturally in such fields as spectral matrix theory, controllability, and Lyapunov stability theory. 1. INTRODUCTION Let #(A) = II;,,< A-hk)“‘k be a complex polynomial with distinct roots x 1,.. ., A,, and let m=ml+ . +m,. It is well known [9] that the functions {tieXk’}, k=l,..., s, i=O,l,..., mkpl, form a fundamental (i.e., linearly inde- pendent) solution set to the differential equation m)Y(t)=O> (1) where D = $(.). It is the purpose of this review to illustrate some of the important properties of this independent solution set. In particular we shall show that both the Wronskian matrix and the Gram matrix play a dominant role in certain applications of matrix theory, such as spectral theory, controllability, and the Lyapunov stability theory. Few of the results in this paper are new; however, the proofs we give are novel and shed some new light on how the various concepts are interrelated, and on why certain results work the way they do. LINEAR ALGEBRA ANDITSAPPLlCATIONS43:229-241(1982) 229 C Elsevier Science Publishing Co., Inc., 1982 52 Vanderbilt Ave., New York, NY 10017 0024.3795/82/020229 + 13$02.50 230 ROBERT E.
    [Show full text]
  • On the Nyström Method for Approximating a Gram Matrix For
    Journal of Machine Learning Research 6 (2005) 2153–2175 Submitted 4/05; Published 12/05 On the Nystrom¨ Method for Approximating a Gram Matrix for Improved Kernel-Based Learning Petros Drineas [email protected] Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180, USA Michael W. Mahoney [email protected] Department of Mathematics Yale University New Haven, CT 06520, USA Editor: Nello Cristianini Abstract A problem for many kernel-based methods is that the amount of computation required to find the solution scales as O(n3), where n is the number of training examples. We develop and analyze an algorithm to compute an easily-interpretable low-rank approximation to an n n Gram matrix G such that computations of interest may be performed more rapidly. The approximation× is of ˜ + T the form Gk = CWk C , where C is a matrix consisting of a small number c of columns of G and Wk is the best rank-k approximation to W, the matrix formed by the intersection between those c columns of G and the corresponding c rows of G. An important aspect of the algorithm is the probability distribution used to randomly sample the columns; we will use a judiciously-chosen and data-dependent nonuniform probability distribution. Let and denote the spectral norm k·k2 k·kF and the Frobenius norm, respectively, of a matrix, and let Gk be the best rank-k approximation to G. We prove that by choosing O(k/ε4) columns n + T ε 2 G CWk C ξ G Gk ξ + ∑ Gii, − ≤ k − k i=1 both in expectation and with high probability, for both ξ = 2, F, and for all k : 0 k rank(W).
    [Show full text]
  • Local Law for Random Gram Matrices
    Local law for random Gram matrices Johannes Alt∗ László Erdős∗ Torben Krüger∗ IST Austria IST Austria IST Austria [email protected] [email protected] [email protected] We prove a local law in the bulk of the spectrum for random Gram matrices XX∗, a generalization of sample covariance matrices, where X is a large matrix with independent, centered entries with arbitrary variances. The limiting eigenvalue density that generalizes the Marchenko-Pastur law is determined by solving a system of nonlinear equations. Our entrywise and averaged local laws are on the optimal scale with the optimal error bounds. They hold both in the square case (hard edge) and in the properly rectangular case (soft edge). In the latter case we also establish a macroscopic gap away from zero in the spectrum of XX∗. Keywords: Capacity of MIMO channels, Marchenko-Pastur law, hard edge, soft edge, general variance profile AMS Subject Classification: 60B20, 15B52 1. Introduction Random matrices were introduced in pioneering works by Wishart [43] and Wigner [42] for applications in mathematical statistics and nuclear physics, respectively. Wigner argued that the energy level statistics of large atomic nuclei could be described by the eigenvalues of a large Wigner matrix, i.e., a hermitian matrix N H = (hij )i,j=1 with centered, identically distributed and independent entries (up to the symmetry constraint H = H∗). He proved that the empirical spectral measure (or density of states) converges to the semicircle law as the dimension of the matrix N goes to infinity. Moreover, he postulated that the statistics of the gaps between consecutive eigenvalues depend only on the symmetry type of the matrix and are independent of the distribution of the entries in the large N limit.
    [Show full text]
  • 7 Spectral Properties of Matrices
    7 Spectral Properties of Matrices 7.1 Introduction The existence of directions that are preserved by linear transformations (which are referred to as eigenvectors) has been discovered by L. Euler in his study of movements of rigid bodies. This work was continued by Lagrange, Cauchy, Fourier, and Hermite. The study of eigenvectors and eigenvalues acquired in- creasing significance through its applications in heat propagation and stability theory. Later, Hilbert initiated the study of eigenvalue in functional analysis (in the theory of integral operators). He introduced the terms of eigenvalue and eigenvector. The term eigenvalue is a German-English hybrid formed from the German word eigen which means “own” and the English word “value”. It is interesting that Cauchy referred to the same concept as characteristic value and the term characteristic polynomial of a matrix (which we introduce in Definition 7.1) was derived from this naming. We present the notions of geometric and algebraic multiplicities of eigen- values, examine properties of spectra of special matrices, discuss variational characterizations of spectra and the relationships between matrix norms and eigenvalues. We conclude this chapter with a section dedicated to singular values of matrices. 7.2 Eigenvalues and Eigenvectors Let A Cn×n be a square matrix. An eigenpair of A is a pair (λ, x) C (Cn∈ 0 ) such that Ax = λx. We refer to λ is an eigenvalue and to ∈x is× an eigenvector−{ } . The set of eigenvalues of A is the spectrum of A and will be denoted by spec(A). If (λ, x) is an eigenpair of A, the linear system Ax = λx has a non-trivial solution in x.
    [Show full text]
  • Canonical Forms for Symmetric/Skew-Symmetric Real Matrix Pairs Under Strict Equivalence and Congruence P
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Linear Algebra and its Applications 406 (2005) 1–76 www.elsevier.com/locate/laa Canonical forms for symmetric/skew-symmetric real matrix pairs under strict equivalence and congruence P. Lancaster a,∗,1, L. Rodman b,1 aDepartment of Mathematics and Statistics, University of Calgary, Calgary, AB, Canada T2N 1N4 bDepartment of Mathematics, College of William and Mary, Williamsburg, VA 23187-8795, USA Received 14 November 2004; accepted 22 March 2005 Available online 29 June 2005 Submitted by H. Schneider Abstract A systematic development is made of the simultaneous reduction of pairs of quadratic forms over the reals, one of which is skew-symmetric and the other is either symmetric or skew-symmetric. These reductions are by strict equivalence and by congruence, over the reals or over the complex numbers, and essentially complete proofs are presented. The proofs are based on canonical forms attributed to Jordan and Kronecker. Some closely related results which can be derived from the canonical forms of pairs of symmetric/skew-symmetric real forms are also included. They concern simultaneously neutral subspaces, Hamiltonian and skew-Hamiltonian matrices, and canonical structures of real matrices which are selfadjoint or skew-adjoint in a regular skew-symmetric indefinite inner product, and real matrices which are skew-adjoint in a regular symmetric indefinite inner product. The paper is largely exposi- tory, and continues the comprehensive account of the reduction of pairs of matrices started in ∗ Corresponding author. E-mail address: [email protected] (P.
    [Show full text]
  • Chapter 2: Linear Algebra User's Manual
    Preprint typeset in JHEP style - HYPER VERSION Chapter 2: Linear Algebra User's Manual Gregory W. Moore Abstract: An overview of some of the finer points of linear algebra usually omitted in physics courses. May 3, 2021 -TOC- Contents 1. Introduction 5 2. Basic Definitions Of Algebraic Structures: Rings, Fields, Modules, Vec- tor Spaces, And Algebras 6 2.1 Rings 6 2.2 Fields 7 2.2.1 Finite Fields 8 2.3 Modules 8 2.4 Vector Spaces 9 2.5 Algebras 10 3. Linear Transformations 14 4. Basis And Dimension 16 4.1 Linear Independence 16 4.2 Free Modules 16 4.3 Vector Spaces 17 4.4 Linear Operators And Matrices 20 4.5 Determinant And Trace 23 5. New Vector Spaces from Old Ones 24 5.1 Direct sum 24 5.2 Quotient Space 28 5.3 Tensor Product 30 5.4 Dual Space 34 6. Tensor spaces 38 6.1 Totally Symmetric And Antisymmetric Tensors 39 6.2 Algebraic structures associated with tensors 44 6.2.1 An Approach To Noncommutative Geometry 47 7. Kernel, Image, and Cokernel 47 7.1 The index of a linear operator 50 8. A Taste of Homological Algebra 51 8.1 The Euler-Poincar´eprinciple 54 8.2 Chain maps and chain homotopies 55 8.3 Exact sequences of complexes 56 8.4 Left- and right-exactness 56 { 1 { 9. Relations Between Real, Complex, And Quaternionic Vector Spaces 59 9.1 Complex structure on a real vector space 59 9.2 Real Structure On A Complex Vector Space 64 9.2.1 Complex Conjugate Of A Complex Vector Space 66 9.2.2 Complexification 67 9.3 The Quaternions 69 9.4 Quaternionic Structure On A Real Vector Space 79 9.5 Quaternionic Structure On Complex Vector Space 79 9.5.1 Complex Structure On Quaternionic Vector Space 81 9.5.2 Summary 81 9.6 Spaces Of Real, Complex, Quaternionic Structures 81 10.
    [Show full text]