The Field of Values of a Matrix and Neural Networks George M

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available athttp://dx.doi.org/10.1109/TNNLS.2013.2293287 1 The Field of Values of a Matrix and Neural Networks George M. Georgiou, Senior Member, IEEE Abstract—The field of values of a matrix, also known as optimization, were used in its development. Necessarily the numerical range, is introduced in the context of neural the backdrop of the learning rule is the complex domain. networks. Using neural network techniques, an algorithm Complex-valued neural networks is an active area of and a generalization are developed that find eigenpairs of research with many applications. [4]–[10] While in most a normal matrix. The dynamics of the algorithm can be cases complex-valued connection matrices have been observed on the complex plane. Only limited visualization is possible in the case when the matrix is Hermitian (or taken to be Hermitian, e.g. that of the complex Hopfield real symmetric) since the field of values is confined on the network, there has been early work in the area that real line. The eigenpairs can serve as stored memories, suggested that more general complex connection matri- which are recalled by using the algorithm. Shifting in ces provide a richer set of dynamic behavior. [11] [12] the algorithm is also discussed, which assists in finding The present work mainly deals with normal matrices, other eigenpairs. Trajectories of runs of the algorithm which includes Hermitian matrices as a subset, and in are visually presented, through which the behavior of the particular in the iterative computation of eigenvalues and algorithms is elucidated. eigenvectors of normal matrices. Index Terms—Complex-valued neural networks, field of There is a plethora algorithms in neural networks values, numerical range, eigenvectors, eigenvalues, normal which involve computing the eigenvectors and eigen- matrices. values of matrices. Some well known examples include Oja’s rule [13], for extracting the principal eigenvector of the correlation matrix of the inputs, and Sanger’s al- I. INTRODUCTION gorithm [14] and the APEX algorithm [15], [16], the last N this paper, the field of values of a matrix [1] two of which extract multiple eigenvectors by employing I [2], which is also known as the numerical range, lateral connections at the output neurons, implicitly or is introduced in the context of neural networks, and in explicitly. A fairly inclusive and comparative discussion particular, in that of complex-valued neural networks. It of such algorithms can be found in [10]. In addition, the seems that the field of values has not been previously complex-valued counterparts of the algorithms are also used in any significant way in neural networks or in found in the same reference. Some examples of neural other engineering applications. For an n × n matrix A computation (in the real domain) that compute eigenpairs with complex entries, the field of values F is defined as of given matrices include [17]–[20]. ∗ n F (A) ≡ fX AX : X 2 C and kXk = 1g; (1) A. The field of values ∗ where X denotes the conjugate transpose of X. F (A) We will look into the properties of the field of values is a connected, convex and compact subset of C. It can of an n × n square matrix with complex entries, which be thought of as a picture of the matrix that provides we denote by Mn(C), or simply by Mn. useful information about it. [3] For example, the F (A) Several definitions and results from the theory of of a Hermitian matrix is a line segment on the real line, matrices will be needed and will be given below. Proofs whereas for a normal matrix, in general, is a polygon. of the theorems and other information on F (A) can be A learning rule, and a generalization, that computes found in these references [1], [2]. eigenvectors and eigenvalues of normal matrices will Theorem 1: Let A be in Mn. A is normal if and only be derived. Properties of the field of values and neu- if there exist unitary matrix U and diagonal matrix D ral techniques, such as gradient ascent and constrained such that A = UDU ∗. In this previous theorem, the diagonal entries of D are George M. Georgiou is with the School of Computer Science and Engineering, California State University, San Bernardino, CA 92407, the eigenvalues of A, and the columns of U are the USA; email: [email protected]. corresponding eigenvectors. Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available athttp://dx.doi.org/10.1109/TNNLS.2013.2293287 2 Definition 1: Let A in Mn be a given matrix. The field where <(·) indicates real part. of values or numerical range of A is defined to be the Intuitively, a sharp point is “corner” on the boundary of subset of the complex plane the field of values; that is, a point where there are two ∗ n tangent vectors, depending on the direction of approach. F (A) ≡ fX AX : X 2 C and kXk = 1g: (2) Thus, the polygon vertices of the field of values of ∗ Thus the field of values is the set of points X AX on a normal matrix, its eigenvalues, are sharp points. An the complex plane when X takes all values on the unit eigenvalue that happens to be collinear with two adjacent sphere. ones will not be a sharp point. Sharp points are called Of great importance is the nature and shape of F (A), extreme points in [1], where an alternative definition is which are discussed next. used. Theorem 3: If α is a sharp point of F (A), where A 2 B. The geometry of F (A) Mn, then α is an eigenvalue of A. Here we present properties of the field of values that The next theorem characterizes sharp points. are useful in understanding the geometry of F (A). Theorem 4: Let A 2 Mn and α be a sharp point of Theorem 2: Let A be in Mn. Then, F (A) is a compact F (A). Then, the unit vector X for which α = X∗AX and convex subset of the complex plane. is an eigenvector of A. [2, p. 55] We denote the spectrum of a matrix A 2 Mn, that is, If A is a normal matrix with distinct eigenvalues, the its set of eigenvalues, by σ(A), and the convex hull of last two theorems imply that if we find a unit vector X a set S ⊂ C by Co(S). such that X∗AX is a vertex of F (A), then X∗AX is an Property 1: If α 2 ;F (αA) = αF (A). C eigenvalue of A and X is the corresponding eigenvector. Property 2: If β 2 C and I is the identity matrix, F (A + βI) = F (A) + β: Property 3 (Normality): If A is a normal matrix in C. Examples of field of values Mn, then F (A) = Co(σ(A)): Plotting of the field of values of matrices can be Since σ(A) is a finite set of points, at most n, the field of done by generating its boundary point-by-point, and values F (A) of normal matrix A is a polygon, possibly connecting the points. [2, p. 33] Figures 1 through 4 collapsed to a line segment or to a point. Since all show the field of values of various matrices. The + signs eigenvalues of a Hermitian matrix are real, and Hermitian in the plots indicate the eigenvalues. matrices are normal, it can be concluded that their field of values is a line segment on the real line, with the endpoints being the extreme eigenvalues. D. Constructing normal matrices It has been proven that for any A 2 M2, F (A) is A normal matrix with a given set of eigenvalues an ellipse, which, however, could degenerate to a line and corresponding eigenvectors can be constructed by segment or a point. For n ≥ 3 dimensions, there is using the unitary decomposition of normal matrices a great variety of shapes of F (·). However, there no (Theorem 1): general characterization. [2, p. 48] A = UDU ∗ (6) The direct sum of two matrices, A ⊕ B, A 2 Mk and The columns of unitary matrix U are the normalized B 2 Ml, is a new matrix C 2 Mk+l formed by placing the matrices so that their main diagonals taken together (orthogonal) eigenvectors and D is a diagonal matrix are the diagonal of C, and the remaining entries are zero: with entries being the corresponding eigenvalues. To construct a normal matrix with given eigenvalues A 0 C = A ⊕ B ≡ (3) and random eigenvectors, the QR factorization [21] can 0 B be used. Given a matrix N with random entries, it can Property 4: For all A 2 Mk and B 2 Ml be decomposed as N = QR, where Q is a unitary F (A ⊕ B) = Co(F (A) [ F (B)): (4) matrix and R is an upper triangular matrix. Matrix R is discarded. Using Q in the place of U in Equation (6), The latter property gives us a means of constructing normal matrix A is obtained. complicated F (·) from simpler ones.

Load more