
Trading Accuracy for Numerical Stability: Orthogonalization, Biorthogonalization and Regularization Tarek A. Lahlou and Alan V. Oppenheim Digital Signal Processing Group Massachusetts Institute of Technology Abstract—This paper presents two novel regularization methods mo- they pertain to biorthogonalization, nonlinear projections, and novel tivated in part by the geometric significance of biorthogonal bases in regularization methods. The proposed regularization methods permit signal processing applications. These methods, in particular, draw upon natural characterizations of accuracy and stability enabling standard the structural relevance of orthogonality and biorthogonality principles and are presented from the perspectives of signal processing, convex tradeoff analysis to identify an acceptable balance. Regularization programming, continuation methods and nonlinear projection operators. methods in signal processing have historically focused on directly Each method is specifically endowed with either a homotopy or tuning obtaining a solution biased toward prespecified signal properties. In parameter to facilitate tradeoff analysis between accuracy and numerical this paper, however, we emphasize the effect the proposed methods stability. An example involving a basis comprised of real exponential signals illustrates the utility of the proposed methods on an ill-conditioned have on the linear problems Euclidean structure itself. inverse problem and the results are compared to standard regularization This paper is outlined as follows: In Section II we collect together techniques from the signal processing literature. geometric facts surrounding orthogonality and biorthogonality that Index Terms—orthogonality, biorthogonality, regularization, convex will be referenced throughout. The orthogonalization and biorthogo- optimization, homotopy maps nalization of a linear system via a nonlinear projection operator is the central focus of Section III. Two regularization methods for balancing I. INTRODUCTION numerical robustness in the sense of relative conditioning and accu- Signal processing algorithms, notably those in machine learning racy in the sense of geometric structure are developed in Section IV. and big data applications, make ubiquitous use of dense and sparse The proposed methods are then applied to the regularization of an ill- linear algebra subroutines [1], [2]. Numerical errors often arise while conditioned basis composed of real exponential signals in Section V implementing such subroutines and are typically explained using as a numerical example. Throughout the presentation, we draw no perturbation analysis tools. For example, finite-precision effects such distinction between various linear problems and the mathematical as coefficient quantization are quantifiable by invoking sensitivity relationships connecting them, e.g. matched filtering, solving linear theorems after organizing the computation into a linear or nonlinear systems, changing bases, etc. In this way, the insights and methods signal-flow graph, as discussed in, e.g., [3]. Also, the magnification presented may be readily applied to any linear problem equally well. and compounding of propagated errors are commonly understood via A. Notational conventions the condition number of a functional representation of the processing system. These unavoidable sources of error contribute significantly Vectors are denoted using underscores with subscripts indexing to the total accumulated error in an obtained solution, especially as vectors as opposed to entries, i.e. a1 and a2 represent two distinct vectors. Boldface letters denote a system or collection of N vec- problem sizes continue to scale. Algorithms specifically designed to N N tors, typically constituting a basis for R , i.e. a = a k=1.A use orthogonality principles alleviate these issues and, among other { k} biorthogonal set is denoted using , i.e. a is biorthogonal to a. The reasons, receive extensive use in practice. Moreover, orthogonality ∼ p-norm of a vector a for p 1 is denoted a p. A capital letter in other contexts such as quantum measurement and asynchronous ≥ distributed optimization continue to provide inspiration for new signal denotes the matrix form of a set of vectors, i.e. A has columns a. The processing algorithms, architectures, and applications [4], [5]. Kronecker delta is denoted δk,j. For clarity we assume all systems Signal processing techniques exploiting natural and efficient rep- are real-valued bases; rigorous extensions to complex-valued bases resentations for signal models with inherent geometric structure and frames follow analogously, e.g. by using appropriate conjugation often use non-orthogonal bases and frames that may generally and restricting arguments to the range of a system. suffer from conditioning issues, e.g. finite rate of innovation [6], II. THE TOPOLOGY OF THE STIEFEL MANIFOLD AND O(N) compressed sensing [7], and transient signal analysis [8]-[11]. To Let a and b denote two general systems and define BO as the address these issues, we restrict our attention in this paper to two clas- nonnegative measure of their biorthogonality given by sical approaches: orthogonalization and regularization. Traditional 2 BO (a, b) a ,b δk,j (1) orthogonalization routines typically destroy the topological relevance k j − k j of an original basis to the problem at hand while simultaneously weakening the informational significance of the solution obtained. which achieves a unique global minimum of zero provided that b = The orthogonal Procrustes problem has been used for subspace a. It is then straightforward to conclude that whitening and machine learning in the matrix setting, i.e. generating BO (a, a)=0 a is an orthonormal system. (2) ⇐⇒ the unitary matrix that optimally maps one matrix to another in the The Stiefel manifold Vk,N is the orthogonal groups principal ho- Frobenius norm sense[12], [13]. In this paper, we specifically translate mogenous space (k =N) and is written using (1) as [14] and explicate these results using bases and further extend them as N VN,N = h R : BO(h, h)=0 . (3) ∈ The authors wish to thank Analog Devices, Bose Corporation, and Texas Instruments for their support of innovative research at MIT and within the Represented as a quadratic form, (1) is convex quadratic with Digital Signal Processing Group. positive signature, i.e. all eigenvalues are positive valued, thus any optimization problem that uses (1) as a cost function or (3) as a feasible set is not precluded from being a convex problem. The orthogonal group of dimension N, denoted by O(N) and described in set-builder notation using the matrix form of h as N N T T O(N)= H R × : H H = HH = IN (4) ∈ N where IN is the identity operator on R , consists of two connected components 1 and 1. Furthermore, 1 and 1 each form M M− N N M M− a disjoint smooth manifold in R × characterized by i = H O(N): det(H)=i ,i= 1 (5) M { ∈ } ± where det( ) denotes the determinant operator. Note that 1 is · M Fig. 1. An illustration of the relationships between the matrix forms of g precisely the special orthogonal group SO(N). From (5) and the and g and their adjoints and their projections onto the orthogonal group via definition of a connected component it immediately follows that (13) where the projection operator is denoted using ( ). The system g is P · chosen such that det(G) > 0 and therefore the projections are onto 1. H i H i,i= 1. (6) M ∈M ⇔ ∈M ± We conclude this section by noting that appropriately selecting a base C. Equivalence of LS and BO over VN,N point for VN,N establishes a one-to-one correspondence with O(N) The measures (1) and (7), respectively interpreted as optimal and, therefore, justifies interchanging between the two notational biorthogonalization and orthgonalization cost functions, produce gen- descriptions of an orthogonal system, i.e. h and H. erally unequal cost surfaces over the space of all N-dimensional III. ORTHOGONALIZATION AND BIORTHOGONALIZATION bases. However, for fixed g and orthogonal h, it follows that Let a and b continue to denote two general systems and define 2 BO(g, h)= g ,h g ,h 2δk,j + δ (9) k j k j k,j LS as the nonnegative Euclidean measure of their distance given by − k, j 2 LS (a, b) ak bk 2 (7) = g ,g 2 g ,h + h ,h (10) − k k k k k k k − k = ak bk,ak bk (8) = g h ,g h (11) − − k k k k k − − k which achieves a unique global minimum of zero provided that b = = LS (g, h) (12) a. In the sequel we shall denote by g a general prespecified system. where the restriction of h to VN,N is used to obtain (10) from (9), A. Optimal orthogonalization problems in particular since a,b = a,h h ,b and δk,j = h ,h . j j j k j Consider the optimization problem that identifies the nearest or- D. Analytic solution and projection interpretation of (P1)-(P4) thogonal system to g in the sense of (7), or equivalently the least- squares orthogonalization of g, given by In Subsection III-C, we established that the solution to (P1) solves (P3) and likewise that the solution to (P2) solves (P4). In this h1 = arg min LS (g, h) s.t. BO (h, h)=0. (P1) h subsection, we first state and interpret the analytic expression that minimizes (P1) followed by discussing one way (P1) and (P2) may Similarly, consider the optimization problem that identifies the nearest orthogonal system to the biorthogonal system g in the sense of (7) be argued to have the same minimizer. Indeed, the matrix form of given by the orthogonal system h1 which solves (P1) is given by T h2 = arg min LS (g, h) s.t. BO (h, h)=0. (P2) H1 = UV (13) h T We collectively refer to (P1) and (P2) as optimal orthogonalization where G = UΣV is the Singular Value Decomposition (SVD) problems and further interpret their solutions as orthogonal systems of the matrix form of g. Observe that verifying the orthogonality that maximally preserve the geometric structure inherent to g and g, of (13) folllows immediately from the closure property of O(N);a respectively.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages5 Page
-
File Size-