Optimization and Dynamical Systems
Uwe Helmke1 John B. Moore2
2nd Edition
March 1996
1. Department of Mathematics, University of W¨urzburg, D-97074 W¨urzburg, Germany. 2. Department of Systems Engineering and Cooperative Research Centre for Robust and Adaptive Systems, Research School of Information Sci- ences and Engineering, Australian National University, Canberra, ACT 0200, Australia. Preface
This work is aimed at mathematics and engineering graduate students and researchers in the areas of optimization, dynamical systems, control sys- tems, signal processing, and linear algebra. The motivation for the results developed here arises from advanced engineering applications and the emer- gence of highly parallel computing machines for tackling such applications. The problems solved are those of linear algebra and linear systems the- ory, and include such topics as diagonalizing a symmetric matrix, singular value decomposition, balanced realizations, linear programming, sensitivity minimization, and eigenvalue assignment by feedback control. The tools are those, not only of linear algebra and systems theory, but also of differential geometry. The problems are solved via dynamical sys- tems implementation, either in continuous time or discrete time , which is ideally suited to distributed parallel processing. The problems tackled are indirectly or directly concerned with dynamical systems themselves, so there is feedback in that dynamical systems are used to understand and optimize dynamical systems. One key to the new research results has been the recent discovery of rather deep existence and uniqueness results for the solution of certain matrix least squares optimization problems in geomet- ric invariant theory. These problems, as well as many other optimization problems arising in linear algebra and systems theory, do not always admit solutions which can be found by algebraic methods. Even for such problems that do admit solutions via algebraic methods, as for example the classical task of singular value decomposition, there is merit in viewing the task as a certain matrix optimization problem, so as to shift the focus from algebraic methods to geometric methods. It is in this context that gradient flows on manifolds appear as a natural approach to achieve construction methods that complement the existence and uniqueness results of geometric invari- iv Preface ant theory. There has been an attempt to bridge the disciplines of engineering and mathematics in such a way that the work is lucid for engineers and yet suitably rigorous for mathematicians. There is also an attempt to teach, to provide insight, and to make connections, rather than to present the results as a fait accompli. The research for this work has been carried out by the authors while visiting each other’s institutions. Some of the work has been written in conjunction with the writing of research papers in collaboration with PhD students Jane Perkins and Robert Mahony, and post doctoral fellow Weiy- ong Yan. Indeed, the papers benefited from the book and vice versa, and consequently many of the paragraphs are common to both. Uwe Helmke has a background in global analysis with a strong interest in systems theory, and John Moore has a background in control systems and signal processing.
Acknowledgements
This work was partially supported by grants from Boeing Commercial Air- plane Company, the Cooperative Research Centre for Robust and Adap- tive Systems, and the German-Israeli Foundation. The LATEXsetting of this manuscript has been expertly achieved by Ms Keiko Numata. Assistance came also from Mrs Janine Carney, Ms Lisa Crisafulli, Mrs Heli Jack- son, Mrs Rosi Bonn, and Ms Marita Rendina. James Ashton, Iain Collings, and Jeremy Matson lent LATEXprogramming support. The simulations have been carried out by our PhD students Jane Perkins and Robert Mahony, and post-doctoral fellow Weiyong Yan. Diagrams have been prepared by Anton Madievski. Bob Mahony was also a great help in proof reading and polishing the manuscript. Finally, it is a pleasure to thank Anthony Bloch, Roger Brockett and Leonid Faybusovich for helpful discussions. Their ideas and insights had a crucial influence on our way of thinking. Foreword
By Roger W. Brockett
Differential equations have provided the language for expressing many of the important ideas of automatic control. Questions involving optimiza- tion, dynamic compensation and stability are effectively expressed in these terms. However, as technological advances in VLSI reduce the cost of com- puting, we are seeing a greater emphasis on control problems that involve real-time computing. In some cases this means using a digital implemen- tation of a conventional linear controller, but in other cases the additional flexibility that digital implementation allows is being used to create sys- tems that are of a completely different character. These developments have resulted in a need for effective ways to think about systems which use both ordinary dynamical compensation and logical rules to generate feedback signals. Until recently the dynamic response of systems that incorporate if-then rules, branching, etc. has not been studied in a very effective way. From this latter point of view it is useful to know that there exist families of ordinary differential equations whose flows are such as to generate a sorting of the numerical values of the various components of the initial conditions, solve a linear programming problem, etc. A few years ago, I observed that a natural formulation of a steepest descent algorithm for solving a least-squares matching problem leads to a differential equation for carrying out such operations. During the course of some conversation’s with Anthony Bloch it emerged that there is a simplified version of the matching equations, obtained by recasting them as flows on the Lie algebra and then restricting them to a subspace, and that this simplified version can be used to sort lists as well. Bloch observed that the restricted equations are identical to the Toda lattice equations in the form introduced by Hermann vi Foreword
Flaschka nearly twenty years ago. In fact, one can see in J¨urgen Moser’s early paper on the solution of the Toda Lattice, a discussion of sorting couched in the language of scattering theory and presumably one could have developed the subject from this, rather different, starting point. The fact that a sorting flow can be viewed as a gradient flow and that each simple Lie algebra defines a slightly different version of this gradient flow was the subject of a systematic analysis in a paper that involved collaboration with Bloch and Tudor Ratiu. The differential equations for matching referred to above are actually formulated on a compact matrix Lie group and then rewritten in terms of matrices that evolve in the Lie algebra associated with the group. It is only with respect to a particular metric on a smooth submanifold of this Lie algebra (actually the manifold of all matrices having a given set of eigen- values) that the final equations appear to be in gradient form. However, as equations on this manifold, the descent equations can be set up so as to find the eigenvalues of the initial condition matrix. This then makes contact with the subject of numerical linear algebra and a whole line of interesting work going back at least to Rutishauser in the mid 1950s and continuing to the present day. In this book Uwe Helmke and John Moore emphasize prob- lems, such as computation of eigenvalues, computation of singular values, construction of balanced realizations, etc. involving more structure than just sorting or solving linear programming problems. This focus gives them a natural vehicle to introduce and interpret the mathematical aspects of the subject. A recent Harvard thesis by Steven Smith contains a detailed discussion of the numerical performance of some algorithms evolving from this point of view. The circle of ideas discussed in this book have been developed in some other directions as well. Leonid Faybusovich has taken a double bracket equation as the starting point in a general approach to interior point meth- ods for linear programming. Wing Wong and I have applied this type of thinking to the assignment problem, attempting to find compact ways to formulate a gradient flow leading to the solution. Wong has also exam- ined similar methods for nonconvex problems and Jeffrey Kosowsky has compared these methods with other flow methods inspired by statistical physics. In a joint paper with Bloch we have investigated certain partial differential equation models for sorting continuous functions, i.e. generating the monotone equi-measurable rearrangements of functions, and Saveliev has examined a family of partial differential equations of the double bracket type, giving them a cosmological interpretation. It has been interesting to see how rapidly the literature in this area has grown. The present book comes at a good time, both because it provides a well reasoned introduction to the basic ideas for those who are curious Foreword vii and because it provides a self-contained account of the work on balanced realizations worked out over the last few years by the authors. Although I would not feel comfortable attempting to identify the most promising direction for future work, all indications are that this will continue to be a fruitful area for research.
Contents
Preface iii
Foreword v
1 Matrix Eigenvalue Methods 1 1.1Introduction...... 1 1.2 Power Method for Diagonalization ...... 5 1.3 The Rayleigh Quotient Gradient Flow ...... 14 1.4TheQRAlgorithm...... 30 1.5 Singular Value Decomposition (SVD) ...... 35 1.6 Standard Least Squares Gradient Flows ...... 38
2 Double Bracket Isospectral Flows 43 2.1 Double Bracket Flows for Diagonalization ...... 43 2.2TodaFlowsandtheRiccatiEquation...... 58 2.3 Recursive Lie-Bracket Based Diagonalization ...... 68
3 Singular Value Decomposition 81 3.1SVDviaDoubleBracketFlows...... 81 3.2 A Gradient Flow Approach to SVD ...... 84
4 Linear Programming 101 4.1 The RˆoleofDoubleBracketFlows...... 101 4.2InteriorPointFlowsonaPolytope...... 111 4.3 Recursive Linear Programming/Sorting ...... 117
5 Approximation and Control 125 x Contents
5.1 Approximations by Lower Rank Matrices ...... 125 5.2ThePolarDecomposition...... 143 5.3OutputFeedbackControl...... 146
6 Balanced Matrix Factorizations 163 6.1Introduction...... 163 6.2Kempf-NessTheorem...... 165 6.3GlobalAnalysisofCostFunctions...... 168 6.4 Flows for Balancing Transformations ...... 172 6.5 Flows on the Factors X and Y ...... 186 6.6 Recursive Balancing Matrix Factorizations ...... 193
7 Invariant Theory and System Balancing 201 7.1Introduction...... 201 7.2 Plurisubharmonic Functions ...... 204 7.3TheAzad-LoebTheorem...... 207 7.4 Application to Balancing ...... 209 7.5EuclideanNormBalancing...... 220
8 Balancing via Gradient Flows 229 8.1Introduction...... 229 8.2FlowsonPositiveDefiniteMatrices...... 231 8.3 Flows for Balancing Transformations ...... 246 8.4 Balancing via Isodynamical Flows ...... 249 8.5 Euclidean Norm Optimal Realizations ...... 258
9 Sensitivity Optimization 269 9.1 A Sensitivity Minimizing Gradient Flow ...... 269 9.2 Related L2-Sensitivity Minimization Flows ...... 282 9.3 Recursive L2-SensitivityBalancing...... 291 9.4 L2-SensitivityModelReduction...... 295 9.5 Sensitivity Minimization with Constraints ...... 298
A Linear Algebra 311 A.1MatricesandVectors...... 311 A.2 Addition and Multiplication of Matrices ...... 312 A.3DeterminantandRankofaMatrix...... 312 A.4RangeSpace,KernelandInverses...... 313 A.5 Powers, Polynomials, Exponentials and Logarithms . . . . . 314 A.6 Eigenvalues, Eigenvectors and Trace ...... 314 A.7SimilarMatrices...... 315 A.8 Positive Definite Matrices and Matrix Decompositions . . . 316 Contents xi
A.9 Norms of Vectors and Matrices ...... 317 A.10KroneckerProductandVec...... 318 A.11 Differentiation and Integration ...... 318 A.12 Lemma of Lyapunov ...... 319 A.13 Vector Spaces and Subspaces ...... 319 A.14BasisandDimension...... 320 A.15 Mappings and Linear Mappings ...... 320 A.16InnerProducts...... 321
B Dynamical Systems 323 B.1LinearDynamicalSystems...... 323 B.2 Linear Dynamical System Matrix Equations ...... 325 B.3 Controllability and Stabilizability ...... 325 B.4 Observability and Detectability ...... 326 B.5Minimality...... 327 B.6 Markov Parameters and Hankel Matrix ...... 327 B.7 Balanced Realizations ...... 328 B.8VectorFieldsandFlows...... 329 B.9 Stability Concepts ...... 330 B.10 Lyapunov Stability ...... 332
C Global Analysis 335 C.1 Point Set Topology ...... 335 C.2AdvancedCalculus...... 338 C.3SmoothManifolds...... 341 C.4 Spheres, Projective Spaces and Grassmannians ...... 343 C.5 Tangent Spaces and Tangent Maps ...... 346 C.6 Submanifolds ...... 349 C.7 Groups, Lie Groups and Lie Algebras ...... 351 C.8HomogeneousSpaces...... 355 C.9 Tangent Bundle ...... 357 C.10 Riemannian Metrics and Gradient Flows ...... 360 C.11StableManifolds...... 362 C.12ConvergenceofGradientFlows...... 365
References 367
Author Index 389
Subject Index 395
CHAPTER 1
Matrix Eigenvalue Methods
1.1 Introduction
Optimization techniques are ubiquitous in mathematics, science, engineer- ing, and economics. Least Squares methods date back to Gauss who de- veloped them to calculate planet orbits from astronomical measurements. So we might ask: “What is new and of current interest in Least Squares Optimization?” Our curiosity to investigate this question along the lines of this work was first aroused by the conjunction of two “events”. The first “event” was the realization by one of us that recent results from geometric invariant theory by Kempf and Ness had important contributions to make in systems theory, and in particular to questions concerning the existence of unique optimum least squares balancing solutions. The second “event” was the realization by the other author that constructive proce- dures for such optimum solutions could be achieved by dynamical systems. These dynamical systems are in fact ordinary matrix differential equations (ODE’s), being also gradient flows on manifolds which are best formulated and studied using the language of differential geometry. Indeed, the beginnings of what might be possible had been enthusias- tically expounded by Roger Brockett (Brockett, 1988) He showed, quite surprisingly to us at the time, that certain problems in linear algebra could be solved by calculus. Of course, his results had their origins in earlier work dating back to that of Fischer (1905), Courant (1922) and von Neumann (1937), as well as that of Rutishauser in the 1950s (1954; 1958). Also there were parallel efforts in numerical analysis by Chu (1988). We also mention 2 Chapter 1. Matrix Eigenvalue Methods the influence of the ideas of Hermann (1979) on the development of applica- tions of differential geometry in systems theory and linear algebra. Brockett showed that the tasks of diagonalizing a matrix, linear programming, and sorting, could all be solved by dynamical systems, and in particular by finding the limiting solution of certain well behaved ordinary matrix dif- ferential equations. Moreover, these construction procedures were actually mildly disguised solutions to matrix least squares minimization problems. Of course, we are not used to solving problems in linear algebra by cal- culus, nor does it seem that matrix differential equations are attractive for replacing linear algebra computer packages. So why proceed along such lines? Here we must look at the cutting edge of current applications which result in matrix formulations involving quite high dimensional matrices, and the emergent computer technologies with distributed and parallel pro- cessing such as in the connection machine, the hypercube, array processors, systolic arrays, and artificial neural networks. For such “neural” network architectures, the solutions of high order nonlinear matrix differential (or difference) equations is not a formidable task, but rather a natural one. We should not exclude the possibility that new technologies, such as charge coupled devices, will allow N digital additions to be performed simultane- ously rather than in N operations. This could bring about a new era of numerical methods, perhaps permitting the dynamical systems approach to optimization explored here to be very competitive. The subject of this book is currently in an intensive state of develop- ment, with inputs coming from very different directions. Starting from the seminal work of Khachian and Karmarkar, there has been a lot of progress in developing interior point algorithms for linear programming and nonlin- ear programming, due to Bayer, Lagarias and Faybusovich, to mention a few. In numerical analysis there is the work of Kostant, Symes, Deift, Chu, Tomei and others on the Toda flow and its connection to completely inte- grable Hamiltonian systems. This subject also has deep connections with torus actions and symplectic geometry. Starting from the work of Brockett, there is now an emerging theory of completely integrable gradient flows on manifolds which is developed by Bloch, Brockett, Flaschka and Ratiu. We also mention the work of Bloch on least squares estimation with relation to completely integrable Hamiltonian systems. In our own work we have tried to develop the applications of gradient flows on manifolds to systems theory, signal processing and control theory. In the future we expect more applications to optimal control theory. We also mention the obvious con- nections to artificial neural networks and nonlinear approximation theory. In all these research directions, the development is far from being complete and a definite picture has not yet appeared. It has not been our intention, nor have we been able to cover thoroughly 1.1. Introduction 3 all these recent developments. Instead, we have tried to draw the emerging interconnections between these different lines of research and to raise the reader’s interests in these fascinating developments. Our window on these developments, to which we invite the reader to share, is of course our own research of recent years. We see then that a dynamical systems approach to optimization is rather timely. Where better to start than with least squares optimization? The first step for the approach we take is to formulate a cost function which when minimized over the constraint set would give the desired result. The next step is to formulate a Riemannian metric on the tangent space of the constraint set, viewed as a manifold, such that a gradient flow on the manifold can be readily implemented, and such that the flow converges to the desired algebraic solution. We do not offer a systematic approach for achieving any “best” selections of the metric, but rather demonstrate the approach by examples. In the first chapters of the monograph, these examples will be associated with fundamental and classical tasks in linear algebra and in linear system theory, therefore representing more subtle, rather than dramatic, advances. In the later chapters new problems, not previously addressed by any complete theory, are tackled. Of course, the introduction of methods from the theory of dynamical sys- tems to optimization is well established, as in modern analysis of the clas- sical steepest descent gradient techniques and the Newton method. More recently, feedback control techniques are being applied to select the step size in numerical integration algorithms. There are interesting applications of optimization theory to dynamical systems in the now well established field of optimal control and estimation theory. This book seeks to catalize further interactions between optimization and dynamical systems. We are familiar with the notion that Riccati equations are often the dynamical systems behind many least squares optimization tasks, and en- gineers are now comfortable with implementing Riccati equations for esti- mation and control. Dynamical systems for other important matrix least squares optimization tasks in linear algebra, systems theory, sensitivity optimization, and inverse eigenvalue problems are studied here. At first en- counter these may appear quite formidable and provoke caution. On closer inspection, we find that these dynamical systems are actually Riccati-like in behaviour and are often induced from linear flows. Also, it is very com- forting that they are exponentially convergent, and converge to the set of global optimal solutions to the various optimization tasks. It is our predic- tion that engineers will become familiar with such equations in the decades to come. To us the dynamical systems arising in the various optimization tasks studied in this book have their own intrinsic interest and appeal. Although 4 Chapter 1. Matrix Eigenvalue Methods we have not tried to create the whole zoo of such ordinary differential equa- tions (ODE’s), we believe that an introductory study is useful and worthy of such optimizing flows. At this stage it is too much to expect that each flow be competitive with the traditional numerical methods where these apply. However, we do expect applications in circumstances not amenable to stan- dard numerical methods, such as in adaptive, stochastic, and time-varying environments, and to optimization tasks as explored in the later parts of this book. Perhaps the empirical tricks of modern numerical methods, such as the use of the shift strategies in the QR algorithm, can accelerate our gradient flows in discrete time so that they will be more competitive than they are now. We already have preliminary evidence for this in our own research not fully documented here. In this monograph, we first study the classical methods of diagonalizing a symmetric matrix, giving a dynamical systems interpretation to the meth- ods. The power method, the Rayleigh quotient method, the QR algorithm, and standard least squares algorithms which may include constraints on the variables, are studied in turn. This work leads to the gradient flow equa- tions on symmetric matrices proposed by Brockett involving Lie brackets. We term these equations double bracket flows, although perhaps the term Lie-Brockett could have some appeal. Thesearedevelopedfortheprimal task of diagonalizing a real symmetric matrix. Next, the related exercise of singular value decomposition (SVD) of possibly complex nonsymmetric ma- trices is explored by formulating the task so as to apply the double bracket equation. Also a first principles derivation is presented. Various alternative and related flows are investigated, such as those on the factors of the de- compositions. A mild generalization of SVD is a matrix factorization where the factors are balanced. Such factorizations are studied by similar gradient flow techniques, leading into a later topic of balanced realizations. Double bracket flows are then applied to linear programming. Recursive discrete- time versions of these flows are proposed and rapproachement with earlier linear algebra techniques is made. Moving on from purely linear algebra applications for gradient flows, we tackle balanced realizations and diagonal balancing as topics in linear system theory. This part of the work can be viewed as a generalization of the results for singular value decompositions. The remaining parts of the monograph concerns certain optimization tasks arising in signal processing and control. In particular, the emphasis is on quadratic index minimization. For signal processing the parameter sensitivity costs relevant for finite-word-length implementations are mini- mized. Also constrained parameter sensitivity minimization is covered so as to cope with scaling constraints in a digital filter design. For feedback control, the quadratic indices are those usually used to achieve trade-offs 1.2. Power Method for Diagonalization 5 between control energy and regulation or tracking performance. Quadratic indices are also used for eigenvalue assignment. For all these various tasks, the geometry of the constraint manifolds is important and gradient flows on these manifolds are developed. Digres- sions are included in the early chapters to cover such topics as Projective Spaces, Riemannian Metrics, Gradient Flows, and Lie groups. Appendices are included to cover the relevant basic definitions and results in linear al- gebra, dynamical systems theory, and global analysis including aspects of differential geometry.
1.2 Power Method for Diagonalization
In this chapter we review some of the standard tools in numerical linear algebra for solving matrix eigenvalue problems. Our interest in such meth- ods is not to give a concise and complete analysis of the algorithms, but rather to demonstrate that some of these algorithms arise as discretizations of certain continuous-time dynamical systems. This work then leads into the more recent matrix eigenvalue methods of Chapter 2, termed double bracket flows, which also have application to linear programming and topics of later chapters. There are excellent textbooks available where the following standard methods are analyzed in detail; one choice is Golub and Van Loan (1989).
The Power Method The power method is a particularly simple iterative procedure to determine a dominant eigenvector of a linear operator. Its beauty lies in its simplic- ity rather than its computational efficiency. Appendix A gives background material in matrix results and in linear algebra. Let A : Cn → Cn be a diagonalizable linear map with eigenvalues λ1,...,λn and eigenvectors v1,...,vn. For simplicity let us assume that A is nonsingular and λ1,...,λn satisfy |λ1| > |λ2| ≥ ··· ≥ |λn|.Wethen say that λ1 is a dominant eigenvalue and v1 a dominant eigenvector.Let n 1/2 2 x = |xi| (2.1) i=1
n n denote the standard Euclidean norm of C . For any initial vector x0 of C with x0 = 1, we consider the infinite normalized Krylov-sequence (xk)of 6 Chapter 1. Matrix Eigenvalue Methods unit vectors of Cn defined by the discrete-time dynamical system
k Axk−1 A x0 ∈ N xk = = k ,k. (2.2) Axk−1 A x0
Appendix B gives some background results for dynamical systems. k Since the growth rate of the component of A x0 corresponding to the eigenvector v1 dominates the growth rates of the other components we would expect that (xk) converges to the dominant eigenvector of A:
v1 lim xk = λ for some λ ∈ C with |λ| =1. (2.3) k→∞ v1
Of course, if x0 is an eigenvector of A so is xk for all k ∈ N. Therefore we would expect (2.3) to hold only for generic initial conditions, that is n for almost all x0 ∈ C . This is indeed quite true; see Golub and Van Loan (1989), Parlett and Poole (1973). Example 2.1 Let 10 1 1 A = with x0 = √ . 02 2 1
Then 1 1 xk = √ ,k≥ 1, 1+22k 2k 0 →∞ which converges to 1 for k . Example 2.2 Let 10 1 1 A = ,x0 = √ . 0 −2 2 1
Then 1 1 xk = √ ,k∈ N, 1+22k (−2)k and the sequence (xk | k ∈ N) has 0 0 , 1 −1 as a limit set, see Figure 2.1. 1.2. Power Method for Diagonalization 7
x 1 xk
x0
0 1