Low-Rank Approximation
Total Page:16
File Type:pdf, Size:1020Kb
Ivan Markovsky Low-Rank Approximation Algorithms, Implementation, Applications January 19, 2018 Springer vi Preface Preface Low-rank approximation is a core problem in applications. Generic examples in systems and control are model reduction and system identification. Low-rank approximation is equivalent to the principal component analysis method in machine learning. Indeed, dimensionality reduction, classification, and information retrieval problems can be posed and solved as particular low-rank approximation problems. Sylvester structured low-rank approximation has applications in computer algebra for the decoupling, factorization, and common divisor computation of polynomials. The book covers two complementary aspects of data modeling: stochastic esti- mation and deterministic approximation. The former aims to find from noisy data that is generated by a low-complexity system an estimate of that data generating system. The latter aims to find from exact data that is generated by a high com- plexity system a low-complexity approximation of the data generating system. In applications, both the stochastic estimation and deterministic approximation aspects are present: the data is imprecise due to measurement errors and is possibly gener- Simple linear models are commonly used in engineering despite of the fact that ated by a complicated phenomenon that is not exactly representable by a model in the real world is often nonlinear. At the same time as being simple, however, the the considered model class. The development of data modeling methods in system models have to be accurate. Mathematical models are obtained from first princi- identification and signal processing, however, has been dominated by the stochastic ples (natural laws, interconnection, etc.) and experimental data. Modeling from first estimation point of view. The approximation error is represented in the mainstream principles aims at exact models. This approach is the default one in natural sciences. data modeling literature as a random process. However, the approximation error is Modeling from data, on the other hand, allows us to tune complexity vs accuracy. deterministic, so that it is not natural to treat it as a random process. Moreover, the This approach is common in engineering, where experimental data is available and approximation error does not satisfy stochastic regularity conditions such as station- a simple approximate model is preferred to a complicated exact one. For example, arity, ergodicity, and Gaussianity. These aspects complicate the stochastic approach. optimal control of a complex (high-order, nonlinear, time-varying) system is cur- An exception to the stochastic paradigm in data modeling is the behavioral ap- rently intractable, however, using simple (low-order, linear, time-invariant) approxi- proach, initiated by J. C. Willems in the mid 80’s. Although the behavioral approach mate models and robust control methods may achieve sufficiently high performance. is motivated by the deterministic approximation aspect of data modeling, it does not The topic of the book is data modeling by reduced complexity mathematical models. exclude the stochastic estimation approach. This book uses the behavioral approach A unifying theme of the book is low-rank approximation: a prototypical data as a unifying language in defining modeling problems and presenting their solutions. modeling problem. The rank of a matrix constructed from the data corresponds to The theory and methods developed in the book lead to algorithms, which are im- the complexity of a linear model that fits the data exactly. The data matrix being full plemented in software. The algorithms clarify the ideas and vice verse the software rank implies that there is no exact low complexity linear model for that data. In this implementation clarifies the algorithms. Indeed, the software is the ultimate un- case, the aim is to find an approximate model. The approximate modeling approach ambiguous description of how the theory and methods are applied in practice. The considered in the book is to find small (in some specified sense) modification of software allows the reader to reproduce and to modify the examples in the book. The the data that renders the modified data exact. The exact model for the modified exposition reflects the sequence: theory algorithms implementation. Corre- 7→ 7→ data is an optimal (in the specified sense) approximate model for the original data. spondingly, the text is interwoven with code that generates the numerical examples The corresponding computational problem is low-rank approximation of the data being discussed. The reader can try out these methods on their own problems and matrix. The rank of the approximation allows us to trade-off accuracy vs complexity. data. Experimenting with the methods and working on the exercises would lead you Apart from the choice of the rank, the book covers two other user choices: the to a deeper understanding of the theory and hands-on experience with the methods. approximation criterion and the matrix structure. These choices correspond to prior knowledge about the accuracy of the data and the model class, respectively. An ex- Leuven, Ivan Markovsky ample of a matrix structure, called Hankel structure, corresponds to the linear time- January 19, 2018 invariant model class. The book presents local optimization, subspace and convex relaxation-based heuristic methods for a Hankel structured low-rank approximation. v viii Contents Contents Part II Applications and generalizations 5 Applications ................................................... 137 5.1 Model reduction ...........................................138 5.2 System identification ........................................144 5.3 Approximate common factor of two polynomials ................149 5.4 Pole placement by a low-order controller .......................154 5.5 Notes and references ........................................156 6 Data-driven filtering and control .................................161 6.1 Model-based vs data-driven paradigms .........................162 6.2 Missing data approach ......................................163 6.3 Estimation and control examples ..............................164 6.4 Solution via matrix completion ...............................166 1 Introduction ................................................... 1 6.5 Notes and references ........................................170 1.1 Classical and behavioral paradigms for data modeling ............ 1 7 Nonlinear modeling problems ...................................173 1.2 Motivating example for low-rank approximation................. 3 7.1 A framework for nonlinear data modeling ......................174 1.3 Overview of applications .................................... 7 7.2 Nonlinear low-rank approximation ............................178 1.4 Overview of algorithms ..................................... 23 7.3 Computational algorithms ...................................182 1.5 Notes and references ........................................ 27 7.4 Identification of polynomial time-invariant systems ..............190 Part I Linear modeling problems 7.5 Notes and references ........................................195 8 Dealing with prior knowledge ...................................199 2 From data to models............................................ 37 8.1 Data preprocessing .........................................200 2.1 Static model representations.................................. 38 8.2 Approximate low-rank factorization ...........................206 2.2 Dynamic model representations ............................... 46 8.3 Complex least squares with constrained phase...................211 2.3 Stochastic model representation .............................. 54 8.4 Blind identification with deterministic input model ..............217 2.4 Exact and approximate data modeling ......................... 59 8.5 Notes and references ........................................221 2.5 Notes and references ........................................ 66 A Total least squares ..............................................225 3 Exact modeling ................................................ 71 3.1 Kung’s realization method ................................... 72 B Solutions to the exercises ........................................233 3.2 Impulse response computation ................................ 76 3.3 Stochastic system identification ............................... 79 C Proofs ................................................... 259 3.4 Missing data recovery ....................................... 86 3.5 Notes and references ........................................ 94 Notation ................................................... ........269 4 Approximate modeling ......................................... 99 Index ................................................... ..........271 4.1 Unstructured low-rank approximation .........................100 4.2 Structured low-rank approximation ............................109 4.3 Nuclear norm heuristic ......................................119 4.4 Missing data estimation .....................................125 4.5 Notes and references ........................................130 vii 2 1 Introduction Chapter 1 inputs (or causes) and the variables corresponding to the B matrix are outputs (or consequences) in the sense that they are uniquely determined by the inputs and the Introduction model. Note that in the classical paradigm the input/output partition of the variables is postulated a priori. Unless