Arxiv:2102.13566V1 [Cs.LG] 26 Feb 2021

Total Page:16

File Type:pdf, Size:1020Kb

Load more

SPARSE APPROXIMATION IN LEARNING VIA NEURAL ODES CARLOS ESTEVE YAGÜE* AND BORJAN GESHKOVSKI* Abstract. We consider the continuous-time, neural ordinary differential equation (neural ODE) perspective of deep supervised learning, and study the impact of the final time horizon T in training. We focus on a cost consisting of an integral of the empirical risk over the time interval, and L1–parameter regularization. Under homogeneity assumptions on the dynamics (typical for ReLU activations), we prove that any global minimizer is sparse, in the sense that there exists a positive stopping time T ∗ beyond which the optimal parameters vanish. Moreover, under appropriate interpolation assumptions on the neural ODE, we provide quantitative estimates of the stopping time T ∗, and of the training error of the trajectories at the stopping time. The latter stipulates a quantitative approximation property of neural ODE flows with sparse parameters. In practical terms, a shorter time-horizon in the train- ing problem can be interpreted as considering a shallower residual neural network (ResNet), and since the optimal parameters are concentrated over a shorter time horizon, such a consideration may lower the computational cost of training without discarding relevant information. Contents 1. Introduction1 2. Preliminary lemmas 10 3. Proof of Theorem 1.1 16 4. Asymptotic interpolation 18 5. Concluding remarks 21 References 22 Keywords. Deep Learning; Neural ODEs; Supervised Learning; Sparsity; Optimal control; Nonlinear systems. AMS Subject Classification. 49J15; 49M15; 49J20; 49K20; 93C20; 49N05. 1. Introduction arXiv:2102.13566v1 [cs.LG] 26 Feb 2021 Sparsity is a highly desirable property in many machine learning and optimization tasks due to the inherent reduction of computational complexity. When induced by `1–regularization for instance, it has been used extensively for simplifying machine learning tasks by selecting a strict subset of the available features to be used in an automatized manner. An illustrative example is the well-known Lasso (least absolute shrinkage and selection operator,[Santosa and Symes, 1986; Tibshirani, 1996]), which consists in minimizing a least squares cost function and an `1–penalty for an affine Date: March 1, 2021 *Equal contribution. 1 2 CARLOS ESTEVE YAGÜE AND BORJAN GESHKOVSKI parametric model, and enforces a subset of the trainable parameters to become zero. As a consequence, the associated features may be pruned. Following this line of reasoning, in this work, we study supervised learning problems viewed from a continuous-time, neural ODE perspective, and we demonstrate the appearance of sparsity patterns for L1–regularized minimization problems. 1.1. Background. We recall that supervised learning addresses the problem of pre- dicting from data, which consists in approximating an unknown function f : N X −! Y from N known and possibly noisy samples ~xi; ~yi = f(~xi) i=1. Depending on the nature of the space of labels , one distinguishesf two typesg of supervised learning tasks, namely that of classificationY (labels take values in a finite set of m classes, e.g. m = 1; : : : ; m ) and regression (labels take continuous values in R ). Heuristi- cally,Y f supervisedg learning consists in constructing a map Y ⊂ fapprox : ( ); X −! P Y which, desirably, is such that for any x and for any Borel measurable A , 2 X ⊂ Y fapprox(x)(A) 1 whenever f(x) A, and fapprox(x)(A) 0 whenever f(x) A; here, ( ) denotes' the space of probability2 measures on . In' other words, one62 looks for P Y Y a map fapprox which approximates the map x δ where δz stands for the Dirac 7−! f(x) measure centered at z. The map fapprox is often chosen from a class of parametric functions, and, as one only has N samples of f, the parameters are tuned in order to fit fapprox to these data by minimizing a specific loss functional. Deep neural networks constitute a popular method for constructing fapprox – they are parametrized computational architectures which propagate each individual sample of N d×N the input data ~xi i=1 R across a sequence of affine parametric maps and simple nonlinearities.f Theg so-called2 residual neural networks (ResNets, [He et al., 2016]) may, in the simplest case, be cast as schemes of the mould 8 k+1 k k k k <x = xi + σ w xi + b for k 0;:::;Nlayers 1 i 2 f − g (1.1) 0 d :xi = ~xi R 2 k d for all i 1;:::;N := [N]. The unknown states are xi R for any i [N], σ is an explicit2 f scalar,g Lipschitz continuous nonlinear function2 defined component-wise2 in (1.1), k k Nlayers−1 are optimizable parameters (controls) with k d×d and w ; b k=0 w R k d 2 b R , and Nlayers > 1 designates the number of layers referred to as the depth. Due2 to the inherent dynamical systems nature of ResNets, several recent works have aimed at studying an associated continuous-time formulation in some detail, a trend started with the works [E, 2017; Haber and Ruthotto, 2017]. This perspective is motivated by the simple observation that for any i [N] and for T > 0,(1.1) is roughly the forward Euler scheme for the neural ordinary2 differential equation (neural ODE) ( x_ i(t) = σ(w(t)xi(t) + b(t)) for t (0;T ) 2 d (1.2) xi(0) = ~xi R : 2 We shall focus our interest on parametrizing fapprox by the flows of neural ODEs such as (1.2). This may be done by setting fapprox : x µ(x(T )), where x(T ) solves d 7−! (1.2) with x(0) = x, and µ : R ( ) is chosen appropriately. In practice, the −! P Y SPARSE APPROXIMATION IN LEARNING VIA NEURAL ODES 3 time-dependent parameters [w; b] are found by solving the regularized empirical risk minimization problem N 1 X p min loss P xi(T ); ~yi + [w; b] ; (1.3) [w;b] N Lp(0;T ;Rdu ) i=1 | {z } :=E(x(T )) d m 1 where p 1; 2 , P : R R is assumed to be a given affine map, and loss( ; ): m 2 f g −! · · R R+ is such that x loss(x; y) is continuous for all y , loss(x; y) = 0 × Y −! 7−! 2 Y 6 whenever µ(x) = δy, and loss(x; y) 0 when µx δy in an appropriate sense of measures (e.g.,6 for the Wasserstein−! distance). Common−! examples of loss functions include the cross-entropy loss for classification tasks ! e(P x)~y lossP x; ~y := log ; (1.4) Pm (P x)j − j=1 e m where P x R and ~y [m], in which case, µ := softmax P , or the mean squared error (MSE)2 loss for regression2 tasks ◦ 2 loss P x; ~y := P x ~y 2 − ` m where now ~y R , in which case, µ(x) := δP x. 2 Y ⊂ Note that in (1.1) the time-step h = T is fixed (equal to 1), and each time-instance Nlayers of a discretization to (1.2) would represent a different layer of the derived neural net- work (1.1). We therefore see that when the time-step is fixed, the time horizon T in (1.2) may serve as an indicator of the number of layers Nlayers in the discrete-time context. Thus, a good a priori knowledge of the dynamics of the learning problem over longer time horizons is desirable in view of discovering approximation and gener- alization properties of the trained neural ODE flow. This perspective has been taken in [Esteve et al., 2020a] for L2–regularized supervised learning problems. Herein, we complete this study with new results and insights for L1–regularized learning problems. N 1.2. Problem setting. We assume we are given a training dataset ~xi; ~yi i=1 where d f g ~xi R and ~yi . We henceforth set dx := d N, and consider stacked neural ODEs2 X of ⊂ the form 2 Y × (x_ (t) = f(x(t); u(t)) for t (0;T ) 2 0 d (1.5) x(0) = x R x ; 2 0 dx dx du dx where T > 0 and x = [~x1; : : : ; ~xN ] R . The nonlinearity f : R R R may take the form 2 × −! 02w 3 2b31 B6 .. 7 6.7C f(x; u) = σ @4 . 5 x + 4.5A (1.6) w b dx du 2 for x R and u = [w; b] R with du := d + d, and σ Lip(R) is defined component-wise2 so that each component2 of f coincides with the canonical2 neural ODE 1In practice, P is either part of the trainable parameters, or its coefficients may be chosen at random. Whilst we fix P for technical purposes, numerical experiments indicate that the results presented in what follows persist when P is optimized as well. 4 CARLOS ESTEVE YAGÜE AND BORJAN GESHKOVSKI given in (1.2). Permutations may also be considered, e.g. 2w 3 2b3 6 .. 7 6.7 f(x; u) = 4 . 5 σ(x) + 4.5 : (1.7) w b The key assumption we make in what follows is that f is 1–homogeneous with respect to the parameters u, i.e. d d f(x; αu) = αf(x; u) for all (x; u) R x R u and α > 0: (1.8) 2 × This is clearly the case for f parametrized as in (1.7), whilst for (1.6), we shall moreover assume that σ is 1–homogeneous – a canonical example of such an activation function is the ReLU σ(x) = max x; 0 . f g d d Remark 1. Since σ Lip(R), for any x0 R x and u L1(0;T ; R u ),(1.5) with f as 2 2 d 2 above admits a unique solution x C0([0;T ]; R x ).
Recommended publications
  • Accelerating Matching Pursuit for Multiple Time-Frequency Dictionaries

    Accelerating Matching Pursuit for Multiple Time-Frequency Dictionaries

    Proceedings of the 23rd International Conference on Digital Audio Effects (DAFx-20), Vienna, Austria, September 8–12, 2020 ACCELERATING MATCHING PURSUIT FOR MULTIPLE TIME-FREQUENCY DICTIONARIES ZdenˇekPr˚uša,Nicki Holighaus and Peter Balazs ∗ Acoustics Research Institute Austrian Academy of Sciences Vienna, Austria [email protected],[email protected],[email protected] ABSTRACT An overview of greedy algorithms, a class of algorithms MP Matching pursuit (MP) algorithms are widely used greedy meth- falls under, can be found in [10, 11] and in the context of audio ods to find K-sparse signal approximations in redundant dictionar- and music processing in [12, 13, 14]. Notable applications of MP ies. We present an acceleration technique and an implementation algorithms in the audio domain include analysis [15], [16], coding of the matching pursuit algorithm acting on a multi-Gabor dictio- [17, 18, 19], time scaling/pitch shifting [20] [21], source separation nary, i.e., a concatenation of several Gabor-type time-frequency [22], denoising [23], partial and harmonic detection and tracking dictionaries, consisting of translations and modulations of possi- [24]. bly different windows, time- and frequency-shift parameters. The We present a method for accelerating MP-based algorithms proposed acceleration is based on pre-computing and thresholding acting on a single Gabor-type time-frequency dictionary or on a inner products between atoms and on updating the residual directly concatenation of several Gabor dictionaries with possibly different in the coefficient domain, i.e., without the round-trip to thesig- windows and parameters. The main idea of the present accelera- nal domain.
  • Paper, We Present Data Fusion Across Multiple Signal Sources

    Paper, We Present Data Fusion Across Multiple Signal Sources

    68 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 51, NO. 1, JANUARY 2016 A Configurable 12–237 kS/s 12.8 mW Sparse-Approximation Engine for Mobile Data Aggregation of Compressively Sampled Physiological Signals Fengbo Ren, Member, IEEE, and Dejan Markovic,´ Member, IEEE Abstract—Compressive sensing (CS) is a promising technology framework, the CS framework has several intrinsic advantages. for realizing low-power and cost-effective wireless sensor nodes First, random encoding is a universal compression method that (WSNs) in pervasive health systems for 24/7 health monitoring. can effectively apply to all compressible signals regardless of Due to the high computational complexity (CC) of the recon- struction algorithms, software solutions cannot fulfill the energy what their sparse domain is. This is a desirable merit for the efficiency needs for real-time processing. In this paper, we present data fusion across multiple signal sources. Second, sampling a 12—237 kS/s 12.8 mW sparse-approximation (SA) engine chip and compression can be performed at the same stage in CS, that enables the energy-efficient data aggregation of compressively allowing for a sampling rate that is significantly lower than the sampled physiological signals on mobile platforms. The SA engine Nyquist rate. Therefore, CS has a potential to greatly impact chip integrated in 40 nm CMOS can support the simultaneous reconstruction of over 200 channels of physiological signals while the data acquisition devices that are sensitive to cost, energy consuming <1% of a smartphone’s power budget. Such energy- consumption, and portability, such as wireless sensor nodes efficient reconstruction enables two-to-three times energy saving (WSNs) in mobile and wearable applications [5].
  • Improved Greedy Algorithms for Sparse Approximation of a Matrix in Terms of Another Matrix

    Improved Greedy Algorithms for Sparse Approximation of a Matrix in Terms of Another Matrix

    Improved Greedy Algorithms for Sparse Approximation of a Matrix in terms of Another Matrix Crystal Maung Haim Schweitzer Department of Computer Science Department of Computer Science The University of Texas at Dallas The University of Texas at Dallas Abstract We consider simultaneously approximating all the columns of a data matrix in terms of few selected columns of another matrix that is sometimes called “the dic- tionary”. The challenge is to determine a small subset of the dictionary columns that can be used to obtain an accurate prediction of the entire data matrix. Previ- ously proposed greedy algorithms for this task compare each data column with all dictionary columns, resulting in algorithms that may be too slow when both the data matrix and the dictionary matrix are large. A previously proposed approach for accelerating the run time requires large amounts of memory to keep temporary values during the run of the algorithm. We propose two new algorithms that can be used even when both the data matrix and the dictionary matrix are large. The first algorithm is exact, with output identical to some previously proposed greedy algorithms. It takes significantly less memory when compared to the current state- of-the-art, and runs much faster when the dictionary matrix is sparse. The second algorithm uses a low rank approximation to the data matrix to further improve the run time. The algorithms are based on new recursive formulas for computing the greedy selection criterion. The formulas enable decoupling most of the compu- tations related to the data matrix from the computations related to the dictionary matrix.
  • Privacy Preserving Identification Using Sparse Approximation With

    Privacy Preserving Identification Using Sparse Approximation With

    Privacy Preserving Identification Using Sparse Approximation with Ambiguization Behrooz Razeghi, Slava Voloshynovskiy, Dimche Kostadinov and Olga Taran Stochastic Information Processing Group, Department of Computer Science, University of Geneva, Switzerland behrooz.razeghi, svolos, dimche.kostadinov, olga.taran @unige.ch f g Abstract—In this paper, we consider a privacy preserving en- Owner Encoder Public Storage coding framework for identification applications covering biomet- +1 N M λx λx L M rics, physical object security and the Internet of Things (IoT). The X × − A × ∈ X 1 ∈ A proposed framework is based on a sparsifying transform, which − X = x (1) , ..., x (m) , ..., x (M) a (m) = T (Wx (m)) n A = a (1) , ..., a (m) , ..., a (M) consists of a trained linear map, an element-wise nonlinearity, { } λx { } and privacy amplification. The sparsifying transform and privacy L p(y (m) x (m)) Encoder | amplification are not symmetric for the data owner and data user. b Schematic Decoding List Probe +1 We demonstrate that the proposed approach is closely related (Private Decoding) y = x (m) + z d (a (m) , b) γL λy λy ≤ − to sparse ternary codes (STC), a recent information-theoretic 1 p (positions) − ´x y = ´x (Pubic Decoding) 1 m M (y) concept proposed for fast approximate nearest neighbor (ANN) ≤ ≤ L Data User b = Tλy (Wy) search in high dimensional feature spaces that being machine learning in nature also offers significant benefits in comparison Fig. 1: Block diagram of the proposed model. to sparse approximation and binary embedding approaches. We demonstrate that the privacy of the database outsourced to a for example biometrics, which being disclosed once, do not server as well as the privacy of the data user are preserved at a represent any more a value for the related security applications.
  • Column Subset Selection Via Sparse Approximation of SVD

    Column Subset Selection Via Sparse Approximation of SVD

    Column Subset Selection via Sparse Approximation of SVD A.C¸ivrila,∗, M.Magdon-Ismailb aMeliksah University, Computer Engineering Department, Talas, Kayseri 38280 Turkey bRensselaer Polytechnic Institute, Computer Science Department, 110 8th Street Troy, NY 12180-3590 USA Abstract Given a real matrix A 2 Rm×n of rank r, and an integer k < r, the sum of the outer products of top k singular vectors scaled by the corresponding singular values provide the best rank-k approximation Ak to A. When the columns of A have specific meaning, it might be desirable to find good approximations to Ak which use a small number of columns of A. This paper provides a simple greedy algorithm for this problem in Frobenius norm, with guarantees on( the performance) and the number of columns chosen. The algorithm ~ k log k 2 selects c columns from A with c = O ϵ2 η (A) such that k − k ≤ k − k A ΠC A F (1 + ϵ) A Ak F ; where C is the matrix composed of the c columns, ΠC is the matrix projecting the columns of A onto the space spanned by C and η(A) is a measure related to the coherence in the normalized columns of A. The algorithm is quite intuitive and is obtained by combining a greedy solution to the generalization of the well known sparse approximation problem and an existence result on the possibility of sparse approximation. We provide empirical results on various specially constructed matrices comparing our algorithm with the previous deterministic approaches based on QR factorizations and a recently proposed randomized algorithm.
  • Modified Sparse Approximate Inverses (MSPAI) for Parallel

    Modified Sparse Approximate Inverses (MSPAI) for Parallel

    Technische Universit¨atM¨unchen Zentrum Mathematik Modified Sparse Approximate Inverses (MSPAI) for Parallel Preconditioning Alexander Kallischko Vollst¨andiger Abdruck der von der Fakult¨atf¨ur Mathematik der Technischen Universit¨at M¨unchen zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzender: Univ.-Prof. Dr. Peter Rentrop Pr¨ufer der Dissertation: 1. Univ.-Prof. Dr. Thomas Huckle 2. Univ.-Prof. Dr. Bernd Simeon 3. Prof. Dr. Matthias Bollh¨ofer, Technische Universit¨atCarolo-Wilhelmina zu Braunschweig (schriftliche Beurteilung) Die Dissertation wurde am 15.11.2007 bei der Technischen Universit¨ateingereicht und durch die Fakult¨atf¨urMathematik am 18.2.2008 angenommen. ii iii Abstract The solution of large sparse and ill-conditioned systems of linear equations is a central task in numerical linear algebra. Such systems arise from many applications like the discretiza- tion of partial differential equations or image restoration. Herefore, Gaussian elimination or other classical direct solvers can not be used since the dimension of the underlying co- 3 efficient matrices is too large and Gaussian elimination is an O n algorithm. Iterative solvers techniques are an effective remedy for this problem. They allow to exploit sparsity, bandedness, or block structures, and they can be parallelized much easier. However, due to the matrix being ill-conditioned, convergence becomes very slow or even not be guaranteed at all. Therefore, we have to employ a preconditioner. The sparse approximate inverse (SPAI) preconditioner is based on Frobenius norm mini- mization. It is a well-established preconditioner, since it is robust, flexible, and inherently parallel. Moreover, SPAI captures meaningful sparsity patterns automatically.
  • A New Algorithm for Non-Negative Sparse Approximation Nicholas Schachter

    A New Algorithm for Non-Negative Sparse Approximation Nicholas Schachter

    A New Algorithm for Non-Negative Sparse Approximation Nicholas Schachter To cite this version: Nicholas Schachter. A New Algorithm for Non-Negative Sparse Approximation. 2020. hal- 02888300v1 HAL Id: hal-02888300 https://hal.archives-ouvertes.fr/hal-02888300v1 Preprint submitted on 2 Jul 2020 (v1), last revised 9 Jun 2021 (v5) HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A New Algorithm for Non-Negative Sparse Approximation Nicholas Schachter July 2, 2020 Abstract In this article we introduce a new algorithm for non-negative sparse approximation problems based on a combination of the approaches used in orthogonal matching pursuit and basis de-noising pursuit towards solving sparse approximation problems. By taking advantage of structural properties inherent to non-negative sparse approximation problems, a branch and bound (BnB) scheme is developed that enables fast and accurate recovery of underlying dictionary atoms, even in the presence of noise. Detailed analysis of the performance of the algorithm is discussed, with attention specically paid to situations in which the algorithm will perform better or worse based on the properties of the dictionary and the required sparsity of the solution.
  • Efficient Implementation of the K-SVD Algorithm Using

    Efficient Implementation of the K-SVD Algorithm Using

    E±cient Implementation of the K-SVD Algorithm using Batch Orthogonal Matching Pursuit Ron Rubinstein¤, Michael Zibulevsky¤ and Michael Elad¤ Abstract The K-SVD algorithm is a highly e®ective method of training overcomplete dic- tionaries for sparse signal representation. In this report we discuss an e±cient im- plementation of this algorithm, which both accelerates it and reduces its memory consumption. The two basic components of our implementation are the replacement of the exact SVD computation with a much quicker approximation, and the use of the Batch-OMP method for performing the sparse-coding operations. - Technical Report CS-2008-08.revised 2008 Batch-OMP, which we also present in this report, is an implementation of the Orthogonal Matching Pursuit (OMP) algorithm which is speci¯cally optimized for sparse-coding large sets of signals over the same dictionary. The Batch-OMP imple- mentation is useful for a variety of sparsity-based techniques which involve coding large numbers of signals. In the report, we discuss the Batch-OMP and K-SVD implementations and analyze their complexities. The report is accompanied by Matlabr toolboxes which implement these techniques, and can be downloaded at http://www.cs.technion.ac.il/~ronrubin/software.html. 1 Introduction Sparsity in overcomplete dictionaries is the basis for a wide variety of highly e®ective signal and image processing techniques. The basic model suggests that natural signals can be e±ciently explained as linear combinations of prespeci¯ed atom signals, where the linear coe±cients are sparse (most of them zero). Formally, if x is a column signal and D is the dictionary (whose columns are the atom signals), the sparsity assumption can be described by the following sparse approximation problem, 2 γ^ = Argmin γ 0 Subject To x Dγ ² : (1.1) γ k k k ¡ k2 · Technion - Computer Science Department In this formulation, γ is the sparse representation of x, ² the error tolerance, and k ¢ k0 is the `0 pseudo-norm which counts the non-zero entries.
  • Regularized Dictionary Learning for Sparse Approximation

    Regularized Dictionary Learning for Sparse Approximation

    16th European Signal Processing Conference (EUSIPCO 2008), Lausanne, Switzerland, August 25-29, 2008, copyright by EURASIP REGULARIZED DICTIONARY LEARNING FOR SPARSE APPROXIMATION M. Yaghoobi, T. Blumensath, M. Davies Institute for Digital Communications, Joint Research Institute for Signal and Image Processing, University of Edinburgh, UK ABSTRACT keeping the dictionary fixed. This is followed by a second step in Sparse signal models approximate signals using a small number of which the sparse coefficients are kept fixed and the dictionary is elements from a large set of vectors, called a dictionary. The suc- optimized. This algorithm runs for a specific number of alternating cess of such methods relies on the dictionary fitting the signal struc- optimizations or until a specific approximation error is reached. The ture. Therefore, the dictionary has to be designed to fit the signal proposed method is based on such an alternating optimization (or class of interest. This paper uses a general formulation that allows block-relaxed optimization) method with some advantages over the the dictionary to be learned form the data with some a priori in- current methods in the condition and speed of convergence. formation about the dictionary. In this formulation a universal cost If the set of training samples is {y(i) : 1 ≤ i ≤ L}, where L function is proposed and practical algorithms are presented to min- is the number of training vectors, then sparse approximations are imize this cost under different constraints on the dictionary. The often found (for all i : 1 ≤ i ≤ L ) by, proposed methods are compared with previous approaches using (i) (i) 2 p synthetic and real data.
  • Structured Compressed Sensing - Using Patterns in Sparsity

    Structured Compressed Sensing - Using Patterns in Sparsity

    Structured Compressed Sensing - Using Patterns in Sparsity Johannes Maly Technische Universit¨atM¨unchen, Department of Mathematics, Chair of Applied Numerical Analysis [email protected] CoSIP Workshop, Berlin, Dezember 9, 2016 Overview Classical Compressed Sensing Structures in Sparsity I - Joint Sparsity Structures in Sparsity II - Union of Subspaces Conclusion Johannes Maly Structured Compressed Sensing - Using Patterns in Sparsity 2 of 44 Classical Compressed Sensing Overview Classical Compressed Sensing Structures in Sparsity I - Joint Sparsity Structures in Sparsity II - Union of Subspaces Conclusion Johannes Maly Structured Compressed Sensing - Using Patterns in Sparsity 3 of 44 Classical Compressed Sensing Compressed Sensing N Let x 2 R be some unknown k-sparse signal. Then, x can be recovered from few linear measurements y = A · x m×N m where A 2 R is a (random) matrix, y 2 R is the vector of measurements and m N. Johannes Maly Structured Compressed Sensing - Using Patterns in Sparsity 4 of 44 Classical Compressed Sensing Compressed Sensing N Let x 2 R be some unknown k-sparse signal. Then, x can be recovered from few linear measurements y = A · x m×N m where A 2 R is a (random) matrix, y 2 R is the vector of measurements and m N. It is sufficient to have N m Ck log & k measurements to recover x (with high probability) by greedy strategies, e.g. Orthogonal Matching Pursuit, or convex optimization, e.g. `1-minimization. Johannes Maly Structured Compressed Sensing - Using Patterns in Sparsity 5 of 44 OMP INPUT: matrix A; measurement vector y: INIT: T0 = ;; x0 = 0: ITERATION: until stopping criterion is met T jn+1 arg maxj2[N] (A (y − Axn))j ; Tn+1 Tn [ fjn+1g; xn+1 arg minz2RN fky − Azk2; supp(z) ⊂ Tn+1g : OUTPUT: then ~-sparse approximationx ^ := xn~ Classical Compressed Sensing Orthogonal Matching Pursuit OMP is a simple algorithm that tries to find the true support of x by k greedy steps.
  • Sparse Estimation for Image and Vision Processing

    Sparse Estimation for Image and Vision Processing

    Sparse Estimation for Image and Vision Processing Julien Mairal Inria, Grenoble SPARS summer school, Lisbon, December 2017 Julien Mairal Sparse Estimation for Image and Vision Processing 1/187 Course material (freely available on arXiv) J. Mairal, F. Bach and J. Ponce. Sparse Modeling for Image and Vision Processing. Foundations and Trends in Computer Graphics and Vision. 2014. F. Bach, R. Jenatton, J. Mairal, and G. Obozinski. Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning, 4(1). 2012. Julien Mairal Sparse Estimation for Image and Vision Processing 2/187 Outline 1 A short introduction to parsimony 2 Discovering the structure of natural images 3 Sparse models for image processing 4 Optimization for sparse estimation 5 Application cases Julien Mairal Sparse Estimation for Image and Vision Processing 3/187 Part I: A Short Introduction to Parcimony Julien Mairal Sparse Estimation for Image and Vision Processing 4/187 1 A short introduction to parsimony Early thoughts Sparsity in the statistics literature from the 60’s and 70’s Wavelet thresholding in signal processing from 90’s The modern parsimony and the ℓ1-norm Structured sparsity Compressed sensing and sparse recovery 2 Discovering the structure of natural images 3 Sparse models for image processing 4 Optimization for sparse estimation 5 Application cases Julien Mairal Sparse Estimation for Image and Vision Processing 5/187 Early thoughts (a) Dorothy Wrinch (b) Harold Jeffreys 1894–1980 1891–1989 The existence of simple laws is, then, apparently, to be regarded as a quality of nature; and accordingly we may infer that it is justifiable to prefer a simple law to a more complex one that fits our observations slightly better.
  • Exact Sparse Approximation Problems Via Mixed-Integer Programming: Formulations and Computational Performance

    Exact Sparse Approximation Problems Via Mixed-Integer Programming: Formulations and Computational Performance

    Exact Sparse Approximation Problems via Mixed-Integer Programming: Formulations and Computational Performance Sébastien Bourguignon, Jordan Ninin, Hervé Carfantan, Marcel Mongeau To cite this version: Sébastien Bourguignon, Jordan Ninin, Hervé Carfantan, Marcel Mongeau. Exact Sparse Approxi- mation Problems via Mixed-Integer Programming: Formulations and Computational Performance. IEEE Transactions on Signal Processing, Institute of Electrical and Electronics Engineers, 2016, 64 (6), pp.1405-1419. 10.1109/TSP.2015.2496367. hal-01254856 HAL Id: hal-01254856 https://hal.archives-ouvertes.fr/hal-01254856 Submitted on 12 Jan 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. 1 Exact Sparse Approximation Problems via Mixed-Integer Programming: Formulations and Computational Performance Sebastien´ Bourguignon, Jordan Ninin, Herve´ Carfantan, and Marcel Mongeau, Member, IEEE Abstract—Sparse approximation addresses the problem of too difficult in practical large-scale instances. Indeed, the approximately fitting a linear model with a solution having as few Q brute-force approach that amounts to exploring all the K non-zero components as possible. While most sparse estimation possible combinations, is computationally prohibitive. In the algorithms rely on suboptimal formulations, this work studies the abundant literature on sparse approximation, much work has performance of exact optimization of `0-norm-based problems through Mixed-Integer Programs (MIPs).