Blind Demixing Via Wirtinger Flow with Random Initialization
Total Page:16
File Type:pdf, Size:1020Kb
Blind Demixing via Wirtinger Flow with Random Initialization Jialin Dong Yuanming Shi [email protected] [email protected] ShanghaiTech University ShanghaiTech University Abstract i = 1; ··· ; s, i.e., s This paper concerns the problem of demixing X \ \ a series of source signals from the sum of bi- yj = bj∗hixi∗aij; 1 ≤ j ≤ m; (1) linear measurements. This problem spans di- i=1 verse areas such as communication, imaging N K processing, machine learning, etc. However, where faijg 2 C , fbjg 2 C are design vectors. semidefinite programming for blind demixing Here, the first K columns of the matrix F form the ma- m K m m is prohibitive to large-scale problems due to trix B := [b1; ··· ; bm]∗ 2 C × , where F 2 C × high computational complexity and storage is the unitary discrete Fourier transform (DFT) ma- cost. Although several efficient algorithms \ trix with FF ∗ = Im. Our goal is to recover fxig and have been developed recently that enjoy the \ fhig from the sum of bilinear measurements, which is benefits of fast convergence rates and even known as blind demixing [1, 2]. regularization free, they still call for spec- tral initialization. To find simple initial- This problem has spanned a wide scope of applications ization approach that works equally well as ranging from imaging processing [3, 4] and machine spectral initialization, we propose to solve learning [5, 6] to communication [7, 8]. Specifically, blind demixing problem via Wirtinger flow by solving the blind demixing problem, both original with random initialization, which yields a images and corresponding convolutional kernels can be natural implementation. To reveal the ef- recovered from a blurred image [3]. This problem has ficiency of this algorithm, we provide the also been exploited to demix and deconvolve calcium global convergence guarantee concerning ran- imaging recordings of neuronal ensembles [4]. Blind domly initialized Wirtinger flow for blind demixing also finds applications in machine learning demixing. Specifically, it shows that with [5, 6] where feature maps convolve with filters individ- sufficient samples, the iterates of randomly ually and added overall users to approximate the input initialized Wirtinger flow can enter a local re- vector. In the context of communication, this problem gion that enjoys strong convexity and strong ensures to recover the source signal from distinguish- smoothness within a few iterations at the first ing users without knowing channel vectors, thereby en- stage. At the second stage, iterates of ran- abling low-latency communication [7]. domly initialized Wirtinger flow further con- Although blind demixing plays vital role in various verge linearly to the ground truth. areas, solving it is generally highly intractable. More- over, some applications of blind demixing problem in- volves large-scale data, hence efficient algorithms with 1 INTRODUCTION optimal guarantees are needed urgently to recover sig- nal vectors from a single observation vector. Instead of Suppose we are given an observation vector y 2 Cm in computationally expensive convex method [1], a non- frequency domain generated from the sum of bilinear convex algorithm, regularized gradient descent [9], has \ N \ K measurements of unknown vectors xi 2 C , hi 2 C , been recently proposed to efficiently solve blind demix- ing problem. This method, however, requires extra Proceedings of the 22nd International Conference on Ar- regularization and still yields conservative computa- tificial Intelligence and Statistics (AISTATS) 2019, Naha, tional optimality guarantees. To elude the regulariza- Okinawa, Japan. PMLR: Volume 89. Copyright 2019 by tion and develop an algorithm with progressive com- the author(s). putational guarantees, the least square estimation ap- Blind Demixing via Wirtinger Flow with Random Initialization proach was proposed in [2]: to design generic saddle-point escaping algorithms, e.g., noisy stochastic gradient descent [10], trust-region m s 2 X X method [11], perturbed gradient descent [12]. With P : minimize f(h; x) := bj∗hixi∗aij − yj ; (2) hi ; xi sufficient sample size, these algorithms are guaranteed f g f g j=1 i=1 to converge globally for phase retrieval [11], matrix re- which is solved via Wirtinger flow with spectral ini- covery [13], matrix sensing [13], robust PCA [14] and tialization by harnessing the benefits of low computa- shallow neural networks [15] where all local minima tional complexity, regularization-free, fast convergence are as good as global and all the saddle points are rate with aggressive step size and computational op- strict. However, the theoretical results developed in timality guarantees [2]. Nevertheless, this algorithm [10, 11, 12, 13, 14, 15] are fairly general and may yield with theoretical guarantees depends on the spectral pessimistic convergence rate guarantees. Moreover, initialization by computing the largest singular vector these saddle-point escaping algorithms are more com- of the data matrix [2]. To find a natural implementa- plicated for implementation than the natural vanilla tion for the practitioners, we propose to solve the blind gradient descent or Wirtinger flow. To advance the demixing problem via randomly initialized Wirtinger theoretical analysis for gradient descent with random flow. Our goal, in this paper, is to confirm the effi- initialization, the fast global convergence guarantee ciency of this algorithm with theoretical guarantees. concerning randomly initialized gradient descent for Specifically, we shall verify that the iterates of ran- phase retrieval has been recently provided in [16]. domly initialized Wirtinger flow are able to achieve a local region that enjoys strong convexity and strong 1.2 Wirtinger Flow with Random smoothness within a few iteration at the first stage. At Initialization the next stage, we further show that the iterates of the In this paper, we solve the blind demixing problem randomly initialized Wirtinger flow linearly converges via randomly initialized Wirtinger flow by harnessing to the ground truth. the benefits of computational efficiency, regularization- free and careful initialization free. In this paper, our 1.1 State-of-the-Art Algorithms main contribution is to provide the global convergence guarantee for the randomly initialized Wirtinger flow. Despite the general intractability of blind demixing, it can be effectively solved by several algorithms under Wirtinger flow with random initialization is an it- the proper statistical models. In particular, semidef- erative algorithm with vanilla gradient descent up- inite programming was developed in [1] to solve the date procedure, i.e., without regularization. Specif- blind demixing problem by lifting the bilinear model ically, the gradient step of Wirtinger flow is repre- into the matrix space. However, it is computation- sented by the notion of Wirtinger derivatives [17], i.e., ally prohibitive for solving large-scale problem due to the derivatives of real-valued functions over complex the high computation and storage cost. To address variables. To simplify the presentation, we denote this issue, the nonconvex algorithm, e.g., regularized f(z) := f(h; x), where gradient descent with spectral initialization [9], was 2z 3 further developed to optimize the variables in the nat- 1 h z = ··· 2 s(N+K) with z = i 2 N+K : (3) ural space. Nevertheless, the theoretical guarantees for 4 5 C i x C z i the regularized gradient [9] provide pessimistic conver- s gence rate and require carefully-designed initialization. Furthermore, we define the discrepancy between the The Riemannian trust-region optimization algorithm estimate z and the ground truth z\ as the distance without regularization was further proposed in [7] to function, given as improve the convergence rate. However, the second- s !1=2 order algorithm brings unique challenges in provid- \ X 2 \ ing statistical guarantees. Recently, theoretical guar- dist(z; z ) = dist (zi; zi ) ; (4) antees concerning regularization-freed Wirtinger flow i=1 with spectral initialization for blind demixing was pro- where vided in [2]. However, this regularization-freed method 2 \ 1 \ 2 \ 2 still calls for spectral initialization. In this paper, we dist (zi; zi ) = min(k hi − hik2 + kαixi − xik2)=di αi C αi aim to explore the random initialization strategy for 2 \ 2 \ 2 natural implementation and theoretically verify the ef- for i = 1; ··· ; s. Here, di = khik2 + kxik2 and each αi ficiency of the randomly initialized Wirtinger flow. is the alignment parameter. Based on the random initialization strategy, a line of Let A∗ and a∗ denote the conjugate transpose of ma- research studies the benign global landscapes and aims trix A vector a respectively. For each i = 1; ··· ; s, Running heading author breaks the line 0 K = 10 10 κ = 1 10 0 K = 20 κ = 2 10 1 − K = 40 1 κ = 3 10 − K = 80 t i K = 100 h 2 1 3 10 − β 10 10 − − α t 3 h1 10 − and α t h2 α t 5 t i h3 10 − h Relative error Relative error 4 2 αht 10 − α 10 − 4 β t h1 β t 5 h2 Stage I Stage II 10 − 7 β t 10 − h3 βht 10 6 10 3 4 0 100 200 300 400 500 − 0 50 100 150 200 250 300 − 20 40 60 80 Iteration count Iteration count Iteration count (a) (b) (c) 6 6 0 10 10 10 α t /β t α t /β t h1 h1 x1 x1 5 α t /β t 5 α t /β t 10 h2 h2 10 x2 x2 α t /β t α t /β t h3 h3 x3 x3 1 4 4 α t /β t α t /β t t i 10 − 10 h4 h4 10 x4 x4 x β t i 3 t i 3 x h 10 10 α t /β 2 x1 /β t i t i and 10 − α t 2 2 x x2 h 10 10 α α t α t i x3 x αxt 1 1 α 4 10 10 3 βxt 10 − 1 βxt 0 0 2 10 10 β t x3 β t 1 1 x4 10 10 10 4 − − − 20 40 60 80 50 100 150 200 50 100 150 200 Iteration count Iteration count Iteration count (d) (e) (f) Figure 1: Numerical results.