1 STFT Phase Retrieval: Uniqueness Guarantees and Recovery Algorithms

Kishore Jaganathan† Yonina C. Eldar‡ Babak Hassibi† †Department of Electrical Engineering, Caltech ‡Department of Electrical Engineering, Technion, Israel Institute of Technology

Abstract—The problem of recovering a signal from its Fourier We consider STFT phase retrieval, which is the problem magnitude is of paramount importance in various fields of engi- of reconstructing a signal from its STFT magnitude. In some neering and applied physics. Due to the absence of Fourier phase applications of phase retrieval, it is easy to obtain such information, some form of additional information is required in order to be able to uniquely, efficiently and robustly identify measurements. One example is Frequency Resolved Optical the underlying signal. Inspired by practical methods in optical Gating (FROG), which is a general method for measuring imaging, we consider the problem of signal reconstruction from ultrashort laser pulses [31]. Fourier ptychography [32]–[34], the Short-Time Fourier Transform (STFT) magnitude. We first a technology which has enabled X-ray, optical and electron develop conditions under which the STFT magnitude is an microscopy with increased spatial resolution without the need almost surely unique signal representation. We then consider a semidefinite relaxation-based algorithm (STliFT) and provide for advanced lenses, is another popular example. In applica- recovery guarantees. Numerical simulations complement our tions such as speech processing, it is natural to work with theoretical analysis and provide directions for future work. the STFT instead of the Fourier transform as the spectral Index Terms—Short-Time Fourier Transform (STFT), Phase content of speech changes over time [35]. The key idea, when Retrieval, Super-Resolution, Semidefinite Relaxation. using STFT measurements, is to introduce redundancy in the magnitude-only measurements by maintaining a substantial overlap between adjacent short-time sections. This mitigates I.INTRODUCTION the uniqueness and algorithmic issues of phase retrieval. In many physical measurement systems, the measurable In this work, our contribution is two-fold: quantity is the magnitude-square of the Fourier transform of (i) Uniqueness guarantees: Researchers have previously de- the underlying signal. The problem of reconstructing a signal veloped conditions under which the STFT magnitude uniquely from its Fourier magnitude is known as phase retrieval [1], identifies signals (up to a global phase). However, either prior [2]. This reconstruction problem is one with a rich history information on the signal is assumed in order to provide and occurs in many areas of engineering and applied physics the guarantees, or the guarantees are limited. For instance, such as optics [3], X-ray crystallography [4], astronomical the results provided in [37] require exact knowledge of a imaging [5], speech recognition [6], computational biology [7], small portion of the underlying signal. In [39], the guaran- blind channel estimation [8] and more. We refer the readers to tees developed are for the setup in which adjacent short- [9]–[11] for a comprehensive survey of classical approaches. time sections differ in only one index. These limitations are Recent reviews can be found in [12], [13]. primarily due to a small number of adversarial signals which It is well known that phase retrieval is an ill-posed problem cannot be uniquely identified from their STFT magnitude. [14]. In order to be able to uniquely identify the underlying arXiv:1508.02820v3 [cs.IT] 1 Apr 2016 Here, in contrast, we develop conditions under which the STFT signal, various methods have been explored, which can be magnitude is an almost surely unique signal representation. broadly classified into two categories: (i) Additional prior in- In particular, we show that, with the exception of a set of formation: common approaches include bounds on the support signals of measure zero, non-vanishing signals can be uniquely of the signal [9]–[11] and sparsity constraints [17]–[24]. (ii) identified (up to a global phase) from their STFT magnitude if Additional magnitude-only measurements: popular examples adjacent short-time sections overlap (Theorem III.1). We then include the use of structured illuminations and masks [24]– extend this result to incorporate sparse signals which have a [30], and Short-Time Fourier Transform (STFT) magnitude limited number of consecutive zeros (Corollary III.1). measurements [31]–[39]. (ii) Recovery algorithms: Researchers have previously de- Copyright (c) 2014 IEEE. Personal use of this material is permitted. veloped efficient iterative algorithms based on classic op- However, permission to use this material for any other purposes must be timization frameworks to solve the STFT phase retrieval obtained from the IEEE by sending a request to [email protected] problem. Examples include the Griffin-Lim (GL) algorithm K. Jaganathan and B. Hassibi were supported in part by the National Science Foundation under grants CCF-0729203, CNS-0932428 and CIF- [38] and STFT-GESPAR for sparse signals [39]. While these 1018927, by the Office of Naval Research under the MURI grant N00014- techniques work well in practice, they do not have theoretical 08-1-0747, and by a grant from Qualcomm Inc. guarantees. In [36] and [43], a semidefinite relaxation-based Y. C. Eldar was supported in part by the European Union’s Horizon 2020 Research and Innovation Program through the ERC-BNYQ Project, and in STFT phase retrieval algorithm, called STliFT (see Algorithm part by the Israel Science Foundation under Grant 335/14. 1 below), was proposed. In this work, we conduct extensive 2

x[0] x[1] x[2] x[3] x[4] x[5] x[6] x[0] x[1] x[2] x[3] x[4] x[5] x[6]

r =0 x w 0 w[4] w[3] w[2] w[1] w[0]

r =1 x w 1 w[4] w[3] w[2] w[1] w[0]

r =2 x w 2 w[4] w[3] w[2] w[1] w[0]

Fig. 1: Sliding window interpretation of the STFT for N = 7, W = 5 and L = 4. The shifted window overlaps with the signal for 3 shifts, and hence R = 3 short-time sections are considered.

numerical simulations and provide theoretical guarantees for The STFT can be interpreted as follows: Suppose wr STliFT. In particular, we conjecture that STliFT can recover denotes the signal obtained by shifting the flipped window most non-vanishing signals (up to a global phase) from their w by rL time units (i.e., wr[n] = w[rL n]) and is the STFT magnitude if adjacent short-time sections differ in at Hadamard (element-wise) product operator.− The rth◦ column most half the indices (Conjecture IV.1). When this condition of Yw, for 0 r R 1, corresponds to the N point DFT ≤ ≤ − is satisfied, we argue that one can super-resolve (i.e., discard of x wr. In essence, the window is flipped and slid across ◦ high frequency measurements) and reduce the number of the signal (see Figure 1 for a pictorial representation), and Yw measurements to (4 + o(1))N, where N is the length of corresponds to the Fourier transform of the windowed signal the complex signal. Therefore, STliFT recovers most non- recorded at regular intervals. This interpretation is known as vanishing signals uniquely, efficiently and robustly, using an the sliding window interpretation. order-wise optimal number of phaseless measurements. Let Zw be the N R measurements corresponding to × We prove this conjecture for the setup in which the exact the magnitude-square of the STFT of x with respect to w 2 knowledge of a small portion of the underlying signal is so that Zw[m, r] = Yw[m, r] . Let Wr, for 0 r available (Theorem IV.1). For particular choices of STFT R 1, be the N N diagonal matrix with diagonal elements≤ ≤ − × parameters, this portion vanishes asymptotically, due to which (wr[0], wr[1], . . . , wr[N 1]). STFT phase retrieval can be this setup is asymptotically reasonable. We also prove this mathematically stated as:− conjecture for the case in which adjacent short-time sections find x (2) differ in only one index (Theorem IV.2). We then extend 2 these results to incorporate sparse signals which have a limited subject to Zw[m, r] = fm, Wrx h i number of consecutive zeros (Corollary IV.1). for 0 m N 1 and 0 r R 1, where fm is The rest of the paper is organized as follows. In Section 2, the conjugate≤ ≤ of the−mth column≤ of the≤ N −point DFT matrix we mathematically formulate STFT phase retrieval and estab- and ., . is the inner product operator. In fact, STFT phase lish our notation. We present uniqueness guarantees in Section retrievalh i can be equivalently stated by only considering the 3. Section 4 considers the STliFT algorithm and provides measurements corresponding to 0 r R 1 and 1 recovery guarantees. Numerical simulations are presented in m M, for any parameter M satisfying≤ ≤ 2W− M N≤ Section 5. Section 6 concludes the paper. (see≤ Section VII for details). This equivalence≤ significantly≤ reduces the number of measurements when W N, which is  II.PROBLEM SETUP typically the case in practical methods. In Section 4, we further reduce the number of measurements per short-time section Let x = (x[0], x[1], . . . , x[N 1])T be a signal of length N through super-resolution. In particular, we consider the setup and w = (w[0], w[1], . . . , w[W − 1])T be a window of length − with 2L W N and 4L M N. W . The STFT of x with respect to w, denoted by Y , is 2 w We use≤ the≤ following definitions:≤ ≤ A signal is said to be defined as: x non-vanishing if x[n] = 0 for all 0 n N 1. Similarly, a N−1 6 ≤ ≤ − X −i2π mn window w is said to be non-vanishing if w[n] = 0 for all 0 Yw[m, r] = x[n]w[rL n]e N (1) 6 ≤ − n W 1. Further, a signal x is said to be sparse if it is not n=0 non-vanishing,≤ − i.e., x[n] = 0 for at least one 0 n N 1. for 0 m N 1 and 0 r R 1, where the ≤ ≤ − parameter≤ L denotes≤ − the separation≤ in time≤ between− adjacent III.UNIQUENESS GUARANTEES  N+W −1  short-time sections and the parameter R = L denotes In this section, we review existing results regarding the the number of short-time sections considered. uniqueness of STFT phase retrieval and present our uniqueness 3

N Uniqueness if the first L samples are known a priori, 2L ≤ W ≤ 2 and w is non-vanishing [37] Non-vanishing signals Uniqueness up to a global phase if L = 1, 2 ≤ W ≤ N+1 , W − 1 {x[n] 6= 0 for all 0 ≤ n ≤ N − 1} 2 coprime with N and mild conditions on w [39] N Uniqueness up to a global phase for almost all signals if L < W ≤ 2 and w is non-vanishing [This work] Uniqueness for signals with at most W − 2L consecutive zeros if the first L samples, starting from the first non-zero sample, are known a priori, N 2L ≤ W ≤ 2 and w is non-vanishing [37] Sparse signals No uniqueness for most signals with W consecutive zeros [39] {x[n] = 0 for at least one 0 ≤ n ≤ N − 1} Uniqueness up to a global phase and time-shift for almost all signals with N less than min{W − L, L} consecutive zeros if L < W ≤ 2 and w is non-vanishing [This work] TABLE I: Uniqueness results for STFT phase retrieval (i.e., 2W M N). ≤ ≤ guarantees. The results are summarized in Table I. magnitude if w, L, M satisfy { } In STFT phase retrieval, the global phase of the signal (i) w is non-vanishing cannot be determined due to the fact that signals x and eiφx, N (ii) L < W 2 for any φ, always have the same STFT magnitude regardless (iii) 2W M≤ N. of the choice of w,L . In contrast, in classic phase retrieval, ≤ ≤ signals which differ{ from} each other by a global phase, time- Proof: The proof is based on a technique commonly shift and/ or conjugate-flip (together called trivial ambiguities) known as dimension counting. The outline is as follows (see cannot be distinguished from each other as they have the same Section VIII for details): Fourier magnitude [12], [17], [21]. Consider the short-time sections r and r +1. Since adjacent Observe that L < W is a necessary condition in order to short-time sections overlap (due to L < W ), there exists at least one index, say n , where both x wr and x wr have be able to uniquely identify most signals. If L > W , then 0 ◦ ◦ +1 the STFT magnitude does not contain any information from non-zero values. Since W N , there can be at most 2W distinct windowed some locations of the signal. When L = W , adjacent short- ≤ 2 signals x wr (up to a phase) that have the same Fourier time sections do not overlap and hence STFT phase retrieval is ◦ W equivalent to a series of non-overlapping instances of classic magnitude [14]. Consequently, x[n0] is restricted to 2 values by the rth column of the STFT magnitude (let r denote phase retrieval. Since there is no way of determining the rela- S W tive phase, time-shift or conjugate-flip between the windowed the set of these values). Similarly, x[n0] is restricted to 2 signals corresponding to the various short-time sections, most values by the r + 1th column of the STFT magnitude (denote the set of these values by r ). signals cannot be uniquely identified. For example, suppose S +1 By construction, r r = φ as the STFT magnitude w,L is chosen such that L = W = 2 and w[n] = 1 for S ∩ S +1 6 { } T is generated by an underlying signal x0, i.e., x0[n0] r all 0 n W 1. Consider the signal x1 = (1, 2, 3) of ∈ S ∩ ≤ ≤ − T r+1. Using Lemma VIII.1 and Theorem VIII.1, we show that, length N = 3. Signals x1 and x2 = (1, 2, 3) have the S − − for almost all non-vanishing signals, r r+1 has cardinality same STFT magnitude. In fact, more generally, signals x1 and S ∩S (1, eiφ2, eiφ3)T , for any φ, have the same STFT magnitude. one. In other words, x wr is uniquely identified (up to a phase) almost surely. ◦ Since adjacent short-time sections overlap, non-vanishing A. Non-vanishing signals signals are uniquely identified up to a global phase from the For some specific choices of w,L , it has been shown that knowledge of x wr (up to a phase) for 0 r R 1 if w all non-vanishing signals can be{ uniquely} identified from their is non-vanishing.◦ ≤ ≤ − STFT magnitude up to a global phase. In [39], it is proven that the STFT magnitude uniquely identifies non-vanishing signals B. Sparse signals up to a global phase for L = 1 if the window w is chosen such While the aforementioned results provide guarantees for that the N point DFT of ( w[0] 2, w[1] 2,..., w[N 1] 2) is non-vanishing signals, they do not say anything about sparse non-vanishing, 2 W N| +1 and| | W | 1 is coprime| − with| N. 2 signals. Reconstruction of sparse signals involves certain chal- In [37], the authors≤ prove≤ that if the first−L samples are known lenges which are not encountered in the reconstruction of non- a priori, then the STFT magnitude can uniquely identify non- vanishing signals. vanishing signals for any L if the window w is chosen such The following example is provided in [39] to show that the that it is non-vanishing and 2L W N . 2 time-shift ambiguity cannot be resolved for some classes of In this work, we prove the≤ following≤ result for non- sparse signals and some choices of w,L : Suppose w,L is vanishing signals: chosen such that L 2, W is a multiple{ of} L and w[n{] = 1}for ≥ Theorem III.1. Almost all non-vanishing signals can be all 0 n W 1. Consider a signal x1 of length N L+1 uniquely identified (up to a global phase) from their STFT such≤ that it≤ has− non-zero values only within an interval≥ of the 4 form [(t 1)L + 1, (t 1)L + L p] [0,N 1] for some using projections. The objective is shown to be monotonically integers 1− p L 1−and t 1.− The⊂ signal x− obtained by decreasing as the iterations progress. An important feature of ≤ ≤ − ≥ 2 time-shifting x1 by q p units (i.e., x2[i] = x1[i q]) has the GL algorithm is its empirical ability to converge to the the same STFT magnitude.≤ The issue with this class− of sparse global minimum when there is substantial overlap between signals is that the STFT magnitude is identical to the Fourier adjacent short-time sections. However, no theoretical recovery magnitude because of which the time-shift and conjugate-flip guarantees are available. To establish such guarantees, we rely ambiguities cannot be resolved. on a semidefinite relaxation approach. It is also shown that some sparse signals cannot be uniquely recovered even up to the trivial ambiguities for some choices A. Semidefinite relaxation-based algorithm of w,L using the following example: Consider two non- Semidefinite relaxation has enjoyed considerable success overlapping{ } intervals [u , v ], [u , v ] [0,N 1] such that 1 1 2 2 in provably and stably solving several quadratic-constrained u v > W , and take a signal x ⊂supported− on [u , v ] 2 1 1 1 1 problems [40]–[42]. The steps to formulate such problems and−x supported on [u , v ]. The magnitude-square of the 2 2 2 as a semidefinite program (SDP) are as follows: (i) Embed STFT of x + x and of x x are equal for any choice 1 2 1 2 the problem in a higher dimensional space using the transfor- of L. The difficulty with this− class of sparse signals is that mation ?, a process which converts the problem of the two intervals with non-zero values are separated by a X = xx recovering a signal with quadratic constraints into a problem distance greater than W because of which there is no way of recovering a rank-one matrix with affine constraints. (ii) of establishing relative phase using a window of length W . Relax the rank-one constraint to obtain a convex program. These examples demonstrate the fact that sparse signals If the convex program has a unique solution ?, are harder to recover than non-vanishing signals in this X0 = x0x0 then is the unique solution to the quadratic-constrained setup. Since the aforementioned issues are primarily due to a x0 problem (up to a global phase). Many recent results in related large number of consecutive zeros, the uniqueness guarantees problems like generalized phase retrieval [42] and phase for non-vanishing signals have been extended to incorporate retrieval using random masks [28], [29] suggest that one sparse signals with limits on the number of consecutive zeros. can provide conditions, which when satisfied, ensure that the In [37], it was shown that if L consecutive samples, starting convex program has a unique solution ?. from the first non-zero sample, are known a priori, then the X0 = x0x0 A semidefinite relaxation-based STFT phase retrieval al- STFT magnitude can uniquely identify signals with less than gorithm, called STliFT, was explored in [36] and [43]. The W 2L consecutive zeros for any L if the window w is chosen details of the algorithm are provided in Algorithm 1. In such− that it is non-vanishing and 2L W N . 2 the following, we develop conditions on x , w,L which Below, we extend Theorem III.1≤ to prove≤ the following 0 ensure that the convex program (4) has X{ = x x}? as the result for sparse signals: 0 0 0 unique solution. Consequently, under these conditions, STliFT Corollary III.1. Almost all sparse signals with less than uniquely recovers the underlying signal up to a global phase. min W L, L consecutive zeros can be uniquely identified (up to{ a− global} phase and time-shift) from their STFT magni- Algorithm 1 STliFT tude if w, L, M satisfy Input: STFT magnitude measurements for { } Zw[m, r] 1 m (i) w is non-vanishing M and 0 r R 1, w,L . ≤ ≤ (ii) L < W N Output: Estimate≤ ≤ xˆ−of the{ underlying} signal x . ≤ 2 0 (iii) 2W M N. • Obtain ˆ by solving: ≤ ≤ X Proof: The min W L, L bound on consecutive zeros minimize trace(X) (4) ensures the following:{ For− sufficient} pairs of adjacent short- subject to Z [m, r] = trace(W?f f ? W X) time sections, there is at least one index among the overlapping w r m m r and non-overlapping indices respectively, where the underly- X < 0 ing signal has a non-zero value. We refer the readers to Section for 1 m M and 0 r R 1. IX for details. ≤ ≤ ? ≤ ≤ − • Return xˆ, where xˆxˆ is the best rank-one approximation of Xˆ . IV. RECOVERY ALGORITHMS The classic alternating projection algorithm to solve phase Based on extensive numerical simulations, we conjecture retrieval [9] has been adapted to solve STFT phase retrieval the following: by Griffin and Lim [38]. To this end, STFT phase retrieval is Conjecture IV.1. The convex program (4) has a unique reformulated as the following least-squares problem: ? solution X0 = x0x0, for most non-vanishing signals x0, if R−1 N−1 X X  22 (i) w is non-vanishing min Zw[m, r] fm, Wrx . (3) N x − h i (ii) 2L W r=0 m=0 2 (iii) 4L ≤ M ≤ N. The Griffin-Lim (GL) algorithm attempts to minimize this ≤ ≤ objective by starting with a random initialization and imposing The number of phaseless measurements considered can be the time domain and STFT magnitude constraints alternately calculated as follows: The total number of short-time sections 5

N+W −1 is L . For each short-time section, M = 4L phaseless when W = 2, at most 4N + 8 phaseless measurements measurementsd e are sufficient. Hence, the total number of phase- are considered. Unlike Theorem IV.1, no prior information is less measurements is N+W −1 4L 4 (N + W ) + 2W . necessary. d L e × ≤ Consequently, when W = o(N), this number is (4 + o(1))N, Theorems IV.1 and IV.2 can be seamlessly extended to which is order-wise optimal. In fact, in generalized phase incorporate sparse signals: retrieval, it is conjectured that (4 o(1))N phaseless mea- Corollary IV.1. The convex program (4) has a unique feasible surements are necessary [45]. − matrix ?, for almost all sparse signals which The proof techniques used in [28] and [42] are not applica- X0 = x0x0 x0 have at most W 2L consecutive zeros, if ble in the STFT setup. In [28] and [42], the measurement − vectors are chosen from a random distribution such that (i) w is non-vanishing they satisfy the restricted isometry property. Furthermore, the (ii) 2L W N ≤ ≤ 2 randomness in the measurement vectors is used to construct (iii) 4L M N ≤ ≤ approximate dual certificates based on concentration inequali- (iv) Either L = 1 or x [n] for i n < i + L is known a 0 0 ≤ 0 ties. In the STFT setup, testing whether the given measurement priori, where i0 is the smallest index such that x0[i0] = 0. vectors satisfy the restricted isometry property is difficult. 6 Also, due to the lack of randomness in the measurement Proof: See Section X. vectors, a different approach is required to construct dual certificates. B. Noisy Setting In the following, we develop a proof technique for the STFT setup, and use it to prove Conjecture IV.1, with additional In practice, the measurements are contaminated by additive assumptions. noise, i.e., the measurements are of the form 2 Theorem IV.1. The convex program (4) has a unique feasible Zw[m, r] = fm, Wrx + z[m, r] ? h i matrix X0 = x0x0, for almost all non-vanishing signals x0, if for 1 m M and 0 r R 1, where zr = (z[0, r],≤ z[1, r],≤ . . . , z[M 1,≤ r])T is≤ the− additive noise cor- (i) w is non-vanishing − (ii) N responding to the rth short-time section and 4L M N. 2L W 2 ≤ ≤ (iii) 4L ≤ M ≤ N STliFT, in the noisy setting, can be implemented as follows: ≤ ≤  L  Suppose for all . The constraints in (iv) x0[n] for 0 n is known a priori. zr 2 η 0 r R 1 ≤ ≤ 2 the convexk programk ≤ (4) can be≤ replaced≤ − by Proof: See Section X. M 2 While it is sufficient to show that (4) has a unique solution X  ? ?  2 ? Zw[m, r] trace(Wr fmfmWrX) η (5) X0 = x0x0, observe that Theorem IV.1 ensures that (4) has − ≤ a unique feasible matrix. This is a stronger condition, and m=1 as a consequence, the choice of the objective function does for 0 r R 1. We recommend the use of trace not matter in the noiseless setting. While this might suggest minimization≤ as≤ the objective− function. Numerical simulations that the requirements of the setup are strong, we argue that strongly suggest that STliFT can recover most non-vanishing it is not the case. In fact, this phenomenon is also observed signals stably in the noisy setting under certain conditions. in generalized phase retrieval (Section 1.3 in [42]) and phase The details of the simulations are provided in the following retrieval using random masks (Theorem 1.1 in [28]). section. L Theorem IV.1 assumes prior knowledge of the first 2 samples, i.e., half of the second short-time section is requiredd e V. NUMERICAL SIMULATIONS to be known a priori. This is not a lot of prior information if W N, which is typically the case. When W = o(N), the In this section, we demonstrate the empirical abilities of fraction of the signal that is required to be known a priori is STliFT using numerical simulations. W less than N , which tends to 0 as N . In the first set of simulations, we evaluate the performance → ∞ of STliFT as a function of window and shift lengths. We Theorem IV.2. The convex program (4) has a unique feasible choose N = 32, and vary L, W . For each choice of L, W , matrix X = x x?, for almost all non-vanishing signals x , 0 0 0 0 we consider M = 4L phaseless{ } measurements and{ perform} if 100 trials. In every trial, we choose a random signal such that (i) w is non-vanishing the values in each location are drawn from an i.i.d. standard (ii) 2 W N 2 complex normal distribution. We select the window w such (iii) 4 ≤ M ≤ N that w[n] = 1 for all 0 n W 1. The probability of (iv) L≤= 1. ≤ successful recovery as a function≤ ≤ of −L, W is plotted in Fig. { } Proof: This is a direct consequence of Theorem IV.1. The 2a. value of x0[0] (and hence x0[0], without loss of generality) Observe that STliFT successfully recovers the underlying N can be inferred from the STFT magnitude if L = 1. signal with very high probability when 2L W 2 and When L = 1, the number of phaseless measurements is fails with very high probability when 2L >≤ W . The≤ choice 4(N + W ), which is again order-wise optimal. For example, of L, W = N , N uses only six short-time sections and { } { 4 2 } 6

1 1

2 2

3 3

4 4

5 5 L (shift length) L (shift length)

6 6

7 7

8 8

2 4 6 8 10 12 14 16 4 8 12 16 20 24 28 32 W (window length) M (measurements per short-time section)

(a) N = 32, M = 4L, and various choices of {L, W }. (b) N = 32, W = 16, and various choices of {L, M} (demonstrates super-resolution).

Fig. 2: Probability of successful recovery using STliFT in the noiseless setting (white region: success with probability 1, black region: success with probability 0).

0 The normalized mean-squared error, given by L=4,W=16 L=8,W=16 L=2,W=8 2 L=4,W=8 x0 cxˆ -10 NMSE = min || − || , (6) |c|=1 x 2 || 0|| -20 is plotted as a function of SNR in Fig. 3. The linear re- lationship between them shows that STliFT stably recovers -30 the underlying signal in the presence of noise. Also, it can NMSE (dB) be observed that the choices of W, L which correspond to { } -40 significant overlap between adjacent short-time sections tend to recover signals more stably compared to values of W, L { } -50 which correspond to less overlap, which is not surprising.

-60 5 10 15 20 25 30 35 40 SNR (dB) VI.CONCLUSIONSAND FUTURE DIRECTIONS In this work, we considered the STFT phase retrieval Fig. 3: NMSE (dB) vs SNR (dB) using STliFT in the noisy N problem. We showed that, if L < W 2 , then almost setting for N = 32, M = 2W . all non-vanishing signals can be uniquely≤ identified from their STFT magnitude (up to a global phase), and extended this result to incorporate sparse signals which have less than min W L, L consecutive zeros. { − } N For 2L W 2 , we conjectured that most non- STliFT recovers the underlying signal with very high probabil- vanishing signals≤ can≤ be recovered (up to a global phase) ity, which, given the limited success of semidefinite relaxation- by a semidefinite relaxation-based algorithm (STliFT). When based algorithms in the Fourier phase retrieval setup, is very W = o(N), through super-resolution, we reduced the number encouraging. of phaseless measurements to (4 + o(1))N. We proved this conjecture for the setup in which the first  L  samples In the second set of simulations, we evaluate the perfor- 2 + 1 mance of STliFT as a function of shift length and measure- are known, and for the case in which L = 1. We argued that ments per short-time section. We choose N = 32 and W = 16, the additional assumptions are asymptotically reasonable when W N, which is typically the case in practical methods. We and vary L, M . For each choice of L, M , we perform  100 trials{ as before.} The probability of{ successful} recovery then extended these results to incorporate sparse signals which have at most W 2L consecutive zeros. as a function of L, M is plotted in Fig. 2b. Observe that − recovery is successful{ even} in the 4L M < 2W regime. Natural directions for future study include a proof of this ≤ conjecture without the additional assumptions, and a stability In the third set of simulations, we evaluate the performance analysis in the noisy setting. Also, a thorough analysis of the of STliFT in the noisy setting. We choose M = 2W , the rest phase transition at 2L = W will provide a more complete of the parameters are the same as the first set of simulations. characterization of STliFT. 7

Acknowledgements: We would like to thank Mordechai [21] K. Jaganathan, S. Oymak and B. Hassibi, “Recovery of sparse 1- Segev and Oren Cohen for introducing us to the STFT phase D signals from the magnitudes of their Fourier transform,” IEEE retrieval problem, and for many insightful discussions. International Symposium on Information Theory Proceedings (2012): 1473-1477. [22] K. Jaganathan, S. Oymak and B. Hassibi, “Sparse phase retrieval: Uniqueness guarantees and recovery algorithms,” REFERENCES arXiv:1311.2745 (2015). [23] Y. Shechtman, A. Beck and Y. C. Eldar, “GESPAR: Efficient [1] A. L. Patterson, “A Fourier series method for the determination phase retrieval of sparse signals,” IEEE Transactions on Signal of the components of interatomic distances in crystals,” Physical Processing 62, no. 4 (2014): 928-938. Review 46, no. 5 (1934): 372. [24] E. J. Candes, Y. C. Eldar, T. Strohmer and V. Voroninski, “Phase [2] A. L. Patterson, “Ambiguities in the X-ray analysis of crystal retrieval via matrix completion,” SIAM Journal on Imaging structures,” Physical Review 65, no. 5-6 (1944): 195. Sciences 6.1 (2013): 199-225. [3] A. Walther, “The question of phase retrieval in optics,” Journal [25] Y. J. Liu et al. “Phase retrieval in X-ray imaging based on using of Modern Optics 10, no. 1 (1963): 41-49. structured illumination,” Physical Review A 78, no. 2 (2008): [4] R. P. Millane, “Phase retrieval in crystallography and optics,” 023817. JOSA A 7, no. 3 (1990): 394-411. [26] I. Johnson, K. Jefimovs, O. Bunk, C. David, M. Dierolf, J. Gray, [5] J. C. Dainty and J. R. Fienup, “Phase retrieval and image D. Renker and F. Pfeiffer, “Coherent diffractive imaging using reconstruction for astronomy,” Image Recovery: Theory and phase front modifications,” Physical review letters 100, no. 15 Application (1987): 231-275. (2008): 155503. [6] L. Rabiner and B. H. Juang, “Fundamentals of speech recogni- [27] E. G. Loewen and E. Popov, “Diffraction gratings and applica- tion,” Prentice Hall (1993). tions,” CRC Press (1997). [7] M. Stefik, “Inferring DNA structures from segmentation data,” [28] E. J. Candes, X. Li, and M. Soltanolkotabi, “Phase retrieval Artificial Intelligence 11, no. 1 (1978): 85-114. from coded diffraction patterns,” Applied and Computational [8] B. Baykal, “Blind channel estimation via combining autocorrela- Harmonic Analysis (2014). tion and blind phase estimation,” IEEE Transactions on Circuits [29] D. Gross, F. Krahmer, and R. Kueng, “Improved recovery and Systems 51, no. 6 (2004): 1125-1131. guarantees for phase retrieval from coded diffraction patterns”, [9] R. W. Gerchberg and W. O. Saxton, “A practical algorithm for arXiv:1402.6286 (2014). the determination of the phase from image and diffraction plane [30] K. Jaganathan, Y. C. Eldar and B. Hassibi, “Phase retrieval pictures,” Optik 35 (1972): 237. with masks using convex optimization,” IEEE International [10] J. R. Fienup, “Phase retrieval algorithms: A comparison,” Ap- Symposium on Information Theory Proceedings (2015): 1655- plied Optics 21, no. 15 (1982): 2758-2769. 1659 . [11] H. H. Bauschke, P. L. Combettes and D. R. Luke, “Phase [31] R. Trebino, “Frequency-resolved optical gating: The measure- retrieval, error reduction algorithm, and Fienup variants: A view ment of ultrashort laser pulses,” Springer (2002). from convex optimization,” JOSA A 19, no. 7 (2002): 1334- [32] M. J. Humphry, B. Kraus, A. C. Hurst, A. M. Maiden, J. 1345. M. Rodenburg, “Ptychographic electron microscopy using high- [12] Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. angle dark-field scattering for sub-nanometre resolution imag- Miao and M. Segev, “Phase retrieval with application to optical ing,” Nature Communications 3 (2012). imaging,” IEEE Magazine 32, no. 3 (2015): [33] J. M. Rodenburg, “Ptychography and related diffractive imag- 87-109. ing methods,” Advances in Imaging and Electron Physics 150 [13] K. Jaganathan, Y. C. Eldar and B. Hassibi, “Phase retrieval: An (2008): 87-184. overview of recent developments,” arXiv:1510.07713 (2015). [34] G. Zheng, R. Horstmeyer and C. Yang, “Wide-field, high- [14] E. M. Hofstetter, “Construction of time-limited functions with resolution Fourier ptychographic microscopy,” Nature photonics specified autocorrelation functions,” IEEE Transactions on In- 7, no. 9 (2013): 739-745. formation Theory 10, no. 2 (1964): 119-126. [35] J. S. Lim and A. V. Oppenheim, “Enhancement and bandwidth [15] M. H. Hayes and J. H. McClellan, “Reducible polynomials in compression of noisy speech,” Proceedings of the IEEE 67.12 more than one variable,” Proceedings of the IEEE 70, no. 2 (1979): 1586-1604. (1982): 197-198. [36] K. Jaganathan, Y. C. Eldar and B. Hassibi, “Recovering signals [16] J. M. Ortega and W. C. Rheinboldt, “Iterative solution of from the short-time Fourier transform magnitude,” IEEE Inter- nonlinear equations in several variables,” Vol. 30. Siam (1970). national Conference on Acoustics, Speech and Signal Processing [17] Y. M. Lu and M. Vetterli, “Sparse spectral factorization: Unicity (2015). and reconstruction algorithms,” IEEE International Conference [37] S. H. Nawab, T. F. Quatieri, and J. S. Lim, “Signal recon- on Acoustics, Speech and Signal Processing (2011): 5976-5979. struction from short-time Fourier transform magnitude,” IEEE [18] A. Szameit, Y. Shechtman, E. Osherovich, E. Bullkich, P. Transactions on Acoustics, Speech and Signal Processing 31, Sidorenko, H. Dana, S. Steiner, E. B. Kley, S. Gazit, T. Cohen- no. 4 (1983): 986-998. Hyams, S. Shoham, M. Zibulevsky, I. Yavneh, Y. C. Eldar, O. [38] D. Griffin and J. S. Lim, “Signal estimation from modified Cohen and M. Segev, “Sparsity-based single-shot subwavelength short-time Fourier transform,” IEEE Transactions on Acoustics, coherent diffractive imaging,” Nature Materials [Online], Sup- Speech and Signal Processing 32, no. 2 (1984): 236-243. plementary Info (2012). [39] Y. C. Eldar, P. Sidorenko, D. G. Mixon, S. Barel and O. Cohen, [19] S. Mukherjee and C. Seelamantula, “An iterative algorithm “Sparse phase retrieval from short-time Fourier measurements,” for phase retrieval with sparsity constraints: Application to IEEE Signal Processing Letters 22, no. 5 (2015): 638-642. frequency domain optical coherence tomography,” IEEE Interna- [40] M. X. Goemans and D. P. Williamson, “Improved approxima- tional Conference on Acoustics, Speech and Signal Processing tion algorithms for maximum cut and satisfiability problems (2012): 553556. using semidefinite programming,” Journal of the ACM (JACM), [20] Y. Shechtman, Y. C. Eldar, A. Szameit and M. Segev, “Sparsity 42(6), 1115-1145 (1995). based sub-wavelength imaging with partially incoherent light [41] I. Waldspurger, A. d’Aspremont and S. Mallat, “Phase recovery, via quadratic ,” Optics Express 19 (2011): maxcut and complex semidefinite programming,” Mathematical 14807-14822. Programming 149.1-2 (2015): 47-81. 8

[42] E. J. Candes, T. Strohmer, and V. Voroninski, “Phaselift: Exact matrix. Note that aw[m, r] for 0 m N 1 can be trivially ≤ ≤ − and stable signal recovery from magnitude measurements via calculated from bw[m, r] for 0 m 2W 2. ≤ ≤ − convex programming,” Communications on Pure and Applied Consequently, if the N point DFT is used and 2W M Mathematics 66, no. 8 (2013): 1241-1274. ≤ ≤ [43] D. L. Sun and J. O. Smith, “Estimating a signal from a mag- N is satisfied, the affine constraints in (4) can be rewritten in nitude spectrogram via convex optimization,” arXiv:1209.2076 terms of aw and X as: (2012). N−1−m [44] L. R. Rabiner and R. W. Schafer, “Digital processing of speech X ? aw[m, r] = X[n, n + m]w[rL n]w [rL (n + m)]. signals,” Prentice Hall, (1978). − − [45] R. Balan, P. Casazza and D. Edidin, “On signal reconstruction n=0 without phase,” Applied and Computational Harmonic Analysis 20, no. 3 (2006): 345-356. [46] E. J. Candes and T. Tao, “Decoding by linear programming,” VIII.PROOFOF THEOREM III.1 Information Theory, IEEE Transactions on 51, no. 12 (2005): 4203-4215. The symbol is used to denote equality up to a global [47] E. J. Candes and B. Recht, “Exact matrix completion via convex phase and time-shift≡ §. We say that two signals x and x are optimization,” Foundations of Computational mathematics 9.6 1 2 (2009): 717-772. distinct if x1 x2, and equivalent if x1 x2. Let denote6≡ the set of all distinct non-vanishing≡ complex [48] B. Recht, M. Fazel and P. Parrilo, “Guaranteed minimum- P rank solutions of linear matrix equations via nuclear norm signals of length N. is a manifold of dimension 2N 1, i.e., minimization,” SIAM Review 52, no. 3 (2010): 471-501. locally resembles aP real 2N 1 dimensional space. This− can [49] K. Jaganathan, S. Oymak and B. Hassibi, “Sparse phase re- beP seen as follows: In order to− discard the global phase of non- trieval: Convex algorithms and limitations,” Information Theory Proceedings (ISIT), IEEE International Symposium on (2013): vanishing signals, we can assume that x[n0] is real and positive 1022-1026. at one index n0, without loss of generality. Hence, x[n0] can [50] R. A. Horn and C. R. Johnson, “Matrix analysis,” Cambridge take any value in R+, and x[n], for each 0 n N 1 2 ≤ ≤ − university press (2012). not equal to n0, can take any value in R 0, 0 , due to the [51] L. Vandenberghe and S. Boyd, “Semidefinite programming,” one-to-one correspondence between and\{ 2. } SIAM review (1996): 49-95. C R Let c be the set of distinct non-vanishing complex signalsP which⊂ P cannot be uniquely identified from their STFT APPENDIX magnitude if w is chosen such that it is non-vanishing and N VII.EQUIVALENT DEFINITION OF STFT PHASE RETRIEVAL W . We show that c has measure zero in . In order ≤ 2 P P Since we consider N point DFT and W satisfies W N , to do so, our strategy is as follows: ≤ 2 STFT phase retrieval can be equivalently stated in terms of We first characterize c using Lemma VIII.1. In particular, P the short-time autocorrelation a [14]: we show that c is a finite union of images of continuously w P differentiable maps from R2N−2 to . Since is a manifold find x (7) of dimension 2N 1, the following resultP completesP the proof: subject to − Theorem VIII.1 ([16], Chapter 5). If f : RN0 RN1 is N−1−m → X ? ? a continuously differentiable map, then the image of f has aw[m, r] = x[n]w[rL n]x [n + m]w [rL (n + m)] N1 − − measure zero in , provided N0 < N1. n=0 R for 0 m N 1 and 0 r R 1. We use the following notation in this section: If g is a ≤ ≤ − ≤ ≤ − T The knowledge of the short-time autocorrelation is sufficient signal of length lg, then g = (g[0], g[1], . . . , g[lg 1]) such − for all the guarantees provided in this paper. Note that the rth that g[0], g[lg 1] = 0 and g[n] = 0 outside the interval { − } 6 column of Zw and the rth column of aw are Fourier pairs. [0, lg 1]. The vector g˜ denotes the conjugate-flipped version − ? ? ? T Hence, for a particular r, if Zw[m, r] for 0 m N 1 is of g, i.e., g˜ = (g [lg 1], g [lg 2], . . . , g [0]) . Let ur and ≤ ≤ − − − available, then aw[m, r] for 0 m N 1 can be calculated vr denote the smallest and largest index where the windowed ≤ ≤ − by taking an inverse Fourier transform. The following lemma signal x wr has a non-zero value respectively. ◦ shows that 2W phaseless measurements per short-time section Lemma VIII.1. Consider two signals x x of length N are sufficient to infer the short-time autocorrelation. 1 2 which have the same STFT magnitude. If6≡ the window w is N Lemma VII.1. Zw[m, r] for 1 m 2W 1 is sufficient chosen such that it is non-vanishing and W , then, for ≤ ≤ − ≤ 2 to calculate aw[m, r] for 0 m N 1. each r, there exists signals gr and hr, of lengths lgr and lhr ≤ ≤ − respectively, such that Proof: If the window length is W , then aw has non-zero (i) x wr gr ? hr, x wr gr ? h˜r values only in the interval 0 m W 1 and N W + 1 1 ◦ ≡ 2 ◦ ≡ ≤ ≤ − − ≤ (ii) lgr + lhr 1 = vr ur + 1 m N 1. Let bw be the signal obtained by circularly − − ≤ − (iii) gr[lgr 1] = 1 and gr[0], hr[0], hr[lhr 1] = 0 shifting aw by W 1 rows, so that bw has non-zero values only − { − } 6 in the interval 0 − m 2W 2. Since the submatrix of the where ? is the convolution operator. Further, there exists at N point DFT matrix≤ obtained≤ − by considering the first 2W 1 least one r such that − columns and any 2W 1 rows is invertible (the Vandermonde (iv) l 2, h [0] is real and positive. − hr r structure is retained), Zw[m, r] for 1 m 2W 1 and ≥ ≤ ≤ − § bw[m, r] for 0 m 2W 2 are related by an invertible For non-vanishing signals, there is no ambiguity due to time-shift. ≤ ≤ − 9

Proof: In Lemma 7.1 of [22], it is shown that if two 1] = 1 is fixed, there are vr ur 1 other terms and each can be − − non-equivalent signals of length N have the same Fourier chosen from R2. Hence, this set is a subset of R2(vr −ur −1). ⊆ magnitude and if the DFT dimension is at least 2N (this Since the short-time sections r and r + 1 overlap in the would imply that they have the same autocorrelation), then index vr, gr ? hr and gr+1 ? hr+1 must be consistent in this there exists signals g and h, of lengths lg and lh respectively, index, i.e., gr, hr must satisfy: such that one signal can be decomposed as g?h and the other { } 1 1 signal can be decomposed as g?h˜. For each r, the rth column hr[lhr 1] = gr [0]hr [0]. (9) w[0] − w[W 1] +1 +1 of the STFT magnitude is equivalent to the Fourier magnitude − of the windowed signal x w . The DFT dimension is N, and r Due to Lemma VIII.1, gr ? h˜r and gr+1 ? h˜r+1 must also be the windowed signal length◦ is (which is less than vr ur + 1 consistent in this index up to a phase, i.e., gr, hr must also N − { } or equal to 2 ). Since, for every r, x1 wr and x2 wr have satisfy: the same Fourier magnitude, the aforementioned◦ result◦ proves (i). w[0] ? hr[0] gr+1[0]hr [lh,r+1 1]. (10) The conditions (ii) and (iii) are properties of convolution ≡ w[W 1] +1 − − (see Lemma 7.1 of [22] for details), and therefore hold for Observe that is used in (10), due to the fact that the equality every r. ≡ is only up to a phase. However, hr[0] is real and positive (see Furthermore, if lhr = 1 for all 0 r R 1, then x1 x2. Lemma VIII.1), due to which (10) fixes hr[0]. ≤ ≤ − iφ1 ≡ Hence, lhr 2 for at least one r. For this r, since e x1 and Consequently, the set of admissible values of the variables, iφ2 ≥ e x2 have the same STFT magnitude, hr[0] can be assumed 2N−2 excluding hr[0] and hr[lhr 1], is a subset of R , as 2(N to be real and positive without loss of generality. Hence, (iv) − − vr+1+ur 1+vr+1 ur+1+1+vr ur 1) = 2N 2. For each holds for at least one r. − − − − − point in this set, hr[0] and hr[lhr 1] are uniquely determined. Consequently, for each x c, condition (iv) of Lemma It is straightforward to check that− the map from this set to ∈ P rlr lr+1 f VIII.1 holds for at least one r. Let c c denote the rlr lr+1 P ⊂ P † is continuously differentiable. Consequently, c is the set of signals for which lhr = lr 2 and lh,r = lr . It P Q 2N−2 ≥ +1 +1 image of a continuously differentiable map from R to . suffices to show that for each r, lr and lr+1, there exists a P rlr lr+1 rlr lr+1 (ii) L < W 1 : set c c , which is the image of a continuously − Q ⊇ P 2N−2 Consider the setup for which 2L W . The set of differentiable map from R to . ≥2(vr+1−ur+1+1) gr+1, hr+1 , as earlier, is a subset of R . We first show the arguments for theP L = W 1 case as the { } The short-time sections r and r + 1 overlap in the interval expressions are simple and provide intuition for− the technique. [ur , vr]. Let vr ur +1 = T (the number of indices in the Then, we show the arguments for the L < W 1 case. +1 − +1 − overlapping interval). Due to 2L W , we have T = W L (i) L = W 1 : W ≥ − ≤ rl−r lr+1 2 . Hence, for each choice of gr+1, hr+1 , gr, hr must The set c is constructed as follows: Consider the b c { } { } Q satisfy: variables gr, hr, gr+1, hr+1 satisfying lhr = lr 2 and { } ≥ n+ur+1−ur lh,r+1 = lr+1, and x[n] for n [0, ur) (vr+1,N 1]. The 1 T ∈ ∪ − X map f = (f0, f1, . . . , fN−1) from these variables to is the gr[m]hr[n + ur+1 ur m] (11) wr[ur + m] − − following: P m=0 n  X 1 x[n] for n [0, ur) (vr+1,N 1] = gr+1[m]hr+1[n m]  ∈ ∪ − wr+1[ur+1 + m] − Pn−ur m=0  gr[m]hr[n ur m]  m=0 − − fn = for n [ur, vr] (8) for 0 n T 1. In addition, gr, hr must also satisfy: ∈ ≤ ≤ − { } Pn−ur+1  m=0 gr+1[m]hr+1[n ur+1 m] T −1  − − 1 X 1  for n [ur+1, vr+1]. hr[0] gr+1[m]hr+1[T 1 m] w[0] ≡ wr [ur + m] − − ∈ m=0 +1 +1 Observe that, for n = ur+1 = vr, fn has two definitions. (12) W The variables can admit only those values for which the two If lhr +1, then the T bilinear equations (11) can be ≥ b 2 c definitions have the same value. In the following, we show written as Ghr = c, where G has upper triangular structure that there is a one-to-one correspondence between the set of with unit diagonal entries, due to which rank(G) = T . The 2N−2 2(lgr −1) admissible values of the variables and a subset of R . set of gr is a subset of R . For each choice of gr, the Each x[n], for n [0, ur) (vr+1,N 1], can be cho- terms hr[lhr T ], . . . , hr[lhr 1] are fixed by Ghr = c. 2 ∈ ∪ − { − − } sen from R . The set of gr+1, hr+1 is a subset of The constraint (12) fixes the value of hr[0], as earlier. Each 2(vr+1−ur⊂+1+1) { } R , which can be seen as follows: gr+1[lg,r+1 of the remaining (lhr 1 T ) terms of hr may be chosen − − − 1] = 1 is fixed (see Lemma VIII.1), there are vr+1 ur+1 + 1 from 2. − R other terms and each can be chosen from R2. Hence,⊆ the set of admissible values of the variables, ⊆ For each choice of gr+1, hr+1 , consider the set of excluding hr[lhr T ], hr[lhr T + 1], . . . , hr[lhr 1] { } { − − − } gr, hr excluding the terms hr[0] and hr[lhr 1]: gr[lgr and , is a subset of 2N−2, due to the fact that { } − − hr[0] R 2(N vr + ur 1 + vr ur + 1 + vr ur T ) = †When r = R, we consider the short-time section r − 1 instead of r + 1. − +1 − +1 − +1 − − 2N 2 (as lgr + lhr 1 = vr ur + 1). For each point in We show the detailed calculations for the case when short-time section r + 1 − − − is considered, the arguments are symmetric for r − 1. this set, hr[0] and hr[lhr T ], . . . , hr[lhr 1] are uniquely { − − } 10 determined. The rest of the arguments are identical to those The proof strategy is as follows: We begin by focusing our of L = W 1. attention on short-time section r = 1. We show that the prior −W If lgr + 1 instead, then the bilinear equations (11) information available, along with the affine autocorrelation ≥ b 2 c can be equivalently written as Hgr = c, the same arguments measurements corresponding to r = 1 and the positive may be applied to draw the same conclusion. For the setup semidefinite constraint, will ensure that every feasible matrix ? with 2L > W , the same arguments hold for the short-time of (4) satisfies X[n, m] = x0[n]x0[m] for 0 n, m L. sections r and r+t, where t is the largest integer such that the We then apply this argument incrementally, i.e.,≤ we show≤ that short-time sections r and r+t overlap (this ensures T W ). the affine measurements corresponding to short-time section ≤ b 2 c r, along with the entries of X uniquely determined and the IX.PROOFOF COROLLARY III.1 positive semidefinite constraint, will ensure that X[n, m] = ? x [n]x [m] for ur n, m vr, where ur and vr denote We now extend Theorem III.1 to incorporate sparse signals. 0 0 ≤ ≤ Let S denote the set of all distinct complex signals of length the smallest and largest index where wr has a non-zero value N withP a support S. Here, S is a binary vector of length N, respectively. Consequently, the entries along the diagonal and the first W L off-diagonals of every feasible matrix of (4) such that x[n] = 0 whenever S[n] = 1 and x[n] = 0 whenever − S[n] = 0. Further,6 S has less than min L, W L consecutive match the entries along the diagonal and the first W L off- { − } ? − zeros. diagonals of the matrix x0x0. Since the entries are sampled S S from a rank one matrix with non-zero diagonal entries (i.e., Let c denote the set of signals which cannot be P ⊂ P ? uniquely identified from their STFT magnitude if w is chosen x0x0), there is exactly one positive semidefinite completion, N S which is the rank one completion x x? [50]. such that it is non-vanishing and W . We show that c 0 0 ≤ 2 P T has measure zero in S. Let s0 = (x0[0], x0[1], . . . , x0[L]) be a length L + 1 P In the proof of Theorem III.1, in order to show dimension subsignal of x0, and S be the (L + 1) (L + 1) submatrix × reduction, we used the fact that for sufficient pairs of adjacent of X corresponding to the first L + 1 rows and columns. We ? short-time sections r and r + 1, the following holds: now show that S = s0s0 is the only feasible matrix under the (i) There is at least one index in the non-overlapping indices constraints of (4).  L  [ur, ur+1 1] or [vr + 1, vr+1] where the signals x1 and Since x0[n] for 0 n 2 is known a priori, we have − ? ≤ ≤  L  x have a non-zero value. This ensures that h [0] is not S[n, m] = x0[n]x0[m] for 0 n, m . Let (S) = c 2 r ≤ ≤ 2 A constrained by gr+1, hr+1 in general. This condition can denote these constraints due to prior information, along with be ensured by imposing{ the} constraint that the sparse signal the affine constraints corresponding to r = 1. In particular, cannot have L consecutive zeros. (S) = c denotes the following set of constraints: A (ii) There is at least one index in the overlapping indices jLk S[n, m] = x [n]x?[m] for 0 n, m , [ur+1, vr] where the signals x1 and x2 have a non-zero value. 0 0 ≤ ≤ 2 This ensures that hr[0] is constrained by gr+1, hr+1 (10) L−m { } X ? for signals which cannot be uniquely identified by their STFT aw[m, 1] = S[n, n + m]w[L n]w [L (n + m)]. − − magnitude. This condition can be ensured by imposing the n=0 constraint that the sparse signal cannot have W L consecutive − For each feasible matrix S, these set of measurements fix zeros.  L   L  (i) the 2 + 1 2 + 1 submatrix, corresponding to the The only difference in the proof is the following: Unlike in  L  × first 2 + 1 rows and columns, of S (ii) the appropriately the case of non-vanishing signals, there is time-shift ambiguity. weighted sum along the diagonal and each off-diagonal of S Hence, the constraint (12) is replaced by: (2L W is implicitly used here). n ≤ 1 X 1 Lemma X.1. If ? satisfies , then it is the hr[0] gr+1[m]hr+1[n m] S0 = s0s0 (S) = c w[0] ≡ wr [ur + m] − A m=0 +1 +1 only positive semidefinite matrix which satisfies (S) = c. (13) A Proof: Let T be the set of Hermitian matrices of the form for some 0 n T 1. This fixes the value of hr[0] to ≤ ≤ − one of at most T values, due to which there is a dimension ? ? n T = S = s0v + vs0 : v C reduction. { ∈ } and T ⊥ be its orthogonal complement. The set T may be ? X.PROOFOF THEOREM IV.1 interpreted as the tangent space at s0s0 to the manifold of Hermitian matrices of rank one. Influenced by [42], we use We first show the arguments for the case 2W M S and S ⊥ to denote the projection of a matrix S onto the N (short-time autocorrelation known) as the expressions≤ are≤ T T subspaces and ⊥ respectively. simple and provide intuition. Then, we show the arguments T T Standard duality arguments in semidefinite programming for the case 4L M < 2W (super-resolution). show that the following are sufficient conditions for ? (i) 2W M ≤ N : S0 = s0s0 The affine≤ constraints≤ in (4) can be rewritten as (see Section to be the unique optimizer of (4): VII): (i) Condition 1: S T and (S) = 0 S = 0. (ii) Condition 2: There∈ exists aAdual certificate⇒ D in the N−1−m ? X ? range space of obeying: aw[m, r] = X[n, n + m]w[rL n]w [rL (n + m)]. − − A n=0 • Ds0 = 0 11

? • rank(D) = L We already have S[n, m] = x [n]x [m] for 0 n, m L 0 0 ≤ ≤ • D 0. from above. Let (S) = c denote these constraints, along < A The proof of this result is based on KKT conditions, with the affine constraints corresponding to r = 2. Due ? and can be found in any standard reference on semidefinite to 2L W , Lemma X.1 proves that S0 = s0s0 is the ≤ programming (for example, see [51]). only psd matrix which satisfies the prior information and We first show that Condition 1 is satisfied. The set of the measurements corresponding to r = 1, 2. Applying this constraints in (S) = 0 due to prior information fix the argument incrementally, the entries along the diagonal and the entries of the firstA  L + 1 rows and columns of S to 0. Since first W L off-diagonals of every feasible matrix of (4) match 2 − ? ? T the entries along the diagonal and the first W L off-diagonals S = s0v + vs0 for some v = (v[0], v[1], . . . , v[L]) (due  L  ? − of the matrix x0x0. to S T ), we infer that v[n] = icx0[n] for 0 n 2 , for some∈ real constant c. Indeed, the equations≤ of the≤ form Sparse signals: The arguments can be seamlessly extended ? ? to incorporate sparse signals. s0[n]v [n] + v[n]s0[n] = 0 imply v[n] = icnx0[n], for some ? ? (i) The fact that there exists a unique positive semidefinite real constant cn. The equations s0[n]v [m] + v[n]s0[m] = 0 imply cn = cm. completion once the diagonal and the first W L off-diagonal ? − The set of constraints in (S) = 0 due to the measurements entries are sampled from x0x0 holds when x0 has less than A corresponding to r = 1, along with v[n] = icx0[n] for 0 W L consecutive zeros. L L ≤ −     (ii) Note that the length of s2 is at most L, as it corresponds n 2 , imply v[n] = icx0[n] for 2 + 1 n L. Hence, for≤S T , (S) = 0 implies v = ics , which≤ in≤ turn implies to the locations in the window where the entries are not ∈ A 0 S = ics s? + ics s? = 0. determined. Since we know x0[n] for i0 n < i0 + L a − 0 0 0 0 ≤ We next establish Condition 2. For simplicity of notation, priori, where i0 is the smallest index such that x0[i0] = 0, 6 we consider the case where for the length of s1 is W L. Redefine s1 so that it corre- w[n] = 1 0 n W − 1. For a general non-vanishing w, the same arguments≤ ≤ hold− sponds to the locations in the window where the entries are (the Toeplitz matrix considered is appropriately redefined with determined, starting from the smallest index which has a non- weights). zero value in order to ensure s1[0] = 0. If x0 has at most 6 ? W 2L consecutive zeros, then the length of s is at least The range space of is the set of all L+1 L+1 matrices − 1 which are a sum of theA following two matrices:× The first matrix (W L) (W 2L) = L. Hence, a lower triangular Toeplitz  L   L  − − − can have any value in the + 1 + 1 submatrix matrix L, satisfying Ls1 + s2 = 0, always exists. 2 × 2 corresponding to the first  L + 1 rows and columns, and (ii) 4L M < 2W : 2 ≤ has a value zero outside this submatrix (dual of the set of The range space of the dual certificate is the set of all L+1 × constraints due to prior information). The second matrix has L+1 matrices which are a sum of the following two matrices:  L   L  a Toeplitz structure (dual of the measurements corresponding The first matrix can have any value in the 2 + 1 2 + 1  L  × to r = 1). submatrix corresponding to the first 2 + 1 rows and  L  columns, and has a value zero outside this submatrix (dual Suppose s1 is the vector containing the first + 1 entries 2 of the set of constraints due to prior information). The second of s0 and s2 is the vector containing the remaining entries matrix has the form PM ? ? , where is of s0. Here, s1 corresponds to the locations where we have m=1 αmWr fmfmWr αm real-valued for each (dual of the measurements correspond- knowledge of the entries and s2 corresponds to the locations m where the entries are not determined. Let L be a lower trian- ing to r = 1). T gular  L   L + 1 Toeplitz matrix satisfying Ls + s = 0. Let l = (l[0], l[1], . . . , l[N 1]) be a vector that satisfies: 2 2 1 2 − × l L m Such an L always exists if s1[0] is non-zero and the length (i) l[0] = 1, l[n] = l[N n] = 0 for 1 n 2 1 of s1 is greater than or equal to the length of s2. Let Λ be Pm − Pm ? ≤ ≤ −  L   L  (ii) n=0 x0[n]l[m n] = n=0 x0[n]l[N m + n] = 0 any 2 + 1 2 + 1 positive semidefinite matrix with rank  L  − −  L  × for 2 + 1 m L satisfying Λs = 0. Again, such a Λ always exists (any ? ≤ ≤ 2 1 (iii) f l = 0 for M + 1 m N. positive semidefinite matrix with eigenvectors perpendicular m ≤ ≤ These constraints together can be written as . When to s ). Consider the following dual certificate: Al = b 1 M 4 L , the matrix A is square or wide, and almost   2 L?L + Λ L? always≥ (pseudo)d e invertible. This can be seen as follows: the   determinant of A is a polynomial function of the entries of D =   . (14) LI L  x0, due to which it is either always zero or almost surely non- 2 zero. By substituting x0[0] = 1 and x0[n] = 0 for n = 0, it ? 6 Clearly, D is in the range space of . Also, Ds0 = 0 by is straightforward to check that the determinant is non-zero. construction. From the Schur complement,A it is straightforward Hence, such an l almost always exists. to see that rank(D) = L and D < 0. If the last row in (14) is chosen as (l[L], l[L 1], . . . , l[0]), ? − We have shown that S0 = s0s0 is the only positive then we have: (i) The lower right block is an identity matrix. semidefinite matrix which satisfies the prior information and (ii) Ls1 + s2 = 0 is satisfied. (iii) Since b is a real ? the measurements corresponding to r = 1. Redefine s0 and S vector, l satisfies l[n] = l [N n]. Therefore, l is in the T PM − such that s0 = (x0[0], x0[1], . . . , x0[2L]) is the 2L+1 length range space of m=1 αmfm where αm is real-valued, due subsignal of x and S is the (2L + 1) (2L + 1) submatrix of to which the resulting second matrix is in the range space of × PM ? ? X corresponding to the first 2L + 1 rows and columns. m=1 αmWr fmfmWr. 12

Therefore, D satisfies all the requirements. The arguments Babak Hassibi (M’08) was born in Tehran, , are applied incrementally as earlier, with M 4L for r > 1. in 1967. He received the B.S. degree from the ≥ in 1989, and the M.S. and Ph.D. degrees from Stanford University in 1993 and 1996, respectively, all in electrical engineering. He has been with the California Institute of Tech- Kishore Jaganathan was born in Tamil Nadu, India, nology since January 2001, where he is currently the in 1989. He received the B.Tech degree from Indian Gordon M Binder/ Amgen Professor Of Electrical Institute of Technology, Madras in 2010, and the Engineering. From 2008-2015 he was Executive Of- M.S. degree from California Institute of Technology, ficer of Electrical Engineering, as well as Associate Pasadena in 2011, both in electrical engineering. Director of Information Science and Technology. He has been with the California Institute of Tech- From October 1996 to October 1998 he was a research associate at the nology since 2011, where he is currently pursuing Information Systems Laboratory, Stanford University, and from November his Ph.D. degree in electrical engineering, under the 1998 to December 2000 he was a Member of the Technical Staff in the supervision of Prof. Babak Hassibi. At Caltech, he Mathematical Sciences Research Center at Bell Laboratories, Murray Hill, was awarded the Atwood Fellowship in 2010. He NJ. His research interests include wireless communications and networks, is also the recipient of the Qualcomm Innovation robust estimation and control, adaptive signal processing and linear algebra. He is the coauthor of the books (both with A.H. Sayed and T. Kailath) Fellowship, in 2014. His current research interests include signal processing, 2 convex optimization and statistical machine learning. Indefinite Quadratic Estimation and Control: A Unified Approach to H and H∞ Theories (New York: SIAM, 1999) and Linear Estimation (Englewood Cliffs, NJ: Prentice Hall, 2000). He is a recipient of an Alborz Foundation Fellowship, the 1999 O. Hugo Schuck best paper award of the American Automatic Control Council (with H. Hindi and S.P. Boyd), the 2002 National Yonina C. Eldar (S’98—M’02—SM’07—F’12) re- Science Foundation Career Award, the 2002 Okawa Foundation Research ceived the B.Sc. degree in Physics in 1995 and the Grant for Information and Telecommunications, the 2003 David and Lucille B.Sc. degree in Electrical Engineering in 1996 both Packard Fellowship for Science and Engineering, the 2003 Presidential Early from Tel-Aviv University (TAU), Tel-Aviv, Israel, Career Award for Scientists and Engineers (PECASE), and the 2009 Al-Marai and the Ph.D. degree in Electrical Engineering and Award for Innovative Research in Communications, and was a participant Computer Science in 2002 from the Massachusetts in the 2004 National Academy of Engineering “Frontiers in Engineering” Institute of Technology (MIT), Cambridge. program. From January 2002 to July 2002 she was a He has been a Guest Editor for the IEEE Transactions on Information Postdoctoral Fellow at the Digital Signal Processing Theory special issue on “space-time transmission, reception, coding and signal Group at MIT. She is currently a Professor in the processing” was an Associate Editor for Communications of the IEEE Trans- Department of Electrical Engineering at the Tech- actions on Information Theory during 2004-2006, and is currently an Editor nion - Israel Institute of Technology, Haifa, Israel, where she holds the for the Journal “Foundations and Trends in Information and Communication” Edwards Chair in Engineering. She is also a Research Affiliate with the and for the IEEE Transactions on Network Science and Engineering. He is Research Laboratory of Electronics at MIT and was a Visiting Professor at an IEEE Information Theory Society Distinguished Lecturer for 2016-2017. Stanford University, Stanford, CA. Her research interests are in the broad areas of statistical signal processing, sampling theory and compressed sensing, optimization methods, and their applications to biology and optics. Dr. Eldar has received numerous awards for excellence in research and teaching, including the IEEE Signal Processing Society Technical Achieve- ment Award (2013), the IEEE/AESS Fred Nathanson Memorial Radar Award (2014), and the IEEE Kiyo Tomiyasu Award (2016). She was a Horev Fellow of the Leaders in Science and Technology program at the Technion and an Alon Fellow. She received the Michael Bruno Memorial Award from the Rothschild Foundation, the Weizmann Prize for Exact Sciences, the Wolf Foundation Krill Prize for Excellence in Scientific Research, the Henry Taub Prize for Excellence in Research (twice), the Hershel Rich Innovation Award (three times), the Award for Women with Distinguished Contributions, the Andre and Bella Meyer Lectureship, the Career Development Chair at the Technion, the Muriel & David Jacknow Award for Excellence in Teaching, and the Technions Award for Excellence in Teaching (two times). She received several best paper awards and best demo awards together with her research students and colleagues including the SIAM outstanding Paper Prize, the UFFC Outstanding Paper Award, the Signal Processing Society Best Paper Award and the IET Circuits, Devices and Systems Premium Award, and was selected as one of the 50 most influential women in Israel. She is a member of the Young Israel Academy of Science and Humanities and the Israel Committee for Higher Education, and an IEEE Fellow. She is the Editor in Chief of Foundations and Trends in Signal Processing, a member of the IEEE Sensor Array and Multichannel Technical Committee and serves on several other IEEE committees. In the past, she was a Signal Processing Society Distinguished Lecturer, member of the IEEE Signal Processing Theory and Methods and Bio Imaging Signal Processing technical committees, and served as an associate editor for the IEEE Transactions On Signal Processing, the EURASIP Journal of Signal Processing, the SIAM Journal on Matrix Analysis and Applications, and the SIAM Journal on Imaging Sciences. She was Co-Chair and Technical Co-Chair of several international conferences and workshops. She is author of the book Sampling Theory: Beyond Bandlimited Systems and co-author of the books Compressed Sensing and Convex Optimization Methods in Signal Processing and Communications, all published by Cambridge University Press.