<<

1 On the Asymptotic Equivalence of Circulant and Toeplitz Matrices Zhihui Zhu and Michael B. Wakin

Abstract—Any sequence of uniformly bounded N × N Her- Hermitian Toeplitz matrices {HN } by defining a function mitian Toeplitz matrices {HN } is asymptotically equivalent to a eh(f) ∈ L2([0, 1]) with Fourier series2 certain sequence of N ×N circulant matrices {CN } derived√ from Z 1 the Toeplitz matrices in the sense that kHN − CN kF = o( N) −j2πkf as N → ∞. This implies that certain collective behaviors of the h[k] = eh(f)e df, k ∈ Z, 0 eigenvalues of each Toeplitz are reflected in those of the ∞ X corresponding and supports the utilization of h(f) = h[k]ej2πkf , f ∈ [0, 1]. the computationally efficient (instead of e the Karhunen-Loeve` transform) in applications like coding and k=−∞ filtering. In this paper, we study the asymptotic performance of the individual eigenvalue estimates. We show that the asymptotic Usually eh(f) is referred to as the symbol or generating equivalence of the circulant and Toeplitz matrices implies the function for the N × N Toeplitz matrices {HN }. individual asymptotic convergence of the eigenvalues for certain Suppose eh ∈ L∞([0, 1]). Szego’s˝ theorem [1] states that types of Toeplitz matrices. We also show that these estimates N−1 asymptotically approximate the largest and smallest eigenvalues 1 X Z 1 lim ϑ(λl(HN )) = ϑ(eh(f))df, (1) for more general classes of Toeplitz matrices. N→∞ N l=0 0 Keywords—Szego’s˝ theorem, Toeplitz matrices, circulant matri- ces, asymptotic equivalence, Fourier analysis, eigenvalue estimates where ϑ is any function continuous on the range of eh. As one example, choosing ϑ(x) = x yields

N−1 1 X Z 1 I.INTRODUCTION lim λl(HN ) = eh(f)df. N→∞ N 0 A. Szego’s˝ Theorem l=0 Toeplitz matrices are of considerable interest in statistical In words, this says that as N → ∞, the average eigenvalue of signal processing and information theory [1–5]. An N × N HN converges to the average value of the symbol eh(f) that 1 HN has the form generates HN . As a second example, suppose eh(f) > 0 (and thus λl(HN ) > 0 for all l ∈ [N] and N ∈ N) and let ϑ be the  h[0] h[−1] h[−2] . . . h[−(N − 1)]  log function. Then Szego’s˝ theorem indicates that h[1] h[0] h[−1]   1 . 1 Z    .  lim log (det (H )) = log h(f) df. HN =  h[2] h[1] h[0] .  N e   N→∞ N 0  . ..   . .  This relates the of the Toeplitz matrix to its h[N − 1] ··· h[0] symbol. Szego’s˝ theorem has been widely used in the areas of or H [m, n] = h[m − n]; m, n ∈ [N] := {0, 1,...,N − N signal processing, communications, and information theory. 1}. The of a random vector obtained by A paper and review by Gray [2, 7] serve as a remarkable sampling a wide-sense stationary (WSS) random process is an elementary introduction in the engineering literature and offer example of such a matrix. arXiv:1608.04820v2 [cs.IT] 22 Feb 2017 a simplified proof of Szego’s˝ original theorem. The result has H Throughout this paper, we consider N that is Hermitian, also been extended in several ways. For example, the Avram- HH = H H i.e., N N , and we suppose that the eigenvalues of N Parter theorem [8, 9], a generalization of Szego’s˝ theorem, λ (H ) ≥ · · · ≥ λ (H ) are denoted and arranged as 0 N N−1 N . relates the collective asymptotic behavior of the singular values A AH Here the Hermitian of a matrix is denoted by . of a general (non-Hermitian) Toeplitz matrix to the absolute Szego’s˝ theorem [1] describes the collective asymptotic value of its symbol, i.e., |h(f)|. Tyrtyshnikov [10] proved that behavior (as N → ∞) of the eigenvalues of a sequence of e Szego’s˝ theorem holds if eh(f) ∈ R and eh(f) ∈ L2([0, 1]), and This work was supported by NSF grant CCF-1409261. Zamarashkin and Tyrtyshnikov [11] further extended Szego’s˝ Z. Zhu and M. B. Wakin are with the Department of Electrical Engineering and Computer Science, Colorado School of Mines, Golden, CO 80401 USA. 2This can also be interpreted using the discrete-time Fourier transform P∞ −j2πkf e-mail: {zzhu,mwakin}@mines.edu. (DTFT). That is, we can define bh(f) = k=−∞ h[k]e = eh(1 − f). 1Through the paper, finite-dimensional vectors and matrices are indicated However, it is more common to view h as the of the symbol by bold characters and we index such vectors and matrices beginning at 0. eh; see [1, 6]. 2 theorem to the case when eh(f) ∈ R and eh(f) ∈ L1([0, 1]). in O(N log N) time3 using the fast Fourier transform (FFT). Sakrison [12] extended Szego’s˝ theorem to high dimensions. The individual eigenvalues of the circulant matrix approximate Gazzah et al. [13] and Gutierrez-Guti´ errez´ and Crespo [14] those of the Toeplitz matrix. We study the quality of this extended Gray’s results on Toeplitz and circulant matrices approximation. to block Toeplitz and block circulant matrices and derived An N × N circulant matrix CN is a special Toeplitz matrix Szego’s˝ theorem for block Toeplitz matrices. of the form Most relevant to our work, Bogoya et al. [15] studied the  c[0] c[1] c[2] . . . c[N − 1]  individual asymptotic behavior of the eigenvalues of Toeplitz c[N − 1] c[0] c[1] matrices by interpreting Szego’s˝ theorem in probabilistic lan-    .  guage. In the case that the range of h is connected, Bogoya et  .  e CN =  c[N − 2] c[N − 1] c[0] .  . al. related the eigenvalues to the values obtained by sampling  . .   . ..  the symbol eh(f) uniformly in frequency on [0, 1]. c[1] ··· c[0] B. Motivation Circulant matrices arise naturally in applications involving Despite the power of Szego’s˝ theorem, in many scenarios the discrete Fourier transform (DFT) [3]; in particular, any (such as certain coding and filtering applications [2, 3]), one circulant matrix can be unitarily diagonalized using the DFT matrix. Circulant matrices offer a nontrivial but simple set may only have access to HN and not eh. In such cases, it is still desirable to have practical and efficiently computable of objects that can be used for problems involving Toeplitz matrices. For example, the product H x can be computed in estimates of individual eigenvalues of HN . We elaborate on N two example applications below. O(N log N) time by embedding HN into a (2N−1)×(2N−1) circulant matrix and using the FFT to perform matrix-vector i. Estimating the of a positive-definite multiplication. Also Gray [2, 7] showed that Toeplitz and Toeplitz matrix. The linear system HN y = b arises naturally circulant matrices are asymptotically equivalent in a certain in many signal processing and estimation problems such as sense; this implies that their eigenvalues have similar collective linear prediction [4, 5]. The condition number κ(HN ) of the behavior. See Section II for formal definitions. Finally, we note Toeplitz matrix HN is important when solving such systems. that circulant matrices have been used as preconditioners [21, For example, the speed of solving such linear systems via 22] of Toeplitz matrices in iterative methods for solving linear the widely used conjugate gradient method is determined systems of the form HN y = b. by the condition number: the larger κ(HN ), the slower the We consider estimates for the eigenvalues of a Toeplitz convergence of the algorithm. In case of large κ(HN ), pre- matrix obtained from a well-constructed circulant matrix. The conditioning can be applied to ensure fast convergence. Thus eigenvalues of the circulant matrix can be computed efficiently estimating the smallest and largest eigenvalues of a symmetric without constructing the whole matrix; one merely applies the positive-definite Toeplitz matrix (such as the covariance matrix FFT to the first row of the matrix. We do not provide new of a random vector obtained by sampling a stationary random circulant approximations to Toeplitz matrices in this paper; process) is of considerable interest [16, 17]. rather we sharpen the analysis on the asymptotic equivalence of ii. Spectrum sensing algorithm for cognitive radio. Toeplitz and certain circulant matrices [2, 3, 7] by establishing Spectrum sensing is a fundamental task in cognitive ratio, results in terms of individual eigenvalues rather than collective which aims to best utilize the available spectrum by identifying behavior. To the best of our knowledge, this is the first work unoccupied bands [18–20]. Zeng and Ling [20] have proposed that provides guarantees for asymptotic equivalence in terms spectrum sensing methods for cognitive radio based on the of individual eigenvalues. eigenvalues of a Toeplitz covariance matrix. These eigenvalue- based algorithms overcome the noise uncertainty problem D. Circulant Approximations to H which exists in alternative methods based on energy detection. N We consider the following circulant approximations that Aside from the above applications, approximate and effi- have been widely used in information theory and applied ciently computable eigenvalue estimates can also be used as the mathematics. starting point for numerical algorithms that iteratively compute 1) CeN : Bogoya et al. [15] proved that the samples of the eigenvalues with high . symbol eh are the main asymptotic terms of the eigenvalues of the Toeplitz matrix HN . Given only HN , one practical C. Contributions strategy for estimating the eigenvalues is to first approxi- th In this paper, we consider estimates for the eigenvalues of a mate eh by the (N − 1) partial Fourier sum SN−1(f) = Toeplitz matrix that are obtained through a two-step process: PN−1 j2πfk k=−(N−1) h[k]e . Then construct a circulant matrix l 1) Transform the Toeplitz matrix into a circulant matrix whose eigenvalues are samples of SN−1(f), i.e., SN−1( ). using a certain procedure described below. N We let CeN denote the corresponding circulant matrix, whose 2) Compute the eigenvalues of the circulant matrix. 3 Both of these steps can be performed efficiently; in particular, We say g1(N) = O(g2(N)) if and only if there exist a positive real the eigenvalues of an N ×N circulant matrix can be computed number t and M ∈ N such that g1(N) ≤ tg2(N) for all N ≥ M. 3 top row (ec[0], ec[1],..., ec[N − 1]) can be obtained as (c[0], c[1],..., c[N − 1]) of CN can be obtained as follows N−1 1 N−1 l 1 X n j2πkn/N X j2πkl/N c[k] = S ( )e c[k] = σN ( )e e N N−1 N N N n=0 l=0 N−1 N−1 N−1 N−1 n 1 X X 0 1 X 1 X X 0 j2πl(k+k0)/N = h[k0]ej2π(k+k )n/N = h[k ]e N N N n=0 k0=−(N−1) l=0 n=0 k0=−n N−1 N−1 ! N−1 n N−1 ! 1 X X X 1 0 X 0 X 1 j2π(k+k0)n/N = h[k0] ej2πl(k+k )/N = h[k ] e N N N n=0 0 k0=−(N−1) n=0 k =−n l=0 1  h[0], k = 0, = ((N − k)h[−k] + kh[N − k]) . = h[−k] + h[N − k], k = 1, 2,...,N − 1, N Pearl [3] first analyzed such a circulant approximation and where the last line utilizes the fact its applications in coding and filtering. The same circulant approximation (referred to as an optimal preconditioner) was N−1  0 X 1 0 1, mod(k + k ,N) = 0, ej2π(k+k )n/N = also proposed by Chan [22]. The optimal preconditioner is the N 0, otherwise. solution to the following optimization problem n=0

minimize kCN − HN kF 2) CbN : Following the same strategy, we first compute the  N−1 th over all N × N circulant matrices. One can verify that C is 2 partial Fourier sum N the solution to the above problem. N−1 b 2 c X j2πfk S N−1 (f) = h[k]e . E. Main Results b 2 c N−1 k=−b 2 c As a reminder, we assume throughout this paper that each n o HN is Hermitian; this ensures that all CN ∈ CeN , CbN , CN Let CbN denote the N ×N circulant matrix whose eigenvalues l are Hermitian as well. Let {λl (CN )} denote the eigenval- are samples of S N−1 (f), i.e, S N−1 ( ). With simple l∈[N] b 2 c b 2 c N n o ues of the circulant matrix CN for all CN ∈ CeN , CbN , CN . manipulations, the top row (bc[0], bc[1],..., bc[N − 1]) of CbN is given by Let λl(CN ) be permuted that such that λρ(0)(CN ) ≥ λρ(1)(CN ) ≥ · · · ≥ λρ(N−1)(CN ). In this paper, we establish  N−1  h[−k], 0 ≤ k ≤ b 2 c, the following results. c[k] = N+1 b h[N − k], d 2 e ≤ k < N,  Theorem I.1. Suppose that the sequence h[k] is absolutely 0, k = N/2, summable. Then when N is even, and lim max λl(HN ) − λρ(l)(CN ) = 0, (2) N→∞ l∈[N]  N−1 h[−k], 0 ≤ k ≤ b 2 c, bc[k] = N+1 n o h[N − k], d 2 e ≤ k < N, for all CN ∈ CeN , CbN , CN . when N is odd. Theorem I.1 states that the individual asymptotic conver- Strang [21] first employed such circulant matrices as precon- gence of the eigenvalues between the Toeplitz matrices H n o N ditioners to speed up the convergence of iterative methods for and circulant matrices CN ∈ CeN , CbN , CN holds as long solving Toeplitz linear systems. This approach is quite simple. as h[k] is absolutely summable. Its proof involves the uniform The underlying idea is that the sequence h[k] usually decays convergence of a Fourier series and the fact that the equal quickly as k grows large, and thus we keep the largest part distribution of two sequences implies individual asymptotic of the Toeplitz matrix and fill in the remaining part to form a equivalence of two sequences in a certain sense. By utilizing circulant approximation. the Sturmian separation theorem [24], we also provide the 3) CN : In the Fourier analysis literature, it is known that convergence rate for band Toeplitz matrices as follows. Cesaro` sum has rather better convergence than the partial th Fourier sum [23]. The N Cesaro` sum is defined as Theorem I.2. Suppose that h[k] = 0 for all |k| > r, i.e., HN is a band Toeplitz matrix when N > r. Then PN−1 n=0 Sn(f) σN (f) = . 1 N max λl(HN ) − λρ(l)(CN ) = O( ) (3) l∈[N] N We use CN to denote the N × N circulant matrix whose n o l as N → ∞ for all CN ∈ CeN , CbN , CN . eigenvalues are samples of σN (f), i.e., σN ( N ). The top row 4

Utilizing the fact that the Cesaro` sum has rather better The above results—characterizing the individual asymptotic convergence than the partial Fourier sum, the following result convergence of the eigenvalues between Toeplitz and circulant establishes a weaker condition on h[k] for the individual matrices—serve as complements to the literature on asymptotic asymptotic convergence of the eigenvalues between HN and equivalence that has focused on the collective behavior of CN . the eigenvalues. Before moving on, we briefly review said literature. In [2, 7], Gray showed the asymptotic equivalence4 Theorem I.3. Suppose that h[k] is square summable and of {HN } and {CeN } when the sequence h[k] is absolutely eh ∈ L∞([0, 1]) is Riemann integrable and the essential range h i summable. Pearl showed the asymptotic equivalence of {HN } of eh is ess inf eh, ess sup eh , i.e., the essential range of eh is and {CN } when the sequence h[k] is square summable and connected. Then HN and CN have bounded eigenvalues for all N ∈ N. The −1 spectrum of the preconditioned matrix CN HN asymptotically lim max λl(HN ) − λρ(l)(CN ) = 0. (4) N→∞ l∈[N] clustering around one was investigated in [10, 27–29]. Finally, as noted previously, Bogoya et al. [15] studied the individual asymptotic behavior of the eigenvalues of Toeplitz Note that the sequence h[k] being absolutely summable ∞ matrices by interpreting Szego’s˝ theorem in probabilistic lan- implies that h[k] is square summable, that eh ∈ L ([0, 1]) is guage. To the best of our knowledge, [15] was the first work Riemann integrable, and that its range is connected [23] (actu- that provided conditions under which Szego’s˝ theorem implies ally eh is continuous). However, the converse of this statement individual asymptotic eigenvalue estimates by sampling the does not hold. We provide an example in Section IV-B. symbol eh(f) uniformly in frequency on [0, 1]. Our estimates Finally, the following result concerns the convergence of for the eigenvalues of a Toeplitz matrix differ from [15] in that the largest and smallest eigenvalues for more general classes they are only dependent on the entries of HN (instead of the of Toeplitz matrices. symbol eh(f)). As we attempt to forge a connection between Theorem I.4. Suppose that eh ∈ L∞([0, 1]) is Riemann equal distribution of two sequences of matrices and individual integrable. Then asymptotic equivalence of the eigenvalues, we bridge the gap between equal distribution of two real sequences (see  lim λ0 (HN ) = lim λρ(0) CN = ess sup eh, Definition II.1) and individual asymptotic equivalence of the N→∞ N→∞  two real sequences (see Definition II.2). In particular, we lim λN−1 (HN ) = lim λρ(N−1) CN = ess inf eh. N→∞ N→∞ provide two conditions under which equal distribution implies individual asymptotic equivalence in Theorems III.1 and III.3, which may themselves be of independent interest. For our Remark. Theorem I.4 works only for the circulant matrix CN proof, as motivated by [15], we utilize the same approach of and not CeN or CbN . This is closely related to the fact that interpreting asymptotic equivalence in probabilistic language. the partial Cesaro` sum has better convergence than the partial However, [15] involves the quantile function, while our ap- Fourier sum [23]. proach involves only the cumulative distributive function and Remark. Theorem I.4 only requires eh to be bounded and works on the sequences directly with proof by contradiction. Riemann integrable, while Theorem I.3 requires the range of Moreover, [15] requires the sequences of the eigenvalues to eh to be connected. be strictly inside the range of eh,5 while our work covers more Directly computing the eigenvalues of the Toeplitz matrix general cases where the sequences of the eigenvalues can be 3 HN would generally require O(N ) flops. By exploiting the outside of the range of eh (since the eigenvalues of CbN and CeN special structure of Toeplitz matrices, Trench [25] presented can be outside the range of eh) as illustrated in Theorem III.1. an iterative algorithm (which combines the Levinson-Durbin See also our remark at the end of Section III-C. Finally, while algorithm with an iterative root-finding procedure) requiring Theorem III.3 in the present work requires two sequences to 2 O(N ) operations per eigenvalue. Laudadio et al. [17] sum- be inside the essential range of a function, the range of this marized several algorithms to estimate the smallest eigen- function is not required to be connected; connectedness is value of a symmetric positive-definite Toeplitz matrix. These needed in [15] so that the quantile function defined there is 2 algorithms need O(N ) flops. Luk and Qiao [26] proposed uniformly continuous. a fast algorithm (which consists of a Lanczos-type tridiago- The rest of the paper is organized as follows. Section II nalization procedure and a QR-type diagonalization method) 2 states preliminary results on the asymptotic equivalence of that computes the eigenvalues in O(N log N) operations. In Toeplitz and circulant matrices. We prove our main results contrast, it is clear from Section I-D that the top row of n o in Section III. Section IV presents examples to illustrate our CN ∈ CeN , CbN , CN can be computed at most in O(N) results, and Section V concludes the paper. flops because of the closed-form expressions. Computing the eigenvalues for the corresponding N × N circulant matrices 4 via the FFT requires only O(N log N) flops, and at the We define asymptotically equivalent sequences of matrices in Section II. 5It is unclear whether Szego’s˝ theorem implies individual asymptotic eigen- same time, Theorems I.1–I.4 ensure that these eigenvalues are value estimates by sampling the symbol eh(f) uniformly if the eigenvalues asymptotically equivalent to the eigenvalues of the Toeplitz are outside the range of eh. For this case, besides a connected range for eh, we matrix. suspect more conditions are needed to ensure the results in [15] still hold. 5

II.PRELIMINARIES for every continuous function ϑ on [0, 2]. However, these two Before proceeding, we introduce some notation used sequences are not individually asymptotically convergent since throughout the paper. Let R (·) be the range of a function max |uN,l − vN,l| = 1 and ess R (·) be the essential range of a function. Suppose l∈[N] g(x): R → R. Then for any N ∈ N. ess R(g) = {y ∈ R | ∀ > 0 : µ({x : |g(x) − y| < }) > 0} , The asymptotic equivalence of two sequences of matrices is where µ(·) is the Lebesgue measure of a set (i.e., the length defined as follows. of an interval when the set is an interval). For any Ω ⊂ R, let Definition II.3. [2, 7] Two sequences of N × N matrices int (Ω) be the interior of the set Ω. We say the essential range {AN } and {BN } (where AN and BN denote N×N matrices) of the function g(x): R → R is connected if its essential are said to be asymptotically equivalent if range is a real interval (a set of real numbers that any number kA − B k lies between two numbers in the set is also included in the lim N √ N F = 0 set). N→∞ N 6 A. Asymptotically Equivalent Matrices and {AN } and {BN } are uniformly absolutely bounded , i.e., there exists a constant M < ∞ such that We begin with the notion of equal distribution of two real sequences, using a definition attributed to Weyl [1]. kAN k2 , kBN k2 ≤ M, ∀N ∈ N. Definition II.1 (equal distribution [1]). Assume that the se- ∞ ∞ quences {{uN,l}l∈[N]}N=1 and {{vN,l}l∈[N]}N=1 are abso- lutely bounded, i.e., there exist a, b such that a ≤ uN,l ≤ Following the convention in Gray’s monograph [7], we write b and a ≤ vN,l ≤ b for all l ∈ [N] and N ∈ N. AN ∼ BN if {AN } and {BN } are asymptotically equivalent. ∞ ∞ Then {{uN,l}l∈[N]}N=1 and {{vN,l}l∈[N]}N=1 are equally This kind of asymptotic equivalence is transitive, i.e., if AN ∼ distributed if BN and BN ∼ CN , then AN ∼ CN . Additional properties N−1 of ∼ can be found in [7]. The following result concerns the 1 X lim (ϑ (uN,l) − ϑ (vN,l)) = 0. asymptotic eigenvalue behavior of asymptotically equivalent N→∞ N l=0 Hermitian matrices. for every continuous function ϑ on [a, b]. Theorem II.4. [7, Theorem 2.4] Let {AN } and {BN } be asymptotically equivalent sequences of Hermitian We define the notion of individual asymptotic equivalence ∞ matrices with eigenvalues {{λl (AN )}l∈[N]}N=1 and of two real sequences as follows. ∞ {{λl (BN )}l∈[N]}N=1. Then there exist constants a and b Definition II.2 (individual asymptotic equivalence). Assume such that ∞ ∞ that the sequences {{uN,l}l∈[N]}N=1 and {{vN,l}l∈[N]}N=1 are arranged in decreasing order and are absolutely bounded, a ≤ λl (AN ) , λl (BN ) ≤ b, ∀ l ∈ [N],N ∈ N. i.e., there exist a, b such that a ≤ u ≤ · · · ≤ u ≤ N,N−1 N,1 Let ϑ be any function continuous on [a, b]. We have uN,0 ≤ b and a ≤ vN,N−1 ≤ · · · ≤ vN,1 ≤ vN,0 ≤ b for all ∞ ∞ N ∈ . Then {{u } } and {{v } } are N−1 N N,l l∈[N] N=1 N,l l∈[N] N=1 1 X individually asymptotically equivalent if lim (ϑ (λl (AN )) − ϑ (λl (BN ))) = 0. N→∞ N l=0 lim max |uN,l − vN,l| = 0. N→∞ l∈[N] In light of this theorem, Definition II.3 can be viewed as We note that individual asymptotic equivalence is stronger the matrix equivalent of Definition II.1. One can also define ∞ than equal distribution since if {{uN,l}l∈[N]}N=1 and the matrix equivalent of Definition II.2, although we will not ∞ {{vN,l}l∈[N]}N=1 are individually asymptotically equivalent, need this. they are equally distributed (see Appendix B, where it is proved that (6) implies (5) without using any of the as- sumptions of Theorem III.1). However, equal distribution in B. Asymptotic Equivalence of Circulant and Toeplitz Matrices general does not imply individual asymptotic convergence of Any circulant matrix CN is characterized by its top row. two real sequences. As a simple example, let uN,0 = 1, uN,l = Let  j2πf0  0, vN,0 = 2, vN,l = 0 for all l ∈ {1, 2,...,N −1} and N ∈ N. e We have  ej2πf1  e :=   ∈ N , f ∈ [0, 1] N−1 f  .  C 1 X  .  lim (ϑ (uN,l) − ϑ (vN,l)) j2πf(N−1) N→∞ N e l=0

1 6 = lim (ϑ(1) − ϑ(2)) = 0 We say a sequence of matrices {AN } is uniformly absolutely bounded if N→∞ N there exists a constant M such that kAN k2 ≤ M is for all N ∈ N. 6

denote a length-N vector of samples from a discrete-time λρ(0)(CN ) ≥ λρ(1)(CN ) ≥ · · · ≥ λρ(N−1)(CN ). Then complex exponential signal with digital frequency f. Note that N−1 N−1 1 X lim ϑ(λl(HN )) − ϑ(λρ(l)(CN )) = 0  X j2π(N−l)(k+n)/N N→∞ N CN e(N−l)/N [k] = c[n]e l=0 n=0 N−1 ! for every function ϑ that is continuous on [a, b] and CN ∈ j2π(N−l)k/N X −j2πln/N n o =e c[n]e , CeN , CbN , CN . Here [a, b] is the smallest interval that n=0 covers all the eigenvalues of HN , CeN , CbN , and CN . which implies that Proof. This result follows simply from Lemmas II.5 and B.1. N−1 ! X −j2πln/N CN e(N−l)/N = c[n]e e(N−l)/N . n=0

n 1 o Thus the normalized DFT vectors √ el/N are III.PROOFSOF MAIN THEOREMS N l∈[N] the eigenvectors of any circulant matrix CN , and the corre- A. Proof of Theorem I.1 sponding eigenvalues are obtained by taking the DFT of the We first provide a strong condition under which the equal first row of CN . Specifically, distribution of two sequences is equivalent to individual N−1 asymptotic equivalence. X −j2πln/N λl (CN ) = c[n]e , ∞ Theorem III.1. Assume that the sequences {{uN,l}l∈[N]}N=1 n=0 ∞ and {{vN,l}l∈[N]}N=1 are absolutely bounded, i.e., there exist 0 0 0 0 which can be computed efficiently via the FFT. We note that a , b such that b ≥ uN,0 ≥ uN,1 ≥ · · · ≥ uN,N−1 ≥ a and 0 0 b ≥ vN,0 ≥ vN,1 ≥ · · · ≥ vN,N−1 ≥ a for all N ∈ . {λl (CN )}l∈[N] are not necessarily arranged in any particular N order; namely, they do not necessarily decrease with l. Furthermore, suppose there exists a non-constant continuous function g(x):[c, d] → such that For a sequence of Toeplitz matrices {HN } and their re- R spective circulant approximations discussed in Section I-D, the lim uN,0 = lim vN,0 = max g(x), following result establishes asymptotic equivalence in terms N→∞ N→∞ x∈[c,d] of the collective behaviors of the eigenvalues. As a reminder, lim uN,N−1 = lim vN,N−1 = min g(x), we assume that each H is Hermitian; this ensures that all N→∞ N→∞ x∈[c,d] n o N CN ∈ CeN , CbN , CN are Hermitian as well. and N−1 Lemma II.5. Suppose that the sequence h[k] is square 1 X 1 Z d summable and {H }, {C }, {C }, {C } are uniformly ab- lim ϑ(uN,l) = ϑ(g(x))dx < ∞ N eN bN N N→∞ N d − c solutely bounded. Then l=0 c for every function ϑ that is continuous on [a, b], where [a, b] is H ∼ C ∼ C ∼ C , N bN eN N the smallest interval that covers [a0, b0] and the range of g(x). and Then the following are equivalent: N−1 N−1 1 X 1 X lim (ϑ(λl(HN )) − ϑ(λl(CN ))) = 0, lim (ϑ(uN,l) − ϑ(vN,l)) = 0; (5) N→∞ N N→∞ N l=0 l=0 where ϑ is any continuous function on [a, b] and CN ∈ n o lim max |uN,l − vN,l| = 0. (6) CeN , CbN , CN . Here [a, b] is the smallest interval that N→∞ l∈[N] covers all the eigenvalues of HN , CeN , CbN , and CN . Proof. See Appendix A. Proof (of Theorem III.1). See Appendix B. A stronger result follows simply from the elementary view of Weyl’s theory of equal distribution [30], which is pre- If eh(f) ≡ C is a constant function, then HN and CN are sented in Lemma B.1. As a reminder, we do assume that diagonal matrices with all diagonals being C for all CN ∈ the eigenvalues of each Toeplitz matrix are ordered such that n o C , C , C and N ∈ . Thus λ (H ) = λ (C ) = C λ (H ) ≥ · · · ≥ λ (H ). eN bN N N l N l N 0 N N−1 N n o Lemma II.6. Suppose that the sequence h[k] is square for all l ∈ [N] and CN ∈ CeN , CbN , CN . The following summable and {HN }, {CeN }, {CbN }, {CN } are uniformly ab- result establishes the range of the eigenvalues of CN and HN solutely bounded. Let λl(CN ) be permuted that such that for the case when eh(f) is not a constant function. 7

∞ Lemma III.2. Suppose that eh ∈ L ([0, 1]) and eh is not for all f ∈ [0, 1] and N ≥ N0. The Cesaro` sum σN (f) also a constant function. Also let λl(CN ) be permuted such that converges to eh(f) uniformly on [0, 1] as N → ∞. λρ(0)(CN ) ≥ λρ(1)(CN ) ≥ · · · ≥ λρ(N−1)(CN ). Then Since the eigenvalues of CeN and CbN are, respectively, the samples of SN−1(f) and S N−1 (f), we conclude that {CeN } ess inf eh < λN−1(HN ) ≤ λρ(N−1)(CN ) b 2 c and {CbN } are uniformly absolutely bounded. Lemma III.2 and implies that {C } and {H } are also uniformly absolutely λ (C ) ≤ λ (H ) < sup h. N N ρ(0) N 0 N ess e bounded. We next show limN→∞ maxl λl (CN ) = maxf∈[0,1] eh(f) n o  C ∈ C , C , C Proof (of Lemma III.2). We first rewrite λl CN as for all N eN bN N . The extreme value theorem N−1 states that eh(f) must attain a maximum and a minimum each  X −j2πln λl CN = c[n]e N at least once since eh(f) ∈ R is continuous on [0, 1]. Let n=0 N−1 fb := arg max eh(f) X 1 −j2πln f = ((N − n)h[−n] + nh[N − n]) e N N n=0 denote any point at which eh achieves its maximum value. Also   1 1 let = HN √ el/N , √ el/N . l N N blN := arg min fb− l∈[N] N By definition, λ0(HN ) = maxkvk2=1hHN v, vi and λ (H ) = min hH v, vi N−1 N kvk2=1 N , we obtain denote any closest on-grid point to fb. For arbitrary  > 0, by uniform convergence, there exists N0 such that λN−1(HN ) ≤ λl(CN ) ≤ λ0(HN ), ∀ l. ! N l For arbitrary v ∈ , kvk2 = 1, we extend v to an infinite bN C λ (CN ) − eh ≤  blN sequence v[n], n ∈ Z by zero-padding. Then N N−1 N−1 X ∗ X for all N ≥ N . Noting that blN − f ≤ 1 and h is hHN v, vi = v [m] h[m − n]v[n] 0 N b 2N e m=0 n=0 continuous on [0, 1], there exists N1 ∈ N so that ∞ ∞ ! X ∗ X   l = v [m] h[m − n]v[n] bN eh fb − eh ≤  m=−∞ n=−∞ N Z 1 2 = |ve(f)| eh(f)df when N ≥ N1. Thus we conclude 0   N−1 P j2πfn λl (CN ) − eh fb ≤ 2 where ve(f) = n=0 v[n]e . If eh(f) is not a constant bN function of [0, 1], we conclude for all N ≥ max {N0,N1}. Since  is arbitrary, Z 1 2 ess inf eh = |ve(f)| df · ess inf eh < hHN v, vi lim max λl (CN ) = max eh(f) 0 N→∞ l f∈[0,1] Z 1 2 n o  < |ve(f)| df · ess sup eh = ess sup eh. for all CN ∈ CeN , CbN , CN . Noting that λl CN ≤ 0 λ0(HN ) ≤ maxf∈[0,1] eh(f), we obtain

lim max λl (HN ) = max eh(f). Theorem I.1 holds trivially when eh(f) is a constant function N→∞ l f∈[0,1] since for this case H and C have the same eigenvalues for n N o N The asymptotic argument for the smallest eigenvalues can be all CN ∈ CeN , CbN , CN and N ∈ N. In what follows, we obtained with a similar approach. It follows from Lemma II.5 that H ∼ C ∼ C ∼ C and from Szego’s˝ theorem (1) suppose eh(f) is not a constant function. The assumption of N bN eN N absolute summability of the sequence h[k] indicates that its that N−1 Z 1 h(f) [0, 1] 1 X DTFT e is continuous on , and moreover, its partial lim ϑ(λl(HN )) = ϑ(eh(f))df. N→∞ N Fourier sum SN (f) converges uniformly to eh(f) on [0, 1] as l=0 0 N → ∞ [23]. Thus, given  > 0, there exists N0 ∈ N such that Finally, the proof of Theorem I.1 is completed by applying Theorem III.1 with g = eh.

eh(f) − SN−1(f) ≤  8

B. Proof of Theorem I.2 We first note that [15, Theorem 1.6] provides a similar Proof (of Theorem III.3). See Appendix D. convergence rate for individual asymptotic eigenvalue esti- mates for band Toeplitz matrices by sampling the symbol eh(f) uniformly in the frequency on [0, 1]. As illustrated in Remark. Theorem III.1 requires that g is continuous and that Appendix C, when h[k] = 0 for all |k| > r, the eigenvalues the extreme values of the sequences asymptotically converge to g of both CeN and CbN are equivalent to the DFT samples of the extreme values of (but meanwhile the extreme values of Pr j2πfk the sequences can be outside of the range of g). Theorem III.3 SN−1(f) = eh(f) = h[k]e . Thus, (3) for CeN k=−r requires the sequences to be strictly inside the range of g (but C and bN follows directly from [15, Theorem 1.6], whose proof g can have discontinuities). involves the quantile function and a refinement of Szego’s˝ Now we are ready to prove Theorem I.3. If eh(f) ≡ C is asymptotic formula in terms of the number of eigenvalues  inside a given interval in [31]. To keep the paper self-contained a constant function, then λl (HN ) = λl CN = C for all and to exhibit an alternative approach, we also prove (3) l ∈ [N]. Thus Theorem I.3 holds trivially. On the other hand, suppose that eh ∈ L∞([0, 1]) is not a constant function and the for CeN and CbN by directly bounding the error between h i the eigenvalues of the band Toeplitz and circulant matrices. essential range of eh is ess inf eh, ess sup eh . It follows from The complete proof is given in Appendix C. We outline the     Lemma III.2 that λl (HN ) , λl CN ∈ int R eh for all main idea here. Let [HN ]N−r be the (N − r) × (N − r) matrix obtained by deleting the last r columns and the last l ∈ [N] and N ∈ N. Using Lemma II.5 and Szego’s˝ theorem (see (1)), the fact that h[k] is square summable together r rows of HN . Similar notation holds for [CeN ]N−r. Note with the fact that {HN } and {CN } are uniformly absolutely that [H]N−r and [CeN ]N−r have the same eigenvalues when bounded imply N > 2r since [H]N−r is exactly the same as [CeN ]N−r. Also C is equivalent to C when N > 2r. We first apply the N−1 1 bN eN 1 X Z Sturmian separation theorem for the Toeplitz and circulant lim ϑ(λl(HN )) = ϑ(eh(f))df, N→∞ N matrices to obtain a bound on the distance between λl(HN ) l=0 0 and λ (C ). We then utilize the fact that h(f) is Lipschitz ρ(l) eN e and continuous to guarantee the closeness between λl(CeN ) and N−1 λl+r(CeN ). Finally, we show λl(CeN ) is close to λl(CN ) since 1 X  lim ϑ(λl(HN )) − ϑ(λl(CN )) = 0 the Cesaro` sum and partial Fourier sum converge to the same N→∞ N function in this case. l=0 h i for all ϑ that are continuous on ess inf eh, ess sup eh . Finally, (4) follows from Theorem III.3 with g = h, u = λ (H ) C. Proof of Theorem I.3 e N,l l N and vN,l = λρ(l)(CN ). This completes the proof of Theorem We first provide another condition (which, informally speak- I.3. ing, is weaker than that in Theorem III.1) under which the equal distribution of two sequences implies individual asymp- totic equivalence. Remark. We note that, as guaranteed by Lemma III.2 that Theorem III.3. Assume that b ≥ u ≥ u ≥ · · · ≥ λl(CN ) and λl(HN ) are always inside the range of eh for N,0 N,1 all l ∈ [N] and N ∈ , Theorem III.3 can also be utilized uN,N−1 ≥ a and b ≥ vN,0 ≥ vN,1 ≥ · · · ≥ vN,N−1 ≥ a. N Furthermore, suppose there is a Riemann integrable function to prove Theorem I.1 (i.e., (2)) for CN = CN . However, g(x):[c, d] → [a, b] such that we cannot apply Theorem III.3 for the other two classes of circulant approximations since their eigenvalues can be outside uN,l, vN,l ∈ int (ess R(g)) , ∀ l ∈ [N],N ∈ N, of the range of eh. and N−1 1 X 1 Z d D. Proof of Theorem I.4 lim ϑ(uN,l) = ϑ(g(x))dx < ∞ N→∞ N d − c Our proof of Theorem I.4 appears in Appendix E. l=0 c for all ϑ that are continuous on [a, b]. Then the following are equivalent: N−1 1 X IV. SIMULATIONS lim (ϑ(uN,l) − ϑ(vN,l)) = 0; (7) N→∞ N In this section, we provide several examples to illustrate our l=0 theory. In the legends of Figures 1–3, we refer to the circulant approximations C , C , and C as Circulant1, Circulant2, lim max |uN,l − vN,l| = 0. (8) eN bN N N→∞ l and Circulant3, respectively. 9

 2 sin(πW k) 1 Due to the gap (between 0 to 1) in the range of the A. h[k] = W πk ,W = 4 window function eh, the eigenvalues of HN and CN have In our first example, the sequence h[k] is absolutely different behavior in the transition region. To better illustrate summable and the corresponding symbol this, Figures 3(d) and 3(e), respectively, show λl(HN ) and λ (C ) for N = 2048. We see that the eigenvalues of the  1 − f , 0 ≤ f ≤ W ρ(l) N f  W Toeplitz matrix H cover the range [0, 1] somewhat uniformly, h(f) = tri( ) = 1−f N e 1 − W , 1 − W ≤ f ≤ 1 C 0 1/2 W  while the eigenvalues of the N tend to cluster around , , 0, otherwise and 1 (there are none near 1/4 or 3/4). The following result is a triangular signal, which is continuous on [0, 1]. Fig- formally explains the transition behavior of the eigenvalues of HN . ure 1(a) shows eh, Figure 1(b) shows λl(HN ), λρ(l)(CeN ), sin(2πW k) 1 λρ(l)(CbN ) and λρ(l)(CN ) for N = 500, and Figure 1(c) shows Lemma IV.1. [6, 32, 33] Let h[k] = πk with W = 4 . 1 maxl∈[N] λl(HN ) − λρ(l)(CN ) against the N for Fix  ∈ (0, 2 ). Then there exist constants C1,C2 and N1 such n o that the distance between any 2 consecutive eigenvalues of all CN ∈ CeN , CbN , CN . As guaranteed by Theorem I.1, it C1 HN inside (, 1 − ) is bounded from below by ln(N) and can be observed in Figure 1(c) that the individual asymptotic C2 from above by ln(N) ; that is convergence of eigenvalues holds for all CeN , CbN , and CN . C1 C2 ≤ λl(HN ) − λl+1(HN ) ≤ 1+(−1)k ln(N) ln(N) B. h[k] = j2πk In this case, the sequence h[k] is not absolutely summable for all  ≤ λl+1(HN ) ≤ λl(HN ) ≤ 1 −  and N ≥ N1. Also and the symbol 1 λb 1 Nc−1 ≥ ≥ λd 1 Ne  1 2 2 2 2f, 0 < f ≤ 2 eh(f) = 1 for all N ∈ . 2f − 1, 2 < f ≤ 1 N is not continuous, but its range is connected. Figure 2(a) On the other hand, we have the following result on the eigenvalues of CN . shows eh, Figure 2(b) shows λl(HN ), λρ(l)(CeN ), λρ(l)(CbN ) sin(2πW k) 1 and λ (CN ) for N = 500, and Figure 2(c) shows Lemma IV.2. Let h[k] = with W = . Then ρ(l) πk 4 maxl∈[N] λl(HN ) − λρ(l)(CN ) against the dimension N  n o  1 = 0, l = N/4, 3N/4, for all C ∈ C , C , C . It is observed from λl CN − N eN bN N 2 ≥ α, l ∈ [N] and l 6= N/4, 3N/4, Figure 2(c) that the individual asymptotic convergence of the eigenvalues holds for CN —as guaranteed by The- with α = 0.4 if N is a multiple of 4. orem I.3—but not for C and C . Figure 2(c) also eN bN Proof. See Appendix F. shows that the errors maxl∈[N] λl(HN ) − λρ(l)(CeN ) and

With more sophisticated analysis, we believe that the above maxl∈[N] λl(HN ) − λρ(l)(CbN ) converge to the size of the Gibbs jump (≈ 0.089). result could be improved to α ≈ 0.45. This is suggested by Figure 3(e). Combining Lemmas IV.1 and IV.2, we conclude that sin(2πW k) 1 max λ (H ) − λ (C ) ≈ 0.4 N → C. h[k] = πk ,W = 4 l∈[N] l N ρ(l) N approaches as ∞ and N is a multiple of 4. In this example, the sequence h[k] is not absolutely Finally, Figure 3(f) plots λ0(HN ) − λρ(0)(CN ) and summable and the symbol λN−1(HN ) − λρ(N−1)(CN ) against the dimension N. As  0, W < f ≤ 1 − W, can be observed, the largest and smallest eigenvalues of C eh(f) = N 1, otherwise, converge to the largest and smallest eigenvalues of HN , respectively. This is as guaranteed by Theorem I.4. is a rectangular window function, which is not contin- uous and whose range is not connected. Figure 3(a) V. CONCLUSIONS shows eh, Figure 3(b) shows λl(HN ), λρ(l)(CeN ), λρ(l)(CbN ) It is well known that any sequence of uniformly bounded and λρ(l)(CN ) for N = 2048, and Figure 3(c) shows Hermitian Toeplitz matrices is asymptotically equivalent to cer- maxl∈[N] λl(HN ) − λρ(l)(CN ) against the dimension N for n o tain sequences of circulant matrices derived from the Toeplitz all CN ∈ CeN , CbN , CN . Figure 3(c) illustrates that the matrices. We have provided conditions under which the asymp- individual asymptotic convergence of eigenvalues does not totic equivalence of the matrices implies the individual asymp- hold for the circulant matrices CeN , CbN , and CN . Indeed, totic convergence of the eigenvalues. Our results suggest that the sequence h[k] does not meet the assumptions in either instead of directly computing the eigenvalues of a Toeplitz Theorem I.1 or Theorem I.3. matrix, one can compute a fast spectrum approximation using 10

1 1 0.025 Toeplitz Circulant1 Circulant1 Circulant2 0.02 Circulant2 Circulant3 Circulant3 0.015 e h(f) 0.5 0.01

0.005

0 0 0 0 0.25 0.5 0.75 1 0 250 499 100 400 700 1000 1300 1600 1900 f l N (a) (b) (c)

Fig. 1. (a) Illustration of a continuous symbol eh(f). (b) The eigenvalues of the Toeplitz matrix HN and the circulant approximations CeN , CbN , and CN , n o arranged in decreasing order. Here N = 500. (c) A plot of maxl∈[N] λl(HN ) − λρ(l)(CN ) versus the dimension N for all CN ∈ CeN , CbN , CN .

1 1.0855 0.12 1 Toeplitz Circulant1 0.1 Circulant2 Circulant3 0.08 e Circulant1 h(f) 0.5 0.06 Circulant2 Circulant3 0.04

0.02 0 0 −0.0855 0 0 0.5 1 0 250 499 101 401 701 1001 1301 1601 1901 f l N (a) (b) (c)

Fig. 2. (a) Illustration of a discontinuous symbol eh(f). (b) The eigenvalues of the Toeplitz matrix HN and the circulant approximations CeN , CbN , and CN , n o arranged in decreasing order. Here N = 500. (c) A plot of maxl∈[N] λl(HN ) − λρ(l)(CN ) versus the dimension N for all CN ∈ CeN , CbN , CN .

the FFT. This is long known, but we provide new guarantees APPENDIX A for the asymptotic convergence of the individual eigenvalues. PROOFOF LEMMA II.5 Some numerical examples have demonstrated the dependence of the convergence behavior on the properties of the symbol of the Toeplitz matrix. An interesting question would be whether it is possible to extend our analysis to general (non- It follows from the definition of CbN that Hermitian) Toeplitz matrices, along the lines of the Avram- Parter theorem [8, 9]. In addition, it would also be of interest to extend our analysis to the asymptotic equivalence of block 2 HN − CbN Toeplitz and block circulant matrices. F N−1 b 2 c X  2 2 = k |h[k] − h[−N + k]| + |h[−k] − h[N − k]| k=1 bN/2c X  2 2 + k |h[k]| + |h[−k]| N−1 k=b 2 c+1 ACKNOWLEDGEMENTS bN/2c X  2 2 2 2 ≤ 2k |h[k]| + |h[−k]| + |h[N − k]| + |h[k − N]| k=1 N−1 X  2 2 The authors would like to thank the reviewers for their ≤ 2k |h[k]| + |h[−k]| . constructive comments and helpful suggestions. k=1 11

1 1.0895 0.16 1 Toeplitz Circulant1 0.155 Circulant2 0.15 Circulant3 e 0.145 h(f) 0.5 0.14

0.135 Circulant1 0.13 Circulant2 0 Circulant3 0 −0.0895 0.125 0 0.25 0.5 0.75 1 0 1024 2047 500 2000 3500 5000 f l N (a) (b) (c) −4 x 10 1 1 8 minimum eigenvalue maximum eigenvalue 6

0.5 0.5 4

2

0 0 0 0 1024 2047 0 1024 2047 500 2000 3500 5000 l l N (d) (e) (f)

Fig. 3. (a) Illustration of a discontinuous symbol eh(f) whose range is not connected. (b) The eigenvalues of the Toeplitz matrix HN and the circulant approximations CeN , CbN , and CN , arranged in decreasing order. Here N = 2048. (c) A plot of maxl∈[N] λl(HN ) − λρ(l)(CN ) versus the dimension n o N for all CN ∈ CeN , CbN , CN . (d) The eigenvalues of the Toeplitz matrix HN . (e) The eigenvalues of the circulant matrix CN , arranged in decreasing order. (f) A plot of λ0(HN ) − λρ(0)(CN ) and λN−1(HN ) − λρ(N−1)(CN ) versus the dimension N.

Fix  > 0. By assumption that the sequence h[k] is square Noting that {HN } and {CbN } are uniformly absolutely summable, there exists N0 such that bounded by assumption, we conclude HN ∼ CbN . The proofs ∞ of H ∼ C and H ∼ C follow from the same approach. X 2 2 N eN N N |h[k]| + |h[−k]| ≤ . Invoking Theorem II.4 completes the proof.

k=N0 Thus we have 1 2 HN − CbN APPENDIX B N F PROOFOF THEOREM III.1 N0−1 1 X  2 2 ≤ 2k |h[k]| + |h[−k]| Set N k=1 1 N Fg(α) := µ {x ∈ [c, d]: g(x) ≤ α} , (9) 1 X  2 2 d − c + 2k |h[k]| + |h[−k]| N 1 k=N F (α) := # {l ∈ [N], u ≤ α} , 0 uN N N,l N0−1 N 1 X  2 2 X  2 2 1 ≤ 2k |h[k]| + |h[−k]| + 2 |h[k]| + |h[−k]| Fv (α) := # {l ∈ [N], vN,l ≤ α} . N N N k=1 k=N0 ≤ + 2 = 3 Here, µ(E) is the Lebsegue measure of a subset E ∈ R. ∞ Definition II.1 states that the sequences {{uN,l}l∈[N]}N=1 when N ≥ max {N0,N1} with N1 ≥ ∞   and {{vN,l}l∈[N]}N=1 are asymptotically equally distributed PN0−1 2 2 k=1 2k |h[k]| + |h[−k]| /. Since  is arbitrary, if we obtain N−1 2 1 X 1 lim (ϑ (uN,l) − ϑ (vN,l)) = 0 lim HN − CbN = 0. N→∞ N N→∞ N F l=0 12

0 α+α1 for all ϑ that are continuous on [a, b]. Here [a, b] is the α −  < α1 < α and let α1 = 2 ∈ R(g). Noting that ∞ smallest interval that covers the sequences {{uN,l}l∈[N]}N=1 g is continuous, we have ∞ and {{vN,l}l∈[N]}N=1.   0 α − α1 Trench [30] strengthens this definition by showing the µ x ∈ [c, d]: |g(x) − α1| < > 0. following result. 2 Lemma B.1. [30, Asymptotcially (absolutely) equal distribu- Thus, we obtain tion] Assume that b ≥ uN,0 ≥ uN,1 ≥ · · · ≥ uN,N−1 ≥ a 1 Fg(α) − Fg(α1) = µ {x ∈ [c, d]: α1 < g(x) ≤ α} and b ≥ vN,0 ≥ vN,1 ≥ · · · ≥ vN,N−1 ≥ a. The following are d − c equivalent: 1 1 PN−1 ≥ µ {x ∈ [c, d]: α1 < g(x) < α} > 0. 1) limN→∞ N l=0 (ϑ (uN,l) − ϑ (vN,l)) = 0 for all ϑ d − c that are continuous on [a, b]; 1 PN−1 Similarly, we have Fg(α) < Fg(α2) for α < α2 < α + . 2) limN→∞ N l=0 |ϑ (uN,l) − ϑ (vN,l)| = 0 for all ϑ that are continuous on [a, b]. We are now ready to prove the main part. First we show ∞ ∞ Here the sequences {{uN,l}l∈[N]}N=1 and {{vN,l}l∈[N]}N=1 that (6) implies (5). Fix ϑ being some continuous function on are said to be absolutely asymptotically equally distributed [30] [a, b] and  > 0. The Weierstrass approximation theorem states if that there exists a p on [a, b] such that N−1  1 X |ϑ(t) − p(t)| ≤ lim |ϑ (uN,l) − ϑ (vN,l)| = 0 N→∞ N 3 l=0 for all t ∈ [a, b]. Since p is a polynomial, there exists a constant for all ϑ that are continuous on [a, b]. C such that Viewing g :[c, d] → R as a random variable, in probabilistic |p(t2) − p(t1)| ≤ C |t2 − t1| language, Fg is the cumulative distribution function (CDF) for any a ≤ t1 ≤ t2 ≤ b. Also (6) implies that there exists an associated to g. Also FuN and FvN can be viewed as the CDF N0 ∈ such that of the discrete random variables uN : {0, 1,...,N − 1} → R N defined by uN (l) = uN,l and vN : {0, 1,...,N − 1} →  R |uN,l − vN,l| ≤ , ∀ l ∈ [N] defined by vN (l) = vN,l, respectively. It is well known that 3C the CDF of a random variable is right continuous and non- for all N ≥ N0. Therefore, we have decreasing. The following result, known as the Portmanteau Lemma, gives two equivalent descriptions of weak conver- |ϑ(uN,l) − ϑ(vN,l)| gence in terms of the CDF and the means of the random ≤ |ϑ(uN,l) − p(uN,l)| + |p(uN,l) − p(vN,l)| variables. + |ϑ(vN,l) − p(vN,l)| Lemma B.2. [34, Portmanteau Lemma] The following are    ≤ + C + =  equivalent: 3 3C 3 1 PN−1 1 R d 1) limN→∞ N l=0 ϑ(uN,l) = d−c c ϑ(g(x))dx, for all for all l ∈ [N] and N ≥ N0. Thus bounded, continuous functions ϑ; N−1 N−1 2) lim F (α) = F (α) for every point α at which F 1 X 1 X N→∞ uN g g (ϑ (u ) − ϑ (v )) ≤ |ϑ(u ) − ϑ(v )| is continuous. N N,l N,l N N,l N,l l=0 l=0 Despite the fact that Fg(α) is right continuous and non- ≤ decreasing everywhere, some stronger results about F (α) can g for all N ≥ N . Since  is arbitrary, this implies (5). be obtained by utilizing the fact that g is continuous on [c, d]. 0 Now let us show that (5) implies (6). We prove the statement Lemma B.3. Let Fg(α) be defined as in (9). Then Fg(α) is (5) ⇒ (6) by contradiction. Suppose (6) is not true, i.e., there ∞ strictly increasing on R(g), i.e., for every α ∈ int (R(g)), exists an increasing sequence {Mk0 }k0=1 and 1 > 0 such that there exists  > 0 such that, for each pair (α1, α2) satisfying max uMk0 ,l − vMk0 ,l ≥ 21 l∈[M 0 ] min g(x) ≤ α −  < α1 < α < α2 < α +  ≤ max g(x), k x∈[c,d] x∈[c,d] 0 0 for all k ≥ 1. Let lk = arg maxl∈[M 0 ] uMk0 ,l − vMk0 ,l we have k denote any point at which uMk0 ,l − vMk0 ,l achieves its max- Fg(α1) < Fg(α) < Fg(α2). imum, which implies uM 0 ,l 0 − vM 0 ,l 0 ≥ 21. Without k k k k ∞ loss of generality, we suppose {Mk00 , lk00 }k00=1 is a subse- ∞ 0 0 0 quence of {Mk , lk }k =1 such that uMk00 ,lk00 ≤ vMk00 ,lk00 , i.e.,

Proof (of Lemma B.3). Since g(x):[c, d] → R is con- uMk00 ,lk00 ≤ vMk00 ,lk00 − 21. tinuous, there exists  such that (α − , α + ) ⊂ R(g) for Note that uMk00 ,lk00 ≤ vMk00 ,lk00 − 21 together with

α ∈ int (R(g)). Let α1 be an arbitrary value such that limN→∞ vN,0 = maxx∈[c,d] g(x) implies uMk00 ,lk00 < 13

00 maxx∈[c,d] g(x) for all sufficiently large k . Thus we let which implies ∞ ∞ {Mk, lk}k=1 be a subsequence of {Mk00 , lk00 }k00=1 such that vMk,lk − uMk,lk−d3Mke uMk,lk < maxx∈[c,d] g(x). By assumption that

=vMk,lk − uMk,lk + uMk,lk − uMk,lk−d3Mke lim uN,N−1 = lim vN,N−1 = min g(x), N→∞ N→∞ x∈[c,d] ≥21 − 1 ≥ 1. there exist k0 ∈ N and αk ∈ int (R(g)) such that Now taking ϑ(t) = t, we obtain

1 Mk−1 0 ≤ αk − uMk,lk < 1 X 2 |ϑ(uMk,l) − ϑ(vMk,l)| Mk 7 l=0 and Fg is continuous at αk for all k ≥ k0. Noting that Fg l is right continuous everywhere and strictly increasing at all 1 Xk ≥ |ϑ(u ) − ϑ(v )| α (which is shown in Lemma B.3), there exist  (α ) > 0 Mk,l Mk,l k 2 k Mk l=l −d M e (which depends on αk) and 3 > 0 (which is independent of k 3 k  α ) such that  (α ) ≤ 1 , F is continuous at α +  (α ), lk k 2 k 2 g k 2 k 1 X and ≥ |ϑ(uMk,l) − ϑ(vMk,lk )| Mk l=lk−d3Mke Fg(αk + 2(αk)) ≥ Fg(αk) + 33. (10) ≥31 > 0 Lemma B.2 indicates that for all k ≥ k1. This contradicts Lemma B.1. Thus we have lim F (α) = F (α) proved that (5) implies (6). uMk g Mk→∞ for every point α at which Fg is continuous. Thus there exist k1 ∈ N, k1 ≥ k0 such that APPENDIX C

Fu (αk) − Fg (αk) < 3, PROOFOF THEOREM I.2 Mk (11) We first establish the following useful results. Fu (αk + 2(αk)) − Fg (αk + 2(αk)) < 3 Mk Lemma C.1. Let u0, u1, . . . , uN−1 ∈ R be an unordered for all k ≥ k1. Thus, we have sequence of N elements. We decreasingly arrange this se- quence so that uρ(0) ≥ uρ(1) ≥ · · · ≥ uρ(N−1). Then for F (α +  (α )) − F (α ) uMk k 2 k uMk k any r ∈ {1, 2 ...,N − 1}, we have =F (α +  (α )) − F (α +  (α )) + F (α +  (α )) uMk k 2 k g k 2 k g k 2 k max max uρ(l) − uρ(l+r0) − F (α ) + F (α ) − F (α ) 1≤r0≤r l∈[N−r0−1] g k g k uMk k ≤ max max |ul − ul+r0 | . ≥Fg(αk + 2(αk)) − Fg(αk) − Fg(αk) − Fu (αk) 1≤r0≤r l∈[N−r0−1] Mk

− Fu (αk + 2(αk)) − Fg(αk + 2(αk)) Mk Proof (of Lemma C.1). The proof is straightforward for the ≥33 − 3 − 3 = 3 case r = 1. If the sequence is constant, then k ≥ k for all 1, where the last line follows from (10) and (11). max uρ(l) − uρ(l+1) = max |ul − ul+1| = 0. Noting that the above equation is equivalent to l∈[N−2] l∈[N−2] 1 Suppose the sequence is not constant, i.e., there exist at least # {l ∈ [M ], α < u ≤ α +  (α )} ≥  , k k Mk,l k 2 k 3 l1, l2 ∈ [N] so that ul 6= ul . Let Mk 1 2 0 we have l = arg max uρ(l) − uρ(l+1) l∈[N−2] 1 # {l ∈ [M ], u < u ≤ α +  (α )} k Mk,lk Mk,l k 2 k u − u Mk denote any point at which ρ(l) ρ(l+1) achieves its maxi- 1 mum. Search the sequence {ul}l∈[N] to find ul00 that is smaller ≥ # {l ∈ [M ], α < u ≤ α +  (α )} ≥  . 00 0 k k Mk,l k 2 k 3 than u 0 and its index l is closest to ρ(l ). Thus Mk ρ(l )

Thus, we obtain max {|ul00 − ul00+1| , |ul00 − ul00−1|} ≥ max uρ(l) −uρ(l+1). l∈[N−2] 0 ≤ u − u ≤ α +  (α ) − u ≤  , Mk,lk−d3Mke Mk,lk k 2 k Mk,lk 1 Suppose r ≥ 2. Similarly, the proof for a constant sequence

7 is obvious. Suppose the sequence is not constant. Let If Fg is continuous at uMk,lk , then we can pick αk = uMk,lk . Otherwise, since Fg is continuous almost everywhere, there always exists an α that is 0 0 k {l , r } = arg max max uρ(l) − uρ(l+r00). 00 00 close to uMk,lk and such that Fg is continuous at αk. 1≤r ≤r l∈[N−r −1] 14

If there are several pairs {l0, r0} have the same values, we choose the one that r0 has the smallest value. If r0 = 1, the Lemma C.3. [24, Sturmian separation theorem] Let A be proof is similar to the case r = 1. We suppose r0 ≥ 2. Thus N 0 an N×N and let [AN ]N−1 be the (N − 1)× there exist at least r elements that are smaller than uρ(l0) and 0 (N − 1) matrix obtained by deleting the last column and the only r −1 elements that are greater than uρ(l0+r0) and smaller last row of AN . Also let λ0(AN ) ≥ · · · ≥ λN−1(AN ) and than uρ(l0). Search the sequence {ul}l∈[N] to find ul00 that is 00 0 0 λ0([AN ]N−1) ≥ · · · ≥ λN−2([AN ]N−1) respectively denote smaller than uρ(l0) and its index l is the r -th closest to ρ(l ). the descending eigenvalues of A and [A ] . Then Without loss of generality, suppose l00 < ρ(l0). N N N−1 If ul00 ≤ uρ(l0+r0), we have λl(AN ) ≥ λl([AN ]N−1) ≥ λl+1(AN )

max |ul00 − ul00+r00 | ≥ uρ(l0) − uρ(l0+r0) for all 0 ≤ l ≤ N − 2. 1≤r00≤r0

00 00 0 The above Sturmian separation theorem forms the founda- since there is at least one element in {ul, l + 1 ≤ l ≤ l + r } tion of the following analysis. We note that Zizler et.al. [31] that is greater than or equal to uρ(l0). utilized the Sturmian separation theorem to prove a refinement 00 000 0 00 If ul00 > uρ(l0+r0), there exists l ≤ l ≤ 2ρ(l ) − l such of Szego’s˝ asymptotic formula in terms of the number of that ul000 is smaller than or equal to uρ(l0+r0) (otherwise, there eigenvalues inside a given interval. 0 are r elements that are greater than uρ(l0+r0) and smaller than uρ(l0)). Also near ul000 , there must exist at least one element Now we are well equipped to prove Theorem I.2. In what that is not smaller than uρ(l0). Then follows, we consider N > 2r. Note that in this case CeN is equivalent to CbN and the eigenvalues of CeN are the DFT r max max {|ul000 − ul000+r00 | , |ul000 − ul000−r00 |} P j2πfk 1≤r00≤r0 samples of SN−1(f) = eh(f) = k=−r h[k]e . Recall that [HN ]N−r is the (N − r) × (N − r) matrix obtained by ≥uρ(l0) − uρ(l0+r0). deleting the last r columns and the last r rows of HN . Similar This completes the proof. notation holds for [CeN ]N−r. Note that [H]N−r is exactly the same as [CeN ]N−r as they have the same elements when N > 2r. Thus [H]N−r and In words, the largest error between the contiguous elements [CeN ]N−r have the same eigenvalues. Let λl([CeN ]N−r) be of a sequence is not magnified when the sequence is rearranged permuted such that in decreasing (or increasing) order. The following result establishes that the largest error be- λρ(0)([CeN ]N−r) ≥ · · · ≥ λρ(N−r−1)([CeN ]N−r). tween two sequences is not magnified when both of the sequences are rearranged in decreasing (or increasing) order. We first consider the simple case when r = 1. It follows from the Sturmian separation theorem that Lemma C.2. Let u0, . . . , uN−1 ∈ R and v0, . . . , vN−1 ∈ R be two unordered sequences of N elements. We decreasingly λl(HN ) ≥ λl([HN ]N−1) ≥ λl+1(HN ), arrange these sequences so that uρ(0) ≥ uρ(1) ≥ · · · uρ(N−1) λρ(l)(CeN ) ≥ λρ(l)([CeN ]N−1) ≥ λρ(l+1)(CeN ) and vρ(0) ≥ vρ(1) ≥ · · · vρ(N−1). Then for all 0 ≤ l ≤ N − 2. This implies the following relationship max uρ(l) − vρ(l) ≤ max |ul − vl| . l∈[N−1] l∈[N−1] between λl(HN ) and λρ(l)(CeN )

λl(HN ) ≤λl−1([HN ]N−1) = λρ(l−1)([CeN ]N−1) Proof (of Lemma C.2). Let ≤λρ(l−1)(CeN ), ∀ l = 1, 2,...,N − 1, 0 (12) r = arg max uρ(r) − vρ(r) λl(HN ) ≥λl([HN ]N−1) = λρ(l)([CeN ]N−1) r∈[N−1] ≥λρ(l+1)(CeN ), ∀ l = 0, 1,...,N − 2 denote any point at which uρ(r) − vρ(r) achieves its max- 0 which give imum and let l be the index of uρ(r0). Without loss of generality, we suppose uρ(r0) ≥ vρ(r0). If vl0 ≤ vρ(r0), we have   λ (H ) − λ C ul0 − vl0 ≥ uρ(r0) − vρ(r0). Otherwise suppose vl0 > vρ(r0), l N ρ(l) eN which implies r0 ≥ 1. Since there are only r0 elements n   0 = max λl (HN ) − λρ(l) CeN , in {ul}l∈[N] that are greater than uρ(r0) and r elements in 00   o {vl} that are greater than vρ(r0), there must exist l such l∈[N] λρ(l) CeN − λl (HN ) that u 00 ≥ u 0 and v 00 ≤ v 0 . Hence l ρ(r ) l ρ(r ) n     ≤ max λρ(l−1) CeN − λρ(l) CeN , u 00 − v 00 ≥ u 0 − v 0 . l l ρ(r ) ρ(r )    o λρ(l) CeN − λρ(l+1) CeN 15

1 for all 1 ≤ l ≤ N − 2. Applying Lemma C.1 with r = 1, we uniform samples of eh(f) with grid size N . Similarly, we have obtain  

λN−1 (HN ) − λρ(N−1) CeN   n   max λl (HN ) − λρ(l) CeN = max λρ(N−1) CN − λN−1 (HN ) , 1≤l≤N−2 e      o ≤ max λρ(l) CeN − λρ(l+1) CeN λN−1 (HN ) − λρ(N−1) CeN 0≤l≤N−2        ≤ max λl CeN − λl+1 CeN . ≤ max λρ(N−1) CeN − min eh(f), 0≤l≤N−2 f∈[0,1]    o λρ(N−2) CeN − λρ(N−1) CeN Note that eh(f) is Lipschitz continuous since it is continuously 1 ≤K . differentiable. There exists a Lipschitz constant K such that, N for all f and f in [0, 1], 1 2 Along with (13), we conclude   1 max λl (HN ) − λρ(l) CeN ≤ K . eh(f1) − eh(f2) ≤ K |f1 − f2| . 0≤l≤N−1 N Now we consider the case r > 1. Repeatedly applying the Sturmian separation theorem r times yields From the fact that the eigenvalues of CeN are the DFT samples   l λ (H ) ≥ λ ([H ] ) ≥ λ (H ), of eh(f), i.e., λl CeN = eh( N ), it follows that l N l N N−r l+r N λρ(l)(CeN ) ≥ λρ(l)([CeN ]N−r) ≥ λρ(l+r)(CeN )   for all 0 ≤ l ≤ N − r − 1. Noting that [HN ]N−r is the same max λl (HN ) − λρ(l) CeN 1≤l≤N−2 as [CeN ]N−r, we have     ≤ max λl CeN − λl+1 CeN 0≤l≤N−2 (13) λl(HN ) ≤λl−r([HN ]N−r) = λρ(l−r)([CeN ]N−r)

l l + 1 1 ≤λρ(l−r)(CeN ), ∀ l = r, r + 1,...,N − 1, ≤ max eh( ) − eh( ) ≤ K . 0≤l≤N−2 N N N λl(HN ) ≥λl([HN ]N−r) = λρ(l)([CeN ]N−r)

≥λρ(l+r)(CeN ), ∀ l = 0, 1,...,N − r − 1 Utilizing the fact that λ (H ) ≤ max h(f) and 0 N f∈[0,1] e which give λ (H ) ≥ min h(f) (see Lemma III.2) and apply- N−1 N f∈[0,1] e ing (12) with l = 0 which gives   max λl (HN ) − λρ(l) CeN r≤l≤N−r−1 n     = max max λl (HN ) − λρ(l) CeN , λ0 (HN ) ≥ λρ(1) CeN , r≤l≤N−r−1   o λρ(l) CeN − λl (HN ) we have n     ≤ max max λρ(l−r) CeN − λρ(l) CeN , r≤l≤N−r−1      o λρ(l) CeN − λρ(l+r) CeN λ0 (HN ) − λρ(0) CeN n       ≤ max λρ(l) CeN − λρ(l+r) CeN = max λ0 (HN ) − λρ(0) CeN , 0≤l≤N−r−1   o     0 λ CN − λ0 (HN ) ≤ max max λl CeN − λl+r CeN ρ(0) e 1≤r0≤r 0≤l≤N−r0−1  r   ≤K , ≤ max max eh(f) − λρ(0) CeN , f∈[0,1] N    o where the third inequality follows from Lemma C.1. Since λρ(0) CeN − λρ(1) CeN 1 λr−1(HN ) ≤ · · · ≤ λ0(HN ) ≤ max eh(f), ≤K , f∈[0,1] N   0 0 we bound λr (HN ) − λρ(r0) CeN , r ≤ r − 1 by consid-     where the second inequality follows because λl CeN are ering the following two cases: if λρ(r0) CeN ≤ λr0 (HN ), 16 we have as N → ∞. Finally,   λr0 (HN ) − λρ(r0) CN max λρ(l)(CN ) − λl(HN ) e 0≤l≤N−1   r ≤ max eh(f) − λρ(r−1) CeN ≤ K ; = max λρ(l)(CN ) − λρ(l)(CeN ) + λρ(l)(CeN ) − λl(HN ) f∈[0,1] N 0≤l≤N−1   ≤ max λρ(l)(CN ) − λρ(l)(CeN ) if λρ(r0) CeN > λr0 (HN ), we have 0≤l≤N−1

  + max λρ(l)(CeN ) − λl(HN ) λρ(r0) CeN − λr0 (HN ) 0≤l≤N−1     ≤ max λl(CN ) − λl(CeN ) ≤λρ(r0) CeN − λρ(r0+r) CeN 0≤l≤N−1

    ≤ max max λl CeN − λl+r00 CeN + max λρ(l)(CeN ) − λl(HN ) 1≤r00≤r 0≤l≤N−r00−1 0≤l≤N−1 r 1 ≤K =O( ) N N where the second line follows because λr0 (HN ) ≥ as N → ∞, where the second inequality follows from   Lemma C.2. λρ(r0+r) CeN and the third line follows from Lemma C.1. Thus we have   r λr0 (HN ) − λρ(r0) CeN ≤ K N APPENDIX D PROOFOF THEOREM III.3 for all 0 ≤ r0 ≤ r − 1. Similarly, Theorem D.1. (Riemann-Lebesgue theorem [35, Theorem   ∞ 0 7.48]) The function g(x) ∈ L ([a, b]) is Riemann integrable λN−r (HN ) − λρ(N−r0) CeN  over [a, b] if and only if it is continuous almost everywhere in   [a, b]. ≤ max λρ(N−r0) CeN − min eh(f), f∈[0,1]    o Despite the fact that Fg(α) is right continuous and non- λρ(N−r0−r) CeN − λρ(N−r0) CeN decreasing everywhere, some stronger results about Fg(α) can n r r o r be obtained at some point α since g(x) is Riemann integrable. ≤ max K ,K = K , N N N Lemma D.2. Suppose g(x):[c, d] → [a, b] is Riemann for all 1 ≤ r0 ≤ r. Therefore, integrable and let Fg(α) be defined as in (9). Then Fg(α) is strictly increasing at α if α ∈ int (ess R(g)), i.e., there   1 exists  > 0 such that, for every pair (α1, α2) such that max λl (HN ) − λρ(l) CeN ≤ Kr . 0≤l≤N−1 N α −  < α1 < α < α2 < α + , Fg(α1) < Fg(α) < Fg(α2). for all N > 2r. Note that Sr+1(f) = Sr+2(f) = ··· = SN−1(f) which Proof (of Lemma D.2). Since g(x):[c, d] → [a, b] is Riemann gives integrable and α ∈ int (ess R(g)), there exists  such that (α − , α + ) ⊂ ess R(g). Let α be an arbitrary value such PN−1 Pr 1 Sn(f) Sn(f) N − r − 1 α− < α < α α0 = α+α1 ∈ ess R(g) σ (f) = n=0 = n=0 + S (f). that 1 and let 1 2 . It follows N N N N r+1 from the definition of essential range that Thus   0 α − α1 Pr µ x ∈ [c, d]: |g(x) − α1| < > 0. n=0 Sn(f) r + 1 2 |σN (f) − SN−1(f)| = − Sr+1(f) N N Thus, we obtain r X 1 1 = S (f) − (r + 1)S (f) F (α) − F (α ) = µ {x ∈ [c, d]: α < g(x) ≤ α} n r+1 N g g 1 1 n=0 d − c 1 1 =O( ) ≥ µ {x ∈ [c, d]: α1 < g(x) < α} > 0. N d − c uniformly on [0, 1] as N → ∞. Therefore, Similarly, we have Fg(α) < Fg(α2) for α < α2 < α + .

max λl(CN ) − λl(CeN ) 0≤l≤N−1 We are now ready to prove the main part using the same approach that was used to prove Theorem III.1. l l 1 = max σN ( ) − SN−1( ) = O( ) First, we show that (8) implies (7). This part is the same as 0≤l≤N−1 N N N those in Appendix B. 17

Now let us show that (7) implies (8). We prove the statement which implies (7) ⇒ (8) by contradiction. Suppose (8) is not true, i.e., there ∞ vMk,lk − uMk,lk−d3Mke exists an increasing sequence {Mk0 }k0=1 and 1 > 0 such that =v − u + u − u Mk,lk Mk,lk Mk,lk Mk,lk−d3Mke max uM 0 ,l − vM 0 ,l ≥ 21 k k ≥21 − 2 ≥ 21 − 1 ≥ 1. l∈[Mk0 ]

0 Now taking ϑ(t) = t, we obtain 0 for all k ≥ 1. Let lk = arg maxl∈[M 0 ] uM 0 ,l − vM 0 ,l k k k denote any point at which u − v achieves its maxi- Mk−1 Mk0 ,l Mk0 ,l 1 X |ϑ(u ) − ϑ(v )| mum, which implies uM 0 ,l 0 − vM 0 ,l 0 ≥ 21. Without loss Mk,l Mk,l k k k k Mk ∞ l=0 of generality, we suppose {Mk, lk}k=1 is a subsequence of ∞ l {M 0 , l 0 } 0 such that u ≤ v , i.e., u ≤ k k k k =1 Mk,lk Mk,lk Mk,lk 1 X vMk,lk − 21. We also suppose Fg is continuous at uMk,lk be- ≥ |ϑ(uMk,l) − ϑ(vMk,l)| Mk cause otherwise, one can always pick a u ∈ int (ess R(g)) l=l −d M e bMk,lk k 3 k that is close enough to uM ,l and such that Fg is continuous l k k 1 Xk at uMk,lk since Fg is continuous almost everywhere. b ≥ |ϑ(uMk,l) − ϑ(vMk,lk )| Mk By assumption, uMk,lk ∈ int (ess R(g)). Noting that Fg l=l −d M e is right continuous everywhere and strictly increasing at k 3 k ≥31 > 0 all uMk,lk (which is shown in Lemma D.2), there exist

2(uMk,lk ) > 0 (which depends on uMk,lk and for notational for all k ≥ k1. This contradicts Lemma B.1. simplicity, we rewrite as 2) and 3 > 0 (which is independent 1 of uMk,lk ) such that 2 ≤ 2 , Fg is continuous at uMk,lk + 2, and APPENDIX E ROOFOF HEOREM Fg(uMk,lk + 2) ≥ Fg(uMk,lk ) + 33. (14) P T I.4 sin(πNf) Lemma E.1. Let DN (f) := sin(πf) denote the Dirichlet Lemma B.2 indicates that 1 kernel. Fix 0 < W < 2 . We have lim Fu (α) = Fg(α) Mk Z 1 Mk→∞ 2 |DN (f)| df = N, ∀ N ∈ N, for every point α at which Fg is continuous. Thus there exists 0 Z 1−W k0 ∈ N such that 2 |DN (f)| df = O(1), when N → ∞. W Fu (uM ,l ) − Fg (uM ,l ) ≤3, Mk k k k k (15) Fu (uM ,l + 2) − Fg (uM ,l + 2) ≤3 Mk k k k k sin(πNf) Proof (of Lemma E.1). Noting that DN (f) = sin(πf) = j2πfN N−1 for all k ≥ k0. Thus, we have e P j2πfn ejπf n=0 e , we have F (u +  ) − F (u ) 2 uMk Mk,lk 2 uMk Mk,lk N−1 N−1 ! N−1 ! 2 X j2πfn X j2πfn X −j2πfm   |DN (f)| = e = e e = FuM (uMk,lk + 2) − Fg(uMk,lk + 2) k n=0 n=0 m=0 + (F (u +  ) − F (u )) N−1 N−1 g Mk,lk 2 g Mk,lk X X   = ej2πf(n−m). + Fg(uM ,l ) − Fu (uM ,l ) k k Mk k k n=0 m=0 ≥F (u +  ) − F (u ) g Mk,lk 2 g Mk,lk It follows that

N−1 N−1 − Fu (uM ,l + 2) − Fg(uM ,l + 2) Z 1 Z 1 Mk k k k k 2 X X j2πf(n−m) |DN (f)| df = e df − Fg(uM ,l ) − Fu (uM ,l ) 0 0 n=0 m=0 k k Mk k k N−1 N−1 Z 1 ≥33 − 3 − 3 = 3 X X = ej2πf(n−m)df = N. 0 for all k ≥ k0, where the last line follows from (14) and (15). n=0 m=0 Note that the above equation is equivalent to 1 Fix 0 < W < 2 . For any f ∈ [W, 1−W ], |DN (f)| is bounded 1 1 above by sin(πW ) . Therefore, # {l ∈ [Mk], uMk,lk < uMk,l ≤ uMk,lk + 2} ≥ 3. Mk Z 1−W Z 1−W 2 1 1 Then |DN (f)| df ≤ 2 df ≤ 2 . W W sin (πW ) sin (πW )

0 ≤ uMk,lk−d3Mke − uMk,lk ≤ uMk,lk + 2 − uMk,lk = 2, 18

Combining (16) and (18) yields Since eh is bounded and Riemann integrable over [0, 1],  1 1 it follows from the Riemann-Lebesgue theorem that eh is λl0 CN =hHN √ el0/N , √ el0/N i continuous almost everywhere in [0, 1]. Thus we can select N N f ∈ [0, 1] and a positive number W such that   0      ≥ ess sup eh − 1 −  − 4 2  4 ess sup eh eh(f) − ess sup eh ≤ 4   2  ≥ ess sup eh − − + − 4 4 16 ess sup h 2 N e holds almost everywhere for |f − f0| ≤ W . For any v ∈ C , we have ≥ ess sup eh − 

 1  hHN v, vi for all N ≥ max N1, W . Noting that λl0 CN ≤ Z 1  2 λρ(0) CN ≤ ess sup eh, we have = |ve(f)| eh(f)df 0 f +W Z 0 Z λρ(0) (CN ) − ess sup eh ≤  = |v(f)|2 h(f)df + |v(f)|2 h(f)df e e f∈[0,1] e e f0−W f∈ /[f0−W,f0+W ] for all N ≥ N0. Since  is arbitrary, we conclude Z f0+W    2 ≥ ess sup eh − |v(f)| dfe 4 e  f0−W lim λρ(0) CN = ess sup eh. Z N→∞   2 − max ess inf eh , ess sup eh · |v(f)| df.e f∈[0,1] e With a similar argument, we have f∈ /[f0−W,f0+W ] (16)  lim λρ(N−1) CN = ess inf eh. N→∞ The DTFT of e is l/N  Noting that λρ(0) CN ≤ λ0 (HN ) ≤ ess sup eh and   ess inf h ≤ λ (H ) < λ (C ) (see Lemma III.2), −jπN f− √l e N−1 N ρ(N−1) N e N l we obtain el/N (f) = DN (f − ). (17) e −jπ(f− l ) N e N  lim λρ(0) CN = lim λ0 (HN ) = ess sup eh N→∞ N→∞ 1 0  Fix eh,  and W . If N ≥ W , there always exists l such that lim λρ(N−1) CN = lim λN−1 (HN ) = ess inf eh. 0 N→∞ N→∞ l − f ≤ W N 0 2 . It follows from Lemma E.1 that

l0 W 2 W Z N + 2 Z 1− 2 1 1 2 √ el0/N (f) df = 1 − |DN (f)| df l0 W N W N − 2 N 2 = 1 − o(1) APPENDIX F PROOFOF LEMMA IV.2 l0 W l0 W as N → ∞. Note that [ N − 2 , N + 2 ] ⊂ [f0 − W, f0 + W ].  1 The following result indicates that the main lobe of the Thus there exists N1 ∈ N such that for all N ≥ max N1, W Dirichlet kernel contains most of its energy. sin(πNf) Lemma F.1. Let DN (f) = be the Dirichlet kernel. Z f0+W 2 sin(πf) 1  Then √ el0/N (f) dfe ≥ 1 − , f0−W N 4 ess sup eh 1 Z N Z 2 |D (f)|2 df ≥ 0.45N. 1 N √ el0/N (f) dfe (18) 0 f∈[0,1] N e f∈ /[f0−W,f0+W ]  ≤  . 2 · max ess inf h , ess sup h |sin(πNf)| e e Proof (of Lemma F.1 ). Noting that |DN (f)| = |sin(πf)| ≥ 19

|sin(πNf)| l 1 3 |πf| , we have Now for any l ∈ [N], N ∈ [0, 4 ) ∪ ( 4 , 1], the main lobe of l 1 3 DN (f − N ) is inside the interval [0, 4 ] ∪ [ 4 , 1]. Thus 1 1 2 2 Z N Z N Z π 2 sin(πNf) N sin(f) |DN (f)| df ≥ df = df λl(CN ) πf π f 0 0 0 1 2 2 Z 4 Z 1 Z π 1−cos(2f) 1 l 1 l N 2 = DN (f − ) df + DN (f − ) df = df N 0 N N 3 N π f 2 4 0 1 ∞ Z N N Z π X (−1)k+1(2f)2k 2 2 = df ≥ |DN (f)| df ≥ 0.9. 2π (2k)!f 2 N 0 0 k=1 ∞ k+1 2k−1 l 1 3 N X (−1) (2π) Similarly, for any l ∈ [N], N ∈ ( 4 , 4 ), we have = π (2k)!(2k − 1) k=1 λl(CN ) 8 1 2 2 (k+1) 2k−1 Z 4 Z 1 N X (−1) (2π) 1 l 1 l ≥ = DN (f − ) df + DN (f − ) df π (2k)! (2k − 1) N N N 3 N k=1 0 4 1 ≥ 0.45N, 2 Z N ≤1 − |D (f)|2 df ≤ 0.1. N N where the third line follows from the common Taylor series 0 2k P∞ k (2f) cos(2f) = k=0(−1) (2k)! , and the fifth line holds from The proof is completed by noting that 0 ≤ λl(CN ) ≤ 1 for the following inequality all l ∈ [N].

∞ X (−1)(k+1)(2π)2k−1 (2k)! (2k − 1) k=9 ∞ REFERENCES X (2π)4n−3 (2π)4n−1 = − ≥ 0 (4n − 2)! (4n − 3) (4n)! (4n − 1) [1] U. Grenander and G. Szego,˝ Toeplitz Forms and Their Applications, n=5 vol. 321. Univ of California Press, 1958. [2] R. Gray, “On the asymptotic eigenvalue distribution of Toeplitz matri- since ces,” IEEE Trans. Inf. Theory, vol. 18, no. 6, pp. 725–730, 1972. (2π)4n−1 (2π)2 (2π)4n−3 [3] J. Pearl, “On coding and filtering stationary signals by discrete Fourier ≤ transforms,” IEEE Trans. Inf. Theory, vol. 19, no. 2, pp. 229–232, 1973. (4n)! (4n − 1) (4n)(4n − 1) (4n − 2)! (4n − 3) [4] J. Makhoul, “Linear prediction: A tutorial review,” Proc. IEEE, vol. 63, (2π)4n−3 no. 4, pp. 561–580, 1975. ≤ (4n − 2)! (4n − 3) [5] T. Kailath, A. Vieira, and M. Morf, “Inverses of Toeplitz operators, innovations, and orthogonal ,” SIAM Rev., vol. 20, no. 1, pp. 106–119, 1978. for all n ≥ 2. [6] P. Deift, A. Its, and I. Krasovsky, “Eigenvalues of Toeplitz matrices in Suppose N is a multiple of 4. Note that the bulk of the spectrum,” Bull. Inst. Math., Acad. Sin. (N.S.), pp. 437– 461, 2012. Z 1 2 [7] R. M. Gray, “Toeplitz and circulant matrices: A review,” Commun. Inf. 1 Theory, vol. 2, no. 3, pp. 155–239, 2005. λl(CN ) = √ el/N eh(f)df 0 N [8] F. Avram, “On bilinear forms in gaussian random variables and Toeplitz 1 2 2 matrices,” Probab. Theory Related Fields, vol. 79, no. 1, pp. 37–45, Z 4 Z 1 1 1 1988. = √ el/N (f) df + √ el/N (f) df, 3 0 N 4 N [9] S. V. Parter, “On the distribution of the singular values of Toeplitz matrices,” Appl., vol. 80, pp. 115–130, 1986. where e (f) is defined in (17). If l = N/4, √1 e (f) = [10] E. E. Tyrtyshnikov, “A unifying approach to some old and new theorems el/N N el/N on distribution and clustering,” Linear Algebra Appl., vol. 232, pp. 1– 1 1 √ D (f − ) . Thus 43, 1996. N N 4 [11] N. L. Zamarashkin and E. E. Tyrtyshnikov, “Distribution of eigenvalues λ (C ) and singular values of Toeplitz matrices under weakened conditions N/4 N on the generating function,” Sbornik: Mathematics, vol. 188, no. 8, 1 2 2 Z 4 Z 1 pp. 1191–1201, 1997. 1 1 1 1 = DN (f − ) df + DN (f − ) df [12] D. Sakrison, “An extension of the theorem of Kac, Murdock and Szego˝ N 4 N 3 4 0 4 to N dimensions,” IEEE Trans. Inf. Theory, vol. 15, no. 5, pp. 608–610, Z 1/2 Z 1 1969. 1 2 1 1 2 1 = |DN (f)| df = |DN (f)| df = . [13] H. Gazzah, P. A. Regalia, and J.-P. Delmas, “Asymptotic eigenvalue N 0 N 2 0 2 distribution of block Toeplitz matrices and application to blind SIMO channel identification,” IEEE Trans. Inf. Theory, vol. 47, no. 3, 1 Similarly, we have λ3N/4(CN ) = 2 . pp. 1243–1251, 2001. 20

[14] J. Gutierrez-Guti´ errez´ and P. M. Crespo, “Asymptotically equivalent sequences of matrices and hermitian block Toeplitz matrices with continuous symbols: Applications to MIMO systems,” IEEE Trans. Inf. Theory, vol. 54, no. 12, pp. 5671–5680, 2008. [15] J. Bogoya, A. Bttcher, S. Grudsky, and E. Maximenko, “Maximum norm versions of the Szego˝ and Avram Parter theorems for Toeplitz matrices,” J. Approx. Theory, vol. 196, no. 0, pp. 79 – 100, 2015. [16] A. Dembo, “Bounds on the extreme eigenvalues of positive-definite Toeplitz matrices,” IEEE Trans. Inf. Theory, vol. 34, no. 2, pp. 352– 355, 1988. [17] T. Laudadio, N. Mastronardi, and M. Van Barel, “Computing a lower bound of the smallest eigenvalue of a symmetric positive-definite Toeplitz matrix,” IEEE Trans. Inf. Theory, vol. 54, no. 10, pp. 4726– 4731, 2008. [18] J. Mitola III and G. Q. Maguire Jr, “Cognitive radio: Making software radios more personal,” IEEE Personal Commun., vol. 6, no. 4, pp. 13– 18, 1999. [19] S. Haykin, “Cognitive radio: Brain-empowered wireless communica- tions,” IEEE J. Select. Areas Commun., vol. 23, no. 2, pp. 201–220, 2005. [20] Y. Zeng and Y.-C. Liang, “Eigenvalue-based spectrum sensing algo- rithms for cognitive radio,” IEEE Trans Commun., vol. 57, no. 6, pp. 1784–1793, 2009. [21] G. Strang, “A proposal for Toeplitz matrix calculations,” Stud. Appl. Math., vol. 74, no. 2, pp. 171–176, 1986. [22] T. F. Chan, “An optimal circulant preconditioner for Toeplitz systems,” SIAM J. Sci. Statis. Comput., vol. 9, no. 4, pp. 766–771, 1988. [23] T. W. Korner,¨ Fourier analysis. Cambridge university press, 1989. [24] R. A. Horn and C. R. Johnson, eds., Matrix Analysis. New York, NY, USA: Cambridge University Press, 1986. [25] F. W. Trench, “Numerical solution of the eigenvalue problem for hermitian toeplitz matrices,” SIAM J. Matrix Anal. Appl., vol. 10, no. 2, pp. 135–146, 1989. [26] F. T. Luk and S. Qiao, “A fast eigenvalue algorithm for hankel matrices,” Linear Algebra Appl., vol. 316, no. 1-3, pp. 171–182, 2000. [27] R. H. Chan, “Circulant preconditioners for hermitian Toeplitz systems,” SIAM J. Matrix Anal. Appl., vol. 10, no. 4, pp. 542–550, 1989. [28] R. H. Chan and G. Strang, “Toeplitz equations by conjugate gradients with circulant preconditioner,” SIAM J. Sci. Statis. Comput., vol. 10, no. 1, pp. 104–119, 1989. [29] R. H. Chan, X.-Q. Jin, and M.-C. Yeung, “The spectra of super- optimal circulant preconditioned Toeplitz systems,” SIAM J. Numer. Anal., vol. 28, no. 3, pp. 871–879, 1991. [30] W. F. Trench, “An elementary view of Weyl’s theory of equal distribu- tion,” Amer. Math. Monthly, vol. 119, no. 10, pp. 852–861, 2012. [31] P. Zizler, R. A. Zuidwijk, K. F. Taylor, and S. Arimoto, “A finer aspect of eigenvalue distribution of selfadjoint band Toeplitz matrices,” SIAM J. Matrix Anal. Appl., vol. 24, no. 1, pp. 59–67, 2002. [32] D. Slepian, “Prolate Spheroidal Wave Functions, Fourier analysis, and uncertainty. V- The discrete case,” Bell Syst. Tech. J., vol. 57, no. 5, pp. 1371–1430, 1978. [33] Z. Zhu and M. B. Wakin, “Approximating sampled sinusoids and multiband signals using multiband modulated DPSS dictionaries,” to appear in J. Fourier Anal. Appl., 2016. [34] A. W. Van der Vaart, Asymptotic , vol. 3. Cambridge university press, 2000. [35] T. M. Apostol, Mathematical Analysis (2nd ed.). Addison Wesley Publishing Company, 1974.