Signal Processing 84 (2004) 1255–1266 www.elsevier.com/locate/sigpro
Kernel smoothing ofperiodograms under Kullback–Leibler discrepancy Jan Hannig, Thomas C.M. Lee∗
Department of Statistics, Colorado State University, Fort Collins, CO 80523-1877, USA
Received 5 June 2003; received in revised form 31 March 2004
Abstract
Kernel smoothing on the periodogram is a popular nonparametric method for spectral density estimation. Most important in the implementation ofthis method is the choice ofthe bandwidth, or span, forsmoothing. One idealized way ofchoosing the bandwidth is to choose it as the one that minimizes the Kullback–Leibler (KL) discrepancy between the smoothed estimate and the true spectrum. However, this method fails in practice, as the KL discrepancy is an unknown quantity. This paper introduces an estimator for this discrepancy, so that the bandwidth that minimizes the unknown discrepancy can be empirically approximated via the minimization ofit. It is shown that this discrepancy estimator is consistent. Numerical results also suggest that this empirical choice of bandwidth often outperforms some other commonly used bandwidth choices. The same idea is also applied to choose the bandwidth for log-periodogram smoothing. ? 2004 Elsevier B.V. All rights reserved.
Keywords: Bandwidth selection; Kullback–Leibler discrepancy; Periodogram and log-periodogram smoothing; Relative entropy; Spectral density estimation
1. Introduction One important component ofthe kernel smooth- ing approach is the choice ofthe bandwidth for Smoothing the periodogram or the log-periodogram smoothing. This paper proposes a bandwidth selec- is a popular method for performing nonparametric tion method that attempts to locate the bandwidth spectral density estimation. Various approaches have that minimizes the unknown Kullback–Leibler (KL) been proposed. These include spline smoothing (e.g., discrepancy between the smoothed estimate and the [9,13,19]), kernel smoothing (e.g., [4,10,11,18]) and true spectrum. The idea behind the proposed method wavelet techniques (e.g., [5,15,20]). The approach that is as follows. An estimator for the unknown KL dis- this paper considers is kernel smoothing. Some ap- crepancy is ÿrst constructed for the class of spectrum pealing features of this approach are that it is sim- estimates (i.e., kernel-smoothed periodograms) that ple to use, easy to understand and straightforward to this paper considers. It is shown that this discrepancy interpret. estimator is consistent under a commonly used model for periodograms. Then the bandwidth that minimizes this discrepancy estimator is chosen as the ÿnal band- ∗ Corresponding author. Tel.: +1-970-491-5269; fax: +1-970- width. As mentioned in [14], the rationale is that the 491-7895. E-mail addresses: [email protected] (J. Hannig), bandwidth that minimizes the discrepancy estimator [email protected] (T.C.M. Lee). should also approximately minimize the unknown
0165-1684/$ - see front matter ? 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.sigpro.2004.04.007 1256 J. Hannig, T.C.M. Lee / Signal Processing 84 (2004) 1255–1266 discrepancy. Also, Chow [3, pp. 293–294] argues that following kernel estimator for fj: the KL discrepancy is a better distance measure than 2n−1 2n−1 the more commonly used squared distance measure. fˆ = K (! − ! )I K (! − ! ); This point will be elaborated more later. j h m j m h l j m=−n The rest ofthis paper is organized as follows.Sec- l=−n tion 2 provides some background material on the pe- j =0;:::;n− 1: (2) riodogram and the KL discrepancy. The proposed KL 1 · discrepancy estimator, together with some related pre- In the above Kh(·)= h K( h ), where the kernel func- vious work, are presented in Section 3. Section 4 tion K is (usually taken as) a symmetric probability brieEy discusses the smoothing oflog-periodogram. density function and the bandwidth h is a nonnega- Some empirical and theoretical properties ofthe pro- tive smoothing parameter that controls the amount of ˆ posed method are reported in Sections 5 and 6, re- smoothing. Note that fj is a function of h, but, for spectively. Conclusions are oFered in Section 7, while simplicity, this dependence is suppressed from its no- technical details are deferred to the appendix. tation. It is well known that the choice of h is much more crucial than the choice of K (e.g., see [22]). Also, in most other kernel smoothing problems the limits of the two summations in (2) are 0 and n − 1. However, 2. Background since in the present setting boundary eFects can be handled by periodic smoothing, the limits are changed 2.1. Periodogram smoothing from 0 and n − 1to−n and 2n − 1, respectively. ˆ The estimator fj can also be interpreted as a Suppose that {xt} is a real-valued, zero mean sta- weighted average ofthe Ijs. It is because one could tionary process with unknown spectral density f, and write that a ÿnite-sized realization x0;:::;x2n−1 of {xt} is 2 n−1 observed. The goal is to estimate f by using those ˆ fj = Wm−jIm with observed xts. The periodogram is deÿned as m=−n 2 2 n−1 K (! − ! ) 1 W = h m j : (3) I(!)= xt exp (−i!t) ; m−j 2n−1 2 × 2n Kh(!l − !j) t=0 l=−n √ i= −1;!∈ [0; 2 ): Notice that the weights Wms sum to unity.
To simplify notation, write ! =2 j=(2n), f =f(! ) j j j 2.2. Why KL discrepancy? and Ij =I(!j). Since the spectral density f is symmet- ric about ! = , in the sequel the focus will be on f j The KL discrepancy for measuring the distance be- for j =0;:::;n− 1. Also, as f is periodic with period tween two probability density functions (pdfs) g (x) 2 , one has f = f and I = I for j =1;:::;n− 1. 1 −j j −j j and g (x) is deÿned as A frequently adopted model for I is (e.g., see 2 j [4,10,15,17]) g1(x) d(g1;g2)= g1(x) log dx g2(x) Ij = fj j;j=0;:::;n− 1; (1) (e.g., see [2]). Note that d(g1;g2) = d(g2;g1). where the js are independent standard exponential In order to use d(g1;g2) for comparing a true ˆ random variables. Therefore E(Ij)=fj and var(Ij)= spectrum f and an estimate f, one needs to com- 2 pare them frequency by frequency. At frequency ! , fj . Due to its unacceptably large variance, Ij is j seldom used as an estimate of fj. the pdf gf(x) corresponding to f is, under model One possible way for obtaining better estimates (1), an exponential distribution with mean fj. That ˆ for fj is to smooth the Ijs. This paper considers the is, gf(x) = exp(−x=fj)=fj;x¿0. For f, a natural J. Hannig, T.C.M. Lee / Signal Processing 84 (2004) 1255–1266 1257 candidate for the corresponding pdf is exponential given by with mean fˆ . Denote this pdfas g (x), and thus j fˆ n−1 k ˆ ˆ ˆ 1 2k ˆ g ˆ(x) = exp(−x=fj)=fj;x¿0. We choose to measure h;k = fj − Wm−jIj+m f n k I ˆ j=0 m=−k j+m m=−k the distance between f and f at frequency !j with ˆ ˆ ˆ k fˆ exp(−x=fj) exp(−x=fj)=fj + W − log j + − 1 ; d(g ˆ;gf)= log dx m−j f ˆ Ij fj exp(−x=fj)=fj m=−k fˆ fˆ where ≈ 0:577216 is Euler’s constant and k is a = j − log j − 1: f f pre-speciÿed positive integer parameter whose value j j is chosen independent of n. It will be shown in The- One could also use d(g ;g ) instead of d(g ;g ), orem 1 that its eFect is asymptotically negligible. We f fˆ fˆ f ˆ but for the reason given in Appendix A, we choose propose choosing h as the minimizer of h;k . Empirical properties ofour estimator are reported in Section 5, d(gfˆ;gf). Therefore our overall KL discrepancy for measuring the distance between the whole spectrum while its consistency is established in Section 6. f and fˆ is ˆ 3.1. Construction of h;k n−1 ˆ ˆ ˆ 1 fj fj KL(f; f)= − log − 1 : ˆ n fj fj This subsection outlines the construction of h;k . j=0 The goal is to seek an unbiased estimator for (f;ˆ f). Our target is to choose the h that minimizes (f;ˆ f). KL KL First realize that estimating (f;ˆ f) is equivalent Notice that (f;ˆ f) is an unknown quantity, so di- KL KL to estimating the two quantities, log(fˆ =f ) and fˆ =f , rect minimization is not possible. j j j j ˆ A more frequently used discrepancy measure, based for all j. We estimate the former by log(fj=Ij) − ,as E{log(fˆ =I ) − } = E{log(fˆ =f )}. The latter, due to on the L2 norm, is j j j j the presence of1 =fj, poses a bigger challenge. It is 1 n−1 because under model (1) E(1=I )=∞, and so fˆ =I (f; fˆ)= (f − fˆ )2: j j j L2 n j j cannot be used as a building block for estimating j=0 ˆ fj=fj. To overcome this diMculty, we make use ofthe fact that if f is locally smooth, then f :::f are ˆ j−k j+k However, as discussed in [3], KL(f; f) is often a (approximately) identical for small k. This implies ˆ more relevant measure than L (f; f). It is because 2 that the periodogram ordinates Ij−k ;:::;Ij+k are ap- the former considers the distance for the whole distri- proximately independent and identically distributed bution while the latter only considers the mean. Also, as exponential with mean f . Therefore we have since (f;ˆ f) has the same form as the theoretical j KL E{1=( k I )}≈1=(2kf ) and hence 1=f can deviance, it utilizes the asymptotic likelihood ofthe m=−k j+m j j k periodogram. Therefore, bandwidth selection meth- be estimated by 2k= m=−k Ij+m. Thus, there are two ˆ ˆ major advantages ofintroducing k: (1) it overcomes ods that are targeting KL(f; f) (or E{KL(f; f)}) seem to be more desirable than those that are targeting the problem ofinÿnite expectation E(1=Ij)=∞, and (f; fˆ) (or E{ (f; fˆ)}). (2) it can be treated as a device for controlling the L2 L2 bias and variance ofour estimator for1 =fj. ˆ To proceed we decompose fj=fj into two parts: ˆ k ˆ k 3. Estimating the KL discrepancy fj Ij+m (fj − m=−k WmIj+m) = Wm + : fj fj fj This section presents the main contribution m=−k We estimate the ÿrst part by its expectation, i.e., ofthis paper, namely, a consistent estimator for ˆ ˆ k KL(f; f). This estimator is denoted as h;k , and it is m=−k Wm. Since that the numerator ofthe second 1258 J. Hannig, T.C.M. Lee / Signal Processing 84 (2004) 1255–1266