Nonparametric Density Estimation Histogram
Total Page:16
File Type:pdf, Size:1020Kb
Nonparametric Density Estimation Histogram iid Data: X1; : : : ; Xn » P where P is a distribution with density f(x). Histogram estimator Aim: Estimation of density f(x) For constants a0 and h, let ak = a0 + k h and Parametric density estimation: Hk = # Xi Xi 2 (ak¡1; ak] © ¯ ª ± Fit parametric model ff(xj)j 2 £g to data à parameter estimate ^ ¯ be the number of observations in the kth interval (ak¡1; ak]. Then ^ ± Estimate f(x) by f(xj) n ^ 1 f (x) = Hk 1 k¡ k (x) ± Problem: Choice of suitable model à danger of mis¯ts hist hn (a 1;a ] kP=1 ± Complex models (eg mixtures) are di±cult to ¯t is the histogram estimator of f(x). Nonparametric density estimation: Advantages: ± Few assumptions (eg density is smooth) ± Easy to compute ± Exploratory tool Disadvantages: ± Sensitive in choice of o®set a Example: Velocities of galaxies 0 ± Nonsmooth estimator ± Velocities in km/sec of 82 galaxies from 6 well-separated conic sections of an un¯lled survey of the Corona Borealis region. 0.54 0.5 0.56 0.45 0.48 0.4 ± Multimodality is evidence for voids and superclusters in the far uni- 0.40 0.36 0.3 0.32 verse. 0.27 0.24 Density Density 0.2 Density 0.18 0.16 0.1 0.09 0.08 0.00 0.0 0.00 0.25 Kernel estimate (h=0.814) 1 2 3 4 5 6 0.9 1.9 2.9 3.9 4.9 5.9 0.8 1.8 2.8 3.8 4.8 5.8 Kernel estimate (h=0.642) Duration Duration Duration Normal mixture model (k=4) 0.5 0.20 0.56 0.56 0.48 0.48 0.4 0.15 0.40 0.40 0.3 0.32 0.32 Density Density 0.24 Density 0.24 Density 0.2 0.10 0.16 0.16 0.1 0.08 0.08 0.05 0.00 0.00 0.0 0.7 1.7 2.7 3.7 4.7 5.7 0.6 1.6 2.6 3.6 4.6 5.6 1 2 3 4 5 Duration Duration Duration 0.00 0 10 20 30 40 Five shifted histograms with bin width 0.5 and the averaged histogram, for the duration of eruptions of Velocity of galaxy (1000km/s) the Old Faithful geyser. Kernel Density Estimation, May 20, 2004 - 1 - Kernel Density Estimation, May 20, 2004 - 2 - Centered Histogram Kernel Estimators Aim: Estimate density f(x) at point x Let K(x) be a function such that ± K(x) ¸ 0, Idea: Shift histogram to be centered on x 1 ± K(x) dx = 1. f^ (x) = ¢ # Xi Xi 2 (x ¡ h=2; x + h=2] Z rect hn © ¯ ª Then the kernel density estimators with kernel K() and bandwidth h is Advantages: ¯ given by ± Exact computation (and plot) of estimate for all x n 1 x ¡ Xi ± Only depends on one parameter: Bin width h f^K(x) = K : hn ³ h ´ Disadvantages: Xi=1 ± Can yield very noisy estimates ± Nonsmooth estimator Common kernel functions: Rectangular kernel 0.56 ± Rectangular kernel 0.48 0.5 0.40 0.32 0.4 Density 0.24 0.16 0.3 0.08 Density 0.00 0.2 −0.5 0.0 0.5 1 2 3 4 5 6 Durationdata 0.1 Triangular kernel 0.56 0.0 ± Triangular kernel 0.48 1 2 3 4 5 Duration 0.40 0.32 Density 0.24 0.16 The centered histogram estimator can be rewritten as 0.08 −1 0 1 0.00 1 2 3 4 5 6 n Durationdata 1 1 x ¡ Xi f^ (x) = K Normal kernel rect 0.56 n i=1 h h ± Normal kernel P ³ ´ 0.48 0.40 where 0.32 Density 0.24 0.16 1 1 0.08 K(x) = 1 (x) −3 0 3 (¡ 2 ; 2 ] 0.00 1 2 3 4 5 6 Durationdata 1 1 is the indicator function for the interval (¡ 2; 2]. The function K is called a kernel or ¯lter. à use di®erent (smooth) kernel functions K(x) Kernel Density Estimation, May 20, 2004 - 3 - Kernel Density Estimation, May 20, 2004 - 4 - Kernel Estimators Kernel Estimators Statistical properties Examples: Old Faithful and Galaxies ± The expectation of f^K(x) is 1.0 h=0.03 0.25 h=0.2 0.8 1 x ¡ y 0.20 (f^K(x)) = K f(y) dy Z h ³ h ´ 0.6 0.15 Density 0.4 0.10Density 2 0.2 = K(z) f(x ¡ hz) dz = f(x) + O(h ): 0.05 Z 0.0 0.00 2 3 4 5 10 15 20 25 30 35 Duration Velocity à ^ h=0.06 h=0.4 The bias of fK(x) decreases as h gets smaller. 0.20 0.6 ^ 0.15 ± The variance of fK(x) is 0.4 0.10 Density Density 0.2 0.05 ^ f(x) 2 var(fK(x)) ¼ K(x) dx: 0.0 0.00 nh Z 2 3 4 5 10 15 20 25 30 35 Duration Velocity à ^ 0.6 h=0.12 h=0.6 The variance of fK(x) vanishes as nh ! 1. 0.5 0.15 0.4 0.10 0.3 Density Density 0.2 Conclusions: 0.05 0.1 ± Restrictions on bandwidth: h ! 0 and nh ! 1 as n ! 1. 0.0 0.00 2 3 4 5 10 15 20 25 30 35 1 Duration Velocity ¡ 5 ± Theory suggests that h / n , but the constant of proportionality h=0.24 h=0.8 0.5 0.15 depends on the unknown density. 0.4 0.10 0.3 Density 0.2 Density ± Trade-o® between bias and variance: 0.05 0.1 ¢ Undersmoothing 0.0 0.00 1 2 3 4 5 6 10 15 20 25 30 35 If bandwidth is too small, the variance becomes large. Duration Velocity 0.4 h=0.48 h=1.6 0.12 0.10 ¢ Oversmoothing 0.3 0.08 If bandwidth is too large, the bias becomes large. 0.2 0.06 Density Density 0.04 0.1 0.02 0.0 0.00 0 1 2 3 4 5 6 5 10 15 20 25 30 35 40 Duration Velocity 0.25 h=0.96 h=3.2 0.08 0.20 0.06 0.15 0.04 0.10Density Density 0.05 0.02 0.00 0.00 0 2 4 6 8 0 10 20 30 40 Duration Velocity Kernel Density Estimation, May 20, 2004 - 5 - Kernel Density Estimation, May 20, 2004 - 6 - Kernel Estimates How to do it in R? ± In R, kernel density estimates can be computed by the command density(): plot(density(Y,bw=0.2,method="gaussian"),type="l") ± As default h is chosen according to the following rule of thumb 1 h^ = 0:9 min(s; R=1:34) n¡5 where s is the sample standard deviation and R is the interquartile range. ± Better methods for selecting h are due to eg Sheather and Jones (1991) and can be invoked by the command bw.SJ: h<-bw.SJ(Y) # solve-the-equation method plot(density(Y,h)) h<-bw.SJ(Y,method="dpi") # direct-plug-in method plot(density(Y,h)) Rule−of−thumb Direct−plug−in Solve−the−equation 0.5 h=0.335 0.6 h=0.165 0.6 h=0.14 0.5 0.4 0.5 0.4 0.4 0.3 0.3 0.3 Density 0.2 Density Density 0.2 0.2 0.1 0.1 0.1 0.0 0.0 0.0 1 2 3 4 5 6 1 2 3 4 5 2 3 4 5 Duration Duration Duration Kernel Density Estimation, May 20, 2004 - 7 -.