Nonparametric Density Estimation Histogram

Nonparametric Density Estimation Histogram

Nonparametric Density Estimation Histogram iid Data: X1; : : : ; Xn » P where P is a distribution with density f(x). Histogram estimator Aim: Estimation of density f(x) For constants a0 and h, let ak = a0 + k h and Parametric density estimation: Hk = # Xi Xi 2 (ak¡1; ak] © ¯ ª ± Fit parametric model ff(xj)j 2 £g to data à parameter estimate ^ ¯ be the number of observations in the kth interval (ak¡1; ak]. Then ^ ± Estimate f(x) by f(xj) n ^ 1 f (x) = Hk 1 k¡ k (x) ± Problem: Choice of suitable model à danger of mis¯ts hist hn (a 1;a ] kP=1 ± Complex models (eg mixtures) are di±cult to ¯t is the histogram estimator of f(x). Nonparametric density estimation: Advantages: ± Few assumptions (eg density is smooth) ± Easy to compute ± Exploratory tool Disadvantages: ± Sensitive in choice of o®set a Example: Velocities of galaxies 0 ± Nonsmooth estimator ± Velocities in km/sec of 82 galaxies from 6 well-separated conic sections of an un¯lled survey of the Corona Borealis region. 0.54 0.5 0.56 0.45 0.48 0.4 ± Multimodality is evidence for voids and superclusters in the far uni- 0.40 0.36 0.3 0.32 verse. 0.27 0.24 Density Density 0.2 Density 0.18 0.16 0.1 0.09 0.08 0.00 0.0 0.00 0.25 Kernel estimate (h=0.814) 1 2 3 4 5 6 0.9 1.9 2.9 3.9 4.9 5.9 0.8 1.8 2.8 3.8 4.8 5.8 Kernel estimate (h=0.642) Duration Duration Duration Normal mixture model (k=4) 0.5 0.20 0.56 0.56 0.48 0.48 0.4 0.15 0.40 0.40 0.3 0.32 0.32 Density Density 0.24 Density 0.24 Density 0.2 0.10 0.16 0.16 0.1 0.08 0.08 0.05 0.00 0.00 0.0 0.7 1.7 2.7 3.7 4.7 5.7 0.6 1.6 2.6 3.6 4.6 5.6 1 2 3 4 5 Duration Duration Duration 0.00 0 10 20 30 40 Five shifted histograms with bin width 0.5 and the averaged histogram, for the duration of eruptions of Velocity of galaxy (1000km/s) the Old Faithful geyser. Kernel Density Estimation, May 20, 2004 - 1 - Kernel Density Estimation, May 20, 2004 - 2 - Centered Histogram Kernel Estimators Aim: Estimate density f(x) at point x Let K(x) be a function such that ± K(x) ¸ 0, Idea: Shift histogram to be centered on x 1 ± K(x) dx = 1. f^ (x) = ¢ # Xi Xi 2 (x ¡ h=2; x + h=2] Z rect hn © ¯ ª Then the kernel density estimators with kernel K() and bandwidth h is Advantages: ¯ given by ± Exact computation (and plot) of estimate for all x n 1 x ¡ Xi ± Only depends on one parameter: Bin width h f^K(x) = K : hn ³ h ´ Disadvantages: Xi=1 ± Can yield very noisy estimates ± Nonsmooth estimator Common kernel functions: Rectangular kernel 0.56 ± Rectangular kernel 0.48 0.5 0.40 0.32 0.4 Density 0.24 0.16 0.3 0.08 Density 0.00 0.2 −0.5 0.0 0.5 1 2 3 4 5 6 Durationdata 0.1 Triangular kernel 0.56 0.0 ± Triangular kernel 0.48 1 2 3 4 5 Duration 0.40 0.32 Density 0.24 0.16 The centered histogram estimator can be rewritten as 0.08 −1 0 1 0.00 1 2 3 4 5 6 n Durationdata 1 1 x ¡ Xi f^ (x) = K Normal kernel rect 0.56 n i=1 h h ± Normal kernel P ³ ´ 0.48 0.40 where 0.32 Density 0.24 0.16 1 1 0.08 K(x) = 1 (x) −3 0 3 (¡ 2 ; 2 ] 0.00 1 2 3 4 5 6 Durationdata 1 1 is the indicator function for the interval (¡ 2; 2]. The function K is called a kernel or ¯lter. à use di®erent (smooth) kernel functions K(x) Kernel Density Estimation, May 20, 2004 - 3 - Kernel Density Estimation, May 20, 2004 - 4 - Kernel Estimators Kernel Estimators Statistical properties Examples: Old Faithful and Galaxies ± The expectation of f^K(x) is 1.0 h=0.03 0.25 h=0.2 0.8 1 x ¡ y 0.20 (f^K(x)) = K f(y) dy Z h ³ h ´ 0.6 0.15 Density 0.4 0.10Density 2 0.2 = K(z) f(x ¡ hz) dz = f(x) + O(h ): 0.05 Z 0.0 0.00 2 3 4 5 10 15 20 25 30 35 Duration Velocity à ^ h=0.06 h=0.4 The bias of fK(x) decreases as h gets smaller. 0.20 0.6 ^ 0.15 ± The variance of fK(x) is 0.4 0.10 Density Density 0.2 0.05 ^ f(x) 2 var(fK(x)) ¼ K(x) dx: 0.0 0.00 nh Z 2 3 4 5 10 15 20 25 30 35 Duration Velocity à ^ 0.6 h=0.12 h=0.6 The variance of fK(x) vanishes as nh ! 1. 0.5 0.15 0.4 0.10 0.3 Density Density 0.2 Conclusions: 0.05 0.1 ± Restrictions on bandwidth: h ! 0 and nh ! 1 as n ! 1. 0.0 0.00 2 3 4 5 10 15 20 25 30 35 1 Duration Velocity ¡ 5 ± Theory suggests that h / n , but the constant of proportionality h=0.24 h=0.8 0.5 0.15 depends on the unknown density. 0.4 0.10 0.3 Density 0.2 Density ± Trade-o® between bias and variance: 0.05 0.1 ¢ Undersmoothing 0.0 0.00 1 2 3 4 5 6 10 15 20 25 30 35 If bandwidth is too small, the variance becomes large. Duration Velocity 0.4 h=0.48 h=1.6 0.12 0.10 ¢ Oversmoothing 0.3 0.08 If bandwidth is too large, the bias becomes large. 0.2 0.06 Density Density 0.04 0.1 0.02 0.0 0.00 0 1 2 3 4 5 6 5 10 15 20 25 30 35 40 Duration Velocity 0.25 h=0.96 h=3.2 0.08 0.20 0.06 0.15 0.04 0.10Density Density 0.05 0.02 0.00 0.00 0 2 4 6 8 0 10 20 30 40 Duration Velocity Kernel Density Estimation, May 20, 2004 - 5 - Kernel Density Estimation, May 20, 2004 - 6 - Kernel Estimates How to do it in R? ± In R, kernel density estimates can be computed by the command density(): plot(density(Y,bw=0.2,method="gaussian"),type="l") ± As default h is chosen according to the following rule of thumb 1 h^ = 0:9 min(s; R=1:34) n¡5 where s is the sample standard deviation and R is the interquartile range. ± Better methods for selecting h are due to eg Sheather and Jones (1991) and can be invoked by the command bw.SJ: h<-bw.SJ(Y) # solve-the-equation method plot(density(Y,h)) h<-bw.SJ(Y,method="dpi") # direct-plug-in method plot(density(Y,h)) Rule−of−thumb Direct−plug−in Solve−the−equation 0.5 h=0.335 0.6 h=0.165 0.6 h=0.14 0.5 0.4 0.5 0.4 0.4 0.3 0.3 0.3 Density 0.2 Density Density 0.2 0.2 0.1 0.1 0.1 0.0 0.0 0.0 1 2 3 4 5 6 1 2 3 4 5 2 3 4 5 Duration Duration Duration Kernel Density Estimation, May 20, 2004 - 7 -.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    2 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us