Kernel Density Estimator

Lecture 8: Kernel Density Estimator D. Jason Koskinen [email protected] Advanced Methods in Applied Statistics Feb - Apr 2019 Photo by Howard Jackman University of Copenhagen Niels Bohr Institute Overview • Much of what we have covered has been parameter estimation, but using analytic or defined density expressions • Today we cover density estimates from the data itself • The methods are regularly employed on finite data samples that need smoothing or require non-parametric methods to get a PDF • Last few slides of this lecture contain extended literature for further reading D. Jason Koskinen - Advanced Methods in Applied Statistics !2 Histogram • The histogram is one of the most simple forms of a data- driven non-parametric density estimator • But, the only two histogram parameters (bin width and bin position(s)) are arbitrary Circle Area Histogram bin width 0.1 m2 250 2 Frequency bin width 1 m bin width 3 m2 200 *From Lecture 1 150 100 50 0 70 75 80 85 90 95 100 Area of Circle from MC [m^2] D. Jason Koskinen - Advanced Methods in Applied Statistics !3 Histograms • The histogram is one of the most simple forms of a data- driven non-parametric density estimator • But, the only two histogram parameters (bin width and bin position(s)) are arbitrary • Histograms are not smooth, but ideally our density estimator/ function is smooth • More dimensions requires more data in order to have a multidimensional histogram which can match the true PDF • We can avoid some of these issues with density estimates by using something more sensible D. Jason Koskinen - Advanced Methods in Applied Statistics !4 Wish List • For density estimates what do we want? • non-parametric, i.e. no explicit requirement for the form of the PDF • (Easily) extendable to higher dimensions • Use data to get local point-wise density estimates which can be combined to get an overall density estimate • Smooth • At least smoother than a ‘jagged’ histogram • Preserves real probabilities, i.e. any transformation has to give PDFs which integrate to 1 and don’t ever go negative • The answer… Kernel Density Estimation (KDE) • Sometimes it is “Estimator” too for KDE D. Jason Koskinen - Advanced Methods in Applied Statistics !5 Bayesian Blocks • An alternative to constant bins for histograms is to use Bayesian Blocks developed by J.D. Scargle • Bayesian Blocks are very useful for determining time varying changes • Covers many, but not all, wish list items dN Γ Crab Flux = E− <latexit sha1_base64="OCfKy9BVbZMA5W2to3getioUP7s=">AAACLXicbZDLSsNAFIYn3q23qks3g0VwY0lEUBeCWKquRMGq0NRyMpno4EwSZk7EEuKD+BQ+glt9AEEQNy58Dae1C28/DPz85xzOnC9IpTDoui/OwODQ8Mjo2HhpYnJqeqY8O3dikkwz3mCJTPRZAIZLEfMGCpT8LNUcVCD5aXBV69ZPr7k2IomPsZPyloKLWESCAdqoXV7zkd+gVnlNQ3C7K7ObYsuPNLA8PCjysF5QP9VJigmtn+cr/h4oBUW7XHGrbk/0r/H6pkL6OmyX3/0wYZniMTIJxjQ9N8VWDhoFk7wo+ZnhKbAruOBNa2NQ3LTy3nUFXbJJSKNE2xcj7aXfJ3JQxnRUYDsV4KX5XeuG/9WaGUYbrVzEaYY8Zl+LokxSe2wXFQ2F5gxlxxpgWti/UnYJlg1aoD+2BIEqSpaK95vBX9NYrW5WvaO1yvZOH88YWSCLZJl4ZJ1sk31ySBqEkTvyQB7Jk3PvPDuvzttX64DTn5knP+R8fAKDmqnp</latexit> dE / *arXiv:1308.6698 D. Jason Koskinen - Advanced Methods in Applied Statistics !6 Basics of Data Driven Estimation • Fixed regions in parameter space are expected to have approximately equal probabilities • The smaller the region(s) the more supported our assumption that probability is constant • The more data in each region the more accurate the density estimate • We will keep the region fixed and find some compromise; large enough to collect some data, but small enough that our probability assumption is reasonable • For more thorough treatment of the original idea see the articles by Parzen and Rosenblatt in the last slides of this lecture D. Jason Koskinen - Advanced Methods in Applied Statistics !7 Example in 1 dimension • Start with a uniform probability box around data points: 0.5, for u 1 K(u)= | | (0, for u > 1 <latexit sha1_base64="yQcwt0AVso64G4+jXxzZHVkIB8A=">AAACZnicbVDPaxNBGJ2sWutq29UiHrwMDUILJezWil6UoBfBS4QmDWSXMDv5kgyZmd3OfCuG7eZP6//h3YMXe+3VSboHk/bBwOO979e8NJfCYhj+angPHj7aerz9xH/6bGd3L3j+omezwnDo8kxmpp8yC1Jo6KJACf3cAFOphPN09mXpn/8AY0Wmz3CeQ6LYRIux4AydNAz63w6Lo49xChOhS+4G2coPW++OF4sY4ScaVY4zUy0ui0saS7iI4tgPnXmP/SnyY9CjeggdBs2wFa5A75KoJk1SozMMbuJRxgsFGrlk1g6iMMekZAYFl1D5cWEhZ3zGJjBwVDMFNilXCVT0jVNG1N3inka6Uv/vKJmydq5SV6kYTu2mtxTv8wYFjj8kpdB5gaD57aJxISlmdBknHQkDHOXcEcaNcLdSPmWGcXShr21JU7X2CReU5iAr3yUVbeZyl/ROWtHbVvj9tNn+XGe2TV6TA3JIIvKetMlX0iFdwskV+U3+kuvGH2/Xe+m9ui31GnXPPlmDR/8B/6y8qw==</latexit> | | • Notice that this kernel K(u) is normalized. The kernel is always normalized!! The total width is 2*1, so the ‘height’ is 1/(2*1) • For a 1D set of data (x), then u=x-h, where h is a distance around a data point that has a non-zero probability • The value of h is set by the analyzer • Here, the h=1 D. Jason Koskinen - Advanced Methods in Applied Statistics !8 KDE by pieces • For 2 data points (x1 and x2) we can produce a data driven probability density P(x) using a box of total width=2 and height=0.5 centered at each data point P(x) Probability Contribution halfwidth=1 Probability { <latexit sha1_base64="53yJEsIG/faLkTeRBsD4wnXfoWo=">AAACDnicbVDLSgNBEOz1GeMr6tHLYhA8hV0V9Bj04jGKeUCyhNlJbzJkZnaZmRXCkj/w4EU/xZt49Rf8Eq9Okj2YxIKGoqqb7q4w4Uwbz/t2VlbX1jc2C1vF7Z3dvf3SwWFDx6miWKcxj1UrJBo5k1g3zHBsJQqJCDk2w+HtxG8+odIslo9mlGAgSF+yiFFirPTQybqlslfxpnCXiZ+TMuSodUs/nV5MU4HSUE60bvteYoKMKMMox3Gxk2pMCB2SPrYtlUSgDrLppWP31Co9N4qVLWncqfp3IiNC65EIbacgZqAXvYn4n9dOTXQdZEwmqUFJZ4uilLsmdidvuz2mkBo+soRQxeytLh0QRaix4cxtCUMx90RGiaTIx0WblL+YyzJpnFf8i4p3f1mu3uSZFeAYTuAMfLiCKtxBDepAIYJneIU358V5dz6cz1nripPPHMEcnK9fInScyg==</latexit> Contribution from x1 from x2 K(x-h)=0.5 x1 x2 {<latexit sha1_base64="53yJEsIG/faLkTeRBsD4wnXfoWo=">AAACDnicbVDLSgNBEOz1GeMr6tHLYhA8hV0V9Bj04jGKeUCyhNlJbzJkZnaZmRXCkj/w4EU/xZt49Rf8Eq9Okj2YxIKGoqqb7q4w4Uwbz/t2VlbX1jc2C1vF7Z3dvf3SwWFDx6miWKcxj1UrJBo5k1g3zHBsJQqJCDk2w+HtxG8+odIslo9mlGAgSF+yiFFirPTQybqlslfxpnCXiZ+TMuSodUs/nV5MU4HSUE60bvteYoKMKMMox3Gxk2pMCB2SPrYtlUSgDrLppWP31Co9N4qVLWncqfp3IiNC65EIbacgZqAXvYn4n9dOTXQdZEwmqUFJZ4uilLsmdidvuz2mkBo+soRQxeytLh0QRaix4cxtCUMx90RGiaTIx0WblL+YyzJpnFf8i4p3f1mu3uSZFeAYTuAMfLiCKtxBDepAIYJneIU358V5dz6cz1nripPPHMEcnK9fInScyg==</latexit> x D. Jason Koskinen - Advanced Methods in Applied Statistics Final KDE • Combine the contributions, normalize the new function to 1, and we have generated a data driven PDF. Kernel Density Estimator using a box kernel P(x) x1 x2 x D. Jason Koskinen - Advanced Methods in Applied Statistics Hyper-Cube • Our extended ‘region’ definition is a D-dimensional hyper- cube with lengths equal to 2h on each side • We include all points within the hyper-cube volume via some weighting scheme. This is known as the kernel (K) which is for this KDE: 0.5, ~xi in region Rd K(~xi)= (0, ~xi outside region Rd <latexit sha1_base64="8WZXQ2wv29nrawHxSN6INbhO/gM=">AAAC5XicdVHLbtQwFHVSHiW8hrJkYzGDVKRqlBQQbCpVsEHqpkVMW2k8ihznZmqNH5HtVB1FyR+wQ2z5LMSXsMWZRqjTlrs6Pvee+zjOSsGti+PfQbhx5+69+5sPooePHj95Oni2dWx1ZRhMmBbanGbUguAKJo47AaelASozASfZ4lOXPzkHY7lWX92yhJmkc8ULzqjzVDr4dbBNzoHVF03KX++RDOZc1cw3tE0Uj9/t4Bb/y+OWOLhwRtZcYeMrtWpaIqk7Kwxd1F+aNCckinfatr1FoytneQ7/FxJQeT8ad89eV2iDrZaAR1cVI8xAOTCQY+pwqblyvqCfmo+adDCMx/Eq8E2Q9GCI+jhMB39IrlklfVcmqLXTJC7drKbGcSagiUhloaRsQecw9VBRCXZWrz6gwa88k+Nu00L7RVbsVUVNpbVLmfnK7gZ7PdeRt+WmlSs+zLzdZeVAsctBRSWw07j7TZxzA8yJpQeUGe53xeyMGsq8NetTskyuHeGdVgxEE3mnkuu+3ATHu+PkzTg+ejvc/9h7toleoJdoGyXoPdpHn9EhmiAW7AUsEIEM5+G38Hv447I0DHrNc7QW4c+/ML/tkg==</latexit> for some R centered at point ~xd • Sometimes you will see the kernel as K(u) where u is the ‘distance’ from xi to xd D. Jason Koskinen - Advanced Methods in Applied Statistics !11 Kernel Characteristics • The integrated kernel always equals 1 + 1 K(u)du =1 Z<latexit sha1_base64="3h4hio49WRnwVR6x2oY95AGkKWM=">AAACGXicbZDNSgMxFIUz9a/Wv6pLN8EiVMQyI4K6EIpuBDcVHFtox5JJM21oJjMkd4QydOtT+Ahu9QFciVtXrn0R03YWtnoh5OOce7nJ8WPBNdj2l5Wbm19YXMovF1ZW19Y3iptbdzpKFGUujUSkGj7RTHDJXOAgWCNWjIS+YHW/fzny6w9MaR7JWxjEzAtJV/KAUwJGahdxi0top4fmCmAwvE8PMrouJ/ud5NxpF0t2xR4X/gtOBiWUVa1d/G51IpqETAIVROumY8fgpUQBp4INC61Es5jQPumypkFJQqa9dPyTId4zSgcHkTJHAh6rvydSEmo9CH3TGRLo6VlvJP7nNRMITr2UyzgBJulkUZAIDBEexYI7XDEKYmCAUMXNWzHtEUUomPCmtvh+OCyYVJzZDP6Ce1Q5qzg3x6XqRRZPHu2gXVRGDjpBVXSFashFFD2iZ/SCXq0n6816tz4mrTkrm9lGU2V9/gA7C6Dd</latexit> 1 • If ‘u’ is multidimensional, then so is the integration. But, the integral is always 1. • Even if the kernel is not transformed to be 1D, e.g. K(x,y,z) instead of K(u), the integral of the kernel is always 1. K(x, y, z) dx dy dz =1 3D in cartesian coordinate x, y, and z ZZZ <latexit sha1_base64="frhnoNRS6vbUvApFmo7dA1E+/2s=">AAAB/3icbVBNS8NAEN34WetX1aOXxSJ4KokI6q3oxWMFYwttKJvtpF26uwm7m0IJ9eBP8Ko/wJN49ad49o+4TXOwrQ8GHu/NMDMvTDjTxnW/nZXVtfWNzdJWeXtnd2+/cnD4qONUUfBpzGPVCokGziT4hhkOrUQBESGHZji8nfrNESjNYvlgxgkEgvQlixglxkqtzgho9jQpdytVt+bmwMvEK0gVFWh0Kz+dXkxTAdJQTrRue25igowowyiHSbmTakgIHZI+tC2VRIAOsvzeCT61Sg9HsbIlDc7VvxMZEVqPRWg7BTEDvehNxf+8dmqiqyBjMkkNSDpbFKUcmxhPn8c9poAaPraEUMXsrZgOiCLU2IjmtoShyFPxFjNYJv557brm3V9U6zdFPCV0jE7QGfLQJaqjO9RAPqKIoxf0it6cZ+fd+XA+Z60rTjFzhObgfP0CZ+OWxg==</latexit> K(s1,...,sk) ds1 ...dsk =1 k-dimensions represented by ~s <latexit sha1_base64="z2PVkZ8JIbLSWO1du7YqeZPA5IU=">AAACSnicbVDLSsNAFJ3UqjW+qi7dDBahQiiJCOpCKLoR3LRgbaEJYTKd1qGTBzMTaVr6XX6FH+DClaAf4ErcOEmzsK0XLpw599zHHC9iVEjTfNUKK8XVtfXShr65tb2zW97bfxBhzDFp4ZCFvOMhQRgNSEtSyUgn4gT5HiNtb3iT1ttPhAsaBvcyiYjjo0FA+xQjqSi33LQppYGEd9WRkRjjE2gbvZHKROX4yrJt3aa9UIqZRriWYadPQ7jDTKsYmDFQwWHa4JYrZs3MAi4DKwcVkEfDLb+rATj2SSAxQ0J0LTOSzgRxSTEjU92OBYkQHqIB6SoYIJ8IZ5J9fQqPFdOD/ZCrVCdm7N+OCfKFSHxPKX0kH8ViLSX/q3Vj2b9wJjSIYkkCPFvUjxmUIUx9hD3KCZYsUQBhTtWtED8ijrBUbs9t8Tx/qitXrEUPlkHrtHZZs5pnlfp1bk8JHIIjUAUWOAd1cAsaoAUweAZv4AN8ai/al/at/cykBS3vOQBzUSj+Aid8sI8=</latexit>

Kernel Density Estimator

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support