Kernel Density Estimator

Lecture 8: Kernel Density Estimator D. Jason Koskinen [email protected] Advanced Methods in Applied Statistics Feb - Apr 2019 Photo by Howard Jackman University of Copenhagen Niels Bohr Institute Overview • Much of what we have covered has been parameter estimation, but using analytic or defined density expressions • Today we cover density estimates from the data itself • The methods are regularly employed on finite data samples that need smoothing or require non-parametric methods to get a PDF • Last few slides of this lecture contain extended literature for further reading D. Jason Koskinen - Advanced Methods in Applied Statistics !2 Histogram • The histogram is one of the most simple forms of a data- driven non-parametric density estimator • But, the only two histogram parameters (bin width and bin position(s)) are arbitrary Circle Area Histogram bin width 0.1 m2 250 2 Frequency bin width 1 m bin width 3 m2 200 *From Lecture 1 150 100 50 0 70 75 80 85 90 95 100 Area of Circle from MC [m^2] D. Jason Koskinen - Advanced Methods in Applied Statistics !3 Histograms • The histogram is one of the most simple forms of a data- driven non-parametric density estimator • But, the only two histogram parameters (bin width and bin position(s)) are arbitrary • Histograms are not smooth, but ideally our density estimator/ function is smooth • More dimensions requires more data in order to have a multidimensional histogram which can match the true PDF • We can avoid some of these issues with density estimates by using something more sensible D. Jason Koskinen - Advanced Methods in Applied Statistics !4 Wish List • For density estimates what do we want? • non-parametric, i.e. no explicit requirement for the form of the PDF • (Easily) extendable to higher dimensions • Use data to get local point-wise density estimates which can be combined to get an overall density estimate • Smooth • At least smoother than a ‘jagged’ histogram • Preserves real probabilities, i.e. any transformation has to give PDFs which integrate to 1 and don’t ever go negative • The answer… Kernel Density Estimation (KDE) • Sometimes it is “Estimator” too for KDE D. Jason Koskinen - Advanced Methods in Applied Statistics !5 Bayesian Blocks • An alternative to constant bins for histograms is to use Bayesian Blocks developed by J.D. Scargle • Bayesian Blocks are very useful for determining time varying changes • Covers many, but not all, wish list items dN Γ Crab Flux = E− <latexit sha1_base64="OCfKy9BVbZMA5W2to3getioUP7s=">AAACLXicbZDLSsNAFIYn3q23qks3g0VwY0lEUBeCWKquRMGq0NRyMpno4EwSZk7EEuKD+BQ+glt9AEEQNy58Dae1C28/DPz85xzOnC9IpTDoui/OwODQ8Mjo2HhpYnJqeqY8O3dikkwz3mCJTPRZAIZLEfMGCpT8LNUcVCD5aXBV69ZPr7k2IomPsZPyloKLWESCAdqoXV7zkd+gVnlNQ3C7K7ObYsuPNLA8PCjysF5QP9VJigmtn+cr/h4oBUW7XHGrbk/0r/H6pkL6OmyX3/0wYZniMTIJxjQ9N8VWDhoFk7wo+ZnhKbAruOBNa2NQ3LTy3nUFXbJJSKNE2xcj7aXfJ3JQxnRUYDsV4KX5XeuG/9WaGUYbrVzEaYY8Zl+LokxSe2wXFQ2F5gxlxxpgWti/UnYJlg1aoD+2BIEqSpaK95vBX9NYrW5WvaO1yvZOH88YWSCLZJl4ZJ1sk31ySBqEkTvyQB7Jk3PvPDuvzttX64DTn5knP+R8fAKDmqnp</latexit> dE / *arXiv:1308.6698 D. Jason Koskinen - Advanced Methods in Applied Statistics !6 Basics of Data Driven Estimation • Fixed regions in parameter space are expected to have approximately equal probabilities • The smaller the region(s) the more supported our assumption that probability is constant • The more data in each region the more accurate the density estimate • We will keep the region fixed and find some compromise; large enough to collect some data, but small enough that our probability assumption is reasonable • For more thorough treatment of the original idea see the articles by Parzen and Rosenblatt in the last slides of this lecture D. Jason Koskinen - Advanced Methods in Applied Statistics !7 Example in 1 dimension • Start with a uniform probability box around data points: 0.5, for u 1 K(u)= | | (0, for u > 1 <latexit sha1_base64="yQcwt0AVso64G4+jXxzZHVkIB8A=">AAACZnicbVDPaxNBGJ2sWutq29UiHrwMDUILJezWil6UoBfBS4QmDWSXMDv5kgyZmd3OfCuG7eZP6//h3YMXe+3VSboHk/bBwOO979e8NJfCYhj+angPHj7aerz9xH/6bGd3L3j+omezwnDo8kxmpp8yC1Jo6KJACf3cAFOphPN09mXpn/8AY0Wmz3CeQ6LYRIux4AydNAz63w6Lo49xChOhS+4G2coPW++OF4sY4ScaVY4zUy0ui0saS7iI4tgPnXmP/SnyY9CjeggdBs2wFa5A75KoJk1SozMMbuJRxgsFGrlk1g6iMMekZAYFl1D5cWEhZ3zGJjBwVDMFNilXCVT0jVNG1N3inka6Uv/vKJmydq5SV6kYTu2mtxTv8wYFjj8kpdB5gaD57aJxISlmdBknHQkDHOXcEcaNcLdSPmWGcXShr21JU7X2CReU5iAr3yUVbeZyl/ROWtHbVvj9tNn+XGe2TV6TA3JIIvKetMlX0iFdwskV+U3+kuvGH2/Xe+m9ui31GnXPPlmDR/8B/6y8qw==</latexit> | | • Notice that this kernel K(u) is normalized. The kernel is always normalized!! The total width is 2*1, so the ‘height’ is 1/(2*1) • For a 1D set of data (x), then u=x-h, where h is a distance around a data point that has a non-zero probability • The value of h is set by the analyzer • Here, the h=1 D. Jason Koskinen - Advanced Methods in Applied Statistics !8 KDE by pieces • For 2 data points (x1 and x2) we can produce a data driven probability density P(x) using a box of total width=2 and height=0.5 centered at each data point P(x) Probability Contribution halfwidth=1 Probability { <latexit sha1_base64="53yJEsIG/faLkTeRBsD4wnXfoWo=">AAACDnicbVDLSgNBEOz1GeMr6tHLYhA8hV0V9Bj04jGKeUCyhNlJbzJkZnaZmRXCkj/w4EU/xZt49Rf8Eq9Okj2YxIKGoqqb7q4w4Uwbz/t2VlbX1jc2C1vF7Z3dvf3SwWFDx6miWKcxj1UrJBo5k1g3zHBsJQqJCDk2w+HtxG8+odIslo9mlGAgSF+yiFFirPTQybqlslfxpnCXiZ+TMuSodUs/nV5MU4HSUE60bvteYoKMKMMox3Gxk2pMCB2SPrYtlUSgDrLppWP31Co9N4qVLWncqfp3IiNC65EIbacgZqAXvYn4n9dOTXQdZEwmqUFJZ4uilLsmdidvuz2mkBo+soRQxeytLh0QRaix4cxtCUMx90RGiaTIx0WblL+YyzJpnFf8i4p3f1mu3uSZFeAYTuAMfLiCKtxBDepAIYJneIU358V5dz6cz1nripPPHMEcnK9fInScyg==</latexit> Contribution from x1 from x2 K(x-h)=0.5 x1 x2 {<latexit sha1_base64="53yJEsIG/faLkTeRBsD4wnXfoWo=">AAACDnicbVDLSgNBEOz1GeMr6tHLYhA8hV0V9Bj04jGKeUCyhNlJbzJkZnaZmRXCkj/w4EU/xZt49Rf8Eq9Okj2YxIKGoqqb7q4w4Uwbz/t2VlbX1jc2C1vF7Z3dvf3SwWFDx6miWKcxj1UrJBo5k1g3zHBsJQqJCDk2w+HtxG8+odIslo9mlGAgSF+yiFFirPTQybqlslfxpnCXiZ+TMuSodUs/nV5MU4HSUE60bvteYoKMKMMox3Gxk2pMCB2SPrYtlUSgDrLppWP31Co9N4qVLWncqfp3IiNC65EIbacgZqAXvYn4n9dOTXQdZEwmqUFJZ4uilLsmdidvuz2mkBo+soRQxeytLh0QRaix4cxtCUMx90RGiaTIx0WblL+YyzJpnFf8i4p3f1mu3uSZFeAYTuAMfLiCKtxBDepAIYJneIU358V5dz6cz1nripPPHMEcnK9fInScyg==</latexit> x D. Jason Koskinen - Advanced Methods in Applied Statistics Final KDE • Combine the contributions, normalize the new function to 1, and we have generated a data driven PDF. Kernel Density Estimator using a box kernel P(x) x1 x2 x D. Jason Koskinen - Advanced Methods in Applied Statistics Hyper-Cube • Our extended ‘region’ definition is a D-dimensional hyper- cube with lengths equal to 2h on each side • We include all points within the hyper-cube volume via some weighting scheme. This is known as the kernel (K) which is for this KDE: 0.5, ~xi in region Rd K(~xi)= (0, ~xi outside region Rd <latexit sha1_base64="8WZXQ2wv29nrawHxSN6INbhO/gM=">AAAC5XicdVHLbtQwFHVSHiW8hrJkYzGDVKRqlBQQbCpVsEHqpkVMW2k8ihznZmqNH5HtVB1FyR+wQ2z5LMSXsMWZRqjTlrs6Pvee+zjOSsGti+PfQbhx5+69+5sPooePHj95Oni2dWx1ZRhMmBbanGbUguAKJo47AaelASozASfZ4lOXPzkHY7lWX92yhJmkc8ULzqjzVDr4dbBNzoHVF03KX++RDOZc1cw3tE0Uj9/t4Bb/y+OWOLhwRtZcYeMrtWpaIqk7Kwxd1F+aNCckinfatr1FoytneQ7/FxJQeT8ad89eV2iDrZaAR1cVI8xAOTCQY+pwqblyvqCfmo+adDCMx/Eq8E2Q9GCI+jhMB39IrlklfVcmqLXTJC7drKbGcSagiUhloaRsQecw9VBRCXZWrz6gwa88k+Nu00L7RVbsVUVNpbVLmfnK7gZ7PdeRt+WmlSs+zLzdZeVAsctBRSWw07j7TZxzA8yJpQeUGe53xeyMGsq8NetTskyuHeGdVgxEE3mnkuu+3ATHu+PkzTg+ejvc/9h7toleoJdoGyXoPdpHn9EhmiAW7AUsEIEM5+G38Hv447I0DHrNc7QW4c+/ML/tkg==</latexit> for some R centered at point ~xd • Sometimes you will see the kernel as K(u) where u is the ‘distance’ from xi to xd D. Jason Koskinen - Advanced Methods in Applied Statistics !11 Kernel Characteristics • The integrated kernel always equals 1 + 1 K(u)du =1 Z<latexit sha1_base64="3h4hio49WRnwVR6x2oY95AGkKWM=">AAACGXicbZDNSgMxFIUz9a/Wv6pLN8EiVMQyI4K6EIpuBDcVHFtox5JJM21oJjMkd4QydOtT+Ahu9QFciVtXrn0R03YWtnoh5OOce7nJ8WPBNdj2l5Wbm19YXMovF1ZW19Y3iptbdzpKFGUujUSkGj7RTHDJXOAgWCNWjIS+YHW/fzny6w9MaR7JWxjEzAtJV/KAUwJGahdxi0top4fmCmAwvE8PMrouJ/ud5NxpF0t2xR4X/gtOBiWUVa1d/G51IpqETAIVROumY8fgpUQBp4INC61Es5jQPumypkFJQqa9dPyTId4zSgcHkTJHAh6rvydSEmo9CH3TGRLo6VlvJP7nNRMITr2UyzgBJulkUZAIDBEexYI7XDEKYmCAUMXNWzHtEUUomPCmtvh+OCyYVJzZDP6Ce1Q5qzg3x6XqRRZPHu2gXVRGDjpBVXSFashFFD2iZ/SCXq0n6816tz4mrTkrm9lGU2V9/gA7C6Dd</latexit> 1 • If ‘u’ is multidimensional, then so is the integration. But, the integral is always 1. • Even if the kernel is not transformed to be 1D, e.g. K(x,y,z) instead of K(u), the integral of the kernel is always 1. K(x, y, z) dx dy dz =1 3D in cartesian coordinate x, y, and z ZZZ <latexit sha1_base64="frhnoNRS6vbUvApFmo7dA1E+/2s=">AAAB/3icbVBNS8NAEN34WetX1aOXxSJ4KokI6q3oxWMFYwttKJvtpF26uwm7m0IJ9eBP8Ko/wJN49ad49o+4TXOwrQ8GHu/NMDMvTDjTxnW/nZXVtfWNzdJWeXtnd2+/cnD4qONUUfBpzGPVCokGziT4hhkOrUQBESGHZji8nfrNESjNYvlgxgkEgvQlixglxkqtzgho9jQpdytVt+bmwMvEK0gVFWh0Kz+dXkxTAdJQTrRue25igowowyiHSbmTakgIHZI+tC2VRIAOsvzeCT61Sg9HsbIlDc7VvxMZEVqPRWg7BTEDvehNxf+8dmqiqyBjMkkNSDpbFKUcmxhPn8c9poAaPraEUMXsrZgOiCLU2IjmtoShyFPxFjNYJv557brm3V9U6zdFPCV0jE7QGfLQJaqjO9RAPqKIoxf0it6cZ+fd+XA+Z60rTjFzhObgfP0CZ+OWxg==</latexit> K(s1,...,sk) ds1 ...dsk =1 k-dimensions represented by ~s <latexit sha1_base64="z2PVkZ8JIbLSWO1du7YqeZPA5IU=">AAACSnicbVDLSsNAFJ3UqjW+qi7dDBahQiiJCOpCKLoR3LRgbaEJYTKd1qGTBzMTaVr6XX6FH+DClaAf4ErcOEmzsK0XLpw599zHHC9iVEjTfNUKK8XVtfXShr65tb2zW97bfxBhzDFp4ZCFvOMhQRgNSEtSyUgn4gT5HiNtb3iT1ttPhAsaBvcyiYjjo0FA+xQjqSi33LQppYGEd9WRkRjjE2gbvZHKROX4yrJt3aa9UIqZRriWYadPQ7jDTKsYmDFQwWHa4JYrZs3MAi4DKwcVkEfDLb+rATj2SSAxQ0J0LTOSzgRxSTEjU92OBYkQHqIB6SoYIJ8IZ5J9fQqPFdOD/ZCrVCdm7N+OCfKFSHxPKX0kH8ViLSX/q3Vj2b9wJjSIYkkCPFvUjxmUIUx9hD3KCZYsUQBhTtWtED8ijrBUbs9t8Tx/qitXrEUPlkHrtHZZs5pnlfp1bk8JHIIjUAUWOAd1cAsaoAUweAZv4AN8ai/al/at/cykBS3vOQBzUSj+Aid8sI8=</latexit>

Load more