Crimestat III
Total Page:16
File Type:pdf, Size:1020Kb
CrimeStat III Part III: Spatial Modeling Chapter 8 Kernel Density Interpolation In this chapter, we discuss tools aimed at interpolating incidents, using the kernel density approach. Interpolation is a technique for generalizing incident loca tion s to an entir e area. Whereas the spatia l distribution and hot spot statistics provide statistica l summaries for the data in cidents themselves, interpola tion techniques generalize those data incidents to the entire region. In particular, they provide density estimates for all parts of a region (i.e., at any location ). The densit y estim ate is an in tensit y varia ble, a Z- value, that is estimated at a particular location. Consequently, it can be displayed by either surface maps or contour maps that show the intensity at all locations. There are many interpola tion techniques, such as Krigin g, trend surfaces, local regression models (e.g., Loess, splines), and Dirichlet tessellations (Anselin, 1992; Cleveland, Grosse and Shyu, 1993; Venables and Ripley, 1997). Most of these require a variable that is being estimated as a function of location. However, kernel density estim ation is an in terpola tion technique that is appropria te for in dividual poin t loca tion s (Silverman, 1986; Härdle, 1991; Bailey and Gatrell, 1995; Burt and Barber, 1996; Bowman and Azalini, 1997). Kernel Density Estimation Kernel density estimation involves placing a symmetrical surface over each point, evaluating the distance from the point to a reference location based on a mathematical function, and summing the value of all the surfaces for that reference location. This procedure is repeated for all reference locations. It is a technique that was developed in the late 1950s as an alternative method for estimating the density of a histogram (Rosenbla tt, 1956; Wh it tle, 1958; Parzen, 1962). A histogr am is a gr aphic r epresentation of a frequency distribution. A continuous varia ble is divided into intervals of size, s (the interval or bin width), and the number of cases in each interval (bin) are counted and displayed as block diagrams. The histogram is assumed to represent a smooth, underlying distribution (a densit y fu nction ). H owever, in order to estim ate a smooth densit y fu nction from the histogram, traditionally researchers have linked adjacent variable intervals by connecting the midpoints of the intervals with a series of lines (Figure 8.1). Unfortunately, doing this causes three statistical problems (Bowman and Azalini, 1997): 1. Information is discarded because all cases within an interval are assigned to the midpoin t. The wider the in terva l, the gr eater the in formation loss. 2. The technique of connectin g t he midpoin ts leads to a discontin uous and not smooth density function even though the underlying density function is assumed to be smooth. To compensate for this, researchers will reduce the width of the interval. Thus, the density function becomes smoother with 8.1 Figure 8.1: Constructing A Density Estimate From A Histogram Method of Connecting Midpoints 50 40 30 Frequency 20 10 0 1 3 5 7 2 4 6 Variable Classification Interval (bin) smaller interval widths, although still not very smooth. Further, there are limits to this technique as the sample size decreases when the bin width gets smaller, eventually becoming too small to produce reliable estimates. 3. The technique is dependent on an arbitrarily defined interval size (bin width). By making the interval wider, the estimator becomes cruder and, conversely, by making the interval narrower, the estimator becomes finer. However, t he underlyin g densit y distribution is assumed to be smooth and continuous and not dependent on the interval size of a histogram. To handle this problem, Rosenblatt (1956), Whittle (1958) and Parzen (1962) developed the kernel densit y method in order to avoid the first two of these difficult ies; the bin width issue still r emain s. What they did was to pla ce a smooth kernel function, rather than a block, over each point and sum the functions for each location on the scale. Figure 8.2 illustrates the process with five point locations. As seen, over each location, a symmetrical kernel function is placed; by symmetrical is meant that is falls off with distance from each point at an equal rate in both directions around each point. In this case, it is a normal distribution, but other types of symmetrical distribution have been used. The underlying density distribution is estimated by summing the individual kernel functions at all locations to produce a smooth cumulative density function. Notice that the functions are summed at every point along the scale and not just at the point locations. The adva ntages of this are that, first, each poin t contributes equally t o the densit y surface and, second, the resulting density function is continuous at all points along the scale. The thir d problem mention ed above, interva l size, still r emain s sin ce the width of the kernel function can be varied. In the kernel density literature, this is called bandwidth and refers essentia lly t o the width of the kernel. Figure 8.3 shows a kernel wit h a narrow bandwidth placed over the same five points while figure 8.4 shows a kernel with a wider bandwidth placed over the points. Clearly, the smoothness of the resulting density function is a consequence of the bandwidth size. There are a number of different kernel fu nction s that have been used, a side from the normal distribution, such as a triangular function (Burt and Barber, 1996) or a quartic function (Bailey and Gatrell, 1995). Figure 8.5 illustrates a quartic function. But the normal is the most commonly used (Kelsall and Diggle, 1995a). The normal distribution function has the followin g fu nction al for m: 2 dij 1 - [--------- ] g(x ) = E{ [W * I ] * ----------- * e 2*h 2 } (8.1) j i i h2 * 2B where dij is the distance between an in cident loca tion and any r eference poin t in the region, h is the standard deviation of the normal distribution (the bandwidth), Wi is a weight at the point location and Ii is an intensity at the point location. This function extends to infinity in all directions and, thus, will be applied to any location in the region. 8.3 Figure 8.2: Kernel Density Estimate Summing of Normal Kernel Functions for 5 Points 0.7 0.6 Kernel density estimate 0.5 0.4 Density Kernels over individual points 0.3 0.2 0.1 0.0 0 2 4 6 8 10 12 14 16 18 20 1 3 5 7 9 11 13 15 17 19 Relative Location Figure 8.3: Kernel Density Estimate Smaller Bandwidth 1.0 0.9 0.8 Kernel density estimate 0.7 0.6 0.5 Density 0.4 0.3 0.2 0.1 0.0 0 2 4 6 8 10 12 14 16 18 20 1 3 5 7 9 11 13 15 17 19 Relative Location Figure 8.4: Kernel Density Estimate Larger Bandwidth 0.5 0.4 Kernel density estimate 0.3 Density Kernels over individual points 0.2 0.1 0.0 0 2 4 6 8 10 12 14 16 18 20 1 3 5 7 9 11 13 15 17 19 Relative Location Figure 8.5: Kernel Density Estimate Summing of Quartic Kernel Function 0.18 0.16 0.14 Kernel density estimate 0.12 0.10 Quartic functions over individual points Density 0.08 0.06 0.04 0.02 0 0 2 4 6 8 10 12 14 16 18 20 1 3 5 7 9 11 13 15 17 19 Relative Location In CrimeStat, there are four alternative kernel functions that can be used, all of which have a circumscribed radius (unlike the normal distribution). The quartic function is applied to a limit ed area around ea ch incident point defined by the radius, h. It falls off gradually with distance until the radius is reached. Its functional form is: 1. Outside the specified radius, h: g(xj) = 0 (8.2) 2. Within the specified radius, h: 2 3 dij g(x ) = E [W * I ] * [ ----------] * [1 - -------] 2 (8.3) j { i i 2 2 } h * B h where dij is the distance between an in cident loca tion and any r eference poin t in the region, h is the radius of the search area (the bandwidth), Wi is a weight at the poin t loca tion and I i is an intensity at the point location. The triangular (or conical) distribution falls off evenly with distance, in a linear rela tion ship. Compared to the quartic fu nction , it falls off more rapidly. It also has a circumscribed radius and is, therefore, applied to a limited area around each incident point, h. Its functional form is: 1. Outside the specified radius, h: g(xj) = 0 (8.4) 2. Within the specified radius, h: E g(xj) = [K - K/h ] * dij (8.5) where K is a constant. In CrimeStat, the constant K is initially set to 0.25 and then re- scaled to ensure that either the densities or probabilities sum to their appropriate values (i.e., N for densit ies and 1.00 for proba bilit ies). The negative exponential (or peaked) distribution falls off very rapidly with distance up to the circumscribed radius. Its functional form is: 1. Outside the specified radius, h: g(xj) = 0 (8.6) 2. Within the specified radius, h: E -K*dij g(xj) = A*e (8.7) 8.8 where A is a constant and K is an exponent.