Visualizing Multivariate Spatial Correlation with Dynamically Linked Windows1
Total Page:16
File Type:pdf, Size:1020Kb
Visualizing Multivariate Spatial Correlation with Dynamically Linked Windows1 LUC ANSELIN, IBNU SYABRI, and OLEG SMIRNOV Spatial Analysis Laboratory (SAL) Department of Agricultural and Consumer Economics University of Illinois, Urbana-Champaign Urbana, IL 61801 Abstract Several recent efforts have focused on adding exploratory data analysis functional- ity to geographic information systems (GIS) by integrating established statistical soft- ware with a GIS. In this paper, we outline an alternative approach, where the function- ality is built from scratch, using a combination of small libraries of dedicated func- tions, rather than relying on the full scope of existing software suites. The suggested approach is modular and freestanding. Within an overall framework of dynamically linked windows, it combines a cartographic representation of data on a map with tra- ditional statistical graphics, such as histograms, box plots, and scatterplots. It extends earlier work on the visualization of spatial autocorrelation to a multivariate setting, in- troducing a Moran Scatterplot Matrix and Multivariate LISA Maps. The new program (DynESDA2) works on both point and polygon coverages, implements true brushing of maps, as well as the usual linking and brushing between maps and statistical graphs. Key Words: spatial statistics, exploratory spatial data analysis, visualization, dynamic graphics, geocomputation, GIS. 1 Introduction With the proliferation of user-friendly geographic information systems, the easy access to geocoded data and the increased interest in substantive “spatial” research questions, the demand for sophisticated spatial analytical tools has increased considerably in recent years (Goodchild et al. 2000). Building on an early interest in linking standard statistical software packages and GIS in order to carry out generic data analyses, several more recent efforts have focused on adding exploratory spatial data analysis (ESDA) functionality. The approach commonly taken is to establish a link between a statistical package and a GIS by means of remote procedure calls and a client/server architecture (often referred to as “close coupling”). There are by now quite a few implementations of this idea, for example, linking statistical software packages such as S-Plus, XGobi, or XploRe to GIS software, 1Contact: [email protected]. This research was supported in part by NSF Grants SBR-9410612, BCS- 9978058, (to the Center for Spatially Integrated Social Science, CSISS), and by a grant from the National Consor- tium on Violence Research (NCOVR). NCOVR is supported under grant SBR-9513040 from the National Science Foundation. The considerable contributions by Shuming Bao to the development of the initial versions of the SpaceStat-ArcView linkage software are gratefully acknowledged. Programming help was provided by Yanqui Ren. This paper includes and elaborates on materials earlier outlined in Anselin et al. (2002). Parts were pre- sented at sessions at the BioMedware Conference on Space-Time Information Systems, Ann Arbor, MI, Jan. 11–12, 2002, and the Annual Meetings of the Association of American Geographers, Los Angeles, CA, March 20–23, 2002. Comments from participants at these sessions are appreciated. Visualizing Multivariate Spatial Correlation 2 such as ArcView and Arc/Info (for recent reviews of the relevant literature, see Anselin 1998, 2000, Symanzik et al. 2000, Zhang and Griffith 2000, Wise et al. 2001). In this paper, we report on an ongoing software tools development project carried out as part of the activities of the U.S. Center for Spatially Integrated Social Science (CSISS) to facilitate exploratory spatial data analysis. We approach the problem in a different way than most recent linking efforts in that the ESDA functionality is built from scratch, rather than by connecting existing software suites. This is accomplished by using a combination of small libraries of dedicated functions as software “components.” The approach is modular and extensible, as well as completely freestanding. The current set of tools is built around ESRI’s MapObjects Lite software components to implement mapping and data base access. This is augmented with functionality to carry out the spatial analysis (written in C++). It does not require the use of a particular GIS, although it adheres to the ESRI shapefile format (ESRI 1995) for data input. The user interface consists of fully dynamically linked windows that include multiple cartographic (thematic) representations of data on maps as well as traditional statistical graphics, such as histograms, box plots, and scatterplots. It also includes several devices to visualize spatial autocorrelation in lattice (or regional) data, such as the Moran Scatterplot and LISA maps (Anselin 1995, 1996, 2000). In addition, the visualization of spatial autocorrelation has been extended to apply to multivariate settings, introducing the concept of a Moran Scatterplot Matrix and Multivariate LISA Maps. In the remainder of this paper, we first provide some background and situate our ap- proach among a number of other efforts that have been reported in the recent literature. We then proceed by presenting the methodological approach to carry out visualization of multivariate spatial correlation. Next, we turn to the DynESDA2 software itself and out- line its architecture and functionality. We close with some concluding comments on future directions. 2 Background The current framework, referred to as DynESDA2, is the latest iteration in an ongoing effort to augment the visualization and spatial data manipulation functionality of a GIS with an analytical engine that contains spatial statistical and spatial econometric methods.2 The original outline of the conceptual framework for such an integration can be found in Anselin and Getis (1992) and Goodchild et al. (1992). The first implementation of this inte- grated framework consisted of interfacing the spatial econometric and ESDA functionality of SpaceStat (Anselin 1992, 2002) with ESRI’s Arc/Info GIS in Anselin et al. (1993). The interaction between the two software packages was based on so-called “loose coupling,” which consisted of moving data and location-specific results back and forth between Space- Stat as the analytical engine and Arc/Info as the visualization engine. This early effort was more a proof of concept than a practical tool, as it suffered from performance problems and limitations for dynamic interaction due to the design of Arc/Info (as well as from the use of two different operating systems, Unix for Arc/Info and DOS for SpaceStat). Arc/Info was used as the basis for an integrated or linked framework by a number of others as well (in a Unix environment), although using a different architecture. For example, dynamically linked windows from XGobi were interfaced with Arc/Info by Symanzik et al. (1994b), based on a client/server architecture, but only limited interaction was possible with the maps in Arc/Info. Similarly, the extension of Arc/Info with spatial statistical functionality 2More extensive descriptions of the evolution of software tools can be found in Anselin et al. (1993), Anselin and Bao (1996, 1997), Bao and Anselin (1998), Anselin and Smirnov (1999a,b) and Anselin (2000). Visualizing Multivariate Spatial Correlation 3 implemented in the SAGE Project uses a client/server architecture to avoid performance problems with loose coupling (Haining et al. 1996, 1998, 2000, Wise et al. 2001). The popularization of the ArcView desktop GIS software in the mid-1990s saw this GIS become the primary focus of extension efforts.3 Initially, spatial statistical functionality was added by means of the built-in Avenue scripting language. For example, Zhang and Griffith (1997) provide spatial autocorrelation statistics through the application of Avenue scripts. Also, in Anselin and Bao (1996, 1997) and Bao and Anselin (1998), Avenue scripts are used to implement the link with SpaceStat. Performance problems, both in terms of speed as well as in terms of the size of problems that could be handled, quickly led to the adoption of different designs. Popular among these was the use of remote procedure calls to link ArcView with other (statistical) software. For example, this is applied in the series of integration efforts in a Unix environment between exploratory software such as XGobi and XploRe on the one hand, and the ArcView GIS on the other hand, by Cook, Symanzik and co-workers (see, e.g., Cook et al. 1996, 1997, Symanzik et al. 1994a, 1997, 1998, 2000). Similarly, the link between the S-Plus statistical software and ArcView is based on remote procedure calls (implemented in Avenue scripts), allowing S-Plus commands to be invoked from within ArcView and vice versa (Bao et al. 2000). While exploiting the functionality of ArcView for interactive mapping and querying, combined with the linking and brushing capabilities in the EDA software, these interfaces were still limited by the constraints on the number of links that could be kept open simultaneously (a limitation of the remote procedure call implementations on these systems). Also, to the extent that they relied on built-in Avenue scripts for some spatial data handling functionality, they tended to be slow and limited in the number of spatial objects that could be handled. The SpaceStat and DynESDA extensions for ArcView in a Microsoft Windows envi- ronment (Anselin and Smirnov 1999a,b, Anselin 2000) were designed to address some of these performance issues. While they also suffer from some of