Point Process Models for Household Distributions Within Small Areal Units

Demographic Research a free, expedited, online journal of peer-reviewed research and commentary in the population sciences published by the Max Planck Institute for Demographic Research Konrad-Zuse Str. 1, D-18057 Rostock · GERMANY www.demographic-research.org DEMOGRAPHIC RESEARCH VOLUME 26, ARTICLE 22, PAGES 593-632 PUBLISHED 13 JUNE 2012 http://www.demographic-research.org/Volumes/Vol26/22/ DOI: 10.4054/DemRes.2012.26.22 Research Article Point process models for household distributions within small areal units Zack W. Almquist Carter T. Butts This publication is part of the Special Collection on “Spatial Demography”, organized by Guest Editor Stephen A. Matthews. ⃝c 2012 Zack W. Almquist & Carter T. Butts. This open-access work is published under the terms of the Creative Commons Attribution NonCommercial License 2.0 Germany, which permits use, reproduction & distribution in any medium for non-commercial purposes, provided the original author(s) and source are given credit. See http://creativecommons.org/licenses/by-nc/2.0/de/ Table of Contents 1 Introduction 594 2 Human settlement patterns 595 3 Background on spatial data and household distributions 596 3.1 Spatial data 597 3.2 Household distributions 598 4 Point process models and simulation 598 4.1 Constant-intensity N-conditioned Poisson process model (uniform) 599 4.2 Low-discrepancy sequence model (quasi-random) 600 4.3 Inhomogeneous Poisson process model (attraction) 600 4.4 Point stacking and building heights 601 5 Standard statistical measures for point processes 601 5.1 Ripley’s K function 601 5.2 Nearest neighbor measures 602 5.3 Scan statistics and baseline models 603 6 Comparison data: U.S. Census geography and household parcel lots 603 6.1 U.S. Census geography 603 6.2 Household distribution data in the US 604 6.3 Urban, suburban, and rural classification 604 7 Comparison measure 611 8 Analysis and results 612 8.1 Software 612 8.2 Comparison of point distributions 612 9 Example: Network diffusion over a spatially embedded network 617 9.1 Spatial Bernoulli Graphs and Simulation 621 9.2 Network diffusion 622 9.3 Simulated diffusion over Portland, OR 623 10 Conclusion and discussion 625 11 Acknowledgments 626 References 627 Demographic Research: Volume 26, Article 22 Research Article Point process models for household distributions within small areal units Zack W. Almquist 1 Carter T. Butts 1;2 Abstract Spatio-demographic data sets are increasingly available worldwide, permitting ever more realistic modeling and analysis of social processes ranging from mobility to disease transmission. The information provided by these data sets is typically aggregated by areal unit, for reasons of both privacy and administrative cost. Unfortunately, such aggregation does not permit fine-grained assessment of geography at the level of individual households. In this paper, we propose to partially address this problem via the development of point process models that can be used to effectively simulate the location of individual households within small areal units. 1 Corresponding author. Department of Sociology; University of California, Irvine. Email: [email protected]. 2 Department of Statistics and Institute for Mathematical Behavioral Sciences; University of California, Irvine. http://www.demographic-research.org 593 Almquist & Butts: Point process models for household distributions within small areal units 1. Introduction Spatio-demographic data sets are increasingly available worldwide, permitting ever more realistic modeling and analysis of social processes ranging from mobility to disease transmission. The information provided by these data sets is typically aggregated by areal unit (e.g., the state, county, tract, block group, and block hierarchy of the U.S. Census), for reasons of both privacy and administrative cost. Unfortunately, such aggregation does not permit fine-grained assessment of geography at the level of individual households, a scale that is potentially important for accurate modeling of micro-social processes such as transmission of disease between households, daily mobility patterns, or patterns of inter- personal contact. While the potential to model such phenomena across large geographical areas thus exists, efforts are hampered by a lack of data on household location. In this paper, we propose to partially address this problem via the development of point process models that can be used to effectively simulate the location of individual households within small areal units. Given basic information such as number of households, general pattern of land use, and/or population of neighboring units, our objective is to identify a probability distribution over household locations within a polygonal region whose average spatial properties reflect the corresponding properties of the unob- served true household distribution in that region. Examples of targeted properties include standard point process descriptives (Ripley 1988; Diggle 2003), such the mean nearest neighbor distance, measures of spatial clustering (e.g. the F and G functions), mean K function value, et cetera. While the resulting distributions will not reproduce household locations with perfect fidelity, the approximations may nevertheless prove adequate for modeling of basic social processes. The models and test procedures proposed in this research also provide relatively generic techniques for statistical treatment of other forms of geocoded point data localized only to an areal unit (e.g., locations of individuals, events, or landmarks). While the problem of imputing household locations can be approached in many ways, our focus within this paper is on the application of simple, scalable models that require no extra information (beyond areal unit and household count) from the analyst. Such models can be employed in virtually any setting, and are a natural starting point for any more complex modeling effort. To that end, we begin with two baseline models—a constant- intensity N-conditioned Poisson process, and a low-discrepancy sequence model—that incorporate only population density within the areal unit. We then extend the density- based models by incorporating additional information from the areal units themselves, using an inhomogeneous Poisson framework in which households are more likely to be found near polygonal borders (a common phenomenon in the observed data). To evaluate these simple point process models, we compare their behavior with observed household location distributions from three different communities. Test samples consist of house- 594 http://www.demographic-research.org Demographic Research: Volume 26, Article 22 hold location data from Portland, OR, Deschutes County, OR, and Irvine, CA2, with areal units given by the 2000 U.S. Census. All modeling is performed in R (R Development Core Team 2010). Our test cases include examples of urban, suburban, and rural settings, with varying spatial scale and levels of population density. Evaluation of the suggested point processes on our three communities suggest that simple models can provide quite reasonable approximations of household location distributions for small areal units. Performance degrades substantially for larger units, although the inhomogeneous model shows some potential within more urbanized regions. Practical suggestions are given for the use of these and related point processes within large-scale simulations, and for applications of this technique to settings beyond the U.S. (and the developed world more generally). The remainder of the paper is organized as follows: (1) a general discussion of human settlement patterns; (2) background on spatial data and household distribution; (3) an introduction to the proposed point process models; (4) standard statistical measures for point processes to be used for evaluative purposes; (5) comparison data and U.S. Census geography to be used for our evaluation study; (6) the comparison measures used for our evaluation study; (7) evaluation study analysis and results; (8) a spatially informed network diffusion example; and, finally, (9) conclusion and discussion. 2. Human settlement patterns Settlement patterns play an important role in shaping human interaction and the demographic processes which result. A classic example is that of marriage in modern Western societies: couples in such societies rarely marry without prior meeting and extensive face- to-face interaction, and marriage is thus disproportionately propinquitous (Bossard 1932). Many demographic processes, such as mortality, fertility, and mobility, are also influenced by human settlement patterns (see, e.g. Freeman and Sunshine 1976; Guilmoto and Ra- jan 2001; Binka, Indome, and Smith 1998); however, making use of such geographical information is frequently difficult due to limitations on data availability. For example, in the United States information on population within aggregate areal units is readily available (e.g., via the U.S. Census), but the coordinates of individuals and households are undisclosed due to privacy concerns. There is thus a distinct need for a methodology to generate household (or individual) distributions over small scale areal units such as census geography, so as to inform statistical models, agent-based simulations, and the like. Adding to the difficulty of this problem is the need for plausible models to be easily computable. For instance, the year 2000 U.S. census reports population in over 8 million 2 Data from Deschutes County Geographic Information Systems (GIS) office;

Point Process Models for Household Distributions Within Small Areal Units

A Stochastic Processes and Martingales

6 Mar 2019 Strict Local Martingales and the Khasminskii Test for Explosions

Geometric Brownian Motion with Affine Drift and Its Time-Integral

Jump-Diffusion Models for Asset Pricing in Financial Engineering

1 the Ito Integral

A Stochastic Lomax Diffusion Process: Statistical Inference and Application

A Guide to Brownian Motion and Related Stochastic Processes

ENLARGEMENT of FILTRATION and the STRICT LOCAL MARTINGALE PROPERTY in STOCHASTIC DIFFERENTIAL EQUATIONS Aditi Dandapani COLUMBIA

Poisson Processes and Jump Diffusion Model

Martingale Problems and Stochastic Equations

On a Class of Historical Superprocesses in the Dynkin Sense

Week 8 Diffusion Processes, Part 2