ARTIGO ARTICLE 1753 Epidemiologia, Geografia médi- Neste artigo, faz-se uma revisão acerca

Resumo de aspectos conceituais e práticos relacionados à de aspectos conceituais e práticos relacionados aplicação de dados georreferenciados na pesquisa abor- epidemiológica. Iniciando com a tradicional dagem de mapeamento de doenças da geografia em médica, discute-se heuristicamente com base tópicos exemplos da literatura epidemiológica, como tipos de dados georreferenciados, implica- es- ções para a análise de dados, autocorrelação desta- pacial e as principais estratégias analíticas, cando-se os estudos de mapeamento da distribui- ção espacial de eventos de saúde, detecção de agre- gados espaciais de casos, avaliação de exposição em estudos de saúde ambiental e estudos ecológi- cos. Os comentários finais salientam tópicos espe- ciais que merecem desenvolvimentos futuros, in- cluindo os dilemas relacionados à incorporação do conceito de espaço na pesquisa epidemiológica, aspectos relacionados à qualidade dos dados e con- fidencialidade, o papel dos estudos epidemiológi- cos na pesquisa com dados espaciais, análise de sensibilidade e modelos espaço-temporais. Palavras-chave ca, Métodos epidemiológicos, Análise de pequenas áreas, Estudos ecológicos, Conglomerados , Medical geography, Medical Epidemiology,

This paper reviews some conceptual 1,2 Abstract and practical issues regarding the application of and practical issues regarding the application georeferenced data in epidemiologic research. of Starting with the disease mapping tradition of geo- geographical medicine, topics such as types referenced data, implications for data analysis, ap- spatial and main analytical on proaches are heuristically discussed, relying most examples from the epidemiologic literature, of them concerning mapping disease distribution, detection of disease spatial clustering, evaluation of exposure in environmental health investiga- tion and ecological correlation studies. As for con- cluding remarks, special topics that deserve fur- ther development, including the misuses of the concept of space in epidemiologic research, issues the to data quality and confidentiality, related role of epidemiologic designs for spatial research, sensitivity analysis and spatiotemporal modeling, are presented. words Key Epidemiologic methods, Small-area analysis, Eco- logical studies, Clustering Georeferenced data in epidemiologic research epidemiologic data in Georeferenced em epidemiologia Dados georreferenciados Instituto de Medicina Departamento de Endemias Social, Universidade do Estado do Rio de Janeiro. 2 Samuel Pessoa, Escola Nacional de Saúde Pública, Fundação Oswaldo Cruz. Rua Leopoldo Bulhões 1480, Manguinhos. 21041-210 Rio de Janeiro RJ. [email protected] 1 Guilherme Loureiro Werneck Werneck Loureiro Guilherme 1754

Werneck, G. L. G. Werneck, Introduction For nearly every epidemiologist, however, John Snow´s investigation of the cholera epidemic in Georeferenced data, also known as spatial, geo- London is the most famous work demonstrating graphical or geospatial data, are the basic pieces the importance of studying the geography of dis- of information needed to identify the geographic ease11. Considered a classical example of epide- location of phenomena across the Earth’s surface. miologic reasoning, which led to development of In general, georeferenced data consist of measure- a water-borne theory of cholera transmission, his ments or observations taken at specific locations work also became famous in medical geography (points referenced by latitude and longitude) or because he used a dot-map to plot the location of within specific regions (areal data). In epidemio- cholera deaths around the Broad Street pump in logic research, this type of information is mainly London Soho´s district. Nevertheless, the belief used to investigate the relationship between geo- that Snow used the dot-map to determine the referenced health events data and aspects related source of the cholera outbreak and to make a causal both to individual characteristics (e.g. genetic, be- connection between the removal of the pump and havioral and demographic) and contextual fac- the end of the outbreak seems not to be support- tors (e.g. socioeconomic neighborhood conditions, ed by evidence12. As a matter of fact, Snow already physical environment). Mapping disease distri- had his theory on the transmission of cholera bution, detection of disease spatial clustering, eval- before collecting data to test it13. uation of exposure in environmental health in- Despite the longstanding tradition of disease vestigation and ecological correlation studies are mapping and geographically oriented research, some examples of possible applications of geo- during the first half of the 20th century epidemi- referenced data in epidemiologic studies. ologists were more inclined to focus their research The geographical distribution of disease has on the time dimensions of disease distribution, been considered a key element in epidemiologic and progressively more and more on individual research, as indicated by the importance given to characteristics, a result of the rising emphasis on the description of health events according to “per- the biological causes of disease8,14. son, place and time” in the classic epidemiology Advances in geographic information systems textbooks1,2. In fact, studies on the geographical (GIS) permitted a remarkable increase in the effi- distribution of diseases come back to the 18th and ciency of processing and analysis of complex geo- 19th centuries, when the term “medical geogra- referenced data involving different variables at a phy” was devised3. variety of geographical scales, providing new tools Among the precursors of geographic studies for epidemiologists to incorporate place and space of disease are the physicians James Lind4, mainly in their investigations15. Since the 1970s, GIS and recognized for his work on scurvy, and Leon- related technologies, such as remote sensing, have hard Ludwig Finke5, who published in 1792 what spread rapidly to many scientific and technical has been considered the most detailed conceptu- fields, including public health and epidemiolo- al contribution on medical geography written to gy14-17. Today GIS and remote sensing are con- that point and the first to systematize world-wide sidered important tools in environmental health data5,6. A long tradition on disease mapping also research and disease surveillance15,18, and have started at that time. Aparently the first dot map been used to investigate patterns of disease spread applied to public health problems is due to Val- and inform control strategies for infectious dis- entine Seaman, in 1798, describing the distribu- eases, in particular vector-borne and zoonotic tion of yellow fever cases in New York7. diseases19,20, to help define the boundaries of the Early roots of epidemiology as a discipline can communities or neighborhoods where study par- be found at this time, with research focusing main- ticipants reside in multilevel studies21,22, and to ly on the relationship between societal conditions estimate socio-demographic and environmental and health8. Various researchers applied what variables21,23. This process of increasing incor- modern epidemiology usually calls ecological study poration of GIS in epidemiologic research should designs to address the problem of disease varia- not be considered simply as a result of techno- tion across different places. For instance, André- logical forces, but must be put in the context of Michel Guerry, in 1833, explored the variation of the renewed efforts detected after the 1960s to suicide and homicide rates across regions of integrate social sciences and epidemiology8. France9 and Engels, in 1892, cited evidence of vari- In this review, I intend to introduce the basic ation in mortality rates across different cities and approaches used by epidemiologists to deal with streets surrounding Manchester10. georeferenced data. I will focus on the strategies 1755 Ciência & Saúde Coletiva, 13(6):1753-1766, 2008

for analysis of this type of data, building upon ature could have been measured at any place, examples from different areas of epidemiologic hypothetically there are an infinite number of research. I wish to keep apart as much places that you can place monitoring stations, as possible, emphasizing the conceptual issues but unfortunately you have, in general, only a behind the techniques. sample of possible locations to get data on tem- For the sake of completeness, it is necessary perature. On the other hand, a discrete spatial to mention that many authors use the terms “spa- domain that you can count the number tial epidemiology” or “geographical epidemiolo- of locations in which observations are taken. An gy” to define this area of investigation24-28. None- example is the number of dengue cases recorded theless, I prefer not to support an unnecessary in neighborhoods of Rio de Janeiro in January autonomy to what I see as a field of epidemio- 2008. Here the spatial domain is discrete: you can logic practice that just carry within its roots a count the number of neighborhoods that con- criticsm concernig the excessive focus of modern figure the region of analysis. epidemiology on the individual causes of disease. A fixed spatial domain is the one that does not change from one realization of the spatial process to the next30. Heuristically, a realization of a spa- Types of georeferenced data tial process is simply a set of georeferenced obser- vations. Imagine you asked your research assis- Broadly speaking, geospatial data can be point tants to do a fieldwork using some kind of instru- referenced or area referenced. Point referenced ment that measures soil contamination by Ar- data are observations registered at specific loca- senic, a known carcinogenic substance. The re- tions that might be identified by latitude and lon- sults of your study are a set of measures taken at gitude, for instance the location of cases of dis- different spatial locations, and might be consid- ease and the location of air pollution monitoring ered one realization of a spatial process. However, stations. Area referenced data are observations if you send other assistant researchers to do the specific to a region (e.g. tracts, neighbor- same work, then they will probably take soil sam- hoods). For both you may also have measured ples from different locations, and this will be an- attributes, such as demographic characteristics other realization of the spatial process. Anyway, of the disease case, specific measurements of pol- although there are different samples (realizations), lutants in each monitoring station, rates of dis- the spatial domain has not changed between the ease by census tracts, socioeconomic variables first and second study. for neighborhoods and so on. Hence, georefer- A random spatial domain can be conceived enced health data combine the usual informa- in situations where the focus has switched from tion available in epidemiologic studies, that is, studying the attribute itself to studying the loca- values for attributes of some object, with infor- tions. Imagine you label your data as 0 or 1, be- mation about their locations. ing 1 whenever the Arsenic level exceeds a given When deciding to choose the appropriate threshold (say the level above which soil con- analysis strategy for georeferenced data, one needs tamination is considered unacceptable) and 0 to consider the for the spatial otherwise. Throw away all points returning a process underlying the available data. A possible value of 0 and keep only those with values of 1. approach is to distinguish data types according Now there is no interest in studying the attribute, to the nature of the spatial domain in which the because all points represent the same condition data is observed29,30. By spatial domain I infor- (Arsenic concentration above the threshold). The mally where things are being observed. The interest is now on the spatial arrangement of spatial domain might be continuous or discrete observations, which is, together with the number (do not confuse this with the attribute being of points observed, the outcome of the spatial measured, that is, whether the variable measured random process30. Here, the spatial domain is is continuous or not) and fixed or random30. random since it will change in every new realiza- A continuous spatial domain means that what tion. Even when attributes are available at each you are observing can be, theoretically, measured location, the important statistical feature of these everywhere within that domain. For instance, data is the random domain30. imagine that you are interested in obtaining data Based on the nature of the spatial domain, on temperature. You will probably gather this georeferenced data can be divided in three sub- data from monitoring stations that are located types: geostatistical data, areal or regional data at specific points in the space. However, temper- and spatial point patterns29,30. 1756

Werneck, G. L. G. Werneck, Geostatistical data Areal, regional or lattice data

Consist of measurements that can potentially Involve observations associated with spatial be taken in any location in space, although the regions. Data are not measurable at any location actual data are sampled at specific locations in a in space and are artificially gathered at sites or spatial continuum (fixed and continuous spatial areas usually defined for statistical or adminis- domain). Statistical approaches to analyze this trative purposes. The whole spatial domain in type of data are commonly known as geostatis- exhaustively divided in areas (fixed and discrete tics. is a branch of applied mathe- spatial domain). Regional data can be regularly matics developed in the early fifties to help obtain or irregularly spaced. An example of regular re- better predictions in mineral prospection31. Usu- gional data is that obtained by remote sensing. ally the geostatistical approach uses the observed Sensing devices on board of satellites measure data to estimate values at unsampled locations reflected or emitted electromagnetic energy from and produce a continuous surface showing the the earth surface, and data are displayed in a se- variation of the attribute across space. The spatial ries of small rectangles (pixels) of the same size47. regression technique known as is the usu- However, most data used in epidemiologic re- al interpolation method used in this setting31. Typ- search are irregularly spaced, for example, rates ical geostatistical approaches have been used in ep- of disease or socioeconomic indicators measured idemiologic research for predicting the spatial dis- at county, neighborhood or census tracts level. tribution of insect vectors of infectious disease32-34 Regional data form the basis of the so-called and of air and soil pollutants35-37. multiple-group ecological study48 in which the However, one should be aware that, in gener- units of analysis are, in general, geographically al, data on health events of individuals are only defined areas. The aims of ecological studies are usually registered where population exists. There- to describe geographical patterns of disease fre- fore, strictly speaking, geostatistics is probably not quencies and risk factors, and estimate putative the canonical approach for disease data, since cases ecological correlations between variables. In the can occur only in selected inhabited parts of larg- purely descriptive ecological study no data on er geographical regions. Because of the patchiness exposure is available, and the major interest is to of population distribution, a continuous surface map the spatial distribution of disease rates, aim- of disease rates would not make a sense in many ing at detecting areas with a significant higher (or epidemiological applications. Of course it is pos- lower) incidence as compared to some expected sible to partially circumvent the problem by de- rate. The most common approach for this ob- limiting study areas that are completely populat- jective is to use a choropleth map, which is a class ed and by relaxing the assumption of a continu- of quantitative thematic maps49. A choropleth ous spatial domain. Actually, it seems that the geo- map, also called area or shaded map, is a carto- statistical approach is flexible enough to support graphic representation employing color or shad- different types of geospatial data, and recent de- ing schemes, graded in intensity from light to velopments have added more and more flexibility dark, to depict variability of data distribution to this type of analysis38,39. In fact, there are many across regions49,50. The name derives from the examples of applications of kriging and other Greek words choros (place), and pleth (value, smoothing techniques to produce continuous quantity)49,50. Choropleth maps appear frequently surfaces for data that are not typically continu- in epidemiologic studies, and many national at- ously distributed across space, such as the preva- lases of mortality and disease distribution have lence and parasitic load of ascariasis obtained by used this approach51-53. household coproparasitologic surveys in Duque Ecological or geographic correlation studies de Caxias (Brazil)40,41, positivity for rotavirus in- are those that examine the spatial variation of fection in stool specimens collected at laborato- environmental, socioeconomic, demographic and ries across the United States42, prevalence of ma- lifestyle factors in relation to health events, all laria infection in different survey sites in Mali43, measured on a geographic (ecologic) scale24. Ex- incidence of visceral leishmaniasis in census tracts posure variables defined for regions are classi- of Teresina (Brazil)44, epidemiologic surveillance fied in three categories: aggregate measures based data of influenza-like illness in France45 and Ja- on individual data (e.g. proportion of smokers, pan46, and breast and cervical cancer mortality in mean income), environmental measures (e.g. New England counties (United States)38. temperature, air pollution), and global measures or contextual attributes (e.g. population density, 1757 Ciência & Saúde Coletiva, 13(6):1753-1766, 2008

social cohesion)48. Outcomes are generally ex- events (e.g. the location of disease cases), and are pressed as incidence or mortality rates for that referred as unmarked patterns30,59. However, if region. This type of approach has long tradition you have additional variables attached to the lo- in sociology54 and comes from that discipline one cations (e.g. socio-demographics, time since di- of the most famous example: Durkheim´s study agnostic for cases) then it is called a marked pro- on suicide55. Durkheim compared suicide rates cess. A case-control study in which the geograph- between Prussian provinces classified according ical location of both cases and controls is known to the proportion of population that was Protes- and additional variables are collected is a typical tant. He found that suicide rates were higher at marked point pattern frequently used in epide- provinces with higher proportion of Protestants. miologic research. Although Durkheim had not actually concluded Spatial point patterns arise in many fields of from that evidence that suicide was more fre- investigation, for example, in plant ecology to quently among Protestants, this individual-level study the spatial distribution of shrubland60 and inference coming from an ecological study be- very large trees in a forest61, in the analysis of came “the” example of a typical bias called “eco- urban land use to study the distribution of fast- logical fallacy”. The ecological fallacy refers to the food restaurants around schools62, in criminol- fact that the degree of association between an ogy to study patterns of urban crimes63, and in exposure and disease may differ in ecological data, geology to investigate spatial patterns of volcano as compared to the same association measured eruptions64. In these areas of investigation, hav- using data obtained on individuals56. Since none ing the background hypothesis of a complete of the regions were entirely Protestant or non- spatial random distribution might be interest- Protestant, it is not possible to exclude the hy- ing, at least for a starting point, because there is pothesis that are just the minority (Catholics or no a priori obvious reason for spatial heteroge- Jews) that are committing suicide in the provinc- neity in the distribution of these events59. In epi- es with higher proportion of Protestants. demiology, however, population variation across There are several examples of ecological cor- space is a major factor leading to geographical relation studies in epidemiologic research. For in- clustering of disease cases and should always be stance, various ecological studies investigated the considered in the analysis, since we are interested association between per capita consumption of in spatial arrangements that are not explained by specific alcoholic drinks and mortality from heart this specific feature. Urban land occupation and disease across countries57. Most of them suggest- the heterogeneous distribution of risk factors ed that wine was more effective in reducing risk of across space are other important factors that mortality than beer or spirits, but evidences from might explain spatial patterns. individual studies indicate that the benefits are Although methods for were attributable primarily to the alcohol content rather developed specifically for each of these three types than to other components of each drink57. In of data, it is common to see methods originally Northeast Brazil, an ecologic study in 165 munic- developed for one type being applied to analyze ipalities of the State of Ceará found that, among data of a different category. For instance, region- others, the level of inequality, population growth, al data might be artificially allocated to some and the presence of a railroad in the municipality point inside the area (e.g. the seat of the state were predictors of the incidence rate of leprosy58. capital or the centroid, defined as the center of gravity of the region). Doing so, data will be dis- Spatial point patterns played as points and might be analyzed as geo- statistical data or point patterns. Also, the loca- Comprise a set of locations of events, usually tion of cases might be aggregated in regions and indexed by geographic coordinates (latitude and analyzed as lattice data. These approaches, when longitude), in a defined study region59. A spatial applied, need to have their underlying assump- point pattern is an example of the above-mentioned tions clarified and justified. situation of a random spatial domain. In this case the locations of the events themselves are the phe- nomena of interest, and the investigation focuses on whether the pattern is exhibiting complete spa- Spatial autocorrelation: a key concept tial randomness, clustering, or regularity59. In the simplest situation, spatial point pat- The key issue in the analysis of georeferenced data terns include only information on the location of is that they often exhibit some spatial structure 1758

Werneck, G. L. G. Werneck, in the sense that there is a tendency of observa- Spatial autocorrelation arises as a result of the tions closer together to be more alike than obser- heterogeneous distribution of risk factor across vations farther apart29. The property that geo- space. On the other hand, spatial autocorrela- graphically nearby values of a variable tend to be tion due to mechanisms means that similar on a map, that is, high values tend to be areas or observations close together are more located near high values, and low values near low alike because proximity facilitates transmission. values, is called spatial autocorrelation65. In this The reaction mechanism is the major mechanism case we are saying that exist some positive spatial underlying spatial autocorrelation in non-trans- autocorrelation, because a negative spatial auto- missible diseases, and interaction is implicated in correlation means that nearby regions or points spatial clustering of transmissible diseases. These tend to be different, that is, one showing high two mechanisms may operate at the same time, values of the attribute and the neighbor low val- in particular for infectious diseases. The term spa- ues, and vice-versa. tial dependence seems to be more appropriate in In ordinary statistical analysis, researchers the case of the interaction mechanism, since we often use the correlation coefficient to measure are dealing with the so-called dependent happen- the direction and strength or degree of the rela- ings, that is, disease incidence in individuals or tionship between a pair of quantitative vari- regions depends on the prevalence of the infec- ables66. A variant of conventional correlation is tion in the population70. However, some authors the serial correlation, which refers to the correla- do not make this distinction and use the terms tion between measures of a single variable over spatial autocorrelation and spatial dependence successive time intervals65. The geographic ver- interchangeably25,71. sion of serial correlation is called spatial auto- There are two major reasons for taking spa- correlation (the prefix “auto-” means self), the tial autocorrelation in consideration in epidemio- relationship between observations on the same logic analysis of georeferenced data. First, most variable taken at different locations in space65. conventional statistical approaches assume that To assess the nature and degree of spatial au- observations are independent, which is clearly vi- tocorrelation, it is necessary to represent the spa- olated when spatial autocorrelation exists (actu- tial arrangement of observations in order to get a ally, the statistical assumption refers to the error sense of how close or distant there are apart from structure, but lets leave formalities behind)65. In each other67. Then we use a set of rules, called this situation, it is necessary to take spatial auto- weighting function, to express the degree of prox- correlation among observations into account in imity between observations67. For instance, for order to obtain valid estimates of regression coef- each possible pair of observations in space we may ficients, confidence intervals, and significance lev- attribute a value of one if the observations are els29. In the statistical jargon, positive spatial au- nearby (sometimes we say that they are neigh- tocorrelation increases the likelihood of the null bors) and zero otherwise. There are many other hypothesis rejection when it is true65. For instance, options for defining these weights, and they may take a study on the association between social dep- be based on distances between points (geostatis- rivation and the incidence of breast cancer, both tical and point pattern data) or centroids (areal variables measured at the municipality level (re- data). In this case, pairs of observations might be gional data). Consider that the distribution of both defined as neighbors using a dichotomous classi- variables show spatial autocorrelation. Even if the fication (yes/no) or entering the actual distance as truth is that there exist no association between a measure of the degree of proximity between deprivation and breast cancer incidence (null hy- observations. For areal data it is common to use pothesis), the results of a study ignoring the lack indicators of proximity based on whether the re- of independence between observations might well gions share a boundary or not. In any case, it is find as statistically significant such association, important to consider the fact that any measure thus rejecting the null hypothesis when this is true. of spatial autocorrelation will be influenced by Because of spatial autocorrelation, there is some the choice of the neighboring weights. redundancy in the information provided by geo- There are basically two major mechanisms referenced data, meaning that more spatially au- responsible for spatial autocorrelation to occur tocorrelated than independent observations are with disease data: reaction and interaction68,69. needed to attain similar information65. Werneck The reaction mechanism implies that neighbor- and Maguire72 show an example on how ignoring hoods or nearby observations behave similarly spatial autocorrelation may bias regression coef- because they share a common background risk. ficients and estimates. 1759 Ciência & Saúde Coletiva, 13(6):1753-1766, 2008

The second reason to consider spatial auto- scriptive purposes, hypothesis generation, pre- correlation is conceptual. The detailed description diction and forecasting. One main interest in ep- of how things are distributed in space might be idemiology is prediction of future occurrences at used to support prediction. For instance, based specific geographic locations, which can be im- on available georeferenced tuberculosis data at the plemented through space-time models75,76. Model municipality level, Braga73 used geostatistical ap- adjustment involves the use of the autocorrela- proaches to obtain better estimates in areas where tion structure to obtain more accurate measures surveillance reporting of cases was considered in- of effect and standard errors estimates. appropriate. Lagrotta et al.74 used the spatial pat- Gatrell and Bailey77 proposed that the analy- terns of entomological parameters related to the ses of georeferenced data may be divided into distribution of Aedes aegypti to identify areas for three broad classes: mapping, exploratory tech- targeting control actions. Explicitly accounting for niques and modeling methods. spatial autocorrelation in a statistical model might also be used as a proxy for unknown or unmea- Mapping sured variables and improve model specification65. In other situations, spatial autocorrelation should Mapping is the most elementary technique be considered as alternative explanations for in- employed to describe the basic spatial features of terpreting study results. Suppose that you imple- disease data. For point patterns data, a dot map ment a community trial for controlling vector- might be used to make simple displays of the lo- borne disease in which the intervention proposed cation (usually the residency) of disease cases78. is spraying houses and the peridomestic environ- One should be aware that visual clustering of dots ment with insecticides. The design you choose is might reflect only the heterogeneous distribution some kind of community intervention trial, in of population across space. Interpretation of dot which the area under study in divided into blocks, maps is also problematic since the mapped loca- some of them randomly allocated to receive in- tion might not reflect the place where the causal tervention and the other not. Interpretation of mechanism leading to disease actually operated. eventual changes in the incidence of the infection This is especially important for non-infectious in the study blocks should consider the possibility diseases with long latency and induction periods. of a spillover effect, that is, an area with no inter- Even for some infectious diseases, such as tegu- vention might benefit from being a neighbor of mentary leishmaniasis, the risk of infection is an area in which spraying was performed (or, on mainly associated with economic and leisure ac- the contrary, spraying would increase even more tivities in forested areas, including fishing, hunt- mosquito population in surrounding areas due ing, and ecotourism, and the household does not to a repellent effect of insecticide). indicate where transmission occurred. In any case, As a matter of fact, spatial autocorrelation has a dot map is an interesting option to show spatial a dual nature67. Sometimes it is considered a sta- density of phenomena, but two critical choices tistical nuisance asking for new analytical meth- should be made: dot size and dot value or quan- ods. At times, it is regarded as an intrinsic charac- tity to be represented by each dot49. Using a small teristic of spatial processes carrying essential in- dot size leads to the sense of a sparse distribution, formation to be considered when interpreting re- but choosing a large dot size may be unattractive sults of studies using georeferenced data65,67. In a because it increases too much the density of the certain way, this duality represents the focus of distribution. If a dot will represent multiple events statisticians and geographers, respectively. Epide- (then dot location does not necessarily relate to miologists have only recently incorporated spa- the specific location of a phenomena), choose a tial analysis in their framework, and still need to dot value so that the dots just coalesce in the most develop meanings for spatial autocorrelation that dense area on the map but a few dots are still are appropriate for their field of investigation. represented in the sparse areas of the map49. Con- sider also using different symbol or colors to rep- resent variations in the attribute being mapped, Analytical approaches for instance to discriminate between cases and controls in a case-control study78. Statistical methods for spatial data analysis gen- Choropleth maps is the most commonly erally are used for either characterization of spa- strategy for the visual display of regional data. tial structure or model adjustment. Character- Disease maps of this type usually show directly ization of spatial structure is well suited for de- age- and sex- standardized rates or standardized 1760

Werneck, G. L. G. Werneck, mortality (or morbidity) ratios (SMR), achieved duce a probability map81. Assuming that the by indirect standardization24. Many problems number of cases of disease in each region follows emerge when mapping areal data, in particular a Poisson distribution with a constant mean, a concerning the choices regarding scale or resolu- map is created which shows those areas with tion, number and boundaries of classes, and the unusually high or low rates. Although useful as color or shading scheme24,78. The same data an exploratory tool, the Poisson model assumes drawn in different levels of resolutions (e.g. cen- independent observations and ignores the possi- sus tracts and municipalities) may lead to differ- bility of spatial autocorrelation between neigh- ent patterns and interpretations. Aggregation of boring areas29. small areas usually lead to loss of information, A proposed solution to reveal spatial patterns and the result is a spatial pattern that is pushed that may be hidden by noisy data is to smooth towards the values of the more populated re- the incidence rates using Bayesian methods79,80,82. gions24. A related question is the differences in The smoothed estimates represent a compromise geometry and size of areal units, larger ones dom- between the actual observed value and the mean inating the depicted pattern. The use of grada- value of the whole region or some local value tions of color and shading helps simplifying the taking into consideration the possible dependen- message to be transmitted, but arbitrary color cy between neighboring areas79,80,82. Although intensities and types may induce the reader to calculations might be complex the idea is simple, focus on specific areas, drawing attention to some that is, the technique estimates the incidence rate features and leaving other aspects less evident78. in a given region as a weighted average of the Most mapping softwares supply different rates in this region and its neighbors. However, ways to generate data classes. If data has a rect- one should be concerned about the degree of angular distribution, classes showing equal in- smoothing to employ, because it will determine a tervals might be a good choice. The natural tradeoff between smoothing away areas with truly breaks technique scrutinizes the actual data dis- high incidence (low sensitivity) or not identifying tribution to find cut points for creating classes. correctly areas with low risk (low specificity)24. The equal area procedure forces the classes to represent approximately the same area in the Exploratory methods map, but the number of units may vary across classes. Cutoff points based on oblige Models for describing georeferenced data of- the classes to have approximately the same num- ten decompose spatial variation into two main ber of units. If the data has normal distribution components83,84: data = large-scale variation + you can create classes representing deviations small-scale variation. from the mean value. All these techniques may Large-scale variation, first order effect or lack be useful in some situations, but if there exists a of stationarity, is a regular variation, analogous clear epidemiological meaning for defining class- to secular trend in time-series, but taking place in es (e.g. an accepted threshold point for surveil- space83. It is also called spatial gradient and refers lance of infectious diseases) this should be con- to the variation in the mean value of the process sidered. Concerning the number of classes, there (e.g. incidence rate) in space and can be repre- is no universal accepted rule, but it is rare to see sented as a function of geographical coordinates maps showing less than four and more than elev- or of variables that have their spatial distribu- en classes. In general more classes are used if you tion akin to the outcome81,83. For point patterns have more geographical units. Anyway, the meth- data, first order effect is often describes as the od used to generate data classes and the number “intensity” of the process, defined as the mean of classes to be displayed are critical points to be number of events per unit of area at a spatial considered because they may have strong influ- location59. The so-called kernel estimation is the ence on interpretation of such maps. usual technique employed for examining large- Small-area mapping studies have the addi- scale variation in spatial point patterns59. tional problem of instability of rate estimates for Explanations for large-scale variation usual- areas with small populations. In this case, the ly rely on variables that vary slowly across large addition or deletion of one or two events can geographical regions, such as altitude, tempera- cause dramatic changes in the observed value, ture, vegetation, and some socioeconomic char- and the most extreme estimates in a map tend to acteristics. For instance, the decreasing north- be provided by the least reliable data79,80. south gradient in prostate cancer mortality in One way to address this problem is to pro- the United States is inversely associated to the 1761 Ciência & Saúde Coletiva, 13(6):1753-1766, 2008

ultraviolet radiation gradient85. Montenegro et tion is both a statistical necessity and an addi- al.86 found a spatial gradient in the incidence rates tional source of information on how the process of leprosy in the State of Ceará, Brazil, with a evolve across space. Most ecological regression tendency of high rates to be concentrated on the studies, however, do not include spatial autocor- north-south axis in the middle region of the state, relation parameters in their analytical models. In which was, at least in part, attributed to the pro- this sense, these are not spatial studies, since the cess of urbanization and the heterogeneous dis- geographical structure of data is not being al- tribution of underlying factors such as crowd- lowed for in the analysis. Non-spatial ecological ing, social inequality, access to health services or analysis work as if the geographical areas are in- environmental characteristics that determine the dependent from each other (no spatial autocor- transmission of Mycobacterium leprae. relation structure), and results from this type of Small-scale variation around the gradient, or analysis would be the same if you randomly re- second order effect, results from clustering of high arrange geographical units across space. In epi- or low values across the region and from spatial demiologic research, one of the first approaches autocorrelation81,83. To study small-scale varia- to include spatial autocorrelation in the analysis tion accurately, it is often necessary to remove of ecological data is due to Cook and Pocock89. the effects of large-scale variation83. Unfortunate- They re-analyzed data from an ecological study ly, this decomposition is not unique, and re- in Great Britain during 1969-73 based on 253 searchers have too much flexibility to “explain” towns aiming at uncover the possible contribu- spatial variation as almost totally due to small- tions of drinking water quality, climate, air pol- scale variation or to assume a certain degree of lution, blood groups, and socioeconomic factors large-scale variation and leave less to be explained on the regional variations of cardiovascular mor- as local clustering83. tality90. They showed that previous results based Once the spatial gradient has been identified on ordinary regression methods overstate the and removed by using trend surface analysis, significance of the regression coefficients89. polishing techniques, or other smooth- There are basically two major ways in which ing techniques87,, the residual spatial variation or spatial autocorrelation can be managed in regres- small-scale variation might be evaluated by us- sion modeling: as spatial-lag and spatial error ing, for instance, the variogram (geostatistical models84. Spatial-lagged regression models take data), the Moran or Geary autocorrelation coef- care of spatial autocorrelation by considering that ficients (regional data), and the K-function (point the dependent (outcome) variable in one area is patterns)81. For example, Jones et al.88 used the K- affected by variables in nearby areas84. For in- function in a case-control study to determine the stance, when studying mortality from respirato- degree of clustering in road traffic accidents out- ry diseases in areas as a function of air pollution, comes. In Teresina, Brazil, after removing the spa- it is possible to include as exposure variables not tial gradient by using smoothing techniques, Wer- only the measure of pollution in that area but neck et al.44 employed the Moran autocorrelation also measures taken at neighborhoods. An alter- index to identify small-scale variation in the inci- native is to include the outcome as lagged vari- dence of visceral leishmaniasis across census tracts. ables. For example, the incidence of tuberculosis in one area might be expressed in a regression Modeling model as a function of socioeconomic variables and the incidence of tuberculosis in nearby re- The objective here is to describe associations gions. The first model is called a regression mod- between exposure variables and some health out- el with spatial lagged explanatory variables and come, for instance, between incidence rates and the last as a regression model with spatial lagged socioeconomic variables, using regression mod- response variable84. els that explicitly take into consideration the spa- Alternatively, when there exist some unmea- tial structure of data. sured spatially correlated variables that have an The ecological study is the typical epidemio- effect on the outcome or when the major source logic design used to study such associations when of autocorrelation in the dependent variable is variables are measured at the group level, most an interaction mechanism, a spatial error model frequently defined as geographical areas. As pre- is recommended84,91. In this case, the error for viously stated, georeferenced data used in these the model in one area is correlated with the error studies are usually spatially autocorrelated and terms in its neighboring locations84,91. Okwi et explicitly taking care of the spatial autocorrela- al.92 using spatial-lag response and spatial error 1762

Werneck, G. L. G. Werneck, models found that a variety of geographic fac- Data availability and quality tors, such as soil type, distance/travel time to public resources, elevation, type of land use, and In Brazil and some other countries, national demographic variables significantly explain spa- registries of sociodemographic and health data tial patterns of poverty in rural Kenya. are important sources of information for geo- Regression approaches for spatial point pat- referenced studies. However, the availability and terns are less common in the epidemiologic liter- quality of essential information is an issue of ature, but some relatively recent developments major concern24. For instance, on propose methods for analyzing georeferenced socioeconomic variables is a common problem, case-control studies using nonparametric binary as well as, the lack of coverage of the whole pop- regression models using kernel and generalized ulation and diagnostic errors. Reliability and va- additive model methods93. Webster et al.94 used lidity studies of large databases are essential for similar approaches to investigate the association judging the possibility of bias in spatial investi- between residence and breast cancer on Cape gations based on secondary data. Cod, Massachusetts (USA) using data from pop- ulation-based case-control studies. Confidentiality

There are two major issues regarding confi- Facing up to the future dentiality24. The first is related to the possibility of using the address recorded in secondary regis- The vast accumulation of methods and tech- try data (e.g. mortality or hospital admittance niques for analyzing georeferenced data in the data) to assess the location of disease cases. Since last years is also posing many challenges to epi- these are personal data, the issue of whether in- demiologists willing to use such approaches in dividual consent is necessary arises. Without no their studies. Beyond those problems already doubt the use of this type of information should mentioned above in the text, there are some oth- only be allowed on the basis of specific rules and er that deserve further consideration. conditions, but the need for individual consent would greatly impair the development of research The concept of space in this field. The second, is the use of very small- area data which might permit the identification It is well recognized that there is an acritical of neighborhoods and even city blocks. If data incorporation of “space” as a category of analy- will be used to map environmental hazards or sis in epidemiologic research. Most of the time, clusters of disease one should be concerned about space is considered just as geometric space indi- the impact not only on community perceptions cating where things occur95,96, or just used for the but also on the value of properties24. differentiation of social conditions96, or as a cir- cumstance of spatial factors inducing risk96. Epidemiologic design Therefore, space in many epidemiological appli- cations is a concept with no social or historical Most analyses of georeferenced data are based dimensions97. Space, however, is both the medi- on aggregated or secondary data. It means that um and the outcome of social relations98, a social there is no research planning for defining what construct resulting from the human action, or- and how variables will be collected. In general, ganized in a society, over landscape99. Barcellos these studies lack essential information for un- and Sabroza100 used the expression “the place derstanding disease processes and are based on behind the case” when analyzing environmental available data collected for other purposes than risk conditions for leptospirosis, which I regard the specific research. The development of spatial as an inspired terminology to stress that the links techniques is an important feature to be between risk conditions and occurrence of dis- implemented in surveys aiming at describing spa- ease are directly determined by socioeconomic, tial variation of health events101,102. Case-control cultural and social factors operating in space96. studies have been used more frequently, but still To be more useful, epidemiologic studies focus- there are some issues of concern. For instance, ing on georeferenced data should embrace the unmatched designs are more clearly interpreted concept of social space or, at least, explicitly indi- since the assumption of cases and controls com- cate the concepts of space they are using. ing from the same spatial distribution leads to the 1763 Ciência & Saúde Coletiva, 13(6):1753-1766, 2008

expectation that their spatial patterns will not dif- tiotemporal context have also some tradition. fer, unless some social or environmental risk fac- Although there exists a tradition on the study of tor exist103. However, the same reasoning will not space-time clustering of non-infectious disease105, directly apply in matched case-control studies103. investigations using space-time modeling in this area are still infrequently seen in epidemiology. Sensitivity analysis The difficulties in specifying and interpreting models with a complex autocorrelation structure Since spatial data analysis involves a series of that goes not only in space and time alone, but choices (e.g. neighboring weights, issues in map- also considers the interaction between the two ping, variety of data types and models), sensitiv- components (e.g. incidence today in one specific ity analysis is an area that deserves more atten- area depends not only on what happened in this tion. Using sensitivity analysis to check whether area in the past and what happens today in the the results of a study are too much rooted or neighboring areas, but also on what happened in dependent on specific choices is a way to incor- the past in nearby areas) is probably the main porate a quantitative approach to improve the reason for not using such models regularly in qualitative judgments that are expected in the dis- epidemiologic research. cussion of epidemiologic studies104.

Space-time models Spatial analytical techniques were developed first for use in geology, ecology, and agriculture, In infectious disease epidemiology, heteroge- and only recently have been largely integrated to neities in space and time has been considered re- the epidemiologic agenda of investigation. At this sponsible for the increase in transmission. In gen- point in time, when there is a wide- of avail- eral, such heterogeneities have been more fre- able software for applying such models in an al- quently incorporated in the dynamical modeling most mechanical way, a critical incorporation of framework, but diffusion models based on ex- methods for the analysis of georeferenced data is tensions of the autorregressive models to the spa- a major challenge for epidemiology.

Acknowledgements

This work was partially supported by CNPq - 483675/2006-9 - 410590/2006-1 - 308889/2007-0.

References

1. MacMahon B, Pugh TFH. Epidemiology: principles 9. Guerry A-M. Essai sur la statistique moral de la France. and methods. Boston: Little Brown & Co.; 1970. A Translation of Andre-Michel Guerry’s Essay on 2. Lilienfeld AM, Lilienfeld DE. Foundations of epide- the Moral Statistics of France (1883): a sociological miology. 2nd ed. New York: Oxford University Press; report to the French Academy of Science; edited and 1980. translated by Hugh P. Whitt and Victor W. Reinking. 3. Meade M, Florin J, Gesler W. Medical geography. Lewinston: Edwin Mellen Press; 2002. New York: Guilford Press; 1988. 10. Engels F. A situação da classe trabalhadora em Ingla- 4. Barret FA. ‘SCURVY’ Lind’s medical geography. Soc terra. Porto: Afrontamento; 1975. Sci Med 1991; 33:347-353. 11. Snow J. Sobre a maneira de transmissão da cólera. 5. Barrett FA. A medical geographical anniversary. Soc Rio de Janeiro: USAID; 1967. Sci Med 1993; 37:701-710. 12. McLeod KS. Our sense of Snow: the myth of John 6. Light RU. The progress of medical geography. Geogr Snow in medical geography. Soc Sci Med 2000; 50: Rev 1944; 34:636-641. 923-935. 7. Barrett FA. Finke’s 1792 map of human diseases: the 13. Vandenbroucke JP, Eelkman Rooda HM, Beukers first world disease map? Soc Sci Med 2000; 50:915- H. Who made John Snow a hero? Am J Epidemiol 921. 1991;133:967-973. 8. Krieger N. Epidemiology and social sciences: to- 14. Moore DA, Carpenter TE. Spatial analytical meth- wards a critical reengagement in the 21st century. ods and geographic information systems: use in Epidemiol Rev 2000; 22:155-163. health research and epidemiology. Epidemiol Rev 1999; 21:143-161. 1764

Werneck, G. L. G. Werneck, 15. Cromley EK. GIS and disease. Annu Rev Public 35. Leem JH, Kaplan BM, Shim YK, Pohl HR, Gotway Health 2003; 24:7-24. CA, Bullard SM, Rogers JF, Smith MM, Tylenda 16. Krieger N. Place, space, and health: GIS and epide- CA. Exposures to air pollutants during pregnancy miology. Epidemiology 2003;14:384-385. and preterm delivery. Environ Health Perspect 2006; 17. Clarke KC, McLafferty SL, Tempalski BJ. On epide- 114:905-910. miology and geographic information systems: a re- 36. Jerrett M, Buzzelli M, Burnett RT, DeLuca PF. Par- view and discussion of future directions. Emerging ticulate air pollution, social confounders, and mor- Infectious Diseases 1996; 2:85-92. tality in small areas of an industrial city. Soc Sci Med 18. Nuckols JR, Ward MH, Jarup L. Using geographic 2005; 60:2845-2863. information systems for exposure assessment in 37. Ersoy A, Yunsel TY, Cetin M. Characterization of environmental epidemiology studies. Environ Health land contaminated by past heavy metal mining us- Perspect 2004; 112:1007-1015. ing geostatistical methods. Arch Environ Contam 19. Rogers DJ, Randolph SE. Studying the global dis- Toxicol 2004; 46:162-175. tribution of infectious diseases using GIS and RS. 38. Goovaerts P. Geostatistical analysis of disease data: Nat Rev Microbiol 2003; 1:231-237. estimation of cancer mortality risk from empirical 20. Beck LR, Lobitz BM, Wood BL. Remote sensing frequencies using Poisson kriging. Int J Health Geogr and human health: new sensors and new opportu- 2005; 4:31. nities. Emerg Infect Dis 2000; 6:217-227. 39. Goovaerts P. Geostatistical analysis of disease data: 21. Tatalovich Z, Wilson JP, Milam JE, Jerrett ML, accounting for spatial support and population den- McConnell R. Competing definitions of contextual sity in the isopleth mapping of cancer mortality risk environments. Int J Health Geogr 2006; 5:55. using area-to-point Poisson kriging. Int J Health 22. Santos SM. A importância do contexto social de mora- Geogr 2006; 5:52. dia na auto-avaliação de saúde [tese]. Rio de Janeiro 40. Campos MR, Valencia LI, Fortes BP, Braga RC, (RJ): Escola Nacional de Saúde Pública; 2008. Medronho RA. Distribuição espacial da infecção 23. Correia VR, Monteiro AM, Carvalho MS, Werneck por Ascaris lumbricoides. Rev. Saude Publica 2002; GL. Uma aplicação do sensoriamento remoto para 36:69-74. a investigação de endemias urbanas. Cad Saude 41. Fortes BP, Ortiz Valencia LI, Ribeiro SV, Medronho Publica 2007; 23:1015-1028. RA. Modelagem geoestatística da infecção por As- 24. Elliott P, Wartenberg D. Spatial epidemiology: cur- caris lumbricoides. Cad Saúde Pública 2004; 20:727- rent approaches and future challenges. Environ 734. Health Perspect 2004; 112:998-1006. 42. Török TJ, Kilgore PE, Clarke MJ, Holman RC, Bre- 25. Rezaeian M, Dunn G, St Leger S, Appleby L. Geo- see JS, Glass RI. Visualizing geographic and tem- graphical epidemiology, spatial analysis and geo- poral trends in rotavirus activity in the United States, graphical information systems: a multidisciplinary 1991 to 1996. Pediatr Infect Dis J 1997; 16:941-946. glossary. J Epidemiol Community Health 2007; 61:98- 43. Kleinschmidt I, Bagayoko M, Clarke GP, Craig M, 102. Le Sueur D. A spatial statistical approach to malaria 26. Lawson AB. Statistical methods in spatial epidemiolo- mapping. Int J Epidemiol 2000; 29:355-361. gy. New York: Wiley; 2001. 44. Werneck GL, Costa CH, Walker AM, David JR, Wand 27. Elliott P, Cuzik J, English D, Stern R, editors. Geo- M, Maguire JH. The urban spread of visceral leish- graphical and environmental epidemiology. Oxford: maniasis: clues from spatial analysis. Epidemiology Oxford University Press; 1996. 2002; 13:364-367. 28. Bailey TC. Spatial statistical methods in health. Cad 45. Carrat F, Valleron AJ. Epidemiologic mapping us- Saude Publica 2001; 17:1083-1098. ing the “kriging” method: application to an influen- 29. Cressie N. Statistics for Spatial Data. New York: John za-like illness epidemic in France. Am J Epidemiol Wiley; 1993. 1992; 135:1293-1300. 30. Schabenberger O, Gotway CA. Statistical methods 46. Sakai T, Suzuki H, Sasaki A, Saito R, Tanabe N, for spatial data analysis. Boca Raton: Chapman & Taniguchi K. Geographic and temporal trends in Hall/CRC; 2005. influenzalike illness, Japan, 1992-1999. Emerg In- 31. Wackernagel H. Multivariate geostatistics. New York: fect Dis 2004; 10:1822-1826. Springer; 1995. 47. DeMers MN. Fundamentals of Geographic Informa- 32. Nicholson MC, Mather TN. Methods for evaluat- tion Systems. New York: John Wiley & Sons; 1997. ing Lyme disease risks using geographic informa- 48. Morgenstern H. Ecologic studies. In: Rothman KJ, tion systems and geospatial analysis. J Med Entomol Greenland S, editors. Modern epidemiology. 2nd edi- 1996; 33:711-720. tion. Philadelphia: Lippincott-Raven; 1998. 33. Allen TR, Lu GY, Wong D. Integrating remote sens- 49. Dent BD. : thematic map design. 3rd edi- ing, terrain analysis, and geostatistics for mosquito tion. Dubuque, Iowa: Wm. C. Brown; 1993. surveillance and control. In: Proceedings of the 50. Cliff AD, Haggett P. Atlas of disease distributions: American Society for Photogrammetry and Remote analytic approaches to epidemiologic data. Oxford: Sensing (ASPRS) Annual Conference; 2003; Anchor- Blackwell; 1988. age, Alaska, USA. p. 1-10. 51. Walter SD, Birnie SE. Mapping mortality and mor- 34. Carbajo AE, Curto SI, Schweigmann NJ. Spatial dis- bidity patterns: an international comparison. Int J tribution pattern of oviposition in the mosquito Epidemiol 1991; 20:678-689. Aedes aegypti in relation to urbanization in Buenos 52. Pickle LW, Mungiole M, Jones GK, White AA. Atlas Aires: southern fringe bionomics of an introduced of United States mortality. Hyattsville, Maryland: vector. Med Vet Entomol 2006; 20:209-218. National Center for Health Statistics; 1996. 1765 Ciência & Saúde Coletiva, 13(6):1753-1766, 2008

53. Brasil. Ministério da Saúde. Atlas de Saúde do Brasil. 74. Lagrotta MT, Silva WC, Souza-Santos R. Identifica- [acessado 2008 Mai 02]. Disponível em: http://www. tion of key areas for Aedes aegypti control through saude.gov.br/svs/atlas geoprocessing in Nova Iguaçu, Rio de Janeiro State, 54. Robinson WS. Ecological correlations and the behav- Brazil. Cad Saúde Pública 2008; 24:70-80. ior of individuals. Am Sociol Rev 1950; 15:351–357. 75. Linde A van der, Witzko KH, Jockel KH. Spatial- 55. Durkheim E. Suicide: a study in sociology. New York: temporal analysis of mortality using splines. Bio- Free Press; 1951. metrics 1995; 51:1352-1360. 56. Walter SD. The ecologic method in the study of 76. Cliff AD, Haggett P, Smallman-Raynor M. An ex- environmental health. II. Methodologic issues and ploratory method for estimating the changing speed feasibility. Environ Health Perspect 1991; 94:67-73. of epidemic waves from historical data. Int J Epide- 57. Rimm EB, Klatsky A, Grobbee D, Stampfer MJ. miol 2008; 37:106-112. Review of moderate alcohol consumption and re- 77. Gatrell AC, Bailey TC. Interactive spatial data analy- duced risk of coronary heart disease: is the effect sis in medical geography. Soc Sci Med 1996; 42:843- due to beer, wine, or spirits. BMJ 1996; 312:731-736. 855. 58. Kerr-Pontes LR, Montenegro AC, Barreto ML, Wer- 78. Lawson AB, Williams FLR. An introductory guide to neck GL, Feldmeier H. Inequality and leprosy in disease mapping. Chichester: Wiley; 2001. Northeast Brazil: an ecological study. Int J Epidemi- 79. Devine OJ, Louis TA, Halloran ME. Empirical Bayes ol 2004; 33:262-269. methods for stabilizing incidence rates before map- 59. Gatrell AC, Bailey TC, Diggle PJ, Rowlingson BS. ping. Epidemiology 1994; 5:622-630. Spatial point patterns and its application in geo- 80. Assunção RM, Barreto SM, Guerra HL, Sakurai E. graphical epidemiology. Trans Inst Br Geogr 1996; Mapas de taxas epidemiológicas: uma abordagem 21:256-274. Bayesiana. Cad Saúde Pública 1998; 14:713-723. 60. Haase P. Spatial Pattern Analysis in Ecology Based 81. Bailey TC, Gatrell AC. Interactive spatial data anal- on Ripley’s K-Function: Introduction and Methods ysis. New York: Longman; 1995. of Edge Correction. J Veg Sci 1995; 6:575-582. 82. Bernardinelli L, Montomoli C. Empirical Bayes ver- 61. Neeff T, Biging GS, Dutra LV, Freitas CC, Santos JR. sus fully Bayesian analysis of geographical varia- Modeling spatial tree patterns in the tapajós forest tion in disease risk. Stat Med 1992; 11:983-1007. using interferometric height. Revista Brasileira de 83. Richardson S. Statistical methods for geographical Cartografia 2005; 57:1-6. correlation studies. In: Elliot P, Cuzick J, English D, 62. Austin SB, Melly SJ, Sanchez BN, Patel A, Buka S, Stern R, editors. Geographical and enviromental ep- Gortmaker SL. Clustering of fast-food restaurants idemiology, Oxford: Oxford University Press; 1996. around schools: a novel application of spatial sta- p.181-204. tistics to the study of food environments. Am J Pub- 84. Haining R. Spatial data analysis in the social and lic Health 2005; 95:1575-1581. environmental sciences. Cambridge: Cambridge 63. Craglia M, Haining R, Wiles P. A Comparative Eval- University Press; 1990. uation of Approaches to Urban Crime Pattern Anal- 85. Schwartz GG, Hanchette CL. UV, latitude, and spa- ysis. Urban Studies 2000; 37:711-729. tial trends in prostate cancer mortality: all sunlight 64. Bishop MA. Point pattern analysis of eruption points is not the same (United States). Cancer Causes Con- for the Mount Gambier volcanic sub-province: a quan- trol 2006; 17:1091-1101. titative geographical approach to the understanding 86. Montenegro AC, Werneck GL, Kerr-Pontes LR, Bar- of volcano distribution. Area 2007; 39:230-241. reto ML, Feldmeier H. Spatial analysis of the distri- 65. Griffith DA. Spatial autocorrelation. [acessado 2008 Mai bution of leprosy in the State of Ceará, Northeast 02]. Disponível em: http://www.utdallas.edu/~dagrif- Brazil. Mem Inst Oswaldo Cruz 2004; 99:683-686. fith/Taiwan_lectures/background%20readings/ESM- 87. Cressie N, Read TRC. Spatial data analysis of re- 2005.pdf gional counts. Biom J 1989; 31:699–719. 66. Griffith DA. Spatial autocorrelation: a primer. Wash- 88. Jones AP, Langford IH, Bentham G. The applica- ington, D.C.: Association of American Geographers; tion of K-function analysis to the geographical dis- 1987. tribution of road traffic accident outcomes in Nor- 67. Odland J. Spatial autocorrelation. Beverly Hill: Sage; folk, England. Soc Sci Med 1996; 42:879-885. 1988. 89. Cook DG, Pocock SJ. Multiple Regression in Geo- 68. Cliff AD, Ord JK. Spatial Processes. London: Pion; graphical Mortality Studies, with Allowance for Spa- 1981. tially Correlated Errors. Biometrics 1983; 39:361-371. 69. Marshall RJ. A review of methods for the statistical 90. Pocock SJ, Shaper AG, Cook DG, Packham RF, analysis of spatial patterns of disease. J R Statist Soc A Lacey RF, Powell P, Russell PF.British Regional 1991; 154:421-441. Heart Study: geographic variations in cardiovascu- 70. Halloran ME, Struchiner CJ. Study design for de- lar mortality, and the role of water quality. Br Med J pendent happenings. Epidemiology 1991; 2:331-338. 1980;280:1243-1249. 71. Lloyd CD. Local models for spatial analysis. Boca Ra- 91. Anselin L. Spatial . In: Baltagi B, edi- ton: CRC Press; 2007. tor. Companion to Theoretical Econometrics. Oxford: 72. Werneck GL, Maguire JH. Spatial modeling using Blackwell; 2001. p.310–330. mixed models: an ecologic study of visceral leish- 92. Okwi PO, Ndeng’e G, Kristjanson P, Arunga M, maniasis in Teresina, Piauí State, Brazil. Cad Saúde Notenbaert A, Omolo A, Henninger N, Benson T, Pública 2002; 18:633-637. Kariuki P, Owuor J. Spatial determinants of poverty 73. Braga JU. O uso da modelagem espacial na estimativa in rural Kenya. Proc Natl Acad Sci USA 2007; dos dados da tuberculose no Brasil [tese]. Rio de Janei- 104:16769-16774. ro (RJ): Instituto de Medicina Social, UERJ; 1997. 1766

Werneck, G. L. G. Werneck, 93. Kelsall JE, Diggle PJ. Spatial variation in risk of 101. Gyapong JO, Remme JH. The use of grid sampling disease: a nonparametric binary regression ap- methodology for rapid assessment of the distribu- proach. Appl Statist 1998; 47:559-573. tion of bancroftian filariasis. Trans R Soc Trop Med 94. Webster T, Vieira V, Weinberg J, Aschengrau A. Hyg 2001; 95:681-686. Method for mapping population-based case-con- 102. Arbia G. The use of GIS in spatial surveys. Inter- trol studies: an application using generalized ad- national Statistical Review1993; 61:339-359. ditive models. Int J Health Geogr 2006; 5:26. 103. Glass GE. Update: spatial aspects of epidemiolo- 95. Curtis S, Jones IR. Is there a place for geography gy: the interface with medical geography. Epidemi- in the analysis of health inequality? Soc Health Il- ol Rev 2000; 22:136-139. ness 1998; 20:645-672. 104. Greenland S. Basic methods for sensitivity analysis 96. Barcellos C. Elos entre geografia e epidemiologia. of biases. Int J Epidemiol 1996; 25:1107-1116. Cad Saúde Pública 2000; 16:607-609. 105. Werneck GL, Struchiner CJ. Estudos de agregados 97. Costa MCN, Teixeira MGLC. A concepção de “es- de doença no espaço-tempo: conceitos, técnicas e paço” na investigação epidemiológica. Cad Saúde desafios. Cad Saúde Pública 1997; 13:611-624. Pública 1999; 15:271-279. 98. Barcellos C, Bastos FI. Geoprocessamento, ambi- ente e saúde, uma união possível? Cad Saúde Pública 1996; 12:389-397. 99. Kearns RA, Joseph AE. Space in its place: develop- ing the link in medical geography. Soc Sci Med 1993; 37:711-717. 100. Barcellos C, Sabroza PC. The place behind the case: leptospirosis risks and associated environ- mental conditions in a flood-related outbreak in Artigo apresentado em 08/05//2008 Rio de Janeiro. Cad Saúde Pública 2001; 17(Sup- Aprovado em 30/05/2008 pl):59-67. Versão final apresentada em 20/06/2008