<<

International Journal of Geo-Information

Article On the Statistical Distribution of the Nonzero Spatial Parameter in a Simultaneous

Qing Luo 1,2 , Daniel A. Griffith 3 and Huayi Wu 1,2,* 1 State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China; [email protected] 2 Collaborative Innovation Center of Geospatial Technology, Wuhan University, Wuhan 430079, China 3 School of Economic, Political, and Policy Sciences, The University of Texas at Dallas, Richardson, TX 75080, USA; dagriffi[email protected] * Correspondence: [email protected]; Tel.: +86-27-6877-8311

 Received: 27 October 2018; Accepted: 6 December 2018; Published: 12 December 2018 

Abstract: This paper focuses on the spatial autocorrelation parameter ρ of the simultaneous autoregressive model, and furnishes its distribution for nonzero values, for two regular square (rook and queen) tessellations as well as a hexagonal case with rook connectivity, using Monte Carlo simulation with a large sample size. The regular square lattice directly relates to increasingly used, remotely sensed images, whereas the regular hexagonal configuration is frequently used in sampling and aggregation situations. Results suggest an asymptotic normal distribution for estimated ρ. More specifically, this paper posits functions between ρ and its for three adjacency structures, which makes hypothesis testing implementable and furnishes an easily-computed version of the asymptotic variance for ρ at zero for each configuration. In addition, it also presents three examples, where the first employed a simulated dataset for a zero spatial autocorrelation case, and the other two used two empirical datasets—of these, one is a block dataset for Wuhan (with a Moran coefficient of 0.53, allowing a null hypothesis of, e.g., ρ = 0.7) to illustrate a moderate spatial autocorrelation case, and the other is a remotely sensed image of the Yellow Mountain region, China (with a Moran coefficient of 0.91, allowing a null hypothesis of, e.g., ρ = 0.95) to illustrate a high spatial autocorrelation case.

Keywords: simultaneous autoregressive model; spatial autocorrelation parameter; nonzero null hypothesis; ; asymptotic variance

1. Introduction Spatial autocorrelation (SA) is a common phenomenon of spatial analyses, where there is a common naive hypothesis that SA is zero (e.g., [1–3]) for datasets involving, for example, georeferenced demographic, social economic, and remotely sensed image variables. A more reasonable postulate would be nonzero SA. Gotelli and Ulrich [4] (p. 171) also pointed out that one of the important challenges in null modeling testing in ecology is “creating null models that perform well with ... varying degrees of SA in species occurrence data”. This is a challenge that is not unique to ecology. A main problem hindering the positing of a nonzero SA hypothesis, or varying degrees of SA, is the unknown sampling distribution of the SA parameter, which may be denoted by ρ of the simultaneous autoregressive (or SAR, which is called the “spatial error model” in spatial [5] (p. 5)) model in this paper. This parameter quantifies the degree of self-correlation in the error term (e.g., [6] (p. 42, which can be rewritten as Equation (1)); as when it deviates further from zero, its distribution becomes more skewed and peaked, and its variance decreases. Figure1 shows this phenomenon,

ISPRS Int. J. Geo-Inf. 2018, 7, 476; doi:10.3390/ijgi7120476 www.mdpi.com/journal/ijgi ISPRS Int. J. Geo-Inf. 2018, 7, x 476 FOR PEER REVIEW 22 of of 25 from zero, its distribution becomes more skewed and peaked, and its variance decreases. Figure 1 whereshows itsthis connectivity phenomenon, matrix where comes its connectivity from the Wuhan matrix census comes blockfrom datathe Wuhan that are census employed block in data the thatmoderate are employed SA case. in Four the histogramsmoderate SA with case. density Four curveshistograms for differentwith densityρ values curves of 0,for 0.5, different 0.9, and 𝜌 values0.99 reveal of 0, that0.5, 0.9, as ρ andincreases, 0.99 reveal the that distribution as 𝜌 increases, moves the further distribution from zero moves (see further the red from reference zero (see line perpendicularthe red reference to theline horizontal perpendicular axis), to and the its hori shapezontal becomes axis), and narrower. its shape becomes narrower.

FigureFigure 1. 1. SelectedSelected frequency distributions distributions of of simultaneous simultaneous autoregressive autoregressive (SAR)-estimated ρ𝜌..

To gain an understanding of the complications relatedrelated to figuringfiguring out the sampling distribution of the test statistic under under the the nonzer nonzeroo hypothesis, hypothesis, it it is is always always enlightening enlightening to to look look back back to to Pearson’s Pearson’s r (i.e.,r (i.e., correlation ) coefficient) in in standard standard statistics and and the the autocorrelation autocorrelation coefficient coefficient in in models, because the SA coefficient coefficient resembles the “a “auto”uto” version of the former, and extends the latter from aa uni-directionuni-direction (time (time ) dimension) to to a multi-directionsa multi-directions (space (space dimension) dimension) case. case. A parallel A parallel to what to whatis commonly is commonly done in done spatial in scenarios spatial isscenarios that a routine is that null a hypothesisroutine null often hypothesis sets the (auto)correlation often sets the (auto)correlationcoefficient to zero coefficient in classical to statistics zero in orclassical time series statisti (e.g.,cs or [7, 8time]). In series order (e.g., to test [7,8]). the nonzeroIn order correlationto test the nonzerocoefficient correlation in standard coefficient statistics, in a Fisher’sstandard Z-transformation statistics, a Fisher’s is used Z-transformation so that the transformed is used so statistic that the is transformedapproximately statistic normally is approximately distributed and itsnormally variance distributed is stabilized and (e.g., its [ 9variance] (pp. 55–56); is stabilized [10] (p. 487); (e.g., [11 [9]]). (pp.However, 55–56); for [10] the (p. original 487); [11]).ρ, the However, distribution for the becomes original skewed 𝜌, the as distributionρ approaches becomes its extreme skewed values. as 𝜌 approachesFortunately, its Provost extreme [12 values.] suggests Fortunately, closed-form Provos representationst [12] suggests of closed-form the density representations function and integer of the densitymoments function of the sampleand integer correlation moments coefficient of the sample to furnish correlation a solution coefficient for the distribution to furnish a problem. solution Forfor thetime distribution series , problem. For time the research series autocorrelat on a statisticalions, the distribution research on of a statistical nonzero coefficient distribution was of adeveloped nonzero coefficient from a more was practicaldeveloped viewpoint. from a more For pr example,actical viewpoint. Ames and For Reiterexample, [13 ]Ames established and Reiter the [13]distribution established of autocorrelation the distribution coefficients of autocorrelation in economic coefficients time series in by economic using empirical time series datasets by sousing that empiricalhypothesis datasets testing forso thethat significance hypothesis of (auto)correlationtesting for the significance among economic of (auto)correlation variables could haveamong “a economicmore appropriate variables basis” could (p. have 638) “a than more the appropriate null hypothesis basis” on (p. a basis 638) ofthan zero the (auto)correlation. null hypothesis Moreon a basisrecent of literature zero (auto)correlation. [14] pertaining Mo to thisre recent topic focusedliterature on [14] the pertaining effect of the to nonzerothis topic coefficient focused on the effectdistribution of the nonzero of the Durbin-Watson coefficient on testthe estimator.distribution Similarly of the Durbin-Watson to the time series test literature, estimator. studies Similarly on the to thedistribution time series of the nonzeroliterature, autocorrelation studies on coefficient/parameterthe distribution of arethe rare nonzero in spatial autocorrelation statistics. One coefficient/parameterrelated work dates back are to rare 1967, in inspatial which statistics Mead [15. One] verified related that work the Fisher’sdates back Z-transformation to 1967, in which did Meadnot help [15] to verified obtain a that stabilized the Fisher’s variance, Z-transformation and even its generalized did not help form to obtain could onlya stabilized help in variance, instances and of a veryeven smallits generalizedρ (i.e., 0, ± 0.05,form ±could0.1). Thisonly earlier help in work instances provides of evidencea very small that the𝜌 (i.e., variance 0, ±0.05, problem ±0.1). seems This earlierto be a majorwork obstacleprovides for evidence establishing that athe sampling variance distribution problem forseems nonzero to beρ. Thisa major paper obstacle furnishes for a possibleestablishing solution a sampling for this distribution problem by for representing nonzero 𝜌. the This variance paper furnishes of ρˆ as a function a possible of solutionρ, and validates for this problemthe asymptotic by representing normality of the this variance distribution of through𝜌 as a Montefunction Carlo of simulation𝜌, and validates experiments. the asymptotic Moreover, normality of this distribution through Monte Carlo simulation experiments. Moreover, it also

ISPRS Int. J. Geo-Inf. 2018, 7, 476 3 of 25 it also establishes a function between ρ and the Moran Coefficient (MC, also known as Moran’s I, [16]), which bridges the most widely used SA statistic and spatial autoregressive model. This paper contributes to the literature by establishing properties of the sampling distribution of nonzero ρ, supplementing the asymptotic variance known for zero ρ in a regression framework. Its results also pertain to the conditional autoregressive (CAR) model, which is called the conditionally specified Gaussian (CSG) in spatial econometrics [17] (pp. 197 and 201), as well as the autoregressive response (AR) model, which is called the spatial lag model in spatial econometrics [5] (p. 5). A SAR model can also be written as a CAR model (e.g., [18] (p. 149); [19] (p. 123); [20] (p. 68)), both SAR and AR are of second-order (i.e., they also specify spatial correlation in terms of neighbors of neighbors), and their pure SA versions with no independent variables are the same model. The focus is on the SAR here because it is the most commonly used specification in spatial statistics, and the SAR uses the row standardized spatial weights matrix W, not matrix C (see Section2). It is also necessary to explain the motivation of this paper from a more general perspective. Because Cliff and Ord [21] systematically introduced hypothesis testing for the existence of SA latent in spatial data, SA-related research, in terms of both its theoretical and applied aspects, has flourished in a wide of domains employing datasets with geographical or locational information. Sokal and Oden [22,23] introduced SA into biology, which inspired biologists to take SA into account in their work because most biological or ecological datasets are closely related to geographical locations. Legendre [24] constructed a paradigm for ecologists to describe and test for SA, as well as to introduce spatial structure into ecological models. Other fields dealing with SA include, for example, spatial (see [25] for a thorough overview), spatial econometrics (e.g., [26]), and urban planning (e.g., [27]), which often use Geographical Information Science (GIS, [28]) methods or tools. In all of these domains, testing the existence or presence of SA is a solved problem, where the next question which naturally arises is how the degree of SA could be tested. This paper furnishes an answer to this question. This article consists of five sections. Section2 presents model specification and parameter estimation. Section3 furnishes the sampling distribution of the SA parameter estimate. Section4 gives one simulated example for zero SA by setting the null hypothesis to be ρ0 = 0, as well as two empirical examples. Empirical analyses from over the years disclose that most socio-economic/demographic attributes have a degree of correlation ranging from 0.4 to 0.6, which indicates a relevant range for ρ within 0.65 to 0.85 [29]; thus, a moderate SA case sets the null hypothesis as ρ0 = 0.7, whereas a strong SA case sets the null hypothesis as ρ0 = 0.95. Section5 presents conclusions and discussion. There are also appendices with some supplementary materials.

2. Model Specification and Parameter Estimation This paper focuses on the SAR model and its SA parameter ρ. The first use of the word simultaneous as a descriptor of an autoregressive model was by Whittle [30], whose seminal article introduced an expression known as the SAR model for two-dimensional stationary processes. Concomitantly with the development of spatial statistics, the SAR model was frequently used by geographers, econometricians, ecologists, and other spatial researchers as one of the very popular specifications for describing georeferenced data. This SAR model is specified as follows:

Y = ρWY + (I − ρW)Xβ + ε, (1) where Y is a response variable whose realizations y1, y2, ··· , yn can be observations of a geographic attribute (e.g., average house price) distributed across n regions, ρ is a SA parameter, W is a stochastic version (row standardized) of the n-by-n (n2 entries) binary contiguity matrix C (the entry of the ith row and jth column is 1 if region i and j are adjacent, and 0 otherwise)—both W and C reflect the spatial adjacency of a geographical phenomenon, I is the n-by-n (n2 entries) identity matrix, X is a ISPRS Int. J. Geo-Inf. 2018, 7, 476 4 of 25 n-by-(m+1) (nm + n entries) matrix with m explanatory variables (i.e., covariates) and one unit vector for the intercept, β is a coefficient vector of the order (m+1)-by-1, and ε is a n-by-1 white-noise error term, which conforms to a standard multivariate normal distribution, ε ∼ MVN0, σ2I. In Equation (1), Y appears on both sides of the equal sign, which is why the regression has the prefix, auto; when the equation is rewritten for individual observations, n similar equations simultaneously appear. Without loss of generality, X can be void of covariates and contain only the intercept term, rendering a pure SA specification. Then, Equation (1) is reduced to:

Y = ρWY + β0(I − ρW)1 + ε, (2) where 1 is the n-by-1 vector of ones contained in matrix X. Equation (2) is also known as a pure SAR model, and is the specification employed in this paper. For the pure SAR specification, the error model and lag model are equivalent. The parameters that need to be estimated in an SAR model are ρ, β (for Equation (2), it is only 2 β0), and σ . As pointed out by Ord [31] (p. 122), the estimator of ρ is inconsistent, and even if this problem is revised by choosing an auxiliary matrix, the estimator is less efficient than a maximum likelihood estimator (MLE). Thus, ML estimation is a commonly used technique to estimate ρ (e.g., [18,21]; [32]). Griffith [33] (pp. 176–177) provides explicit MLEs of these parameters, and furnishes an equivalent form for the MLE of ρ that can be executed with the SAS code employed by this paper, and |ρ| < 1 for square tessellation and the rook’s adjacency (for regular square queen, and hexagonal adjacencies, see §3.1). However, a well-known problem of the MLE is the computing burden of its determinant constituting its Jacobian term, i.e., ln(|I − ρW|), which becomes especially troublesome when a sample size becomes large. Griffith [34–38] (This work includes finding approximations of the Jacobian for regular, as well as irregular surface partitioning, exploring analytical or approximated expressions of eigenvalues of matrix C and matrix W, and figuring out a simplified algorithm for calculating MLEs for massive sample sizes. In addition, this work establishes criteria to check the rationality of posited eigenvalue approximations, and to evaluate the Jacobian term approximations. Analyses summarized in this paper are based upon these contributions.) contributed some simplifications that reduced the computational burden of this factor. Another issue meriting attention here is the sampling variance of the model estimated ρ (i.e., ρˆSAR, which is simplified to ρˆ hereafter), because it measures the uncertainty or quantifies the precision of the ML estimate. Capturing how this variance changes is helpful for evaluating the efficiency of the estimator; in addition, it is always necessary to know at least the first- and the second-order central moments of a sampling distribution of a parameter estimator so that can be conducted. Ord [31] (p. 124) suggests an asymptotic variance- matrix of ρ and σ2, from which an expression of the asymptotic variance of ρˆ can be obtained. This lays the foundation for exploring the analytical expression of the sampling variance of ρˆ in this paper. Another approach that can be used to estimate the SA parameter is the general method of moments (GMM) furnished by, among others, Kelejian and Prucha [39,40]. Walde et al. [41] present a thorough comparison between GMM- and MLE-based methods for very large-data spatial models, and suggest employing the former. A very important reason that led them to choose the GMM for large spatial data model estimation was that the GMM had a “directly computable” (p. 164) of ρ, which is also implemented in a MLE framework in this paper. Section3 describes this implementation in detail.

3. The Sampling Distribution of ρˆ Before discussing the sampling distribution, it is necessary to introduce three surface partitionings on which all works of this paper are based: a regular square tessellation with rook connectivity (squares A and B are defined to be neighbors if they have a common edge), a regular square tessellation with queen connectivity (squares A and B are defined to be neighbors if they have a common edge or a common vertex), and a regular hexagonal tessellation with rook connectivity (hexagons A and ISPRS Int. J. Geo-Inf. 2018, 7, 476 5 of 25

B are defined to be neighbors if they have a common edge). There are definitely many types of geographical configurations, e.g., some theoretical types that have been discussed in [42], a more practical one that is used to construct grids for the earth’s surface (i.e., ISEA3H, [43]), a soccer ball-like ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 5 of 25 configuration (i.e., a combination of hexagons and pentagons). However, only these three were employed,three were becauseemployed, the because rook and the queen rook areand frequently-used queen are frequently-used criteria in many criteria GIS in and many spatial GIS dataand analysis-relatedspatial data analysis-related software or packagessoftware or (e.g., packages Esri ArcMap; (e.g., Esri spdep ArcMap; in R, spdep [6]); moreover,in R, [6]); thesemoreover, two adjacencythese two schemesadjacency with schemes a more with regular a squaremore regular lattice relatesquare to lattice increasingly relate used,to increasingly remotely sensed used, images.remotely Also, sensed a hexagonal images. Also, configuration a hexagonal is often configur used ination spatial is often sampling used designin spatial [44 ]sampling (pp. 24–25) design and aggregation[44] (pp. 24–25) [43]. and In summary,aggregation these [43]. three In summary, configurations these three are frequently configurations used inare practice, frequently and used many in irregularpractice, latticesand many tend irregular to have lattices connectivities tend to thathave are connectivities between a regular that are square between and a hexagon regular square lattice (e.g.,and hexagon [42]). As lattice shown (e.g., in Figure [42]).2 ,As for shown a square in Figure tessellation, 2, for aA squarehas four tessellation, neighbors A for has a rookfour adjacency,neighbors eightfor a neighborsrook adjacency, for a queen eight adjacency, neighbors and for sixa queen neighbors adjacency, for a hexagonal and six tessellation.neighbors for Suppose a hexagonal there aretessellation.n observations Suppose of a there geographical are 𝑛 observations attribute distributed of a geographical over one of attribute these landscapes, distributed considering over one the of regularity,these landscapes, and let consideringn = P × Q, wherethe regularity,P is the numberand let 𝑛=𝑃×𝑄 of rows, Q, whereis the number 𝑃 is the of number columns, of rows, and the 𝑄 2 2 dimensionis the number of the of columns, connectivity and matrix the dimension is n-by-n of (i.e., theP connectivity× Q ). matrix is n-by-n (i.e., 𝑃 ×𝑄).

(a) (b) (c)

FigureFigure 2.2.Three Three regular regular surface surface partitionings: partitionings: (a) square(a) square configuration configuration with rookwith adjacency;rook adjacency; (b) square (b) configurationsquare configuration with queen with adjacency; queen adjacency; (c) hexagonal (c) hexagonal configuration. configuration.

3.1. The Relationship between ρˆ and the MC 3.1. The Relationship between 𝜌 and the MC In the literature, the most widely used statistic for quantifying SA is the MC. From a model In the literature, the most widely used statistic for quantifying SA is the MC. From a model perspective, ρ is the parameter that has the same function as the MC. Although these two quantities perspective, 𝜌 is the parameter that has the same function as the MC. Although these two quantities are related, they are not equivalent, and thus should not be interchangeably used [1]. Establishing are related, they are not equivalent, and thus should not be interchangeably used [1]. Establishing their explicit relationship quantitatively supports evaluation of the SA level latent in georeferenced their explicit relationship quantitatively supports evaluation of the SA level latent in georeferenced data. Griffith [45] (pp. 33–34) points out that the relationship curve of ρˆ against the MC is logistic data. Griffith [45] (pp. 33–34) points out that the relationship curve of 𝜌 against the MC is logistic ρˆ (or(or sigmoid).sigmoid). This paper explores their relationship function further further so so that that 𝜌 cancan bebe quantitativelyquantitatively describeddescribed byby thethe MCMC forfor thethe threethree surfacesurface partitioningspartitionings discusseddiscussed inin thisthis paper.paper. n ρ SupposeSuppose aa geographicalgeographical configurationconfiguration has 𝑛 units. units. The estimates of ρ and and thethe MCMC areare closelyclosely MCM C M relatedrelated toto matrixmatrix 𝑴𝑪𝑴,, where where 𝑪is is defined defined as as done done previously, previously, 𝑴is the is the projection projection matrix matrix defined defined as I − J n J as 𝑰−𝑱/ ,⁄ and, 𝑛 andis a 𝑱 n-by-n is a matrixn-by-n whosematrix entries whose are entries all ones. are Jong all etones. al. [46Jong] (pp. et 21–22)al. [46] demonstrated (pp. 21–22) T  that MCextreme = n/1 C1 λextreme, which is indicative of the relationship between extreme MC demonstrated that 𝑀𝐶 = (𝑛𝟏⁄)𝑪𝟏 𝜆, which is indicative of the relationship between MCM valuesextreme and MC extreme values and eigenvalues extreme (denotedeigenvalues by (denotedλ) of by; in𝜆) otherof 𝑴𝑪𝑴 words,; in other the maximum words, the eigenvalue maximum correspondseigenvalue corresponds to the maximum to the MC maximum value, while MC thevalue, minimum while the eigenvalue minimum corresponds eigenvalue to corresponds the minimum to MCthe minimum value. Because MC value. this equation Because restricts this equation the feasible restricts range the of feasible the MC, range it is reasonable of the MC, to it generate is reasonable a set of sample MC values by: to generate a set of sample MC values by: n MC = i = n i T 𝑛λi, 1, 2, . . . , . (3) 𝑀𝐶 1=C1 𝜆 ,𝑖 =1,2,…,𝑛. (3) 𝟏𝑪𝟏

Ordering the eigenvalues such that 𝜆 ≥𝜆 ≥...≥𝜆, the corresponding 𝜌 can be estimated for each by employing eigenvectors 𝑬=(𝐸,𝐸,…,𝐸) to replace 𝒀 in Equation (2). That is:

𝐸 =𝜌𝑾𝐸 +𝛽(𝑰−𝜌𝑾)𝟏 + 𝜺, 𝑖 = 1, 2, …,𝑛. (4)

The eigenvalue 𝜆 and eigenvector 𝐸 (𝑖 = 1, 2, … , 𝑛) correspond to the ith MC value, and this 𝐸 is also the response variable in the pure SAR model rendering an estimate for the ith 𝜌 value.

ISPRS Int. J. Geo-Inf. 2018, 7, 476 6 of 25

Ordering the eigenvalues such that λ1 ≥ λ2 ≥ ... ≥ λn, the corresponding ρ can be estimated for each by employing eigenvectors E = (E1, E2,..., En) to replace Y in Equation (2). That is:

Ei = ρWEi + β0(I − ρW)1 + ε, i = 1, 2, . . . , n. (4) ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 6 of 25 The eigenvalue λi and eigenvector Ei (i = 1, 2, . . . , n) correspond to the ith MC value, and this Ei is also theTo responseexplore the variable relationship in the pure between SAR modelthe MC rendering and 𝜌, 14 an groups estimate of for experiments the ith ρ value. with different sampleTo exploresizes were the conducted relationship for between the three the configurat MC andionsρˆ, 14(i.e., groups regular of square experiments rook, regular with different square samplequeen, sizesand hexagonal were conducted cases), forwhere the three𝑛 ranged configurations from 25 to (i.e., 4900 regular (that is, square 𝑛 = rook, 5 × 5, regular 10 × 10, square … ,60× queen,60, 65 × and 65, hexagonal 70 × 70, where cases), 𝑃 where and 𝑄n increaseranged from in increments 25 to 4900 (thatof five). is, n The= 5 theoretical× 5, 10 × 10, relationship... , 60 × 60,functions 65 × 65,for 70different× 70, where spatialP andconfigurationsQ increase were in increments defined, ofand five). the parameters The theoretical were relationship estimated. functionsTable 1 summarizes for different the spatial resulting configurations expressions. were The defined, bold and𝒆 is thethe parametersbase of the werenatural estimated. logarithm, Table and1 summarizes𝜆 is the minimum the resulting eigenvalue expressions. of matrix The bold 𝑾 e(rookis the connectivity base of the natural 𝜆 is logarithm, about −1, andqueenλmin 𝜆is the is minimumapproximately eigenvalue −0.53, whereas of matrix 𝜆W(rookof the connectivity hexagonal tessellationλmin is about is around−1, queen −0.57).λmin is approximately −0.53, whereas λmin of the hexagonal tessellation is around −0.57). Table 1. Theoretical functions for 𝜌 versus the Moran coefficient (MC). Table 1. Theoretical functions for ρ versus the Moran coefficient (MC). Geographical Configurations Function Forms Parameter Estimates Geographical Configurations Function𝑎 Forms𝑑 Parameter Estimates Square rook 𝜌 = + 𝑎=2,𝑏 =−8,𝑐̂ =0,𝑑 =1 1+𝒆a ∗ d 𝜆 Square rook ρˆ = ∗ + + aˆ = 2, bˆ = −8, cˆ = 0, dˆ = 1 1+eb MC c λmin Square queen 𝑎 𝑑 𝑎=5.8,𝑏 = −0.96, 𝑐̂ =8,𝑑 =1 𝜌 = + = ˆ = − = ˆ = Square queen = a |+| d aˆ 5.8, b 0.96, cˆ 8, d 1 Hexagonal ρˆ 1+𝒆|MC+b|c 𝜆 𝑎 = 5.5, 𝑏 = −0.96, 𝑐̂ =6.7,𝑑 =1 Hexagonal 1+e λmin aˆ = 5.5, bˆ = −0.96, cˆ = 6.7, dˆ = 1

Figure 3 presents the selected fitted curves. This figure depicts the 70-by-70 (𝑛 = 4900) configurationFigure3 presents results for the the selected square-rook, fitted curves.square-queen, This figureand hexagonal depicts thecases. 70-by-70 It presents ( n the= 4900fitted) configurationcurves (red), computed results for with the square-rook, the theoretical square-queen, equation, superimposed and hexagonal on cases. the observed It presents data the (blue), fitted curveswhere (red),the horizontal computed axis with indicates the theoretical the values equation, of MC superimposed, and the vertical on the axis observed stands for data the (blue), values where of 𝜌 the horizontal axis indicates the values of MC, and the vertical axis stands for the values of ρ (here, ρ (here, 𝜌 denotes both the MC-estimated 𝜌 and the SAR-estimated 𝜌. ); the regression lines of 𝜌 denotes(estimated both by the the MC-estimated functions of ρtheand MC the inSAR-estimated Table 1) versusρ .);𝜌 theare regressionshown in Appendix lines of ρˆMC A.(estimated Figure 3a byreveals the functions that for ofthe the square MC in rook Table connectivity,1) versus ρˆ arethe shown feasible in range Appendix of theA. MC Figure is 3approximatelya reveals that for [−1, the 1], squareand the rook range connectivity, of 𝜌 is (−1,1 the) feasible. For the range square of queen the MC (Figure is approximately 3b) and hexagonal[−1, 1], (Figure and the 3c) range cases, of ρtheis (MC−1, is 1) .about For the (−0.51, square 1] queen; 𝜌 is (Figureapproximately3b) and hexagonal(−1.90, 1) (Figureand (−1.74,3c) cases, 1), respectively the MC is about (These(− intervals0.51, 1]; ρ is approximately (−1.90, 1) and (−1.74, 1), respectively (These intervals of ρ verify the inequality of 𝜌 verify the inequality 𝜆 < 𝜌 <𝜆 (Ord, 1975, p124). While the most frequently used −1 −1 λintervalmin < isρˆ (<−1,1λmax) (refer(Ord, to 1975, Section p124). 2), Whileintervals the beyond most frequently this range used can interval be transformed is (−1, 1) onto(refer it.). to SectionThese theoretical2), intervals and beyond original this scatter range plots can bematch transformed perfectly, ontowith it.).bivariate These regression theoretical 𝑅 ands of original nearly 1 2 scatter(see Appendix plots match A Figure perfectly, A1). with bivariate regression R s of nearly 1 (see AppendixA Figure A1).

(a) (b) (c)

FigureFigure 3.3. FittedFitted curvescurves ofof thethe MCMC againstagainst ρ𝜌for for 70-by-70 70-by-70 ( n(𝑛= 4900 = 4900)) surface surface partitionings: partitionings: ((aa)) squaresquare rookrook case;case; ((bb)) squaresquare queenqueen case;case; and, and, ( c(c)) hexagonal hexagonal case. case.

3.2. The Sampling Variance of 𝜌 The sampling variance of 𝜌 is important because it quantifies the uncertainty with which 𝜌 is estimated. Ord [31] (p. 124) proposes the asymptotic variance-covariance matrix1 of 𝜌 and 𝜎 (the value of diagonal entries of the variance- of error term 𝜺); for illustrative purposes, it is rewritten as:

1 Ord, 1975, Appendix B, contains a typographical error: a minus should be inserted after the second equality sign of 𝛼.

ISPRS Int. J. Geo-Inf. 2018, 7, 476 7 of 25

3.2. The Sampling Variance of ρˆ The sampling variance of ρˆ is important because it quantifies the uncertainty with which ρ is estimated. Ord [31] (p. 124) proposes the asymptotic variance-covariance matrix1 of ρ and σ2 (the value of diagonal entries of the variance-covariance matrix of error term ε); for illustrative purposes, it is rewritten as: !−1   n tr(B) V σ2, ρ = σ2 2σ2 , (5) tr(B) σ2trBTB − ασ2

−1 n 2 2 where B = (I − ρW) W, α = − ∑ i=1λi /(1 − ρλi) , λi is the ith eigenvalue of matrix W, “tr” is the matrix trace operator, and superscript “T” is the matrix transpose operator. An asymptotic formula of the sampling variance of ρˆ derived from Equation (5) by inverting the 2-by-2 matrix is T  n 2 2 2 Var(ρˆ)asy = (n/2)/∆, where ∆ = (n/2)tr B B + (n/2) ∑ i=1λi /(1 − ρλi) − [tr(B)] . Hence, the variance expansion may be written as:

1 Var(ρˆ) = . (6) asy 2 h i2 T T −1 −1 n λi 2 −1 tr(W (I − ρW ) (I − ρW) W) + ∑i=1 2 − n tr((I − ρW) W) (1−ρλi)

This equation is still not easy to compute because it involves (inverse) matrix operations, as well as eigenvalues. The following sub-sections present some simplifications which only consist of sample size (or number of observations) n, and the extreme eigenvalues of matrix W (specifically, for zero SA cases, only P and Q).

3.2.1. The Sampling Variance of ρˆ at Zero

T n 2 −1 When ρ = 0, Equation (6) becomes Var(ρ = 0)asy = 1/[tr(W W) + ∑ i=1λi ]. Because W = D C (D. is a diagonal matrix whose n diagonal entries are inverse row sums of matrix C), C and D−1 2 T T −1 −1 2 −2 are square matrices, and cij = cij because C is binary, tr(W W) = tr(C D D C) = tr(C D ) = n n 2 n 2 n n ∑ i=1[∑ j=1cij /(∑ j=1cij) ] = ∑ i=1(1/ ∑ j=1cij), then the asymptotic variance of ρ at zero is:

1 Var(ρ = 0) = . (7) asy n 1 n 2 ∑i=1 n + ∑i=1 λ ∑j=1 cij i

Table2 summarizes this variance for different surface partitionings; the formulae for summation of the inverse row sum of matrix C and summation of squared eigenvalues of matrix W are listed in AppendixB.

Table 2. Asymptotic variance of ρˆ at zero for different surface partitionings.

Landscape Asymptotic Variance for ρ=0. Square rook 72/(36PQ + 23P + 23Q + 36) Square queen 2400/(600PQ + 639P + 639Q + 794) Hexagonal 180/(60PQ + 55P + 60Q + 71)

The expressions in Table2 make Equation (7) extremely easy to compute, especially when the sample size is large. However, functions for ρ at nonzero points are more complex. As has been argued by [15], the variance of the inter-plant competition coefficient (which is the sampling variance of ρ that is discussed in this paper) cannot be stabilized by the Fisher Z-transformation, and even its generalized form only works on very weak spatial interactions (0, ±0.05, ±0.1) for three specific spatial

1 Ord, 1975, Appendix B, contains a typographical error: a minus should be inserted after the second equality sign of α. ISPRS Int. J. Geo-Inf. 2018, 7, 476 8 of 25 configurations (i.e., 7, 12, and 19 hexagonally arranged points; see Figure1 on p. 193). More recently, GriffithISPRS Int. and J. Geo-Inf. Chun 2018 [47, ]7, emphasized x FOR PEER REVIEW that better quantifying the spatial variability of SA estimates is 8 ofstill 25 a challenge. By exploring the distribution of the variance, this paper finds that the sampling variance ofgroupρˆ is awas function 𝑃×𝑄 of. Theρ (which SA parameter is implemented 𝜌 was withuniformlyρˆ), which sampled is depicted across by its a feasible Beta distribution ranges, namely, curve with𝜆 equal<𝜌<𝜆 parameters =1 larger , where than 1. 𝜆 is an eigenvalue of matrix 𝑾 . Theoretical functions of asymptotic variance versus 𝜌 are presented in Table 3. In the formulae, 𝑎 is the variance at zero 3.2.2. The Sampling Variance of ρˆ at Nonzero Values (see Table 2), 𝐺 is the standardized 𝜌 with form 𝐺=(𝜌−1⁄ 𝜆)(1𝜆⁄ ⁄ −1⁄ 𝜆), and 𝜆 and To𝜆 conduct are the the maximum experiments, and30 minimu groupsm of eigenvalues different sample of matrix sizes, 𝑾 ranging, respectively. from 5-by-5 to 150-by-150 with P and Q increasing in increments of five, were employed; the sample size of each group was Table 3. Theoretical functions for asymptotic variance versus 𝜌. −1 P × Q. The SA parameter ρ was uniformly sampled across its feasible ranges, namely, λmin < ρˆ < −1 λmaxGeographical= 1, where λ is an eigenvalue of matrix W. Theoretical functions of asymptotic Forms Parameter Estimates (𝑹𝟐) versusConfigurationsρˆ are presented in Table3. In the formulae, a0 is the variance at zero (see Table2), G is the standardized ρˆ with form G = (ρˆ − 1/λmin)/(1/λmax − 1/λmin), and17.9181λmax and λmin are the maximum 𝑎= + 6.7144 , (0.9999) andSquare minimum rook eigenvalues of matrix W, respectively. 𝑛. 𝑏 =2.4,𝑐̂ =2.4 Table 3. Theoretical functions for asymptotic variance32.4112 versus ρˆ. 𝑉𝑎𝑟(𝜌) =𝑎∗𝑎 ∗𝐺 𝑎= + 6.4694, (0.9990) 𝑛. ( ) 2 Geographical Configurations∗ 1−𝐺 Function Forms 2.0836Parameter Estimates (R ) Square queen 𝑏 = .17.9181+ 2.1107, (0.9975) 𝑛aˆ = 0.5395 + 6.7144 , (0.9999) Square rook Var(ρˆ) = a ∗ a ∗ Gb−1 n asy 0 1.0641 ˆ = = − b 2.4, cˆ 2.4 ∗(1 − G)c 1 𝑐̂ = + 2.2843, (0.9974) 𝑛.= 32.4112 + ( ) aˆ n0.4482 6.4694, 0.9990 Square queen 30.1752bˆ = 2.0836 + 2.1107, (0.9975) 𝑎= n0.3519− 0.0160, (0.9941) .= 1.0641 + ( ) 𝑛 cˆ n0.2982 2.2843, 0.9974 3.002330.1752 𝑏 = aˆ = 0.3065+ 0.5299,− 0.0160, (0.9966)(0.9941) 𝑉𝑎𝑟(𝜌) .n 𝑛 bˆ = 3.0023 + 0.5299, (0.9966) ∗ n0.1121 =𝑎∗𝑎 ∗𝐺 0.0110 ∗ (ln(𝑛) −6)cˆ =+ 0.02357, 𝑛 400 Hexagonal ∗ 𝑐̂ = ( 2 , ∗ (1−𝐺) Var(ρˆ)asy = −0.02330.0110 ∗ (ln∗ ((ln𝑛()n)−6)− 6) ++0.02357, 0.02357,n ≤ 𝑛400 400 Hexagonal , c∗G+b−1 e∗G+d−1 −0.0233 ∗ (ln(n) − 6)2 + 0.02357, n > 400 a ∗ a0 ∗ G ∗ (1 − G) (0.9917) 𝑑 =𝑛.∗().(0.9917− 0.0634,) (0.9900) ˆ −0.0647∗ln (n)+0.4876 d = n ln 𝑛 − 0.0634, (0.9900) = ln n − ( ) 𝑒̂ = eˆ 3.35013− 1.1161,1.1161, (0.9890)0.9890 3.35013

FigureFigure4 4 shows shows selected selected results results of of 150-by-150 150-by-150 ( (n𝑛= 22, = 22,500 500)) latticeslattices forfor thethe squaresquare rook,rook, squaresquare queen,queen, andand hexagonalhexagonal cases.cases. ItIt presentspresents theoreticaltheoretical plotsplots (red)(red) superimposedsuperimposed onon thethe originaloriginal scatterscatter plotsplots (blue), showing showing that that the the sampling sampling variance variance of of𝜌 isρˆ Beta-distributedis Beta-distributed with with equal equal shape shape and scale and scaleparameters, parameters, and andthat thatthe thetheoretical theoretical and and the the orig originalinal plots plots closely closely correspond, correspond, which which is alsoalso 2 corroboratedcorroborated byby thethe bivariatebivariate regressionregression plotsplots andand theirtheir accompanyingaccompanying R𝑅 valuesvalues ofof nearlynearly 11 (see(see AppendixAppendixA A Figure Figure A2 A2).).

(a) (b) (c)

FigureFigure 4.4. FittedFitted curvescurves ofof thethe asymptoticasymptotic variancevariance versusversus ρˆ𝜌for for 150-by-150 150-by-150 ( n(𝑛= 22, = 50022,500)) adjacencies:adjacencies: ((aa)) squaresquare rook rook case; case; ( b(b)) square square queen queen case; case; and, and, ( c(c)) hexagonal hexagonal case. case.

ToTo evaluateevaluate thethe validityvalidity ofof thethe theoreticaltheoretical equationsequations listedlisted inin TableTable3 ,3, simulation simulation experiments experiments werewere conducted.conducted.

3.2.3. Simulation Experiments For each of the three cases, 24 groups of experiments of different sample sizes (from 10-by-10 to 125-by-125) were done, and for each sample size, combinations of two treatments (i.e., employing an

ISPRS Int. J. Geo-Inf. 2018, 7, 476 9 of 25 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 9 of 25

3.2.3.approximated Simulation Jacobian Experiments term [36,37] and employing [35] a new algorithm for MLEs) were applied for the pure SAR model. For a square rook case, 𝜌 took 21 values (from −0.9 to 0.9 with a 0.1 increment, For each of the three cases, 24 groups of experiments of different sample sizes (from 10-by-10 to and ±0.95) within its feasible range (−1,1), and 10,000 replications were executed per value per 125-by-125) were done, and for each sample size, combinations of two treatments (i.e., employing an method. For a square queen case, 𝜌 took 29 values (from −1.8 to 0.9 with a 0.1 increment, and 0.95) approximated Jacobian term [36,37] and employing [35] a new algorithm for MLEs) were applied for within its feasible range (−1.9,1), and 10,000 replications were executed per value per method. For a the pure SAR model. For a square rook case, ρ took 21 values (from −0.9 to 0.9 with a 0.1 increment, hexagonal case, 𝜌 took 28 values (from −1.6 to 0.9 with a 0.1 increment, and −1.65, 0.95) within its and ±0.95) within its feasible range (−1, 1), and 10,000 replications were executed per value per feasible range (−1.74,1), and 10,000 replications were executed per value per method. Table 4 shows method. For a square queen case, ρ took 29 values (from −1.8 to 0.9 with a 0.1 increment, and 0.95) different treatment combinations that were employed by the three cases with sample size 100 (i.e., within its feasible range (−1.9, 1), and 10,000 replications were executed per value per method. For a 10-by-10 lattice). Full information about the simulations appears in Appendix C (Table A2). For a hexagonal case, ρ took 28 values (from −1.6 to 0.9 with a 0.1 increment, and −1.65, 0.95) within its specific 𝜌, 10,000 simulated values were generated whose was approximately feasible range (−1.74, 1), and 10,000 replications were executed per value per method. Table4 shows normal, and from which its and variance were extracted. different treatment combinations that were employed by the three cases with sample size 100 (i.e., 10-by-10 lattice). Full information about the simulations appears in AppendixC (Table A2). For a Table 4. Methods employed with a 10-by-10 lattice *. specific ρ, 10,000 simulated values were generated whose frequency distribution was approximately normal, and fromTreatment which its mean andExactJacob variance were extracted.AppJacob1 AppJacob2 AppJacob3 Original MLEs SR, SQ, H - - SQ, H Griffith’s MLEs (2015)Table 4. Methods - employed with a SR 10-by-10 lattice SR*. SQ, H * SR, TreatmentSQ, and H represent square ExactJacob rook, squareAppJacob1 queen, and hexagonal AppJacob2 cases, respectively. AppJacob3 AppJacob1 and AppJacob2 refer to approximated Jacobian Equations (10) and (11) in [35]. AppJacob3 is Original MLEs SR, SQ, H - - SQ, H Griffith’sEquation MLEs (8). (2015) - SR SR SQ, H * SR, SQ, and H represent square rook, square queen, and hexagonal cases, respectively. AppJacob1 and AppJacob2 referThese to approximated experiments Jacobian contend Equations with (10) two and (11)complications: in [35]. AppJacob3 approximating is Equation (8). the Jacobian term, and clarifying its derivation for the model in SAS. For the square rook adjacency case,These two experimentsforms of approximation contend with furnished two complications: by [36] were approximating employed. For the the Jacobian square term,queen and and clarifyinghexagon its adjacency derivation cases, for thethe nonlinearfollowing regression approximation model [37] in SAS.was used: For the square rook adjacency case, two forms of approximation furnished𝑙𝑛(1−𝜌∗𝜆 by [36] were) employed. For the square queen and hexagon 𝑱𝒂𝒄 =𝛼 ∗ +1−𝛿 ∗𝑙𝑛(1−𝜌∗𝜆)+𝛼 adjacency cases, the following approximation𝜌∗𝜆 [37] was used: 𝑙𝑛(1−𝜌∗𝜆) (8) ∗ +1−𝛿 ∗𝑙𝑛(1−𝜌∗𝜆), h 𝜌∗𝜆i h i ln(1−ρ∗λmin) ln(1−ρ∗λmax) Jac = α ∗ + 1 − δ ∗ ln(1 − ρ ∗ λ ) + α ∗ + 1 − δ ∗ ln(1 − ρ ∗ λmax) , (8) 1 ρ∗λmin 1 min 2 ρ∗λmax 2 where 𝑱𝒂𝒄 denotes an approximated Jacobian term. The default derivative of the Jacobian term whereemployedJac denotes in SAS is an misleading. approximated Its correct Jacobian form, term. Equation The default(A2), is derivativepresented in of Appendix the Jacobian D. Griffith term employed[35] (p. 2149) in SAS furnishes is misleading. a new algorithm Its correct to avoid form, the Equation massive matrix (A2), iscalculation presented in in the Appendix MLEs, whichD. Griffitheffectively [35] reduces (p. 2149) the furnishes execution a time. new This algorithm algorithm to avoid is employed the massive in this matrix paper. calculation in the MLEs,Figure which 5 effectively includes 10-by-10 reduces theresults execution for the time.regular This square algorithm rook, isregular employed square in thisqueen, paper. and regular hexagonalFigure5 includestessellation 10-by-10 cases. resultsIn all forthese the figures, regular squarethe theoretical rook, regular values square are presented queen, and by regular smooth hexagonalorange curves. tessellation Selected cases. sampling In all these distribution figures, thes theoreticalof weak, moderate, values are presentedand strong by positive smooth orange𝜌 for a curves.10-by-10 Selected lattice sampling are presented distributions in Appendix of weak, E to moderate, illustrate and the strong asymptotic positive normalityρ for a 10-by-10 of its sampling lattice aredistribution. presented in AppendixE to illustrate the asymptotic normality of its sampling distribution.

(a) (b) (c)

FigureFigure 5. Variance5. Variance comparison comparison plots plots for 10-by-10 for 10-by-10 (n = 100(𝑛) tessellations: = 100) tessellations: (a) square (a) rooksquare case; rook (b) case; square (b) queensquare case; queen (c) hexagonal case; (c) hexagonal case. case.

ISPRS Int. J. Geo-Inf. 2018, 7, 476 10 of 25

Figure5a portrays variance values for a systematic sample of 21 points (from −0.95 to 0.95) and a square rook case. Not including the orange graph, there are three other colored graphs in Figure5a, where the blue stands for the exact values, and the red and green represent two different approximated calculations. The exact and approximated values are close, while a small gap appears between the blue-red-green and the orange graphs around the peak. The gap almost disappears when the sample size becomes 400 (20-by-20, see AppendixF Figure A4), and then appears again, but has no big change as the sample size increases. The gap is the difference between the theoretical and the exact values, and depicts the accuracy of the theoretical formulae—a smaller gap indicates a better approximation. Figure5b portrays variance values for a systematic sample of 29 points (from −1.8 to 0.95) and a square tessellation with a queen adjacency case. To obtain the two results, one (the red) was calculated by approximating the Jacobian term and using a simplified algorithm for ML estimation [35], whereas the other (the green) was calculated by approximating the Jacobian term and using the conventional ML estimation implementation. Results calculated by these two methods are coincident, and they are not only considerably close to the exact graph (the blue in Figure5b), but also close to the theoretical curves; the variance plots for larger sample sizes are shown in AppendixF (Figure A5), in which the theoretical (orange) and one approximation (red) are portrayed (except for the 20-by-20 case) because executing the nonlinear regression without the simplified algorithm is extremely time-consuming, and the gaps between the orange and the red are negligible. Figure5c portrays variance values for a systematic sample of 28 points (from −1.65 to 0.95) computed for the hexagonal case. Here, the two approximations almost perfectly match the exact and theoretical values; variance plots for larger sample sizes appear in AppendixF (Figure A6); the exact would be slightly bigger around the maximum values than the theoretical ones when sample size exceeds 1600 (40-by-40), whereas values along the two sides match reasonably well. Figure6 presents convergence plots of variances at ρ = 0. The top row presents convergence curves whose horizontal axis is the square root of the sample size (from 5 to 150), and whose vertical axis is the . Different colors indicate different calculation methods; for example, the orange denotes the theoretical calculated variance of ρ at zero, the blue denotes the asymptotic (formulae in Table2) computed variance, the green and red respectively denote variance calculated with simulation experiments using the two methods approximating the Jacobian term (two methods exist for the square rook adjacency, whereas only one for the square queen and hexagonal adjacencies). The bottom row shows the standard-deviation-to-square-root-of-sample-size ratio (from 5 to 150); the green and red lines denote those standard deviations calculated with simulation experiments versus the asymptotically calculated standard deviations. Figure6(a1,a2) are from the results of a square rook case, Figure6(b1,b2) display the results from a square queen case, and Figure6(c1,c2) portray the results from a hexagonal case. All of the trajectory curves appear to converge to zero, and all standard deviation ratios fluctuate around 1, when sample size goes to infinity. ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 10 of 25

Figure 5a portrays variance values for a systematic sample of 21 points (from −0.95 to 0.95) and a square rook case. Not including the orange graph, there are three other colored graphs in Figure 5a, where the blue stands for the exact values, and the red and green represent two different approximated calculations. The exact and approximated values are close, while a small gap appears between the blue-red-green and the orange graphs around the peak. The gap almost disappears when the sample size becomes 400 (20-by-20, see Appendix F Figure A4), and then appears again, but has no big change as the sample size increases. The gap is the difference between the theoretical and the exact values, and depicts the accuracy of the theoretical formulae—a smaller gap indicates a better approximation. Figure 5b portrays variance values for a systematic sample of 29 points (from −1.8 to 0.95) and a square tessellation with a queen adjacency case. To obtain the two results, one (the red) was calculated by approximating the Jacobian term and using a simplified algorithm for ML estimation [35], whereas the other (the green) was calculated by approximating the Jacobian term and using the conventional ML estimation implementation. Results calculated by these two methods are coincident, and they are not only considerably close to the exact graph (the blue in Figure 5b), but also close to the theoretical curves; the variance plots for larger sample sizes are shown in Appendix F (Figure A5), in which the theoretical (orange) and one approximation (red) are portrayed (except for the 20-by-20 case) because executing the nonlinear regression without the simplified algorithm is extremely time-consuming, and the gaps between the orange and the red are negligible. Figure 5c portrays variance values for a systematic sample of 28 points (from −1.65 to 0.95) computed for the hexagonal case. Here, the two approximations almost perfectly match the exact and theoretical values; variance plots for larger sample sizes appear in Appendix F (Figure A6); the exact variances would be slightly bigger around the maximum values than the theoretical ones when sample size exceeds 1600 (40-by-40), whereas values along the two sides match reasonably well. Figure 6 presents convergence plots of variances at 𝜌=0. The top row presents convergence curves whose horizontal axis is the square root of the sample size (from 5 to 150), and whose vertical axis is the standard deviation. Different colors indicate different calculation methods; for example, the orange denotes the theoretical calculated variance of 𝜌 at zero, the blue denotes the asymptotic (formulae in Table 2) computed variance, the green and red respectively denote variance calculated with simulation experiments using the two methods approximating the Jacobian term (two methods exist for the square rook adjacency, whereas only one for the square queen and hexagonal adjacencies). The bottom row shows the standard-deviation-to-square-root-of-sample-size ratio (from 5 to 150); the green and red lines denote those standard deviations calculated with simulation experiments versus the asymptotically calculated standard deviations. Figure 6(a1,a2) are from the results of a square rook case, Figure 6(b1,b2) display the results from a square queen case, and Figure ISPRS6(c1,c2) Int. J.portray Geo-Inf. 2018the ,results7, 476 from a hexagonal case. All of the trajectory curves appear to converge11 of 25to zero, and all standard deviation ratios fluctuate around 1, when sample size goes to infinity.

ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 11 of 25 (a1) (b1) (c1)

(a2) (b2) (c2)

FigureFigure 6.6. ConvergenceConvergence andand variancevariance ratioratio plotsplots atat ρ𝜌=0= 0.(. (a1a1,,a2a2)) SquareSquare rookrook variancevariance convergenceconvergence andand ratioratio plots; plots; (b1 (,b1b2,)b2 square) square queen queen variance variance convergence convergence and ratio and plots; ratio ( c1plots;,c2) hexagonal (c1,c2) hexagonal variance convergencevariance convergence and ratio and plots. ratio plots.

4. A Simulated Zero Spatial Autocorrelation Scenario, and Two Nonzero Empirical Examples 4. A Simulated Zero Spatial Autocorrelation Scenario, and Two Nonzero Empirical Examples SectionSection3 3furnishes furnishes the the sampling sampling distribution distribution of the of nonzerothe nonzero SA parameter SA parameter of the pureof the SAR pure model; SAR onemodel; possible one possible application application of this contribution of this contribution is to conduct is to hypothesisconduct hypothesis testing for testing a nonzero for aρ nonzero. Thus, the 𝜌. nextThus, section the next includes section two includes empirical two examplesempirical thatexam representples that moderate represent and moderate strong and positive strong SA, positive and to illustrateSA, and ato zero illustrate SA case a (rarelyzero SA encountered case (rarely in enco empiricaluntered data), in simulatedempirical datadata), were simulated generated data as were well. ( − ) sd( ) Consideringgenerated as the well. normality Considering of its sampling the normality distribution, of its asampling Z test of the distribution, form ρ ρa0 Z/ testρ ofcan the be usedform to evaluate the estimated ρ value. Let the significance level α be 0.05, and for a two-sided hypothesis (𝜌−𝜌)⁄𝑠𝑑(𝜌) can be used to evaluate the estimated 𝜌 value. Let the significance level 𝛼 be 0.05, ± − testand asfor employed a two-sided in thishypothesis paper, thetest critical as employed values in would this paper, be 1.96 the (i.e.,critical any valu valuees would less than be ±1.961.96, (i.e., or biggerany value than less 1.96 than leads −1.96, to a or rejection bigger ofthan the 1.96 null leads hypothesis). to a rejection of the null hypothesis). 4.1. A Description of the Selected Datasets 4.1. A Description of the Selected Datasets For the zero SA case, a 100-by-100 (n = 10, 000) landscape (Goslee [48] furnishes R code for For the zero SA case, a 100-by-100 (𝑛 = 10,000) landscape (Goslee [48] furnishes R code for generating square lattices) with 10,000 numbers conforming to N(0, 1) randomly assigned on it was generating square lattices) with 10,000 numbers conforming to 𝑁(0,1) randomly assigned on it was generated. Figure7a illustrates this landscape, with colors ranging from red to green portraying generated. Figure 7a illustrates this landscape, with colors ranging from red to green portraying values of different levels. For the moderate SA case, a census dataset with 184 subdistricts for values of different levels. For the moderate SA case, a census dataset with 184 subdistricts for Wuhan, China (2010) (The statistical data come from the “Tabulation on the 2010 population census of Wuhan, China (2010) (The statistical data come from the “Tabulation on the 2010 population census China”, http://www.stats.gov.cn/tjsj/pcsj/rkpc/6rp/indexch.htm, and the blocks map comes from of China”, http://www.stats.gov.cn/tjsj/pcsj/rkpc/6rp/indexch.htm, and the blocks map comes from the Wuhan Land Resources and Planning Bureau) was employed. The target variable was set to be the the Wuhan Land Resources and Planning Bureau) was employed. The target variable was set to be percentage of 0-to19-year-old individuals in the population (denoted as 0–19%). Figure7b illustrates the percentage of 0-to19-year-old individuals in the population (denoted as 0–19%). Figure 7b the geographic distribution of 0–19% (ranging from 7.83 to 39.74) across Wuhan administrative districts, illustrates the geographic distribution of 0–19% (ranging from 7.83 to 39.74) across Wuhan with the green representing areas of low rates and the red representing areas of high rates. Relatively administrative districts, with the green representing areas of low rates and the red representing low 0–19% areas cluster in the central urban sections, while relatively high 0–19% areas cluster areas of high rates. Relatively low 0–19% areas cluster in the central urban sections, while relatively in suburban regions. Because of the irregularity of its configuration, new simulation experiments high 0–19% areas cluster in suburban regions. Because of the irregularity of its configuration, new were conducted to obtain the sampling variance of the SA parameter. Detailed information can be simulation experiments were conducted to obtain the sampling variance of the SA parameter. found in AppendixG. For the strong SA case, a Landsat 7 Enhanced Thematic Mapper Plus (ETM+) Detailed information can be found in Appendix G. For the strong SA case, a Landsat 7 Enhanced remotely sensed image downloaded from the USGS EarthExplore (https://earthexplorer.usgs.gov/) Thematic Mapper Plus (ETM+) remotely sensed image downloaded from the USGS EarthExplore was employed. This image is of the Yellow Mountain region of China on 8 October 2002. Its original (https://earthexplorer.usgs.gov/) was employed. This image is of the Yellow Mountain region of China on 8 October 2002. Its original form is 7811-by-7051 (𝑛 = 55,075,361) pixels, and it includes spectral bands B1-B8, with B1-B7 at a spatial resolution of 30 meters, and B8 having a spatial resolution of 15 meters. In order to be consistent with the magnitude of the other cases, a 100-by-100 patch, which displays the topographic landscape with green areas representing lower elevation and buff-to-white strips depicting mountain ridges with bare rocks, was cropped from the original image, and constitutes the study area across which the normalized difference vegetation index (NDVI) was calculated. The adjacency scheme employed to identify neighbors for all three cases was the rook (i.e., edge connections only), because for the irregular configuration (Figure 7b), the rook and queen (i.e., edge and point connections) would render very close results as adjacent administrative districts always have common boundaries; and for the two square cases (Figure 7a,c),

ISPRS Int. J. Geo-Inf. 2018, 7, 476 12 of 25 form is 7811-by-7051 (n = 55, 075, 361) pixels, and it includes spectral bands B1-B8, with B1-B7 at a spatial resolution of 30 meters, and B8 having a spatial resolution of 15 meters. In order to be consistent with the magnitude of the other cases, a 100-by-100 patch, which displays the topographic landscape with green areas representing lower elevation and buff-to-white strips depicting mountain ridges with bare rocks, was cropped from the original image, and constitutes the study area across which the normalized difference vegetation index (NDVI) was calculated. The adjacency scheme employed to identify neighbors for all three cases was the rook (i.e., edge connections only), because for the irregular configuration (Figure7b), the rook and queen (i.e., edge and point connections) would render ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 12 of 25 very close results as adjacent administrative districts always have common boundaries; and for the twoalthough square numerical cases (Figure results7a,c), althoughwould be numerical different, resultsthe adjacency would be different,definition the has adjacency little impact definition on the hasconclusion. little impact on the conclusion.

(a) (b) (c)

FigureFigure 7. 7.Landscapes Landscapes with with different different degrees degrees of of spatial spatial autocorrelation autocorrelation (SA): (SA): (a ()a a) a 100-by-100 100-by-100 lattice lattice acrossacross which which pseudo-random pseudo-random numbers numbers from afrom standard a st normalandard distribution normal distri are randomlybution are distributed; randomly (bdistributed;) 184 census ( blocksb) 184 ofcensus Wuhan blocks (2010) of with Wuhan 0–19% (2010) (ranging with from 0–19% 7.83 (ranging to 39.74, withfrom green7.83 to representing 39.74, with lowgreen rates representing and red representing low rates highand rates)red re displayed;presenting ( chigh) 100-by-100 rates) displayed; pixels clipped (c) 100-by-100 from the Yellow pixels Mountainclipped from region the image Yellow (2002, Mountain with the region green representingimage (2002, vegetation with the green and buff-to-white representing strips vegetation depicting and mountainbuff-to-white ridges). strips depicting mountain ridges).

InIn the the case case of of moderate moderate positive positive SA, SA, the the response response variable variable 0–19% 0–19% (YR, (YR, as as denoted denoted in in Table Table5 )5) waswas subjected subjected to to an an exponential exponential transformation transformation to to make make it it better better align align with with a a bell-shaped bell-shaped curve, curve, hencehence satisfying satisfying an an assumption assumption of of the the pure pure SAR SAR model; model; the the Shapiro-Wilk Shapiro-Wilk test test statistic statistic for for the the original original 0–19%0–19% was was 0.91, 0.91, increasing increasing to to 0.99 0.99 after after this this transformation. transformation. ForFor the the strong strong positive positive SA SA case, case, an an exponentialexponential transformation transformation was was also also applied applied to to NDVI NDVI to to achieve achieve better better symmetry; symmetry; considering considering its its largelarge sample sample size, size, the Kolmogorov-Smirnovthe Kolmogorov-Smirnov (KS) normality (KS) was employed, test was and employed, the transformation and the decreasedtransformation the KS decreased statistic from the 0.4250KS statistic to 0.0238. from Thus, 0.4250 the to transformed 0.0238. Thus, 0–19% the transformed and NDVI constituted 0–19% and theNDVI response constituted variables the of response those SAR variables models forof whichthose SAR the moderate models for and which strong the levels moderate of SA parameters and strong ρlevelswere estimated,of SA parameters respectively. 𝜌 were Data estimated, transformation respectively. and normality Data transformation testing details and for normality model residuals testing aredetails presented for model in Appendix residualsH are, and presented scatter in plots Append of residual-predictedix H, and scatter valuesplots of for residual-predicted assessing the homoscedasticityvalues for assessing of model the residuals, and thus of model the validity residuals, of the and regression thus the model, validity are of also the included.regression model, are also included.

Table 5. Results of zero, moderate, and strong positive spatial autocorrelation (SA) examples.

Dataset Random Landscape Wuhan Census Blocks Yellow Mountain Image Pseudo-random Response variable 3.67 − 11.79 ∗ 𝒆.∗ 1.46 ∗ 𝒆.∗ −2.70 numbers

Null hypothesis (𝐻) 𝜌 =0 𝜌 =0.7 𝜌 =0.95 MC 0.0051 0.5304 0.9077 𝜌 0.0101 0.7080 0.9697 𝑠𝑑(𝜌) 0.0139 0.0598 0.0019 z-score 0.7254 0.1338 10.1762 p-value 0.4682 0.8935 <2.5314e-24 *95% CI of 𝜌 [−0.0171, 0.0373] [0.5908, 0.8252] [0.9659, 0.9735]

Conclusion Failed to reject 𝐻 Failed to reject 𝐻 Reject 𝐻 *95% confidence intervals of model estimated 𝜌 are calculated to determine the range in which the

𝜌 will not be rejected.

4.2. Results and Explanations Results are summarized in Table 5, which also includes the MC values. There are two indexes appearing in this table to indicate the level of SA: the MC, and the SAR model estimated 𝜌 (i.e., 𝜌), both of which represent the same level (i.e., nearly zero, moderate, and

ISPRS Int. J. Geo-Inf. 2018, 7, 476 13 of 25

Table 5. Results of zero, moderate, and strong positive spatial autocorrelation (SA) examples.

Dataset Random Landscape Wuhan Census Blocks Yellow Mountain Image Pseudo-random Response variable 3.67 − 11.79 ∗ e−6.59∗YR 1.46 ∗ e4.78∗NDVI − 2.70 numbers

Null hypothesis (H0) ρ0 = 0 ρ0 = 0.7 ρ0 = 0.95 MC 0.0051 0.5304 0.9077 ρˆ 0.0101 0.7080 0.9697 sd(ρˆ) 0.0139 0.0598 0.0019 z-score 0.7254 0.1338 10.1762 p-value 0.4682 0.8935 <2.5314e-24 *95% CI of ρˆ [−0.0171, 0.0373] [0.5908, 0.8252] [0.9659, 0.9735] Conclusion Failed to reject H0 Failed to reject H0 Reject H0

*95% confidence intervals of model estimated ρ are calculated to determine the range in which the ρ0 will not be rejected.

4.2. Results and Explanations Results are summarized in Table5, which also includes the MC values. There are two indexes appearing in this table to indicate the level of SA: the MC, and the SAR model estimated ρ (i.e., ρˆ), both of which represent the same level (i.e., nearly zero, moderate, and strong) of SA, but with different values; the MC is directly calculated with the observations, and ρˆ is the ML estimate for the pure SAR model in which the neighborhood was defined by rook adjacency. For the random case, the MC value is 0.0051, which indicates nearly zero SA, while ρˆ is 0.0101; the z-score and p-value indicate a failure to reject the null hypothesis of ρ0 = 0, which verifies that those pseudo-random numbers conforming to N(0, 1) were randomly distributed on the 100-by-100 lattice. For the Wuhan census blocks case, the MC value of 0.53 indicates moderate SA, while ρˆ = 0.7080 corresponds to a first-order spatial correlation of roughly 0.45 (e.g., [29] (§2.2.1); [45] (p. 33)). Accordingly, a failure to reject the null hypothesis ρ0 = 0.7 for moderate SA is understandable. In the case of the Yellow Mountain remotely-sensed image, the MC value of 0.9077 indicates a high level of SA, and ρˆ = 0.9697 corresponds to a first-order spatial correlation of roughly 0.90. For this strong SA case, setting the null hypothesis of ρ0 = 0.95 is still rejected. Reasons for the rejection may include: (1) when referring to the logistic curve in Figure3a, those ρ values that indicate strong positive SA may be restricted to a very narrow interval, say (0.95, 1), while ρ0 = 0.95 corresponds to a moderate MC value (around 0.5); and, (2) the variance of ρˆ approaches 0 when the sample size becomes large and ρ approaches 1, and the nearly zero variance leads to a high z-score and a narrow confidence interval, and thus an extremely significant p-value.

5. Conclusions and Discussion The main contribution of this paper is furnishing the sampling distribution of the nonzero SA parameter of the SAR model, which is frequently employed in a wide range of disciplines whose study observations are connected with geographical attributes or locations. More specifically, the sampling distribution was constructed for three specific spatial structures (i.e., regular square rook and queen, and hexagonal tessellations); the former two are usually used for remotely sensed images (raster data), while the latter is preferred for spatial sampling designs and aggregations. As shown with graphics and functions, the curve shape of the parameter ρ against the MC is sigmoid, which indicates that the MC should better differentiate SA levels and is more coincident with intuition (e.g., a MC value of 0.5 quantifies moderate SA, whereas the ρ value for the same level may be larger than 0.7, which conjures up a vision of moderate-to-strong SA at first glance). This may suggest that the MC could be better for assessing the level of SA. In addition, the sampling variance of the parameter seems to conform to a beta distribution with equal scale and shape parameters (larger than 1). A difference in these distributions between the square rook and the other two is that the parabola-shaped curve is ISPRS Int. J. Geo-Inf. 2018, 7, 476 14 of 25 symmetric at zero for the square rook adjacency, whereas the symmetry is at negative points for the square queen and hexagonal adjacencies. One merit of this contribution is that it implements hypothesis testing for a nonzero null hypothesis. For illustrative purposes, two empirical examples for moderate and strong SA were selected, as was a zero null hypothesis case employing simulated data exhibiting a random map pattern. For this random case, the result indicates a failure to reject the null hypothesis, which is in accordance with our expectation. For the moderate case, prior knowledge that ρ and the MC have a logistic relationship (i.e., MC values beyond, e.g., (−0.3, 0.3) correspond to extreme ρ values) results in positing a null hypothesis of ρ0 = 0.7; thus, the null hypothesis is not rejected. For the strong SA case, the null hypothesis of ρ0 = 0.95 is rejected mainly because of the large sample size (i.e., n = 10, 000) and the closeness between 0.95 and 1, both of which result in an extremely small variance. These examples verify that the uncovered sampling distribution results are credible (the standard error for ρ0 = 0 is 0.0141, corresponding to a z-score of 0.7188, which has a very small difference of 0.0066 with the z-score 0.7254 appearing in Table5). In addition, a by-product of this contribution is the visualization of statistical power curves for SA statistics [42]. Because the SA parameter can be expressed in terms of the MC, their power curves can be plotted with a common measurement scale (see AppendixI). Future work needs to refine, and hopefully simplify, the variance expression for the hexagonal case; one way to achieve this end is to increase the number of experiments (the current variance-ρ function is based upon 14 groups of datasets (i.e., 7-by-7 to 70-by-70) to estimate its parameters). Extensions can be made for more complicated landscapes, such as a combination of hexagons and pentagons, partitionings with distorted hexagons (i.e., ISEA3H as a hexagonal Discrete Global Grid System; [43]); these may be described with appropriate spatial weights matrices, where for the former, expressions like those listed in Table2 need to be derived, and for the latter, a spatial weights matrix containing metrics based upon side length or hexagon area may be more reasonable than a binary version. An important issue illustrated by the last example meriting emphasis is that the formal hypothesis testing or its p-value may no longer make sense because of the large-to-massive sample size involved. For example, the “confidence interval” of 0.9697 is [0.9659, 0.9735], in which the null hypothesis value should be contained; otherwise, the null hypothesis is rejected. A cause of this “always significant” phenomenon is the extremely small variance for a very big sample size, which creates the necessity to develop a new criterion to substitute for statistical significance when analyzing spatial data with large-to-massive sample sizes. This is a meaningful research theme that needs attention in the future.

Author Contributions: The second author and the third author are advisors of the first author. The first author participated in all the stages of writing this paper, including experimentation, , and manuscript preparation. The second author supervised the entire process, including consultation on the experiments, evaluating the results, and reviewing versions of the manuscript. The third author also supervised the entire process, including scheduling of research taskes, data provision, and hardware and computing resources support. Funding: This work, in part, was supported by the China Scholarship Council, grant number 201406270075, and by the National Key Research and Development Program of China, grand number 2017YFB0503802. Acknowledgments: The authors thank Bin Li for his participation in the middle and the final stage discussions of this paper. The authors thank Zhipeng Gui, Zelong Yang, and Xu Gao for providing access to the computing resources on the team’s private cloud. The authors also thank anonymous reviewers for their suggestions to improve this paper. Conflicts of Interest: The authors declare no conflict of interests. ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 14 of 25 pentagons, partitionings with distorted hexagons (i.e., ISEA3H as a hexagonal Discrete Global Grid System; [43]); these may be described with appropriate spatial weights matrices, where for the former, expressions like those listed in Table 2 need to be derived, and for the latter, a spatial weights matrix containing metrics based upon side length or hexagon area may be more reasonable than a binary version. An important issue illustrated by the last example meriting emphasis is that the formal hypothesis testing or its p-value may no longer make sense because of the large-to-massive sample size involved. For example, the “” of 0.9697 is [0.9659, 0.9735], in which the null hypothesis value should be contained; otherwise, the null hypothesis is rejected. A cause of this “always significant” phenomenon is the extremely small variance for a very big sample size, which creates the necessity to develop a new criterion to substitute for when analyzing spatial data with large-to-massive sample sizes. This is a meaningful research theme that needs attention in the future.

Author Contributions: The second author and the third author are advisors of the first author. The first author participated in all the stages of writing this paper, including experimentation, data collection, and manuscript preparation. The second author supervised the entire process, including consultation on the experiments, evaluating the results, and reviewing versions of the manuscript. The third author also supervised the entire process, including scheduling of research taskes, data provision, and hardware and computing resources support.

Funding: This work, in part, was supported by the China Scholarship Council, grant number. 201406270075, and by the National Key Research and Development Program of China, grand number 2017YFB0503802.

Acknowledgments: The authors thank Bin Li for his participation in the middle and the final stage discussions of this paper. The authors thank Zhipeng Gui, Zelong Yang, and Xu Gao for providing access to the computing resources on the team’s private cloud. The authors also thank anonymous reviewers for their suggestions to ISPRS Int. J. Geo-Inf. 2018, 7, 476 15 of 25 improve this paper.

Conflicts of Interest: The authors declare no conflict of interests. Appendix A. Bivariate Regression Results Appendix A. Bivariate Regression Results

(a) (b) (c) Figure A1. Bivariate regression results of 𝜌 as a function of 𝜌 for three 70-by-70 (𝑛 = 4900) FigureFigure A1.A1.Bivariate Bivariate regression regression results results of ρˆofMC 𝜌as aas function a function of ρˆ forof three𝜌 for 70-by-70 three 70-by-70 (n = 4900 (𝑛) regular = 4900) tessellations:regular tessellations: (a) square (a) rook square adjacency rook adjacency case; (b) squarecase; (b queen) square adjacency queen case;adjacency and, (case;c) hexagonal and, (c) casehexagonal data. case data.

(a) (b) (c) Figure A2. Bivariate regression results of the asymptotic variance of ρˆ as a function of its theoretical version for three 150-by-150 (n = 22, 500 ) regular tessellations: (a) square rook adjacency case; (b) square queen adjacency case; and, (c) hexagonal case.

Appendix B. Components of the Denominator of the Asymptotic Variance of ρˆ at Zero for Different Landscapes

Table A1. Components of the denominator of Equation (7).

n n c n 2 Landscape ∑i=11/∑j=1 ij ∑i=1λi Square rook [3PQ + 2(P + Q) + 4]/12 [18PQ + 11(P + Q) + 12]/72 Square queen [15PQ + 18(P + Q) + 28]/120 [300PQ + 279(P + Q) + 234]/2400 Hexagonal (5PQ + 5P + 6Q + 8)/30 (30PQ + 25P + 24Q + 23)/180

Appendix C. Basic Information about the Simulation Experiments Because calculating the exact Jacobian is very time consuming, and nonlinear calculation iterations for the conventional MLEs are slow, sample sizes larger than 10-by-10 did not employ the pure SAR model with an exact Jacobian term and the conventional MLE implementation (except for the square queen 15-by-15 and 20-by-20 cases (SQ15-20), and the hexagonal 15-by-15 to 40-by-40 cases (H15-40)). Hence, the number of methods for all three cases using a 10-by-10 lattice is 3 (Table A2). For the square rook 15-by-15 to 125-by-125, two approximated Jacobian terms, plus Griffith’s new MLE equations [35] were employed. For the square queen 15-by-15, 20-by-20, and hexagonal 15-by-15 to 40-by-40, cases, an approximated Jacobian term plus the new MLE equations, as well as an approximated Jacobian term plus the conventional MLE implementation, were used; thus, the number of methods for these cases is two. The remaining square queen and hexagonal cases employed an approximated Jacobian term plus the new MLE equations; thus, the number of method here is one. ISPRS Int. J. Geo-Inf. 2018, 7, 476 16 of 25

Table A2. Simulation experiments employing ρ within its feasible range.

Regular Square Rook Regular Square Queen Regular Hexagonal Tessellation Sample Number of Number of Sample Number of Number of Sample Number of Number of Size Methods Replications Size Methods Replications Size Methods Replications 10-by-10 3 3*21*10,000 10-by-10 3 3*29*10,000 10-by-10 3 3*28*10,000 15-by-15 2 2*21*10,000 15-by-15 2 2*29*10,000 15-by-15 2 2*28*10,000 20-by-20 2 2*21*10,000 20-by-20 2 2*29*10,000 20-by-20 2 2*28*10,000 25-by-25 2 2*21*10,000 25-by-25 1 29*10,000 25-by-25 2 2*28*10,000 30-by-30 2 2*21*10,000 30-by-30 1 29*10,000 30-by-30 2 2*28*10,000 35-by-35 2 2*21*10,000 35-by-35 1 29*10,000 35-by-35 2 2*28*10,000 40-by-40 2 2*21*10,000 40-by-40 1 29*10,000 40-by-40 2 2*28*10,000 45-by-45 2 2*21*10,000 45-by-45 1 29*10,000 45-by-45 1 28*10,000 50-by-50 2 2*21*10,000 50-by-50 1 29*10,000 50-by-50 1 28*10,000 55-by-55 2 2*21*10,000 55-by-55 1 29*10,000 55-by-55 1 28*10,000 60-by-60 2 2*21*10,000 60-by-60 1 29*10,000 60-by-60 1 28*10,000 65-by-65 2 2*21*10,000 65-by-65 1 29*10,000 65-by-65 1 28*10,000 70-by-70 2 2*21*10,000 70-by-70 1 29*10,000 70-by-70 1 28*10,000 75-by-75 2 2*21*10,000 75-by-75 1 29*10,000 75-by-75 1 28*10,000 80-by-80 2 2*21*10,000 80-by-80 1 29*10,000 80-by-80 1 28*10,000 85-by-85 2 2*21*10,000 85-by-85 1 29*10,000 85-by-85 1 28*10,000 90-by-90 2 2*21*10,000 90-by-90 1 29*10,000 90-by-90 1 28*10,000 95-by-95 2 2*21*10,000 95-by-95 1 29*10,000 95-by-95 1 28*10,000 100-by-100 2 2*21*10,000 100-by-100 1 29*10,000 100-by-100 1 28*10,000 105-by-105 2 2*21*10,000 105-by-105 1 29*10,000 105-by-105 1 28*10,000 110-by-110 2 2*21*10,000 110-by-110 1 29*10,000 110-by-110 1 28*10,000 115-by-115 2 2*21*10,000 115-by-115 1 29*10,000 115-by-115 1 28*10,000 120-by-120 2 2*21*10,000 120-by-120 1 29*10,000 120-by-120 1 28*10,000 125-by-125 2 2*21*10,000 125-by-125 1 29*10,000 125-by-125 1 28*10,000

Appendix D. Correct Derivation of the Jacobian Term for SAS For a pure SAR model (Equation (2)), the nonlinear model executed in SAS is of the form [33]:

Y ∗ Jac = [ρWY + (1 − ρ)β01 + ε] ∗ Jac. (A1)

Taking the derivative with respect to ρ on both sides of Equation (A1), and considering

∂(Y ∗ Jac) ∂Y ∂Jac = Jac + Y ∗ ∂ρ ∂ρ ∂ρ yields ∂Y ∂Jac Jac = [ρWY + (1 − ρ)β 1 − Y] + (WY − β 1) ∗ Jac. (A2) ∂ρ 0 ∂ρ 0 The default derivative of the Jacobian term employed in SAS is ∂(Y ∗ Jac)/∂ρ; however, what should be calculated is Equation (A2)2.

Appendix E. Asymptotic Normality of the Sampling Distribution of the Spatial Autocorrelation (SA) Parameter ρ Considering the prevalence of positive SA latent in geographical datasets, selecting frequency distributions of positive parameters with weak (ρ = 0.3), moderate (ρ = 0.7), and strong (ρ = 0.95) SA is representative. Figure A3a–c include these plots for the 10-by-10 square rook, square queen, and hexagonal cases, respectively.

2 There is a typo in [36] (p. 864), the brace needs to be removed. ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 16 of 25

For a pure SAR model (Equation (2)), the nonlinear model executed in SAS is of the form [33]:

𝒀∗𝑱𝒂𝒄 = 𝜌𝑾𝒀 + (1−𝜌)𝛽𝟏+𝜺] ∗ 𝑱𝒂𝒄. (A1) Taking the derivative with respect to 𝜌 on both sides of Equation (A1), and considering 𝜕(𝒀 ∗ 𝑱𝒂𝒄) 𝜕𝒀 𝜕𝑱𝒂𝒄 = 𝑱𝒂𝒄 +𝒀∗ 𝜕𝜌 𝜕𝜌 𝜕𝜌 yields 𝜕𝒀 𝜕𝑱𝒂𝒄 𝑱𝒂𝒄 = 𝜌𝑾𝒀 + (1−𝜌)𝛽 𝟏−𝒀] + (𝑾𝒀 −𝛽 𝟏) ∗ 𝑱𝒂𝒄. (A2) 𝜕𝜌 𝜕𝜌 The default derivative of the Jacobian term employed in SAS is 𝜕(𝒀 ∗ 𝑱𝒂𝒄)⁄ 𝜕𝜌; however, what should be calculated is Equation (A2)2.

Appendix E. Asymptotic Normality of the Sampling Distribution of the Spatial Autocorrelation (SA) Parameter 𝝆 Considering the prevalence of positive SA latent in geographical datasets, selecting frequency distributions of positive parameters with weak (𝜌=0.3), moderate (𝜌=0.7), and strong (𝜌 = 0.95) ISPRSSA is Int. representative. J. Geo-Inf. 2018, 7 ,Figures 476 A3a–c include these plots for the 10-by-10 square rook, square queen,17 of 25 and hexagonal cases, respectively.

(a) (b)

(c)

FigureFigure A3.A3. FrequencyFrequency distributionsdistributions ofof weak,weak, moderate,moderate, andand strongstrong positivepositive SASA parameters:parameters: ((aa)) squaresquare tessellation,tessellation, rookrook adjacency;adjacency; ( (bb)) square square tessellation, tessellation, queen queen adjacency; adjacency; and, and, ( c(c)) hexagonal hexagonal adjacency. adjacency.

Simulation experiments were conducted to obtain these distributions, which become thinner and taller as the preset ρ value changes from 0.3 to 0.95. Estimated and standard deviations are shown in the legends, and all means are marked on the /density curve plots as red reference2 There is a lines.typo in [36] Despite (p. 864), their the brace widths needs and to be heights, removed. all of these distributions are bell-shaped; thus, the SA parameter ρ is asymptotic normally distributed, with changing mean and variance across its feasible range. ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 17 of 25

Simulation experiments were conducted to obtain these distributions, which become thinner and taller as the preset 𝜌 value changes from 0.3 to 0.95. Estimated means and standard deviations are shown in the legends, and all means are marked on the histogram/density curve plots as red ISPRSreference Int. J. Geo-Inf.lines. 2018Despite, 7, 476 their widths and heights, all of these distributions are bell-shaped; thus,18 of the 25 SA parameter 𝜌 is asymptotic normally distributed, with changing mean and variance across its feasible range. Appendix F. ρ versus the Variance of ρˆ: Simulation Results for Three Adjacencies Appendix F. 𝝆 versus the Variance of 𝝆: Simulation Experiment Results for Three Adjacencies

20-by-20 30-by-30 40-by-40

50-by-50 60-by-60 70-by-70

80-by-80 90-by-90 100-by-100

110-by-110 120-by-120 125-by-125

FigureFigure A4.A4. TheoreticalTheoretical variancesvariances (orange)(orange) versusversus exactexact variancesvariances (red(red andand green)green) obtainedobtained byby approximatingapproximating the the Jacobian Jacobian term term and and employing employing new new maximum maximum likelihood likelihood estimators estimators (MLEs) (MLEs) with with differentdifferent sample sample sizes sizes for for the the square square tessellation tessellation with with rook rook adjacency. adjacency.

ISPRS Int. J. Geo-Inf. 2018, 7, 476 19 of 25 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 18 of 25

20-by-20 30-by-30 40-by-40

50-by-50 60-by-60 70-by-70

80-by-80 90-by-90 100-by-100

110-by-110 120-by-120 125-by-125

FigureFigure A5. A5. TheoreticalTheoretical variancesvariances (orange)(orange) versusversus exactexact variancesvariances (red(red andand green)green) obtainedobtained byby approximatingapproximating the the Jacobian Jacobian term term and and employing employing new new MLEs MLEs with differentwith different sample sample sizes for sizes the squarefor the tessellationsquare tessellation with queen with adjacency. queen adjacency.

ISPRS Int. J. Geo-Inf. 2018, 7, 476 20 of 25 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 19 of 25

20-by-20 30-by-30 40-by-40

50-by-50 60-by-60 70-by-70

80-by-80 90-by-90 100-by-100

110-by-110 120-by-120 125-by-125

FigureFigure A6. A6. TheoreticalTheoretical variancesvariances (orange)(orange) versusversus exactexact variancesvariances (red(red andand green)green)obtained obtainedby by approximatingapproximating the the Jacobian Jacobian term term and and employing employing new MLEsnew MLEs with differentwith different sample sample sizes for sizes the squarefor the tessellationsquare tessellation with hexagonal with hexagonal adjacency. adjacency.

AppendixAppendix G. G. The The Sampling Sampling Distribution Distribution of of the the SA SA Parameter Parameterρ for𝝆 for the the Wuhan Wuhan Census Census Blocks BlocksDataset Dataset TheThe 184 184 census census blocks blocks of Wuhan of Wuhan do not constitutedo not cons a regulartitute tessellation,a regular sotessellation, sampling distributionsso sampling obtaineddistributions with theobtained three ideal with configurations the three ideal are configurations no longer suitable are forno thislonger irregular suitable surface for this partitioning irregular case.surface Thus, partitioning experiments case. need Thus, to be experiments conditional onneed this to empirical be conditional configuration on this in empirical order to constructconfiguration the correctin order sampling to construct distribution. the correct The sampling same procedures distributi ason. those The insame Section procedures3 are repeated. as those The in Section theoretical 3 are repeated. The theoretical functions for 𝜌versus the MC, and for its sampling variance versus 𝜌, are expressed by Equations (A3) and (A4). 4.3462 0.9619 𝜌 = + , (A3) 1+𝑒|.|. −0.7646

ISPRS Int. J. Geo-Inf. 2018, 7, 476 21 of 25 functions for ρˆ versus the MC, and for its sampling variance versus ρ, are expressed by Equations (A3) and (A4). 4.3462 0.9619 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEERρˆWuhan REVIEW= 8.3721 + , 20 (A3) of 25 1 + e|MC−0.9839| −0.7646 1.0229 1.4125 Var(ρˆ ) = 0.0761 ∗ G .∗ (1 − G) ., (A4) 𝑉𝑎𝑟(Wuhan𝜌) = 0.0761 ∗ 𝐺 ∗ (1−𝐺) , (A4) G = ( ˆ − ) ( − ) = = − wherewhere 𝐺=(𝜌ρWuhan −11/λ⁄ 𝜆min/)(1𝜆⁄1/⁄λmax −11/⁄λ 𝜆min); for; for this this landscape, landscape,λmax 𝜆1, and=1λmin, and 𝜆0.7646 =. 2 Figure−0.7646 A7. Figurepresents A7 their presents fitted curves,their fitted where curves, the R wherefor the the logistic 𝑅 curvefor the and logistic the beta curve curve and with the equal beta parameterscurve with largerequal parameters than one are larger 0.9975 than and one 0.9993, are 0.9975 respectively. and 0.9993, respectively.

(a1) (a2)

(b1) (b2)

FigureFigure A7.A7. CurveCurve fitting fitting plots plots and and bivariate bivariate regression regression plots plots for for Wuhan Wuhan census census blocks blocks data. data. (a1,a2 (a1) The,a2) originalThe original MC- ρˆMC-scatter𝜌 (blue) plot superimposed(blue) superimposed on the theoreticalon the theoretical MC-ρ curve MC- (red),𝜌 curve and (red), a bivariate and a regressionbivariate regression scatter plot scatter for the plot original for the and original predicted and ρpredicted;(b1,b2) the𝜌; ( originalb1,b2) theρˆ -var( originalρˆ ) scatter 𝜌-var( plot𝜌) scatter (blue) superimposedplot (blue) superimposed on the theoretical on theρ ˆtheoretical-var(ρˆ ) curve 𝜌-var( (red),𝜌) curve and a (red), bivariate and regressiona bivariate scatter regression plot forscatter the originalplot for andthe original predicted and Var( predictedρˆ ). Var(𝜌).

Appendix H. A Data Analysis for One Simulated and Two Empirical Examples Appendix H. A Data Analysis for One Simulated and Two Empirical Examples The simulated data were sampled from a standard normal distribution (Figure A8a) within The simulated data were sampled from a standard normal distribution (Figure A8a) within the the interval [–4,4]. The accompanying quantile-quantile (Q-Q) normal plot of the regression model interval [–4,4]. The accompanying quantile-quantile (Q-Q) normal plot of the regression model residuals (Figure A8b) conforms almost perfectly to its theoretical normal counterpart. The symmetry residuals (Figure A8b) conforms almost perfectly to its theoretical normal counterpart. The along the horizontal axis of the scatter plot of model residual-predicted values (Figure A8c) indicates symmetry along the horizontal axis of the scatter plot of model residual-predicted values (Figure homoscedaticity of the residuals; thus, results estimated by the model are reliable. A8c) indicates homoscedaticity of the residuals; thus, results estimated by the model are reliable.

ISPRSISPRS Int.Int. J.J. Geo-Inf.Geo-Inf. 20182018,, 77,, 476x FOR PEER REVIEW 2122 of 25 25 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 21 of 25

(a) (b) (c) (a) (b) (c) Figure A8. Statistical properties of simulated data and their regression residuals: (a) a histogram of Figure A8.A8. StatisticalStatistical propertiesproperties ofof simulatedsimulated datadata andand theirtheir regressionregression residuals:residuals: (a)) aa histogramhistogram ofof simulated data; (b) a Q-Q plot of the SAR model residuals; and, (c) a scatter plot of residual-predicted simulatedsimulated data;data; ((bb)) aa Q-QQ-Q plotplot of the SAR model residuals; and, (c) a scatter plot of residual-predicted values for assessing the homoscedasticity of model residuals. valuesvalues forfor assessingassessing thethe homoscedasticityhomoscedasticity ofof modelmodel residuals.residuals. Things are different for the two empirical cases. For the Wuhan census blocks data, the young Things are different for thethe twotwo empiricalempirical cases.cases. For the Wuhan censuscensus blocks data, the young population ratio range is 7.83, 39.74], the mean and the standard deviation are 18.33 and 4.67, populationpopulation ratio range is [7.83,7.83, 39.74]39.74],, thethe meanmean andand thethe standardstandard deviation are 18.33 and 4.67, respectively, and the is 17.93, which indicates right . An exponential respectively,respectively, andand the the median median is 17.93, is which17.93, indicateswhich rightindicates skewness. right An skewness. exponential An transformation exponential improvestransformation the Shapiro-Wilk improves the normality Shapiro-Wilk diagnostic normality statistic diagnostic from 0.91 statistic to 0.99, from and 0.91 the to p0.99,-value and from the transformation improves the Shapiro-Wilk normality diagnostic statistic from 0.91 to 0.99, and the p-value −from9 5.3 × 10 to 0.06, which indicates a normal distribution. The normal Q-Q plot (Figure 5.3p-value× 10 fromto 5.3 0.06, × 10which to indicates0.06, which a normalindicates distribution. a normal distribution. The normal The Q-Q normal plot Q-Q (Figure plot A9 (Figureb) of A9b) of the regression residuals shows deviation in the left tail; more specifically, three points theA9b) regression of the regression residuals residuals shows deviation shows deviation in the left in tail; the more left specifically,tail; more specifically, three points three deviate points far deviate far from the reference line. The approximated symmetry along the horizontal axis of the fromdeviate the far reference from the line. reference The approximated line. The approximated symmetry alongsymmetry the horizontal along the axishorizontal of the scatteraxis of plotthe scatter plot of residual-predicted values (Figure A9c) indicates that the model residuals are not ofscatter residual-predicted plot of residual-predicted values (Figure values A9c) indicates(Figure A9c) that indicates the model that residuals the model are not residuals heteroscedatic, are not heteroscedatic, meaning that results estimated by the model are be trustworthy. meaningheteroscedatic, that results meaning estimated that results by the esti modelmated are by be the trustworthy. model are be trustworthy.

(a) (b) (c) (a) (b) (c) FigureFigure A9.A9.Statistical Statistical properties properties of theof transformedthe transformed young young population population ratio and ratio its regressionand its regression residuals: Figure A9. Statistical properties of the transformed young population ratio and its regression (residuals:a) a histogram (a) a histogram of the exponential of the exponential transformed transformed young population young population ratio; (b) aratio; Q-Q (b plot) a Q-Q of the plot SAR of residuals: (a) a histogram of the exponential transformed young population ratio; (b) a Q-Q plot of modelthe SAR residuals; model and,residuals; (c) a scatter and, plot(c) a of scatter residual-predicted plot of residual-predicted values for assessing values the for homoscedasticity assessing the the SAR model residuals; and, (c) a scatter plot of residual-predicted values for assessing the ofhomoscedasticity model residuals. of model residuals. homoscedasticity of model residuals. ForFor thethe YellowYellow Mountain Mountain region region image, image, the the range range of itsof NDVIits NDVI is [− is0.2318, −0.2318, 0.3252 0.3252]], the mean, the and For the Yellow Mountain region image, the range of its NDVI is −0.2318, 0.3252], the standardmean and deviation standard are deviation 0.1107 and are 0.0941, 0.1107 respectively, and 0.0941, andrespectively, the median and is 0.1287,the median which is indicates 0.1287, left mean and standard deviation are 0.1107 and 0.0941, respectively,λ∗NDVI and the median is 0.1287, skewness;which indicates thus, the left exponential skewness; transformation thus, the exponentiala + b ∗ e transformationwas applied 𝑎+𝑏∗𝒆 to the raw∗ data, was and the which indicates left skewness; thus, the exponential transformation 𝑎+𝑏∗𝒆∗ was nonlinearapplied to regression the raw procedure data, and calculated the nonlinear estimates maximizingregression procedure the likelihood calculated are aˆ = − estimates2.70, bˆ = 1.46, applied to the raw data, and the nonlinear regression procedure calculated estimates andmaximizingλˆ = 4.78 .the The likelihood KS normality are 𝑎 test = −2.70, for the 𝑏 original= 1.46, and NDVI 𝜆 =4.78 as well. asThe for KS the normality transformed test for data the yields maximizing the likelihood are 𝑎 = −2.70, 𝑏 = 1.46, and 𝜆 =4.78. The KS normality test for the aoriginal KS statistic NDVI that as well decreases as for from the transformed 0.4250 to 0.0238, data withyields respective a KS statistic p-values that increasingdecreases from less original NDVI− as16 well as for −the5 transformed data yields a KS statistic that decreases from than0.42502.20 to ×0.0238,10 towith2.33 respective× 10 . Thep-values histogram, increasing as well from as theless densitythan 2.20 curve × 10 of the to transformed 2.33 × 0.4250 to 0.0238, with respective p-values increasing from less than 2.20 × 10 to 2.33 × NDVI10. The appearing histogram, in Figure as well A10 asa, the indicates density an curve approximate of the transformed normal distribution. NDVI appearing The pure in SARFigure model 10. The histogram, as well as the density curve of the transformed NDVI appearing in Figure descriptionA10a, indicates of this an transformedapproximate NDVI normal was distribu estimated,tion. The and pure then itsSAR residuals model description were assessed. of this A Q-Q A10a, indicates an approximate normal distribution. The pure SAR model description of this plottransformed for the residuals NDVI was appears estimated, in Figure and A10 thenb that its hasresiduals a conspicuous were assessed. deviation A fromQ-Q theplot Q-Q for plotthe line transformed NDVI was estimated, and then its residuals were assessed. A Q-Q plot for the occuringresiduals in appears the right in tail Figure of the A10b quantile. that Slighthas a heteroscedaticity conspicuous deviation of model from residuals the Q-Q (the rangeplot line widens residuals appears in Figure A10b that has a conspicuous deviation from the Q-Q plot line aoccuring little after in −the1) canright be tail detected of the inquantile. the residual-predicted Slight heteroscedaticity scatter plot of model (Figure residuals A10c). (the range occuring in the right tail of the quantile. Slight heteroscedaticity of model residuals (the range widens a little after −1) can be detected in the residual-predicted scatter plot (Figure A10c). widens a little after −1) can be detected in the residual-predicted scatter plot (Figure A10c).

ISPRS Int. J. J. Geo-Inf. 2018, 7, x 476 FOR PEER REVIEW 2223 of of 25 ISPRS Int. J. Geo-Inf. 2018, 7, x FOR PEER REVIEW 22 of 25

(a) (b) (c) (a) (b) (c) Figure A10. Improved normality of transformed normalized difference vegetation index (NDVI) and FigureFigure A10.A10.Improved Improved normality normality of of transformedtransformed normalized normalized difference difference vegetation vegetation index index (NDVI) (NDVI) and and regression residuals: (a) a histogram of the Exponential transformed NDVI; (b) a Q-Q plot of the SAR regressionregression residuals: residuals: ( a(a)) a a histogram histogram of of the the Exponential Exponential transformed transformed NDVI; NDVI; ( b(b)) a a Q-Q Q-Q plot plot of of the the SAR SAR model residuals; and, (c) a scatter plot of residual-predicted values for assessing the modelmodel residuals; residuals; and, and, (c) a scatter(c) a plotscatter of residual-predicted plot of residual-pre valuesdicted for assessing values the for homoscedasticity assessing the homoscedasticity of model residuals. ofhomoscedasticity model residuals. of model residuals. Appendix I. Statistical Power Visualization ofof DifferentDifferent SASA StatisticsStatistics inin aa SingleSingle PlotPlot Appendix I. Statistical Power Visualization of Different SA Statistics in a Single Plot One ofof thethe applications applications of of this this paper’s paper’s contribution contribution is the is visualizationthe visualization of statistical of statistical power power curves One of the applications of this paper’s contribution is the visualization of statistical power curvesof different of different SA statistics/parameters. SA statistics/parameters. The Geary The Geary Ratio (GR,Ratio also (GR, known also known as Geary’s as Geary’s c) power c) power curve curves of different SA statistics/parameters. The Geary Ratio (GR, also known as Geary’s c) power curvecan be can constructed be constructed as well. as well. In Figure In Figure A11, A11, the green the green depicts depicts the MC’sthe MC’s power, power, the redthe red presents presents the curve can be constructed as well. In Figure A11, the green depicts the MC’s power, the red presents theGR’s GR’s power, power, and and the bluethe blue shows shows the SARthe SARρ’s power.𝜌’s power. These These graphics graphics indicate indicate that ρthatis uniformly 𝜌 is uniformly more the GR’s power, and the blue shows the SAR 𝜌’s power. These graphics indicate that 𝜌 is uniformly powerfulmore powerful than thethan MC the and MC theand GR, the whileGR, while the MC the isMC uniformly is uniformly more more powerful powerful than than the the GR GR only only for more powerful than the MC and the GR, while the MC is uniformly more powerful than the GR only positivefor positive SA forSA thefor the hexagonal hexagonal tessellation. tessellation. for positive SA for the hexagonal tessellation.

(a) (b) (c) (a) (b) (c) Figure A11. Statistical power curves curves for for the the MC MC (green), (green), the the GR GR (red), (red), and and the the SAR SAR 𝜌ρ (blue):(blue): (a) the Figure A11. Statistical power curves for the MC (green), the GR (red), and the SAR 𝜌 (blue): (a) the 7-by-7 squaresquare tessellation,tessellation, rook rook adjacency adjacency case; case; (b )(b the) the 7-by-7 7-by-7 square square tessellation, tessellation, queen queen adjacency adjacency case; 7-by-7 square tessellation, rook adjacency case; (b) the 7-by-7 square tessellation, queen adjacency case;and, (and,c) the (c 7-by-7) the 7-by-7 hexagonal hexagonal tessellation tessellation case. case. case; and, (c) the 7-by-7 hexagonal tessellation case. References

1. Li, H.; Calder, C.A.; Cressie, N. Beyond Moran’s I: Testing for Spatial Dependence Based on the Spatial Autoregressive Model. Geogr. Anal. 2007, 39, 357–375. [CrossRef] 2. Robinson, P.M.; Rossi, F. Refined tests for spatial correlation. Econom. Theory 2015, 31, 1249–1280. [CrossRef] 3. López, F.; Matilla-García, M.; Mur, J.; Marín, M.R. A non-parametric spatial independence test using symbolic entropy. Reg. Sci. Urban Econ. 2010, 40, 106–115. [CrossRef] 4. Gotelli, N.J.; Ulrich, W. Statistical challenges in null model analysis. Oikos 2012, 121, 171–180. [CrossRef] 5. Griffith, D.A. Spatial statistics: A quantitative geographer’s perspective. Spat. Stat. 2012, 1, 3–15. [CrossRef] 6. Bivand, R. Package ‘spdep’. Available online: https://cran.r-project.org/web/packages/spdep/spdep.pdf (accessed on 24 October 2018). 7. Hong, Y.; White, H. Asymptotic Distribution Theory for Nonparametric Entropy Measures of Serial Dependence. Econometrica 2005, 73, 837–901. [CrossRef] 8. Matilla-García, M.; Marín, M.R. A non-parametric independence test using permutation entropy. J. Econom. 2008, 144, 139–155. [CrossRef] 9. Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Analysis; Wiley: Hoboken, NJ, USA, 2012. 10. Burt, J.E.; Barber, G.M.; Rigby, D.L. Elementary Statistics for Geographers; Guilford Publications: New York, NY, USA, 2009.

ISPRS Int. J. Geo-Inf. 2018, 7, 476 24 of 25

11. Bewick, V.; Cheek, L.; Ball, J. Statistics review 7: Correlation and regression. Crit. Care 2003, 7, 451–459. [CrossRef] 12. Provost, B.S. Closed-Form Representations of the Density Function and Integer Moments of the Sample Correlation Coefficient. Axioms 2015, 4, 268–274. [CrossRef] 13. Ames, E.; Reiter, S. Distributions of Correlation Coefficients in Economic Time Series. J. Am. Stat. Assoc. 1961, 56, 637–656. [CrossRef] 14. Lee, M.Y. The Effect of Nonzero Autocorrelation Coefficients on the Distributions of Durbin-Watson Test Estimator: Three Autoregressive Models. Expert J. Econ. 2014, 2, 85–99. 15. Mead, R. A Mathematical Model for the Estimation of Inter-Plant Competition. Biometrics 1967, 23, 189–205. [CrossRef][PubMed] 16. Moran, P.A.P. Notes on Continuous Stochastic Phenomena. Biometrika 1950, 37, 17–23. [CrossRef][PubMed] 17. Lesage, J.; Kelly Pace, R. Introduction to Spatial Econometrics; CRC Press: Boca Raton, FL, USA, 2009; Volume 1. 18. Cliff, A.D.; Ord, J.K. Spatial Processes: Models & Applications; Pion: London, UK, 1981. 19. Griffith, D.A.; Layne, L.J. A Casebook for Spatial Statistical Data Analysis: A Compilation of Analyses of Different Thematic Data Sets; Oxford University Press: Oxford, UK, 1999. 20. Ver Hoef, J.M.; Hanks, E.M.; Hooten, M.B. On the relationship between conditional (CAR) and simultaneous (SAR) autoregressive models. Spat. Stat. 2018, 25, 68–85. [CrossRef] 21. Cliff, A.D.; Ord, J.K. Spatial Autocorrelation; Pion Limited: London, UK, 1973. 22. Sokal, R.R.; Oden, N.L. Spatial autocorrelation in biology: 1. Methodology. Biol. J. Linn. Soc. 1978, 10, 199–228. [CrossRef] 23. Sokal, R.R.; Oden, N.L. Spatial autocorrelation in biology: 2. Some biological implications and four applications of evolutionary and ecological interest. Biol. J. Linn. Soc. 1978, 10, 229–249. [CrossRef] 24. Legendre, P. Spatial Autocorrelation: Trouble or New Paradigm? Ecology 1993, 74, 1659–1673. [CrossRef] 25. Auchincloss, A.H.; Gebreab, S.Y.; Mair, C.; Diez Roux, A.V. A Review of Spatial Methods in Epidemiology, 2000–2010. Annu. Rev. Public Health 2012, 33, 107–122. [CrossRef] 26. Anselin, L.; Florax, R.; Rey, S.J. Advances in Spatial Econometrics: Methodology, Tools and Applications; Springer: Berlin/Heidelberg, Germany, 2004. 27. Mustafa, A.; Van Rompaey, A.; Cools, M.; Saadi, I.; Teller, J. Addressing the determinants of built-up expansion and densification processes at the regional scale. Urban Stud. 2018, 55, 3279–3298. [CrossRef] 28. Goodchild, M.F. Geographical information science. Int. J. Geogr. Inf. Syst. 1992, 6, 31–45. [CrossRef] 29. Bartlett, M.S. The Statistical Analysis of Spatial Pattern; Chapman and Hall: Boca Raton, FL, USA, 1975. 30. Whittle, P. On stationary processes in the plane. Biometrika 1954, 41, 434–449. [CrossRef] 31. Ord, K. Estimation Methods for Models of Spatial . J. Am. Stat. Assoc. 1975, 70, 120–126. [CrossRef] 32. Cressie, N.A.C. Statistics for Spatial Data; Wiley: Hoboken, NJ, USA, 1993. 33. Griffith, D.A. Estimating Spatial Autoregressive Model Parameters with Commercial Statistical Packages. Geogr. Anal. 1988, 20, 176–186. [CrossRef] 34. Griffith, D.A. Simplifying the normalizing factor in spatial autoregressions for irregular lattices. Pap. Reg. Sci. 1992, 71, 71–86. [CrossRef] 35. Griffith, D.A. Approximation of Gaussian spatial autoregressive models for massive regular square tessellation data. Int. J. Geogr. Inf. Sci. 2015, 29, 2143–2173. [CrossRef] 36. Griffith, D.A. Faster maximum likelihood estimation of very large spatial autoregressive models: an extension of the Smirnov–Anselin result. J. Stat. Comput. Simul. 2004, 74, 855–866. [CrossRef] 37. Griffith, D.A. Extreme eigenfunctions of adjacency matrices for planar graphs employed in spatial analyses. Linear Algebra Its Appl. 2004, 388, 201–219. [CrossRef] 38. Griffith, D.A. Eigenfunction properties and approximations of selected incidence matrices employed in spatial analyses. Linear Algebra Its Appl. 2000, 321, 95–112. [CrossRef] 39. Kelejian, H.H.; Prucha, I.R. A Generalized Moments Estimator for the Autoregressive Parameter in a Spatial Model. Int. Econ. Rev. 1999, 40, 509–533. [CrossRef] 40. Kelejian, H.H.; Prucha, I.R. Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. J. Econom. 2010, 157, 53–67. [CrossRef] 41. Walde, J.; Larch, M.; Tappeiner, G. Performance contest between MLE and GMM for huge spatial autoregressive models. J. Stat. Comput. Simul. 2008, 78, 151–166. [CrossRef] ISPRS Int. J. Geo-Inf. 2018, 7, 476 25 of 25

42. Luo, Q.; Griffith, D.A.; Wu, H. The Moran Coefficient and the Geary Ratio: Some Mathematical and Numerical Comparisons. In Advances in Geocomputation; Springer International Publishing: Cham, Switzerland, 2017; pp. 253–269. 43. Sahr, K.; White, D.; Kimerling, A.J. Geodesic Discrete Global Grid Systems. Cartogr. Geogr. Inf. Sci. 2003, 30, 121–134. [CrossRef] 44. Chun, Y.; Griffith, D.A. Spatial Statistics and : Theory and Applications for Geographic Information Science and Technology; SAGE Publications: Thousand Oaks, CA, USA, 2013. 45. Griffith, D.A. Spatial Autocorrelation and Spatial Filtering: Gaining Understanding Through Theory and Scientific Visualization; Springer: Berlin, Germany, 2003. 46. Jong, P.; Sprenger, C.; Veen, F. On Extreme Values of Moran’s I and Geary’s c. Geogr. Anal. 1984, 16, 17–24. [CrossRef] 47. Griffith, A.D.; Chun, Y. Spatial Autocorrelation and Uncertainty Associated with Remotely-Sensed Data. Remote Sens. 2016, 8, 535. [CrossRef] 48. Goslee, S.C. Behavior of Vegetation Sampling Methods in the Presence of Spatial Autocorrelation. Plant Ecol. 2006, 187, 203–212. [CrossRef]

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).