An application of Geographically Weighted Regression to Agricultural Data for Small Area Estimates

Chiara Bocci, Alessandra Petrucci and Emilia Rocco1

1 Dipartimento di Statistica “G. Parenti”, Universit`adegli Studi di Firenze, viale Morgagni, 59 – 50134 Firenze, e-mail: [email protected]fi.it, [email protected]fi.it, [email protected]fi.it

Abstract: The aim of the work is to evaluate if the knowledge of farms geograph- ical location used in a Geographically Weighted Regression model could produce accurate estimations of agricultural surface areas or productions in sub-regional domains.

Keywords: Small area estimation; spatial data; stationary process; geographi- cally weighted regression.

1 Introduction

The Italian Fifth Agricultural Census driven in year 2000 has given the op- portunity to locate the farms on the territory introducing a new challenge for statistical analysis. The geographical location of each farm can consti- tute a particularly useful information for the analysis of many phenomena concerning the agricultural field (Lipizzi, 2004; Bocci et al., 2005). In particular, in this paper we underline the usefulness of the farms geo- graphical location for the analysis of non stationary spatial phenomena. “Global” dependence models, such as the classical regression model, assume the independence of the phenomenon from the data spatial location. For many agricultural phenomena the application of one of these models could bring to wrong conclusions and generate spatially autocorrelated residuals. Hence the need to apply statistical techniques able to take into account the spatial variability of the phenomenon. The technique we considered in this paper is the Geographically Weighted Regression (GWR), a specific model which allows to represent non station- ary local phenomena (Fotheringham et al., 2002). We applied the GWR methodology in order to estimate, at municipal level, agricultural surface area or production for main cultivations of each area using sample data similar to the current Farm Structure Survey data. First results from the estimation of the area allocated to grapevine in the 44 municipalities of the province suggest the opportunity of a 2 GWR for Small Area Estimates thorough examination of this application.

2 Methodology

GWR extends the traditional regression framework by allowing local rather than global parameters to be estimated, so that the classical regression model is rewritten as

p X yi = β0(ui, vi) + βk(ui, vi)xik + εi, i = 1, . . . , n (1) k=1 where (ui, vi) denotes the coordinates of the i-th point in space, βk(ui, vi) is a realisation of the continuous function βk(u, v) at point i, xi1, xi2, . . . , xip are the explanatory variables at point i and εi are error terms. For a given data set, local parameters βk(ui, vi) are estimated using the weighted least square procedure. The weights wij, for j = 1, . . . , n, at each location (ui, vi) are obtained as a continuous function of the distance between the point i and the other data points. Let   β0(u1, v1) β1(u1, v1) ··· βp(u1, v1)  . . .. .  β =  . . . .  (2) β0(un, vn) β1(un, vn) ··· βp(un, vn) be the matrix of the local parameters. Each row is estimated by

−1 βˆ(i) = XT W(i)X XT W(i)y, (3) where i = 1, . . . , p represents the row of the matrix (2), X is the matrix of explanatory variables, y is the dependent variable and W(i) is an n by n spatial weighting matrix of the form

W(i) = diag [wi1, wi2, . . . , win] . (4) The estimator in (3) is a least square estimator but the weighting matrix is not constant, hence W(i) has to be computed for each point i and the wij indicate the proximity of each data point to the location of i: data points closer to i carrying more weight in the estimation of the β(i) parameters then those farther away. We can note, however, that equation (3) can be estimated at any point, even at location where no data have been observed. The choice of the weighting scheme W(i) is a relevant step in GWR proce- dure. Several different weighting functions can be defined, the more com- mon kernels being the Gaussian and the bi-square weighting functions. For a comprehensive overview about kernel functions in GWR analysis see Fotheringham et al. (2002). Bocci et al. 3

A modified bi-square function taking into account only the N th nearest neighbours is

 [1 − (d /b)2]2 if j is one of the N th nearest neighbours of i w = ij (5) ij 0 otherwise where b is the distance to the N th nearest neighbour. This kernel function varies in space and presents an adaptive bandwidth depending on the data points density. Consequently the calibration of the model involves also the choice of N, the number of data point to be included in the estimation of local parameters. The appropriate bandwidth, or the appropriate value of N, can be obtained by a least square approach using the cross-validation criteria

n X 2 CV = [yi − yˆ6=i(b)] , (6) i=1 wherey ˆ6=i(b) is the fitted value of yi with the observations for point i omitted from the calibration process.

3 Application and results

As stated above, the objective of this work is to investigate how the infor- mation on the spatial location of the farms gathered by the Fifth Italian Agricultural Census can be used in a GWR model in order to predict, at municipal level, average (or total) agricultural surface area or production for a specific cultivation. The results obtained by a first application of this methodology to the es- timation of the agricultural area allocated to grapevines in the 44 munici- palities of the Florence province encourage to go on with the study. In this first application, the implemented procedure considers: 1. a simple random sample of 300 farms drawn from the census list; the sample size has been fixed roughly equal to the size of the sample used in the Farm Structure Survey (FFS, 2003). 2. as spatial information, the geographical coordinates of each farm’s ad- ministrative centre; the Euclidean distance between them is used as the distance between the corresponding data points. 3. with regard to the geographically weighted regression model, the bi- square weighting function with adaptive bandwidth, shown in (5). The number of nearest neighbour data points to be included within the cali- bration of the local model is determined from the cross-validation criteria, shown in (6). 4. among the available explicative variables collected by the Agricultural Census, only the utilised agricultural surface (SAU) and the European size unit (UDE) are statistically significant.

4 GWR for Small Area Estimates

5. the total municipal area allocated to grapevines is calculated by the sum over all the farms belonging to the same municipality. The obtained results are shown in Figure 1: for more than the 50% of the municipalities the error is less than 10% and the percentage increase up to 77% if an error less than 20% is considered. We can note that a deep study of the available explicative variables should be done in order to achieve more complete and stable results. In addition to the agricultural surface area, we intend to estimate also the production. Since the absence of a real data to compare with, a simulation study starting from the current Farm Structure Survey data will be carried out.

Cod Cod Comune

01 25 Londa

02 Barberino di Mu gello 26 Marrad i

03 Barberino Val D'elsa 27

04 28

05 30

06 Campi Bise nzio 31

08 Ca praia e Limite 32

10 33 Pontass ieve

11 35

12 36 Rignano s ull'

13 37

14 38 San Casciano in Val di Pesa

15 39

16 Figline V aldarno 40 San Piero a Si e ve

17 Firenze 41

18 Firenzuo la 42 Scarperia

19 43 Sesto Fior entino

20 Gambassi T erme 44

21 45 Tavarn elle Val di Pesa

22 46

23 Incisa in V al D'arno 49

24 50 Vinci

FIGURE 1. Error of the estimates of the total area allocated to grapevines for the 44 municipalities of the (with ISTAT codes).

References Bocci, C., Petrucci, A., Rocco, E. (2005). L’uso di informazioni di tipo spaziale in ambito agricolo: un’applicazione relativa al comune di Fiesole. In: Atti del convegno “Metodi d’Indagine e di Analisi per le Politiche Agricole”, Universit`adi , 21-22 ottobre 2004, 75-83. Fotheringham, A.S., Brunsdon, C., and Charlton, M. (2002). Geographically Weighted Regression: The Analysis of Spatially Varying Relation- ships. Chichester, UK: John Wiley & Sons Lipizzi, F. (2004). L’integrazione dei disegni territoriali del Censimento della popolazione e degli edifici e del Censimento dell’agricoltura. In: Atti del Convegno “L’Informazione Statistica e le Politiche Agricole (ISPA2004)”, Universit`adi Cassino, 6 maggio 2004, 109-118.