Dependence and Regimes in Applied Spatial Regression Analysis.

Paper to be presented at the 36. European Regional Science Association Congress, Zurich, 1996

Jørgen Lauridsen

Institute of Economics, University, DK-5230 Odense M. Fax +45 66158790 E-mail [email protected]

Abstract. Recent results have shown that the presence of geographical dependency among regions in a cross section has serious consequences for the reliability of traditional tests for structural (in)stability. In the present paper, it is illustrated how the Chow-test for switching regimes is affected by geographical dependency. Consequently, asymptotic variances of the Chow-test, which implement geographical correlation processes in the error term of a linear regression model, are set up. Applications of these Chow-tests are illustrated on models for explanation of municipal elderly care services in a cross section of 275 Danish municipalities. Derivation of different regimes structures, using natural regimes, univariate sorting and multivariate clustering are illustrated and evaluated. 1. Introduction.

In empirical regional research, it is usually assumed that the relations under consideration are stable over the spatial structure. Danish political research is no exception from this. Almost all investigations of intermunicipal service variation explains the intermunicipal service variation by linear regression of some service measure on a set of explanatory variables. In these policy- output investigations, the municipalities are assumed to be in an equilibrium state, such that a common model with fixed coefficients can be assumed for all municipalities. Opposed to this, Lauridsen (1995, 1996) provides evidence that there are strong heterogeneities in these models, as the error terms for the models seems to vary systematically with the explanatory variables.

In the present paper, focus is on models accounting for heterogeneity of a spatial nature. Specifically, spatial heterogeneity will be formulated as regime models. In connection to this, the implications of another spatial effect - spatial dependence among the spatial units - will be considered. It is well known - see Lauridsen (1995, 1996) and Anselin (1988) that the presence of spatially autocorrelated error terms migth have serious consequences for the estimated model parameters, their significance, and misspecification tests. For example, the power of the popular Breusch-Pagan test for heteroscedasticity is uncontrollable if the error terms are spatially autocorrelated. This paper aims to formulate and estimate regime models and to test these models aginst a common model by a spatial Chow-test, suggested by Anselin (1990), taking into account the implications of spatial dependence.

The paper consists of 6 parts. Following this introduction, part 2 reviews models which implements spatial effects in the form of spatial regimes and spatially autocorrelatederror terms. After this, a spatial Chow-test for structural regimes is presented in part 3. As the application of the Chow-test preassumes the regime structure to be defined in advance, part 4 will focus shortly on different heuristics for developing regime structures. In part 5, a practical case is considered, which is a set of models for explanation of the intermunicipal variation in elderly care. Following Lauridsen (1995, 1996), where strong evidence of spatial dependency and structural instability in these models is provided, structural regimes will be developed and evaluated by the spatial Chow-test. Finally, part 6 consists of a few conclusions and suggestions of future research.

2. Structural instability and spatially autocorrelated errors in regression models.

In linear regression models, based on cross section observations, spatial heterogeneity migth be defined in a variety of forms. The form consisted in this paper is specified by assuming a set of regression coefficients for each regime. In this layout, a test for homogeneity corresponds to a test for similar coefficients for the regimes.

Formally, the presence of spatially autocorrelated errors will disturbe such tests, as they are based on the assumption that the covariance matrix for the error terms is described by a diagonal matrix. Spatial autocorrelation implies that errors in contingent spatial units will have a non-zero covariance, breaking the assumption of a diagonal covariance matrix.

For the moment, it will be assumed that the regime structure is known; deduction of regime structures will be discussed in part 4.

1 On a spatial structure of n units, G spatial regimes are defined, each consisting of ng spatial units, g=1, 2, .. , G, such that n = n1 + n2 + .. + ng . Assuming k explanatory variables, of which one migth be a constant term, the unlimited model for each regime reads as

yg = Xgg$ + ,g , E(,gg', ) = Qg , g = 1, 2, .. , G, where

yg is a ng vector of observations for the dependent variable, Xg is a ng by k matrix of explanatory variables, $g is a kvector of regression coefficients, ,g is a ng vector of error terms, and Qg is a ng by ng matrix of variances and covariances.

Opposed to the unlimited model, the limited model, assuming equal regression coefficients for all regimes, reads as

yg = Xg$ + ,g , E(,gg', ) = Qg , g = 1, 2, .. , G, the only difference being that the vector of regression coefficients, $, is the same for each regime.

For notational simplicity, both models migth be written compactly as

(1) y = X$ + , , E(,',) = Q, defining for both models

y = (y1 ', .. , yG ')', , = (,1 ', .. , ,G') ', a n d Q = diag(Q1 , .. , QG).

For the unlimited model, define

X = diag(X1 , .. , XG ) , and $ = ($1 ', .. , $G') ' , whereas the according definitions for the limited model are

X = (X1 ', .. , XG ')' , and $ a k vector of coefficients.

Assuming the heterogeneity to be fully captured by the varying coefficients, the covariance matrix for the error terms for each regimes reads as

2 Qg = F Ing , g = 1 , .. , G ,

2 where F is the common error variance for all spatial units, and Ing is a ng dimensional identity

2 matrix. Following this assumption, the covariance matrix for the entire system becomes the well known spherical form

2 Q = diag(Q1 , .. , QG) = F In , in which case In is a n dimensional identity matrix, resulting in forms of models (1) which migth simply be estimated by Ordinary Least Squares regression (OLS). This is valid for the limited as well as the unlimited model.

If spatial dependence in the error terms are present, the method of OLS looses validity, as described in Lauridsen (1995), Anselin (1988). In this case, assuming intra- as well as interdependence among spatial units in consecutive regimes, the error term for the single spatial unit in (1) consists of an autoregressive, interdependent part and an independent part. Formally,

2 (2) , = 8W, + µ , µ distributed N(0,F I)n , where 8 is an autocorrelation parameter, and the dependencies are specified in the n by n matrix W, defined as

Wij = 1 , if regions i and j are assumed interdependent = 0, otherwise.

In this definition, the dependent parts of the error terms are specified in the n vector (W,), the i'th term of this product simply being the sum of the error terms in these regions with which region i is assumed to be interdependent. The autocorrelation coefficient 8, assumed to be numerically less than unit, measures the sensibility of the error made in the spatial region i upon the errors made in contingent spatial units. Row standardizing W, i.e. dividing each element in W by the sum of the elements in the corresponding row, the product (W,) has as i'th element the average - instead of the simple sum - of the errors in contingent regions. This definition of W is rather ad hoc; other specifications - giving rise to weighted averages in the product (W,) - migth be considered as well. Furthermore, the unsystematic - i.e. independent - error made in spatial unit i is accounted for as element i in the error vector µ.

A comfortable way of rewriting (2) is

, - 8W = µ , or

(In - 8W), = µ , or, with B = I - 8W ,

, = B-1 µ .

Assuming the heterogeneity to be fully captured in the varying regression coefficients - the $g's - the covariance matrix for the error term , becomes

3 Q = E(,,') = E((B-1µ)(B -1 µ)') = B-1 E(µµ') (B-1 )' = F2 (B'B)-1 .

It is obvious from the definition of B that this covariance matrix does not meet the assumptions of a diagonal structure, which is fundamental for a series of test procedures in standard non- spatial econometrics. This is also profound in tests for structural instability, which are in general based on the statistical distance between the estimated error terms for the unlimited and the limited models. For the case of the Chow-test, well known from a times series setup, nondiagonality of the Q matrix in the spatial setup prevents direct application of the standard F- distributed test in a spatial regimes context. This will be discussed further in the next section.

3. Testing structural instability : A spatial Chow-test.

The Chow-test for structural instability is based on the quadratic form

-1 -1 (3) CG = eLU 'F e L - eUU 'F e U , where eU and eL are the estimates for the error , in the unlimited and limited models respectivily, and FU is the estimate of the covariance matrix Q in the unlimited model. Essentially, this quadratic form measures the statistical distance between the sums of squared errors in the two models, relative to the variation in the unlimited model. From asymptotic theory, it is well known that - under the hypothesis of no structural instability - such a distance follows an asymptotic P2 distribution with q degrees of freedom, q being the number of restricted koefficients, which for the models in (1) gives q equals k.

2 2 In a traditional OLS setup, FU reduces to the diagonal form sU In , sU being the estimated common variance for the error terms. This reduces the quadratic form (3) to

2 (4) CG = (eLL 'e - eUU 'e )/s U , which is simply a ratio of two exact P2 distributions with k and n-2k degrees of freedom respectively. Therefore, divisions of the numerator and the denominator in (4) with their respective degrees of freedom gives the exact Chow-test

C = CG (n-2k)/k which follows an F-distribution with k and n-2k degrees of freedom under the hypothesis of no structural instability.

Obviously, this result is not valid in a spatial setup, in which case the nondiagonality of the estimated covariance matrix FU prevents the reduction from (3) to (4). Consequently, tests for structural instability in such a setup must be based directly on the asymptotic P2 distribution of (3).

4 4. Exploratory development and evaluation of alternative regimes - some suggestions.

Regimes in a cross section model may be defined due to several principles. A common feature for these principles is that they are of a rather ad-hoc nature. Consequently, application of alternative principles is called for in an actual investigation.

In the present paper, three methods for grouping of spatial observations in regimes will be considered and illustrated: Natural regimes, sorting by one characteristic variable, and multivariate clustering.

Natural regimes are regimes structures defined by political, geographical and other borderlines of an exogenously defined character. For example, the 275 Danish municipalities naturally groups into 14 regimes, according to the 14 Danish counties; a structure which migth be found similarly in most countries. The conceptual adequacy of such a grouping, of course, depends on the degree of diversity among regimes, with respect to the behaviour under consideration in the actual model.

Sorting by one characteristic variable gives rise to regimes which are quite diversified with respect to the sorting criteria or sorting variable. As well as in the case of natural regimes, the adequacy of such a regimes structure depends on the adequacy of the sorting criteria in diversifying the spatial units with respect to the behaviour under study. Therefore, the dependent variable or the OLS error terms are often used as criteria for such a sorting.

Apart from the problem of choosing sorting criteria, the choice of split points and consequently the number of regimes must be considered. A natural criteria is to search for a sufficiently high value for the ratio of the between-regimes sum of squared deviations to the overall sum of squared deviation.

Multivariate clustering of spatial units is based upon the same idea as the sorting principle. According to a set of grouping variables, a grouping of spatial units is searched such that the centroids of the groups have large statistical distances. Two principally different strategies is briefly outlined in the following. For a comprehensive treatment, see Sharma (1996), Johnson and Wichern (1992).

Strategy 1 : Hierarchical clustering:

Step 1: Let each observation be a cluster.

Step 2: Select the two clusters which have minimal distance, and merge them into one.

Step 3: Recalculate the cluster centroids and distances, and go to step 2.

Strategy 1 gives a hierarchy of cluster structures with n, n-1, n-2, .. , 2, 1 clusters of observations. The number of clusters, G, can be resolved by inspection of the cluster structures for different values of G. On one hand, the ratio of the between-group sum of squared distances to the within- group sum of squared distances should be high. On the other hand, the number of clusters should be reasonably low. Especially, clusters with too few observations migth cause inoperationality

5 in the estimation of the unlimited model and should consequently be avoided.

An alternative strategy is given in:

Strategy 2 : Non-hierarchical clustering (G-means clustering):

Step 1: Divide the spatial units into G arbitrary groups, and calculate the group centroids.

Step 2: If some unit is closer to another group, then move it to this group.

Step 3: Recalculate the group means, and go to step 2.

Strategy 2 results in a regimes structure with G regimes, the number G given in advance, for example by inspection of the output from a hierarchical clustering. Formally, the alternative name G-means clustering refers to a variant where G means are specified, and units assigned and reassigned iteratively according to these means, which migth be found by choosing a G-cluster solution from a nonhierarchical clustering output.

5. Regimes models for intermunicipal variation in elderly care.

In Lauridsen (1995), regression models for explanation of the intermunicipal service variations in the service areas of children daycare, public schools and elderly care are considered. As a general principle, OLS models were developed with service measures as dependent variables, and political, economic and socio-demographic variables as explanatory variables. For the elderly care service area, three service measures were considered:

EXP : Expenditure to elderly care per 65+ year old inhabitant, COV : Coverage; the number of resting home places per 100 65+ year old inhabitant, STF : Staff; the number of full-time (37 hours/week) employed personnels per 100 resting home places.

As explanatory variables, the following were considered:

SCARE : Staff in elderly care; the number of full-time (37 hours/week) employed personnels per 100 65+ year old inhabitants, SH65 : Percentage population share of 65+ year old inhabitants, GSH65 : Growth in SH65 in last 4 year, FWK : Female workforce participation, percent, LOGPOP : Logarithm to population size, LOGPOPD : Logarithm to population density, TAX : Taxbase per citizen.

These variables are observed for the year 1989, for all 275 Danish municipalities. For an overview of the Danish municipality structure, see figure 1.

(figure 1 - about here)

6 The OLS models from Lauridsen (1995), for the year 1989, is repeated in table 1:

(table 1 - about here)

Based upon tests for spatial dependence, alternative models with spatial effects were estimated. Especially, models with autocorrelated errors were estimated. These model are repeated in table 2:

(table 2 - about here)

It is evident from these tables that the error terms are spatially autocorrelated, as measured in the 8 parameters and their p-values. Furthermore, LM-tests for omitted spatial effects shows that the models are quite unstable.

Three specific forms of regimes structures are considered.

Structure 1 postulates the existense of regimes by counties, giving 14 regimes. The Danish county structure is shown in figure 2.

(figure 2 - about here)

Structure 2 was derived by sorting the first principal component for the estimated errors from the OLS models for the three service measures, i.e. the models in Table 1. This means that the first principal component is assumed to be a caracteristic variable, describing the structural instability. Using the SAS procedure PROC PRINCOMP, it was found that this first principal component accounted for 38 percent of the variation among these errors. From table 3 it is seen that the second and the third principal component accounts for 33 and 29 percent of this variation, respectively. The interpretation of the first principal component, as seen from the eigenvector reported in table 1, is that it contrasts the errors for the COV- and EXP models to the error for the STF model. This seems reasonable, as high positive deviations in the two first mainly are financed by lower staffs. Furthermore, the second and third principal components indicate a contrasts between the overall expenditure model and the specific expenditure components models.

(table 3 - about here)

Initially, a structure with 14 regimes was developed. Of these, 3 regimes consisted of 1 observation, and 1 regimes of 2 observations. These 4 regimes were assigned to those other to which the centroid distance were smallest. Concluding, a structure with 10 regimes were arrived upon, as shown in figure 3.

(figure 3 - about here)

Structure 3 was derived by multivariate clustering, based on the 3 service measures EXP, COV and STF. As the scales of these measures are uncomparable, they were standardized ahead of the clustering procedure. Initially, a hierarchical clustering was employed, from which a structure with 14 clusters was selected. Next, this structure was passed as input to a nonhierarchical

7 clustering. Finally, clusters with 1 and 2 observations were assigned to larger clusters, according to the centroid distance criteria. The remaining structure consisted of 8 regimes, as shown in figure 4.

(figure 4 - about here)

To give some measure of the relative adequacy of the 3 regimes structures, multivariate ANOVA's were carried out in the SAS procedure PROC GLM with the three service measures EXP, COV and STF as dependent variables, and the regimes structures 1-3 as classifying variables. Table 4 shows results from multivariate F-tests for no overall regimes effect, using 4 different criteria for calculation of the tests. For all 3 regimes structures, high F-values and corresponding small probability values indicate significant differences among regimes. Regimes structures 2 and 3 obviously have higher between-regimes variation, as migth be expected from the construction of these structures. Furthermore, it is seen that structure 2 is superior to structure 3 only with respect to F-tests based on Pillai's Trace, whereas the opposite is the case for the other 3 criteria.

(table 4 - about here)

A further investigation in the form of univariate ANOVA's gave F-tests for hyphoteses of no regimes effects as reported in table 5. These F-tests supports the conclusions drawn from the multivariate tests: Structure 3 is superior to structure 2, which in turn is superior to structure 1. The only exeption is that structure 1 seems slighly superior to structure 2 with respect to EXP.

(table 5 - about here)

Finally, spatial Chow-tests were calculated, based upon residuals from models with spatially autocorrelated errors. For each choise 1-3 of regimes structure, and for each service measure, some important results are reported in table 6. These results are R-squares (R2 ), Log likelihood values (L), Akaike's Information criterias (AIC), the correlation parametres (8), and Chow-tests

(CG ) with probability values for the hyphothesis of no regimes structure. Results are reported for the unlimited (subscript U) as well as the limited (subscript L) models. The only exeption is the EXP model, where the alternative model for regimes structure 3 (by counties) could not be estimated because of singularity problems.

(table 6 - about here)

For all 3 regimes structures, the alternative models ranks much better than the no-regime model with respect to R-square values and Log likelihood - as well as AIC values. This supports the general impression from the CG values which shows superiority of the alternative regimes models for all 3 choices of regimes structures. For regimes structures 2 and 3 this is quite reasonable, but for regimes structure 1 these findings are somewhat surprising, as the political decision of a partition of the Danish municipalities into 14 counties was motivated by an intention of creating administrative units which were as much alike each other as possible. If this goal were fulfilled, the inter-county variation should be small compared to the intra-county variation, and, consequently, not very high Chow-test values should be found.

8 For the EXP models, there are some discrepancies, as the R2 - and AIC measures favours regimes structure 3, whereas the L - and CG values points to regimes structure 2.

For the COV models, all criteria indicates superiority of regimes structures 2 and 3 to 1. In this case, however, the AIC - and CG measures agree in pointing to regimes structure 3 as the best alternative, opposed to R2 - and L values favouring regimes structure 2.

For the STF models, the superiority of regimes structures 2 and 3 to 1 is evident too. In this case, however, all measures clearly indicates the superiority of regimes structure 2 to 3.

Finally, a few words about the correlation parameters (8) are called for. For all three service measures, high correlation parameters are found for the limited models. For the regimes models, high values are found for regimes structures 2 only, whereas the correlation is almost eliminated for regimes structures 1 and 3. These findings indicate that dependence in some cases - but not in others - migth be eliminated by partitioning the observations in regimes. In other words: Spatial dependency and spatial heterogeneity are not coceptually interchangeable, although one effect migth dominate another in specific situations with specific choices of regimes structures. This is especially understressed by the fact that the regimes structure 2 - which should be the most obvious candidate for eliminating error correlation - increases the dependencies instead of reducing them.

6. Conclusions and further research directions.

In the present paper, the models discussed in Lauridsen (1995, 1996) are further developed. Based upon the findings in Lauridsen (1995, 1996) - that spatial effects in the form of spatial dependencies and heterogeneity are profound in policy-output models for explanation of intermunicipal service variation - this paper shows that models which accounts for spatial heterogeneity in the form of regimes structures performs much better than simple no-regimes models.

Different ways of defining regimes structures - natural regimes, sorting and multivariate clustering - are illustrated. It is shown that the sorting - and clustering procedures are very promising ways of deriving regimes, which are superior to naturally defined regimes - although this migth not always be the case.

Also, the necessity of considering spatial dependencies and spatial heterogeneity simultaneously is understressed in this paper as well as in Lauridsen (1995, 1996). This is much more evident as conflicting findings are done with respect to interchangeability of these two effects: In some cases, spatial regimes removes spatial dependencies, in other cases they strengthen them.

In accordance with Lauridsen (1995,1996) which concluded that space models should take into account the implications of spatial effects - especially in the form of dependencies and heteroscedasticity - this paper shows evidence that further research in the direction of formulating spatial heterogeneity is promising and desirable

Although it is hardly expectable that a unique procedure for defining regimes might be arrived upon, further empirical research aiming to develope and elaborate on regimes searching

9 procedures would be very stimulating and highly informative for regional research - especially for these branches of regional research which takes into account the nature and implications of space structures.

Literature.

Anselin, L. 1988: Spatial econometrics: Methods and models. Kluwer Academic Publishers.

Anselin, L. 1990: Dependence and Spatial Structural Instability in Applied Regression Analysis. Journal of Regional Science, 30, 1990, s. 185-207.

Lauridsen, J. 1995: Anvendelse af regionaløkonomiske metoder i analyse af kommunal udgiftsadfærd. Ph.D. afhandling, Odense Universitet, 1995.

Lauridsen, J. 1996: Regional Econometric Modelling in the Explanation of Intermunicipal Service Variation. Occasional Papers, 1, Odense Universitet.

Sharma, S. 1996: Applied Multivariate Techniques. Wiley, New York.

Johnson, R.A and D.W. Wichern 1992: Applied Multivariate Statistical Analysis. Prentice-Hall, London.

10 Christiansø

Skagen Allinge-Gudhjem Hasle

Hirtshals

Rønne

Åkirkeby Neksø

Hjørring Sindal

Løkken-Vrå

Sæby

Brønderslev Læsø

Pandrup

Dronninglund Åbybro

Brovst

Hanstholm Fjerritslev

Hals

Ålborg

Thisted

Nibe Sejlflod

Løgstør

Støvring

Skørping Års

Morsø Sydthy

Sundsøre Farsø Nørager Arden Sallingsund

Ålestrup

Mariager

Thyborøn-Harboør Thyholm

Spøttrup Møldrup

Nørhald Skive

Purhus

Tjele Rougsø Nørre Djurs Struer Vinderup Fjends Viborg Sønderhald

Langå Bjerringbro Midtdjurs Grenå

Holstebro Rosenholm

Avlum-Haderup karup

Rønde Kjellerup Ulfborg-Vemb Ebeltoft Trehøje Gjern

Ringkøbing Galten Århus

Herning Græsted- Holmsland

Videbæk Ry Hørning Helsingør Them

Skanderborg

Hundested Frederiksværk Fredensborg-Humlebæk Skjern Nørre Snede Brædstrup Hillerød Åskov Karlebo Nykøbing-Rørvig Gedved Brande Jægerspris Skævinge Hørsholm

Trundholm Slangerup Allerød Birkerød Egvad Tørring-Uldum Samsø Farum Søllerød

Give Ølstykke Lyngby-Tårbæk Dragsholm Stenløse Værløse Gentofte Skibby Jelling Holbæk Gladsakse Blåbjerg Ølgod Ledøje-Smørum Juelsminde Bjergsted Svinninge Bramsnæs Gundsø Rødovre København Høje Tåstrup Brøndby Billund Tornved IshøjVallensbæk Egtved Hvidebæk Jernløse Tårnby Børkop Hvalsø Dragør Lejre Greve Helle Tølløse Blåvandshuk Ramsø

Fredericia Solrød Dianalund Stenlille Holsted Gørlev Lunderskov Høng

Esbjerg Brørup Skovbo Søndersø Køge Bramming Sorø Nørre Åby

Fanø Vamdrup Ejby Vallø Odense

Rødding Årup Korsør Suså Hashøj Ribe Christiansfeld Stevns Fuglebjerg Rønnede Årslev Fakse Gram Assens Holmegård Næstved Vojens Ørbæk Skælskør Broby Ringe Ryslinge Fladså Hårby

Skærbæk Nørre Rangstrup

Gudme Fåborg Præstø Egebjerg

Bredebro Rødekro Løgumkloster Åbenrå Langebæk

Møn Nordborg Tranekær

Højer Sundeved Augustenborg

Tønder Tinglev Lundtoft Rudkøbing Gråsten Sønderborg Sydals Nørre Alslev

Ærøskøbing

Broager Stubbekøbing Bov

Sydlangeland Højreby Sakskøbing

Nykøbing F

Rødby

Sydfalster

Figure 1: The Danish municipality structure. Bornholms amt

Nordjyllands amt

Viborg amt

Århus amt

Ringkøbing amt

Frederiksborg amt

Københavns amt Vejle amt

Frederiksberg København

Ribe amt Roskilde amt Vestsjællands amt

Fyns amt

Sønderjyllands amt Storstrøms amt

Figure 2: The Danish county structure. Christiansø

Skagen

Allinge-Gudhjem Hasle

Hirtshals

Rønne Neksø Åkirkeby

Hjørring Sindal Frederikshavn

Løkken-Vrå

Sæby

Brønderslev Læsø

Pandrup

Dronninglund

Åbybro

Brovst

Hanstholm Fjerritslev

Hals

Ålborg

Thisted

Nibe Sejlflod

Løgstør

Støvring

Års Skørping

Sydthy Morsø

Farsø Sundsøre Hadsund Nørager Sallingsund Arden

Ålestrup

Mariager

Thyborøn-Harboør Thyholm Hobro Spøttrup Møldrup

Nørhald Skive

Purhus Tjele Rougsø Lemvig Nørre Djurs

Struer Vinderup Randers Fjends Viborg Sønderhald

Bjerringbro Langå Midtdjurs Grenå

Holstebro Hvorslev Hadsten Rosenholm

Avlum-Haderup karup

Rønde Kjellerup Ulfborg-Vemb Hinnerup Hammel Ebeltoft Gjern Trehøje

Ringkøbing Silkeborg Galten Ikast Århus

Holmsland Græsted-Gilleleje

Videbæk Ry Hørning Helsingør Them Helsinge

Skanderborg Fredensborg-Humlebæk Hundested Frederiksværk Skjern Nørre Snede Brædstrup Hillerød Åskov Karlebo Nykøbing-Rørvig Gedved Odder Brande Jægerspris Skævinge Hørsholm

Trundholm Slangerup Allerød Birkerød Egvad Horsens Søllerød Tørring-Uldum Samsø Frederikssund Farum Give Ølstykke Lyngby-Tårbæk Dragsholm Værløse Stenløse Gentofte Jelling Hedensted Skibby Herlev Grindsted Holbæk Ballerup Gladsakse Blåbjerg Ølgod Juelsminde Ledøje-Smørum Gundsø Bjergsted Svinninge Bramsnæs Rødovre Vejle Kalundborg Albertslund Frederiksberg Glostrup Høje Tåstrup København Billund Brøndby Tornved Roskilde Vallensbæk Hvidovre Ishøj Varde Egtved Hvidebæk Jernløse Tårnby Børkop Hvalsø Lejre Dragør Greve

Helle Tølløse Blåvandshuk Ramsø

Fredericia Stenlille Solrød Bogense Otterup Gørlev Dianalund Holsted Lunderskov Høng

Esbjerg Vejen Brørup Kolding Kerteminde Middelfart Skovbo Bramming Søndersø Køge Munkebo Ringsted Nørre Åby Sorø Slagelse

Fanø Vamdrup Ejby Vissenbjerg Vallø Odense Langeskov Korsør Suså Rødding Årup Haslev Christiansfeld Nyborg Hashøj Stevns Ribe Ullerslev Fuglebjerg Rønnede Tommerup Årslev Fakse Holmegård Gram Assens Glamsbjerg Næstved Vojens Ørbæk Broby Skælskør

Haderslev Ringe Ryslinge Fladså Hårby

Skærbæk Nørre Rangstrup

Fåborg Præstø Egebjerg Vordingborg

Rødekro Bredebro Løgumkloster Åbenrå Langebæk

Nordborg Møn Svendborg Tranekær

Højer Sundeved Augustenborg Tønder Lundtoft Tinglev Gråsten Rudkøbing Ravnsborg Sønderborg Sydals Nørre Alslev

Ærøskøbing Stubbekøbing Broager Marstal Bov Nakskov Højreby Sydlangeland Maribo Sakskøbing

Nykøbing F Rudbjerg

Rødby Nysted Holeby

regime 2

1,00000 - 1,90000 :

1,90000 - 2,80000 :

2,80000 - 3,70000 :

3,70000 - 4,60000 :

4,60000 - 5,50000 :

5,50000 - 6,40000 :

6,40000 - 7,30000 :

7,30000 - 8,20000 :

8,20000 - 9,10000 :

9,10000 - 10,00000 :

Figure 3: Regimes structure 2 : 10 regimes by first principal component for OLS-errors. Christiansø

Skagen

Allinge-Gudhjem Hasle

Hirtshals

Rønne Neksø Åkirkeby

Hjørring Sindal Frederikshavn

Løkken-Vrå

Sæby

Brønderslev Læsø

Pandrup

Dronninglund

Åbybro

Brovst

Hanstholm Fjerritslev

Hals

Ålborg

Thisted

Nibe Sejlflod

Løgstør

Støvring

Års Skørping

Sydthy Morsø

Farsø Sundsøre Hadsund Nørager Sallingsund Arden

Ålestrup

Mariager

Thyborøn-Harboør Thyholm Hobro Spøttrup Møldrup

Nørhald Skive

Purhus Tjele Rougsø Lemvig Nørre Djurs

Struer Vinderup Randers Fjends Viborg Sønderhald

Bjerringbro Langå Midtdjurs Grenå

Holstebro Hvorslev Hadsten Rosenholm

Avlum-Haderup karup

Rønde Kjellerup Ulfborg-Vemb Hinnerup Hammel Ebeltoft Gjern Trehøje

Ringkøbing Silkeborg Galten Ikast Århus

Holmsland Herning Græsted-Gilleleje

Videbæk Ry Hørning Helsingør Them Helsinge

Skanderborg Fredensborg-Humlebæk Hundested Frederiksværk Skjern Nørre Snede Brædstrup Hillerød Åskov Karlebo Nykøbing-Rørvig Gedved Odder Brande Jægerspris Skævinge Hørsholm

Trundholm Slangerup Allerød Birkerød Egvad Horsens Søllerød Tørring-Uldum Samsø Frederikssund Farum Give Ølstykke Lyngby-Tårbæk Dragsholm Værløse Stenløse Gentofte Jelling Hedensted Skibby Herlev Grindsted Holbæk Ballerup Gladsakse Blåbjerg Ølgod Juelsminde Ledøje-Smørum Gundsø Bjergsted Svinninge Bramsnæs Rødovre Vejle Kalundborg Albertslund Frederiksberg Glostrup Høje Tåstrup København Billund Brøndby Tornved Roskilde Vallensbæk Hvidovre Ishøj Varde Egtved Hvidebæk Jernløse Tårnby Børkop Hvalsø Lejre Dragør Greve

Helle Tølløse Blåvandshuk Ramsø

Fredericia Stenlille Solrød Bogense Otterup Gørlev Dianalund Holsted Lunderskov Høng

Esbjerg Vejen Brørup Kolding Kerteminde Middelfart Skovbo Bramming Søndersø Køge Munkebo Ringsted Nørre Åby Sorø Slagelse

Fanø Vamdrup Ejby Vissenbjerg Vallø Odense Langeskov Korsør Suså Rødding Årup Haslev Christiansfeld Nyborg Hashøj Stevns Ribe Ullerslev Fuglebjerg Rønnede Tommerup Årslev Fakse Holmegård Gram Assens Glamsbjerg Næstved Vojens Ørbæk Broby Skælskør

Haderslev Ringe Ryslinge Fladså Hårby

Skærbæk Nørre Rangstrup Gudme

Fåborg Præstø Egebjerg Vordingborg

Rødekro Bredebro Løgumkloster Åbenrå Langebæk

Nordborg Møn Svendborg Tranekær

Højer Sundeved Augustenborg Tønder Lundtoft Tinglev Gråsten Rudkøbing Ravnsborg Sønderborg Sydals Nørre Alslev

Ærøskøbing Stubbekøbing Broager Marstal Bov Nakskov Højreby Sydlangeland Maribo Sakskøbing

Nykøbing F Rudbjerg

Rødby Nysted Sydfalster Holeby

regime 3

1,00000 - 1,87500 :

1,87500 - 2,75000 :

2,75000 - 3,62500 :

3,62500 - 4,50000 :

4,50000 - 5,37500 :

5,37500 - 6,25000 :

6,25000 - 7,12500 :

7,12500 - 8,00000 :

Figure 4: Regimes structure 3 : 8 regimes by multivariate clustering on EXP, COV, STF. ______EXP COV STF ______

CONSTANT -8608.19 10.30 30.49 (0.181) (0) (0.021) COV 1534.74 (0) STF 32.35 (0.004) SCARE 2257.10 -0.480 (0) (0) SH65 -260.56 (0.001) GSH65 -0.082 0.589 (0) (0.003) FWK 204.41 (0.010) LOGPOP 5.12 (0.001) LOGPOPD 949.14 -0.360 (0) (0.002) TAX 0.0002 (0.034) ______R-SQUARE 0.60 0.26 0.18 LOG LIKEL. -2609.10 -540.35 -1179.93 ______Table 1. Linear regression models for EXP, COV and STF. Numbers in parentheses are significanse probabilities. ______EXP COV STF ______CONSTANT -8995.68 10.64 29.36 (0.227) (0) (0.025) COV 1540.12 (0) STF 33.44 (0.003) SCARE 2245.74 -0.555 (0) (0) SH65 -251.81 (0.002) GSH65 -0.084 0.593 (0) (0.002) FWK 207.51 (0.009) LOGPOP 5.25 (0.001) LOGPOPD 929.82 -0.365 (0) (0.003) TAX 0.0002 (0.043) LAMBDA 0.24 0.43 0.22 (0.107) (0.005) (0.080) ______R-SQUARE 0.592 0.326 0.19 LOG LIKEL. -2607.02 -540.27 -1179.82 AIC 5229.04 1089.54 2368.65 ______LM-TEST FOR 142.95 3.324 150.03 HETEROG. (0) (0.334) (0) ______Table 2. Models for EXP, COV and STF with correlated errors. Numbers in parentheses are significanse probabilities. ______

Eigenvalues of correlation matrix

Eigenvalue Proportion Cumulative PRIN1 1.135 0.378 0.378 PRIN2 0.996 0.332 0.710 PRIN3 0.869 0.290 1.000

Eigenvectors

PRIN1 PRIN2 PRIN3

E(EXP) 0.192 0.977 0.089 E(COV) -0.700 0.073 0.711 E(STF) 0.688 -0.198 0.698 ______Table 3. Results from principal components analysis of errors from linear models in Table 1. ______

______Regimes st. 1 Regimes st. 2 Regimes st. 3 (counties) (princ. 1) (m-cluster)

Wilk's Lambda 0.448 0.005 0.029 F 4.44 56.09 56.66 Pr>F 0.0001 0.0001 0.0001

Pillai's Trace 0.676 2.739 1.969 F 4.08 43.62 36.99 Pr>F 0.0001 0.0001 0.0001

Hotelling-Lawley 0.976 13.304 7.838 F 4.81 65.63 73.49 Pr>F 0.0001 0.0001 0.0001

Roy's Gr. Root 0.658 5.694 4.365 F 13.22 114.31 166.17 Pr>F 0.0001 0.0001 0.0001 ______Table 4. Multivariate ANOVA - F-tsts for hypotheses of no overall regimes effects on EXP, COV, STF. ______

______Regimes st. 1 Regimes st. 2 Regimes st. 3 (counties) (princ. 1) (m-cluster)

EXP 4.76 (0.0001) 3.17 (0.0012) 69.28 (0.0001)

COV 4.58 (0.0001) 23.15 (0.0001) 83.37 (0.0001)

STF 5.25 (0.0001) 21.98 (0.0001) 69.53 (0.0001) ______Table 5. Univariate ANOVA - F-tests (Pr>F) for hypotheses of no regimes effect. ______

22 RRLLAICAICCP(C)ULUL ULGGU8 8L ______

EXP : _____

R.S.1 - 0.59 - -2607.02 - 5229.04 - - - 0.24 (County)

R.S. 2 0.80 -2886.41 5113.82 469.57 0 0.59 (PRIN1)

R.S.3 0.82 -2499.26 5111.52 337.76 0 0 (M-CLUS) ______

COV : _____

R.S. 1 0.47 0.33 -493.91 -540.27 1100.82 1089.54 779.03 0 0 0.43 (County)

R.S.2 0.86 -357.44 795.87 779.03 0 0.48 (PRIN1)

R.S.3 0.81 -352.56 770.11 792.48 0 0.08 (M-CLUS) ______

STF : _____

R.S.1 0.436 0.19 -1128.75 -1179.82 2370.50 2368.65 130.02 0 0 0.22 (County)

R.S.2 0.78 -1006.40 2093.80 709.82 0 0.31 (PRIN1)

R.S.3 0.72 -1032.11 2129.24 542.88 0 0 (M-CLUS) ______2 Table 6. R , L, AIC, CG , 8 values for limited and unlimited (regimes -) models, with subscripts L and U, respectively. ______