<<

A suggestion on improving human migration models through incorporation of regional identities

Willem R. J. Vermeulen BSc

August 25, 2018

Supervisors: Rick Quax PhD, Debraj Roy PhD Assessor: Valeria Krzhizhanovskaya PhD UvA student id: 10561714

VU student id: 2606467 ii Abstract Human migration is a complex phenomenon, on which many different factors have their influence. The migration process is modelled using gravity models or radiation models. In this paper we propose that human migration models can be improved by embedding regional identities into the model. This is tested by adding three different sets of Dutch identity regions to a gravity model. Through analysis of the Dutch internal migration data between 1996 and 2016, we discover that individuals are more likely to move towards municipalities located within the same identity region. We also find that this influence of identity becomes larger when the identity regions are as smaller and as well defined as possible.

iii iv Contents

Acknowledgements xii

1 Introduction 1 1.1 Research question and hypotheses ...... 1 1.1.1 Introducing regional identities to a human migration model increases its predictive value ...... 2 1.1.2 Distance is handled differently when regional identities are introduced .. 2 1.1.3 Smaller regional identities have a higher predictive power than larger iden- tities ...... 2

2 Methodologies 3 2.1 Model specification ...... 3 2.2 Used migration data ...... 4 2.3 Fitting a standard gravity model for human migration ...... 4 2.4 Expansion of the gravity model ...... 5 2.4.1 Specification of regional identities ...... 6 2.4.2 Introduction of the different sets of identity regions ...... 7 2.5 Comparison of the importance of identity in different sets of regions ...... 7 2.6 Creation of other sets of identity regions ...... 8 2.6.1 Randomly generated regions ...... 8 2.6.2 Randomly generated spatially clustered regions ...... 8 2.7 Optimisation of a set of regions ...... 9

3 The significance of the influence of the specified identity regions 13 3.1 Differences in the mean ICM values ...... 14 3.2 Differences in the median ICM values ...... 17 3.3 Sensitivity to parameter changes in the optimisation algorithm ...... 19

4 Discussion 21 4.1 Specification of the model ...... 22 4.2 Specification of the ICM value ...... 22 4.2.1 Other influences that can contribute to the ICM value ...... 22 4.3 Identity regions ...... 23 4.3.1 Optimisation technique ...... 23 4.3.2 Quality ...... 24 4.3.3 Challenges in the usage of identity regions ...... 24

5 Conclusions 27

6 Future work 29

Appendices I

v A Influences on migration decisions III A.1 Considerations in migration decisions ...... III A.2 Economic benefits ...... III A.3 Availability of amenities ...... IV A.4 Travel distance ...... V A.5 Information distance ...... V A.6 Social distance ...... VI A.7 Household optimisation ...... VII A.8 Family-cycle considerations ...... VII A.9 Policies & disasters ...... VIII

B Specification of identity regions IX

C Tactics to increase the average ICM value XVII C.1 Using network metrics ...... XVII C.2 Similarity of migration behaviour ...... XVIII C.3 Reassigning municipalities ...... XVIII C.4 Simulated annealing ...... XIX C.5 Discussion ...... XIX

D Geographical distributions of ICM values for different sets of regions XXI

vi List of Figures

2.1 Visual representation of the steps taken to assign municipalities to spatially clus- tered regions using the k-means algorithm. This algorithm can be applied to generate different numbers of regions. This number of regions is controlled by the variable k. To be able to compare the generated regions with a certain set of identity regions, this k is set equal to the number of regions present in this set of identity regions...... 9 2.2 A visual representation of the optimisation algorithm. Given a certain starting configuration, it is determined what regions are located within a distance of20 kilometres from each existing municipality. For each of these municipalities, the change in the global average ICM value is measured when a municipality would be part of that region. Every municipality that should be part of another region than it already was, is then relocated with a chance of 50%. This process is repeated until no more municipalities are relocated for three iterations...... 11

3.1 The mean ICM values for 250 sets of twelve randomly generated regions, 250 sets of twelve randomly generated spatially clustered regions, fifty sets of twelve optimised randomly generated spatially clustered regions and fifty sets of optimised NUTS 2 regions, compared to the ICM value of the original NUTS 2 regions...... 14 3.2 Whatever changes in parameters we make to the human migration model, the mean value of the ICM values of the randomly generated spatially clustered regions is always significantly lower than the ICM value of the NUTS 2 regions. For each of the parameter configurations, thirty different randomly generated spatially clustered regions were generated...... 14 3.3 The mean ICM values for 250 sets of forty randomly generated regions, 250 sets of forty randomly generated spatially clustered regions, fifty sets of forty optimised randomly generated spatially clustered regions and fifty sets of optimised NUTS 3 regions, compared to the ICM value of the original NUTS 3 regions...... 15 3.4 Whatever changes in parameters we make to the human migration model, the mean value of the ICM values of the randomly generated spatially clustered regions is always significantly lower than the ICM value of the NUTS 3 regions. For each of the parameter configurations, thirty different randomly generated spatially clustered regions were generated...... 15 3.5 The mean ICM values for 250 sets of seventy randomly generated regions, 250 sets of seventy randomly generated spatially clustered regions, fifty sets of seventy op- timised randomly generated spatially clustered regions and fifty sets of optimised literature regions, compared to the ICM value of the original regions specified through literature...... 16 3.6 Whatever changes in parameters we make to the human migration model, the mean value of the ICM values of the randomly generated spatially clustered re- gions is always significantly lower than the ICM value of the literature defined regions. For each of the parameter configurations, thirty different randomly gen- erated spatially clustered regions were generated...... 16

vii 3.7 The median ICM values for 250 sets of twelve randomly generated regions, 250 sets of twelve randomly generated spatially clustered regions, fifty sets of twelve opti- mised randomly generated spatially clustered regions and fifty sets of optimised NUTS 2 regions, compared to the ICM value of the original NUTS 2 regions. .. 17 3.8 The median ICM values for 250 sets of forty randomly generated regions, 250 sets of forty randomly generated spatially clustered regions, fifty sets of forty optimised randomly generated spatially clustered regions and fifty sets of optimised NUTS 3 regions, compared to the ICM value of the original NUTS 3 regions...... 18 3.9 The median ICM values for 250 sets of seventy randomly generated regions, 250 sets of seventy randomly generated spatially clustered regions, fifty sets of sev- enty optimised randomly generated spatially clustered regions and fifty sets of optimised literature regions, compared to the ICM value of the original regions specified through literature...... 18 3.10 Differences between the mean ICM values of the optimised randomly spatially clustered regions and the mean ICM values of the optimised predefined regions for each predefined set of regions, using three different distance cut-offs inthe optimisation algorithm. For each of the parameter configurations, thirty different optimised regions were generated...... 19 3.11 Differences between the median ICM values of median ICM values of the optimised randomly spatially clustered regions and the optimised predefined regions for each predefined set of regions, using three different distance cut-offs in the optimisation algorithm. For each of the parameter configurations, thirty different optimised regions were generated...... 20

4.1 Three different approaches to defining identity region: with hard boundaries, fuzzy boundaries, or by looking at the existing connections between two municipalities. 25

D.1 The ICM values calculated for each municipality when the are split into the NUTS 2 regions. All ICM values are positive. The ICM values in the southern part of Limburg, Zeeland, and the Northern parts of and are all larger than the ICM values in other parts of the country. The average ICM value for municipalities located within these twelve regions is 20.91, the median ICM value is 12.35. Municipal boundary data used in this map is acquired from the Basisregistratie Kadaster (2016)...... XXII D.2 The ICM values calculated for each municipality when the Netherlands are split into the NUTS 3 regions. There are municipalities scattered throughout the coun- try with relatively high ICM values. The average ICM value for municipalities lo- cated within these forty regions is 59.57, the median ICM value is 33.03. The lower ICM values are clustered around the centre of the country. Municipal boundary data used in this map is acquired from the Basisregistratie Kadaster (2016). ... XXIII D.3 The ICM values calculated for each municipality when the Netherlands are split into the seventy regions specified by literature. The average ICM value for mu- nicipalities located within these regions is 73.91, the median ICM value is 42.34. Most ICM values are positive, but clusters of lower ICM values are found in North-Holland and Utrecht. The ICM values of Texel, and the Wijde- meren municipalities are even negative. Municipal boundary data used in this map is acquired from the Basisregistratie Kadaster (2016)...... XXIV D.4 The ICM values for each municipality in four different scenarios in which the Netherlands are split into twelve random regions, disregarding any distance. In each of these scenarios about half of the municipalities have a negative ICM value, and half of the municipalities has a positive ICM value...... XXV

viii D.5 The ICM values for each municipality in four different scenarios in which the Netherlands are split into twelve randomly spatially clustered regions. When the ICM values in these scenarios are compared to the ICM values created by the original regions, it becomes apparent that the ICM values of some municipalities become negative in the randomly generated spatially clustered regions, whereas all ICM values in the original regions were positive...... XXVI D.6 The ICM values for each municipality in four different scenarios in which the Netherlands are split into twelve randomly spatially clustered regions, and then further optimised. As this optimisation technique is based on chance, different optima are found. When the generated ICM values are compared to the ICM values of the original regions, we see that the ICM values are not distributed in a similar way. Whereas the variance in the ICM values in the original regions is very low, we see that there occur various high ICM values and negative ICM values in the randomly generated spatially clustered regions...... XXVII D.7 The ICM values for each municipality in four different scenarios in which the NUTS 2 regions are further optimised. Since this optimisation technique is par- tially based on chance, different optima are found. When the generated ICM values are compared to the ICM values of the original regions, we find that the ICM values are not distributed in a similar way. Whereas the variance in the ICM values in the original regions is very low, we see that there occur various high ICM values and some negative ICM values in the randomly generated spatially clus- tered regions. When compared to the ICM values of the non-optimised randomly spatially clustered regions, we do however find that the number of negative ICM values is decreased...... XXVIII D.8 The ICM values for each municipality in four different scenarios in which the Netherlands are split into forty random regions, disregarding any distance. In each of these scenarios most municipalities have a negative ICM value. The other 15% of the regions have slightly positive ICM values. In each of the four situations, we find that there are extremely positive and extremely negative ICM values. .. XXIX D.9 The ICM values for each municipality in four different scenarios in which the Netherlands are split into forty randomly spatially clustered regions. As opposed to the municipalities in the original NUTS 3 regions, municipalities within these randomly spatially clustered regions have more negative ICM values...... XXX D.10 The ICM values for each municipality in four different scenarios in which the Netherlands are split into forty randomly spatially clustered regions, and then further optimised. As this optimisation technique is based on chance, different optima are found. Most municipalities have positive ICM values, except for the two cases the municipalities on the Frisian islands had negative ICM values, as well as the one case a municipality in Friesland had a negative ICM value. .... XXXI D.11 The ICM values for each municipality in four different scenarios in which the NUTS 3 regions are further optimised. As this optimisation technique is based on chance, different optima are found. In all four scenarios almost all municipalities have positive ICM values, except for the municipalities of Texel and . When the ICM values of the optimised regions are compared to the ICM values of the municipalities located in the original regions, it becomes clear that ICM values of municipalities located all over the country are increased...... XXXII D.12 The ICM values for each municipality in four different scenarios in which the Netherlands are split into seventy random regions, disregarding any distance. In each of these scenarios most municipalities have a negative ICM value. The other 10% of the regions have slightly positive ICM values. In each of the four situations, we find that there are extremely positive and extremely negative ICM values. .. XXXIII

ix D.13 The ICM values for each municipality in four different scenarios in which the Netherlands are split into seventy randomly spatially clustered regions. en the ICM values in these scenarios are compared to the ICM values of the original regions, it becomes apparent that the ICM values in a lot of municipalities are actually higher than they were before. On the other hand, more municipalities then do have negative ICM values. This pattern could be explained by the fact that municipalities in certain parts of the Netherlands are larger than in others. When the regional centres are spread in an equal way over the country by using the k-means algorithm, this means that municipalities that should belong to the same larger identity region are less likely to be assigned to the same region. .. XXXIV D.14 The ICM values for each municipality in four different scenarios in which the Netherlands are split into seventy randomly spatially clustered regions, and then further optimised. As this optimisation technique is based on chance, different optima are found. Even though most ICM values are positive, each of the four different scenarios contains at least one municipality with a negative ICM value. On the Frisian islands, and in one municipality in Zeeland these negative ICM values appear more than once. When the ICM values of the optimised regions are compared to the ICM values of the original randomly spatially clustered regions, we find that the number of municipalities with negative ICM values has decreased. XXXV D.15 The ICM values for each municipality in four different scenarios in which the sev- enty regions specified by literature are further optimised. As this optimisation technique is based on chance, different optima are found. In three of the four scenarios all municipalities have positive ICM values, in one scenario three mu- nicipalities have negative ICM values. When the ICM values of the optimised regions are compared to the ICM values of the municipalities located in the origi- nal regions, it becomes clear that ICM values of municipalities located all over the country are increased...... XXXVI

x List of Tables

2.1 Sources for the values of the different variables used in the regression...... 5 2.2 Different parameters used in the extended gravity model ...... 6 2.3 Different sets of identity regions embedded in the human migration model. .... 6

B.1 Dutch municipalities that existed in 2016, split into 70 different identity regions. IX

xi A special thanks to Rick Quax, Debraj Roy and Wessel Klijnsma for your advice and insights, and to my family and girlfriend for the continuous support.

xii CHAPTER 1 Introduction

Human migration decisions are based on a broad spectrum of factors. This makes it difficult to accurately predict human migration behaviour. It has been discovered that the choice to move to a certain destination is likely influenced by its economic prospects (Smith, 1776; Peters, 1984; J. Kok, 2004), the availability of amenities (Tiebout, 1956; P. Graves and Linneman, 1979), and the travel distance (Ravenstein, 1885; Ravenstein, 1889; Grigg, 1977; Peters, 1984), information distance (P. Bouman and W. Bouman, 1967; J. Kok, 2004) and social distance (Ravenstein, 1876; Ravenstein, 1885; Weber, 1899; Hipp et al., 2012) between both locations (for more de- tails, see Appendix A). The way in which these variables influence the migration decision differs for each individual. After all, each individual has different personal connections (Bauer and Zimmermann, 1997; Massey, 2015), requirements (Harts and Hingtsman, 1986; J. Kok, 2004) and aspirations (Greenwood, 1985; Lucassen, 2000).

It is important to governments and companies to be able to predict human migration flows beforehand, because this can help them plan ahead. These predictions can be made using dif- ferent types of models. A popular type of migration models are the gravity models, which were first introduced by Zipf (1946) and applied by many others since (Greenwood, 1985; Cummins, 2009; Anderson, 2011). In these models interactions between different locations are specified as a direct function of their mutual geographical distance and their population mass as proxy to the economic prospects of a location. Another popular new type of migration models are the radiation models, which were introduced by Simini et al. (2012) and applied and extended since (Yang et al., 2014; Ren et al., 2014; Kang et al., 2015). In these models residents create inter- vening opportunities for migrants, which means that geographical distance only has an indirect influence on the generated migration flows. Both types of models can come in different forms.A good example of this variety in models is an artificial neural network model that included both the traditional variables, as well as intervening variables and amenity variables (Robinson and Dilkina, 2017).

1.1 Research question and hypotheses

In this research we will examine one of the factors that could attribute to the human migration decision: regional identity. This is done by introducing three different sets of Dutch identity regions to a basic gravity model. One set of regions is created through literature research, to test the possibilities of defining such regions when no predefined regions are available for acertain area. The other two sets of regions are defined by Eurostat for statistical purposes, and are used as a comparison. Through this analysis we try to confirm three different hypotheses. H1 The introduction of regional identities in a human migration model can help in explaining anomalies in local migration behaviour. H2 The introduction of regional identities in a human migration model will have a significant impact on the way distance is handled in the model.

1 H3 Smaller, more specific regional identities can explain the anomalies in local migration be- haviour better than larger, more generic regional identities.

1.1.1 Introducing regional identities to a human migration model increases its pre- dictive value As described before, human migration is based on many factors. A basic gravity model for human migration disregards a lot of these factors, and only takes the distance and population size factors into account. While the availability of amenities and economic prospects of a region could correlate with the number of people living somewhere, and the geographical distance could correlate with the travel distance and information distance, this also means that social distance is not included in this model.

The similarities and differences in identity are an important part of the experienced social dis- tance. Even though identity is a complex concept, we thus hypothesise that introducing regional identities into the human migration model can at least partially account for a missing factor in the model.

1.1.2 Distance is handled differently when regional identities are introduced When regional identity is introduced to the human migration model, we hypothesise that this will have an effect on the way distance is handled. Most migration movements that also involve regional identities will take place over shorter distances. If this means that migration numbers over shorter distances are increased towards municipalities that share the same regional identity, this would also mean that the distance equation would initially have be fitted to contaminated data. Short distance migration flows might have been bigger, but not only because of thesmall distance between both locations.

1.1.3 Smaller regional identities have a higher predictive power than larger identities When the specified regional identities are smaller, we hypothesise that these regions can explain anomalies in local migration behaviour better than larger, more generic regional identities. Larger regional identities might actually consist of many smaller identities that are combined together. We would thus hypothesise that larger regional identities would still have some predictive power, but that the regions are actually inaccurate.

2 CHAPTER 2 Methodologies

In this chapter we will first specify a basic gravity model for human migration, after which this model is fitted on Dutch data collected between 1996 and 2016. Using three different sets of identity regions we specify, we introduce the identity regions to the gravity model. After creating this model, we define a way of comparing the influence of identity on migration for different identity sets, and define ways to create and optimise the definitions of suchregional identities.

2.1 Model specification

As mentioned in the introduction, there are two types of models that are often used to model migration. Even though radiation models seem to work slightly better on a larger scale than gravity models, a gravity model is used in this research. The decisive factor is that a gravity model explicitly uses the distance variable, whereas the radiation model uses the distances vari- able indirectly (Piovani et al., 2018). The influence of distance on the migration process andthe impact of the introduction of identity regions on that distance variable would be otherwise be hard to determine.

The most basic gravity model often used to model human migration is shown in Equation 2.1. Within this equation the populations pa and pb of municipalities a and b are positively related to the number of people that migrate from municipality a to b, Ma→b. When more people live in municipality a, a larger number of people can leave that location, and when there are more people living in municipality b, it is likely that there are more opportunities in that municipality (Weber, 1899; J. Kok, 2004). This could make people more willing to move there.

Alongside the influences of the population sizes of both municipalities, the distance ∆a→b be- tween those two municipalities is also included in this model. This distance can compensate for the influence of travel distance and information distance between two locations. As the distance between two municipalities becomes larger, it becomes less likely that individuals move between those two municipalities. The G variable is a proportionality constant that differs depending on the geographical context and time scale in which the function is applied.

α · β pa pb Ma→b = G · γ (2.1) ∆a→b This equation can then be rewritten in a linear form through taking the logarithm of both sides, as shown in Equation 2.2. By doing this, a generalised linear regression (GLM) can be applied to find the values of α, β, γ and δ. The δ variable is introduced to accurately determine the value of G, which equals exp(δ).

ln(Ma→b) = α · ln(pa) + β · ln(pb) − γ · ln(∆a→b) + δ + ϵa→b (2.2)

3 2.2 Used migration data

Part of the internal migration events within the Netherlands take place within municipalities, whereas the other events take place between two different municipalities. This means that both intramunicipal and intermunicipal migration data should be used when we want to include all migration events. Such migration data is available for the years 1995 to 2016 via Statistics Netherlands1.

The migration data is collected directly from the Dutch civil registration database. This means that the data is as accurate as possible, because only cases in which human migration movements are not correctly registered are excluded. Even though no data is available on the percentage of unregistered movements, this shortcoming is not expected to have a large effect on the outcomes of this research: research by Bouhuijs and Meijer (2017) showed that in 2016 96.26%(95% CI [95.72%, 96.81%]) of the Dutch citizens was registered at the right address.

For each year the number of intermunicipal migrants between every combination of munici- palities is recorded, as well as the number of intramunicipal migrants. This does not mean that data for the same number of municipalities is recorded each year: because of merges of munic- ipalities, the number of municipalities has decreased from 625 in 1996 to 390 in 2016 (Centraal Bureau voor de Statistiek, 2018b; Centraal Bureau voor de Statistiek, 2018c). To be able to create maps and compare migration data in different years, we artificially merge municipalities to form the municipalities that existed in 2016.

2.3 Fitting a standard gravity model for human migration

To be able to fit the gravity model to the migration data, we also need to have dataonthe population size of all municipalities and data on the distances between all locations. While mu- nicipal population data can easily be acquired through Statistics Netherlands (Centraal Bureau voor de Statistiek, 2018a), it is harder to acquire data on the distance travelled by each individual.

The distance travelled by every migrant between the same two municipalities will always slightly differ. Migrants do not live at the same location in both municipalities. Because it isimpos- sible to know the exact migration distance for each migration event, the distance travelled is approximated by the length of the straight line between the geographical centres of both mu- nicipalities in kilometres. This can be done because this distance is highly correlated with the travel time between two locations (Phibbs and Luft, 1995). Even though there are cases where this assumption does not hold, such as when certain geographical boundaries cannot be crossed or the population centre is located far from the geographical centre of a municipality, we assume this does not have a significant impact.

This way of approximating the distance travelled cannot be used for the intramunicipal mi- gration data. The distance between the centre of a municipality and the centre of that very same municipality is always zero. Under the assumption that most migration movements√ take 1 | | place over shorter distances, we estimate the intramunicipal travel distance to be a about 2 instead. 1 Between 1996 and 2010 a separate intermunicipal migration data set was released each year (Centraal Bureau voor de Statistiek, 2005a; Centraal Bureau voor de Statistiek, 2005b; Centraal Bureau voor de Statistiek, 2005c; Centraal Bureau voor de Statistiek, 2005d; Centraal Bureau voor de Statistiek, 2005e; Centraal Bureau voor de Statistiek, 2005f; Centraal Bureau voor de Statistiek, 2005g; Centraal Bureau voor de Statistiek, 2005h; Centraal Bureau voor de Statistiek, 2005i; Centraal Bureau voor de Statistiek, 2006; Centraal Bureau voor de Statistiek, 2007; Centraal Bureau voor de Statistiek, 2008; Centraal Bureau voor de Statistiek, 2009; Centraal Bureau voor de Statistiek, 2010; Centraal Bureau voor de Statistiek, 2011). Intermunicipal migration data after 2010 is all collected in one single data set, which is updated on a yearly basis (Centraal Bureau voor de Statistiek, 2017). All intramunicipal migration data is available in one single data set, also updated on a yearly basis (Centraal Bureau voor de Statistiek, 2018d)

4 Intramunicipal Intermunicipal migration migration

Population data Centraal Bureau voor de Statistiek, 2018a Migration data Centraal Bureau voor de Multiple sources, see Statistiek, 2018d footnote 1 on page 4 Distance data Estimate: Estimate: Distance between the √ centres of both 1 | | 2 municipalities

Table 2.1: Sources for the values of the different variables used in the regression.

The linear form of the gravity model presented in Equation 2.2 can then be fitted on the data using a Generalised Linear Model (GLM). GLMs are flexible generalisations of linear regres- sions, in which the response variables can have a non-normal error distribution model (Nelder and Wedderburn, 1972). Because logarithms are used in this linear form and it is impossible to take the logarithm of zero, the cases in which no people migrate between two municipalities should be processed before fitting the equation.

There are two options to solve this problem: disregard the migration data when no migrants move between two municipalities in a certain year, or modify all the measured migration data in such a way that all connections have some migrants. Because choosing the first approach would mean that information is lost on municipalities that did not attract migrants, we choose the last option. Every number of migrants is increased by two, as this value minimises the deviation of the model. A regression on this data resulted in Equation 2.3 (χ2(4,N = 4, 750, 471) = 1.4232 · 106, P ≤ 0.001).

p0.2433 · p0.2327 − · a b Ma→b = exp( 1.6175) 0.4760 + ϵa→b (2.3) ∆a→b

2.4 Expansion of the gravity model

The impact of the regional identities can be examined by expanding the gravity model with a categorical variable ι that is true if both municipalities have the same regional identity and false if they do not. Following this introduction, the linear version of the gravity model is also adjusted to Equation 2.4. The used parameters are shown in Table 2.2.

ln(Ma→b) = α · ln(pa) + β · ln(pb) − γ · ln(∆a→b) + δ + ι + ϵa→b (2.4)

5 Parameter Description

α Influence of pa on the number of migrants

β Influence of pb on the number of migrants

γ Influence of ∆a→b on the number of migrants δ = log(G) Normalisation constant of the regression function

ϵa→b Difference between the calculated number of mi- grants and Ma→b, different for each set of(a, b) ι = log(I) Increase in the number of migrants when both mu- nicipalities are located in the same identity region

∆a→b Distance between municipality a and municipality b

Ma→b Number of migrants between municipality a and mu- nicipality b

pa Population of municipality a

pb Population of municipality b

Table 2.2: Different parameters used in the extended gravity model

2.4.1 Specification of regional identities In order to incorporate regional identities into the model, these should first be specified by form- ing identity regions. To be able to comprehend both the importance of identity in regions of certain sizes and the significance of choosing the right clusters of municipalities, three different sets of identity regions are used as shown in Table 2.3.

The first two sets of identity regions consist of administrative regions, designed to compare regions of certain sizes within the EU. The first set consists of the twelve NUTS 2 regions or provinces (Eurostat, 2013), the second set of forty NUTS 3 regions or COROP regions (Centraal Bureau voor de Statistiek, 2015; Eurostat, 2013). Because of their administrative origin, these regions are easy to get hold of, but might not be fully accurate. Regions within both sets must have a minimal number of residents to allow for accurate statistic comparison (Eurostat, 2018). Because it is not guaranteed that every existing identity region has enough residents, this could mean that different smaller identity regions were combined to reach this population threshold.

The third set of identity regions is manually specified through a literature study. It consists of seventy long-standing historical identity regions. Details on these regions are found Ap- pendix B. Even though this specification process is a complex and demanding task, it often also is a necessity. Prespecified administrative regions are not always available.

Data set Regions Specification

NUTS-2 (Provinces) 12 Eurostat, 2013 NUTS-3 (COROP regions) 40 Centraal Bureau voor de Statistiek, 2015; Eurostat, 2013 Literature study 70 Various literature sources, as specified in Appendix B

Table 2.3: Different sets of identity regions embedded in the human migration model.

6 2.4.2 Introduction of the different sets of identity regions Using the same data as before, new models can be fitted on each of the three predefined setsof identity regions. The formula fitted on the NUTS 2 regions is shown in Equation 2.5 (χ2(4,N = 4, 750, 471) = 1.4028 · 106, P ≤ 0.001), the formula generated using the NUTS 3 regions in Equation 2.6 (χ2(4,N = 4, 750, 471) = 1.2729 · 106, P ≤ 0.001) and the formula that is based on the regions defined through literature in Equation 2.7 (χ2(4,N = 4, 750, 471) = 1.2712 · 106, P ≤ 0.001). This means that the deviance is decreased by respectively 1.4%, 10.6% and 10.7%. In all cases the influence of identity was significant (P < 0.001).

p0.2452 · p0.2343 − · a b · [region(a)=region(b)] Ma→b = exp( 2.0427) 0.3987 exp(0.3824) + ϵa→b (2.5) ∆a→b

p0.2491 · p0.2361 − · a b · [region(a)=region(b)] Ma→b = exp( 2.3727) 0.3373 exp(1.2482) + ϵa→b (2.6) ∆a→b

p0.2499 · p0.2383 − · a b · [region(a)=region(b)] Ma→b = exp( 2.3398) 0.3493 exp(1.5360) + ϵa→b (2.7) ∆a→b In these three equations two variables have changed by more than one tenth: γ and δ. The δ variable would previously have contained part of the ι variable, and is thus likely to have a lower value when the ι variable is introduced. Likewise, the value of γ is lowered because the variable would no longer have to account for the effects that identity has on shorter distance migration.

2.5 Comparison of the importance of identity in different sets of regions

The ι values found cannot be compared one on one, because the values of the α, β, γ and δ parameters differ as well. This means that we should find another way of comparing different identity regions. By analysing the differences in the values of the ϵ variables of the basic gravity model specified in Equation 2.3, we could do just that.

These ϵ variables could only be analysed to find the minimal influence of identity, as wejust argued that the δ and γ already partially compensate for the effects that regional identity has on the human migration network. To be able to determine this minimal influence, the ϵ values are first split into two different categories as specified in Equation 2.8. { ϵ , if municipalities in the same identity region ϵ = in (2.8) ϵout, if municipalities not in the same identity region

A two-sample Kolmogorov–Smirnov test between the distribution of all ϵin values and the distri- bution of all ϵout values reveals that both distributions are not the same for each of the three sets of prespecified identity regions (p < 0.001) (Kolmogorov, 1933). This means that there still are differences between interregional and intramunicipal migration flows that cannot be explained by the basic gravity model.

As a tool to compare the influence of identity in different municipalities, the Identity Com- parison Measure or ICM is specified in Equation 2.9. This measure uses these ϵ values. The ICM value takes on positive values when people are more likely to move towards municipalities that are located in the same area than towards municipalities that are not, and negative values if this is not the case. ICM = avg(ϵin) − avg(ϵout) (2.9) This ICM value can thus tell us about the difference between the variance that can be explained in the intraregional migration flows, and the explained variance in the interregional migration

7 data. A positive ICM value indicates that the model has more difficulties to explain the intrare- gional migration behaviour than it has to explain the interregional migration behaviour.

An ICM value can be calculated for each separate municipality. This is done by only using all ϵin and ϵout values for all migration flows originating in that particular municipality. Which migration flows are part of the intraregional migration figures and which flows are partofinter- regional migration figures depends on the used identity regions.

A comparison between these different ICM values can give an indication as to what regions contain stronger identities, or whether the municipality is part of the right identity region. This does not imply that an ICM value can be translated directly to the influence regional identity has on migration. Instead, the ICM value defines the unexplained differences in the remaining deviance that could not be included in the γ and δ variables.

2.6 Creation of other sets of identity regions

When an ICM value is positive, it cannot directly be concluded that the predefined identity regions have a real influence on the human migration decision. Positive ICM values mightalso occur in randomly generated regions, just because the municipalities within each region are located in proximity to one another. This possibility can be excluded by further comparing the ICM values of the predefined identity regions with the average ICM values generated by thesame number of randomly generated regions.

2.6.1 Randomly generated regions A set of random regions can be generated by assigning each municipality to a random region, while making sure that every region consists of at least two municipalities. Such a set of regions is expected to have an mean ICM value of zero, because random combinations of municipalities are not likely to hold the same identity. This also means that the ι variable in the extended gravity model would be close to zero.

2.6.2 Randomly generated spatially clustered regions Using randomly generated regions might however not be realistic. In real life, identity regions are not scattered all over the country, but present in a cluster of municipalities. A more realistic approach to generating random regions would thus enforce that municipalities located within a region should at least form a spatial cluster together.

The k-means algorithm is used to create such spatial clusters of municipalities (MacQueen, 1967). Instead of randomly assigning each municipality to a region, a random centre point is as- signed to each region. Each municipality is then assigned to the closest centre point, after which the centre point of each region becomes the geographical centre of the municipalities belonging to that region. Some municipalities might then be located closer to the centre point of another region, which means that the that municipality is relocated to that other region. This process is repeated until no more municipalities are reassigned to another region. A visual representation of this algorithm is shown in Figure 2.2.

8 Figure 2.1: Visual representation of the steps taken to assign municipalities to spatially clustered regions using the k-means algorithm. This algorithm can be applied to generate different numbers of regions. This number of regions is controlled by the variable k. To be able to compare the generated regions with a certain set of identity regions, this k is set equal to the number of regions present in this set of identity regions.

When this algorithm is applied the generated regions are likely to partly overlap with the identity regions, because both types of regions are specially clustered. This means that the ICM values become positive, and the ι values larger than they were in the randomly generated regions.

2.7 Optimisation of a set of regions

It could furthermore be interesting to see whether a set of regions can be optimised further. If a certain predefined set of regions cannot be further optimised, or the ICM values of the optimised randomly spatially clustered regions are lower or similar to the ICM value of the predefined region, we would expect that these predefined regions are well defined.

When a set of regions is optimised, we want to increase the average ICM value. When this value is increased, the differences in the predictive value of the model between the interregional and intraregional migration are enlarged. As this difference becomes larger, the strength of the identity contained within the defined regions also becomes larger. Under the constraint that every resulting region has at least two municipalities, we specified an algorithm to increase the ICM values in Algorithm 1. Without this constraint it would become impossible to calculate the ICM value.

9 Data: a set of regions, each containing at least two municipalities Result: a set of regions with a better average ICM value than before initialise current regions filled with municipalities; do initialise new regions empty; for every municipality in the Netherlands do determine current region; determine the regions neighbouring the municipality; determine optimal region using Equation 2.10; if optimal region different than current region and current region will have at least two municipalities in the new regions and fifty percent chance then add municipality to the optimal region in the new regions; else add municipality to the current region in the new regions; end end current regions become the new regions; while not every municipality in optimal region and these municipalities are not the same for the last five iterations; Algorithm 1: Optimisation algorithm used to increase the average ICM value for a set of regions.

The function that is used to find the optimal region for a certain municipality is shown inEqua- tion 2.10. In this function the notation Ma→b,year is used to denote that only migration data for that year is used in the M function. This allows us to take the median value of the corrected migration data, which means that outlier data will have very little influence on the optimisation process. In this formula the effects of a municipality relocation are evaluated by looking atthe change in the ICM values of the municipalities in that region, and the change in the ICM value of the municipality itself.

Because identity regions are usually not scattered all over the country, this optimisation al- gorithm is limited to only assign a municipality to one of the regions that also contains a neigh- bouring municipality. 1 value for optimal region(a) = max( · |{b | ∀ municipality b ∈ region | a ≠ b} (2.10) Σ{med({Ma→b,year|∀year ∈ years}) + med({Mb→a,year|∀year ∈ years}) | ∀ municipality b ∈ region})})

In order to prevent the algorithm from creating a deadlock situation, there is only a fifty percent chance of reassigning a municipality to the determined optimal region, given that there will still be two municipalities left in the region the municipality belonged to. When this probability is not introduced, situations can occur in which two municipalities that should be in the same region can never end up together. Once all municipalities are located in their optimal region or the same set of municipalities is relocated for five iterations, the municipality relocation process is ended.

The resulting region configuration is a local optimum. Because there are many of such local optima, this means that the algorithm will have to be executed several times to find the optimal configuration that can be reached from a particular starting configuration. Because thealgo- rithm does not accept changes that lower the ICM value, this does not necessarily mean that the optimal configuration of regions can be reached. As a result, we cannot say for sure thatthe most optimal local optimum accessible is in fact the global optimum.

10 Figure 2.2: A visual representation of the optimisation algorithm. Given a certain starting configuration, it is determined what regions are located within a distance of 20 kilometres from each existing municipality. For each of these municipalities, the change in the global average ICM value is measured when a municipality would be part of that region. Every municipality that should be part of another region than it already was, is then relocated with a chance of 50%. This process is repeated until no more municipalities are relocated for three iterations.

When the starting configuration is created through thorough research, we could argue thatthis resulting configuration could actually be the global optimum. It could after all be considered to be very unlikely that the optimal configuration would differ significantly from a well researched configuration. When more complicated methods are used to actually find the global optimum complications can arise, as further explained in Appendix C.

11 12 CHAPTER 3 The significance of the influence ofthe specified identity regions

The ICM values of the generated random, randomly spatially clustered, optimised randomly spatially clustered and optimised identity regions can be compared with the ICM value of the prespecified identity regions. By comparing these ICM value distributions we can determine whether the effects we attribute to regional identity could not be attributed to chance aswell.

In the following sections we will compare the distributions of mean an median values for each of the mentioned types of regions. Because the gravity model is dependent on various variables that are estimated from the data, as well as an estimated distance variable, we have to test whether our conclusions will also hold if these parameters are slightly varied. In each case, the α, β, γ and ∆ variables are varied by 10% to support the conclusions we draw on the differences that exist between the ICM values of the randomised spatially clustered regions and the ICM value of the predefined regions.

Besides these distributions of average ICM values, each of these single data sets also has a geographical distribution of ICM values. A closer examination of these ICM values can help in identifying municipalities that are located in the wrong region beforehand, or help understanding the effects of the used algorithms. The geographical distributions of the data sets areincluded in Appendix D.

13 3.1 Differences in the mean ICM values

As seen in Figure 3.1, the ICM value of the NUTS 2 regions lies within the 95% confidence interval of the ICM values of the randomly spatially clustered regions. This means that the NUTS 2 regions cannot be distinguished from the randomly spatially clustered regions. Despite this, Figure 3.2 shows that the ICM value of the NUTS 2 regions remains higher than the average ICM value of the randomly spatially clustered regions, even when parameters change (0.96 standard deviations under default parameters).

Figure 3.1: The mean ICM values for 250 sets of twelve randomly generated regions, 250 sets of twelve randomly generated spatially clustered regions, fifty sets of twelve optimised randomly generated spatially clustered regions and fifty sets of optimised NUTS 2 regions, compared to the ICM value of the original NUTS 2 regions.

Figure 3.2: Whatever changes in parameters we make to the human migration model, the mean value of the ICM values of the randomly generated spatially clustered regions is always signifi- cantly lower than the ICM value of the NUTS 2 regions. For each of the parameter configurations, thirty different randomly generated spatially clustered regions were generated.

14 As seen in Figure 3.3, the ICM value of the NUTS 3 regions is significantly higher than the ICM values of the randomly spatially clustered regions. This means that the NUTS 3 regions can easily be distinguished from the randomly spatially clustered regions. This large difference is also shown in Figure 3.4. Even when parameters in the model change, the difference between the randomly spatially clustered regions and the predefined NUTS 3 regions is high (9.19 standard deviations under default parameters).

Figure 3.3: The mean ICM values for 250 sets of forty randomly generated regions, 250 sets of forty randomly generated spatially clustered regions, fifty sets of forty optimised randomly generated spatially clustered regions and fifty sets of optimised NUTS 3 regions, compared to the ICM value of the original NUTS 3 regions.

Figure 3.4: Whatever changes in parameters we make to the human migration model, the mean value of the ICM values of the randomly generated spatially clustered regions is always signifi- cantly lower than the ICM value of the NUTS 3 regions. For each of the parameter configurations, thirty different randomly generated spatially clustered regions were generated.

15 As seen in Figure 3.5, the ICM value of the literature defined regions is also significantly higher than the ICM values of the randomly spatially clustered regions. Though the difference is smaller than it was in the NUTS 3 region comparison, we still find that the there are large differences between the average ICM value of the randomly spatially clustered regions and ICM value of the literature defined regions (2.90 standard deviations under default parameters). Even when parameters in the model change, the differences between the randomly spatially clustered regions and the predefined NUTS 3 regions are high, as shown in Figure 3.6.

Figure 3.5: The mean ICM values for 250 sets of seventy randomly generated regions, 250 sets of seventy randomly generated spatially clustered regions, fifty sets of seventy optimised randomly generated spatially clustered regions and fifty sets of optimised literature regions, compared to the ICM value of the original regions specified through literature.

Figure 3.6: Whatever changes in parameters we make to the human migration model, the mean value of the ICM values of the randomly generated spatially clustered regions is always signif- icantly lower than the ICM value of the literature defined regions. For each of the parameter configurations, thirty different randomly generated spatially clustered regions were generated.

16 3.2 Differences in the median ICM values

Whereas the mean ICM value of the NUTS 2 distribution was located within the distribution of the mean ICM values of the randomly generated spatially clustered regions, the median ICM value of the distribution is lower than all the median ICM values of the randomly generated spatially clustered regions. A comparison of these median values is shown in Figure 3.7. This figure also shows that the median values of the optimised NUTS 2 regions are generally lower than the median values of the optimised randomly generated spatially clustered regions.

On the other hand, the median values of the NUTS 3 distributions and literature defined regions are located within the distribution of median ICM values of the randomly generated spatially clustered regions. As can be seen in Figures 3.8 and 3.9, the median ICM values of both optimised distributions are indistinguishable as well.

Figure 3.7: The median ICM values for 250 sets of twelve randomly generated regions, 250 sets of twelve randomly generated spatially clustered regions, fifty sets of twelve optimised randomly generated spatially clustered regions and fifty sets of optimised NUTS 2 regions, compared to the ICM value of the original NUTS 2 regions.

17 Figure 3.8: The median ICM values for 250 sets of forty randomly generated regions, 250 sets of forty randomly generated spatially clustered regions, fifty sets of forty optimised randomly generated spatially clustered regions and fifty sets of optimised NUTS 3 regions, compared to the ICM value of the original NUTS 3 regions.

Figure 3.9: The median ICM values for 250 sets of seventy randomly generated regions, 250 sets of seventy randomly generated spatially clustered regions, fifty sets of seventy optimised randomly generated spatially clustered regions and fifty sets of optimised literature regions, compared to the ICM value of the original regions specified through literature.

18 3.3 Sensitivity to parameter changes in the optimisation algorithm

Within the optimisation algorithm one parameter is used, which is arbitrarily chosen: the dis- tance parameter. In the design of this algorithm, the decision was made that every municipality could only be relocated to a region that was located within a twenty kilometre distance of that municipality. It can be argued that this distance cut-off makes sense, because the maximum lowest distance between two municipalities is larger than ten kilometres but lower than twenty kilometres. It would however be even better to test the differences in the outcomes of the op- timisation algorithm when different distance cut-offs are used. We therefore test whether the difference between the ICM value distributions of the optimised randomly spatially clustered regions and the ICM value distributions of the optimised predefined regions differs significantly when this distance cut-off is changed to either ten or thirty kilometres.

Figure 3.10: Differences between the mean ICM values of the optimised randomly spatially clustered regions and the mean ICM values of the optimised predefined regions for each predefined set of regions, using three different distance cut-offs in the optimisation algorithm. For eachof the parameter configurations, thirty different optimised regions were generated.

As the cut-off distance in the optimisation algorithm changes, we see inFigure 3.10 that this has different effects on the differences between the mean ICM values of the optimised randomly spatially clustered regions and the mean ICM values of the optimised predefined regions for each predefined set of regions. For the NUTS 3 regions, we see that the difference betweenthe average mean ICM value is large and the optimised predefined regions have larger mean ICM values, regardless of the applied cut-off distance. In case of the NUTS 2 and literature defined regions, we see that the choice of a different cut-off distance has an effect on the conclusions we could draw when the cut-off distance was twenty kilometres. When the cut-off distance is set to ten kilometres, the optimised NUTS 2 regions have a lower mean ICM value than the optimised randomly spatially clustered regions did. Perhaps the distance is too small to actu- ally allow all municipalities to move. On the other hand, an increase of the cut-off distance to thirty kilometres cause the mean ICM value of the optimised literature defined regions to be- come similar to the mean ICM value of the optimised randomly spatially clustered regions. This might be explained by the fact that the literature defined regions are very small, and a large cut- off distance could cause cases in which not all municipalities in a region are located in onecluster.

As seen in Figure 3.11, the median ICM value of the optimised predefined sets of regions is generally lower than the median ICM value of the optimised randomly spatially clustered re-

19 gions. As the cut-off distance increased, the difference between the optimised literature regions and the randomly generated spatially clustered regions became worse. Considering that the median ICM value of the literature regions was not better when we started the algorithm, this can likely only be explained by the fact that most ”optimal changes” for the literature regions are located within a ten kilometre radius, while this is not the case for the randomly gener- ated spatially clustered regions. Even though we cannot really reach a real definite conclusion or interpretation of the NUTS 3 data, we see the same pattern emerge in the set of NUTS 2 regions.

Figure 3.11: Differences between the median ICM values of median ICM values of the optimised randomly spatially clustered regions and the optimised predefined regions for each predefined set of regions, using three different distance cut-offs in the optimisation algorithm. For eachofthe parameter configurations, thirty different optimised regions were generated.

20 CHAPTER 4 Discussion

The initial expansion of the gravity model with three different sets of identity regions gave us three different equations that showed how the influence of these identity regions should bein- corporated into the model. As shown in Equation 2.5, individuals were 1.46 times more likely to move towards a location if people in that location live in the same NUTS 2 region. The introduction of these NUTS 2 regions to the gravity model decreased the deviance of the model by 1.4%.

When the NUTS 3 and literature defined regions are added, a much larger effect is seen. Asseen in Equations 2.6 and 2.7, people are respectively 3.48 and 4.65 times more likely to move to a certain location when taking these regions into consideration. After the introduction of these re- gions to the gravity model, the deviance of the model respectively decreased by 10.6% and 10.7%.

It thus seems that the identity regions should be small to be able to actually contribute to better predictions. This idea is further supported by the mean ICM value comparison graphs shown in Figures 3.1, 3.3 and 3.5. Whereas the average ICM values of the NUTS 2 regions could not be distinguished from the randomly spatially clustered regions, the average ICM values of the smaller NUTS 3 and literature defined regions could. These conclusions stayed the same when the different parameters of the model were altered by 10%.

Even though the maximal ICM value of all three sets of optimised identity regions is higher than the maximal ICM value of their optimised randomly generated spatially clustered counter- parts, we find that the distributions are sensitive to changes in the distance cut-off usedinthe optimisation algorithm. In Figure 3.10 we see that only the mean value of the optimised NUTS 3 regions is always higher. When the median values are compared, we also find that the NUTS 2 regions perform worst: the median ICM value is, regardless of the cut-off distance chosen, lower than the median ICM value of the optimised randomly generated spatially clustered regions.

Considering all the different tests and the increased explanatory value of the extended mod- els, we accept H1. The specified identity regions do have a significant influence on thehuman migration behaviour. While this also applies to the used NUTS 2 regions, we would recommend to use smaller sized regions whenever this is possible. When these smaller regions were applied, the models became better at predicting the migratory movements, and the effects of the prespec- ified regional identities became distinguishable from the randomly spatially clustered regions. We could therefore also accept H3: smaller regional identities can explain the anomalies in local migration behaviour better than the larger regional identities.

When the three different sets of identity regions were introduced to the gravity model the γ parameter which controls the way distance has influence on the number of migrants, changed. It decreased from 0.4760 in the original model to 0.3987 (NUTS 2 regions), 0.3373 (NUTS 3 regions) and 0.3493 (literature defined regions). This is not likely to have a very large effect on theaverage

21 number of people that migrate over smaller distances, because part of these migration numbers are also increased by the i parameter when both locations are located in the same identity region.

The of the γ parameter is more important in longer distance migration: as this parameter is lowered, the number of migration movements over longer distances drops. Comparing the γ parameters of the basic gravity model (0.4760) and the gravity model extended with NUTS 3 identity regions (0.3373), we find that the model estimates for the number of migrants over a distance of 100 kilometre are seven times higher in the original model than in the model with the NUTS 3 identity regions. This means that we should accept H2, as the introduction of regional identities seems to have a great impact on the way that distance is handled in the model.

To be able to reach these conclusions, we had to make several decisions about the way the model, ICM-value and identity regions are specified. In the following sections we will discuss those decisions, and analyse the quality of the used identity regions.

4.1 Specification of the model

The migration data could have been explained using either a gravity or a radiation model. Because it would be easier to understand the influence of identity on the way distance is incor- porated in the model when this distance is explicitly included, we chose to use a gravity model. A radiation model would make this analysis more complicated than necessary.

In this gravity model the distances between two municipalities are approximated by taking the geographical distance between the centres of these municipalities. This is a assumption that works for the Netherlands, because it’s flat and there are a lot of roads and bridges -except maybe for two municipalities that are located at opposite shores of the IJsselmeer. When travel times do not necessarily correlate with the geographical distance, it might be better to find the actual travel times between two locations instead - taking into account the location of the pop- ulation centres within those municipalities as well.

The population parameters in the model were fitted using the intermunicipal and intramunicipal migration data that is available from Statistics Netherlands. In some cases no-one had migrated between two certain municipalities in a certain year. This data could not be excluded, as that would mean a lot of longer distance migration data could not be used in the regression. To solve this problem we have added two migrants to every migration flow, as this minimised the deviance of the original model.

4.2 Specification of the ICM value

The introduction of a set of identity regions to a migration model does not only create a ι pa- rameter for the model, but also has an influence on the other existing parameters. Because the other parameters are not the same for each set of identity regions, the ι parameters cannot be compared directly. We have introduced the ICM value to solve this problem. For each munici- pality, this value represents the ratio between the unexplained variance of the intraregional and interregional migration after applying the basic model.

While this ICM value makes the different distributions and configurations comparable, itcan be confusing that the ICM value does not directly match the influence of identity on migration itself. Only the ι parameter in the extended model can be used to determine that influence.

4.2.1 Other influences that can contribute to the ICM value Even though the ICM value was designed to be able to compare the effects of the identity regions, it could be that other factors also contribute to the differences in the remaining variance. These factors would attract relatively more migrants on a regional scale, than on a interregional scale.

22 Pull factors that have an equal effect on people living with in the identity region and outside of the identity region, should have no effect on the ICM value. We would expect that most amenities such as industrial areas, beautiful nature, and housing have such equal effects.

Most of the influences that increase the ICM value will be part of the regional identity itself. An important factor in migration that might also increase the ICM values of municipalities, is the economic motivation behind migration. When the industry in a region is highly specialised, we might find that people are more likely to move within their region than outside theregion because they want to keep working in the same industry.

The presence of industries could also have the opposed effect: when employment possibilities in a certain identity region are very low, people could decide to migrate towards another region where they can get a job. In such cases a lot of people can move out of the identity region, thus decreasing the ICM values. A similar situation occurs when there are only few or no universities available in a region. As education possibilities in the identity region are very low, people could decide to move somewhere they can get the education they want.

When people are forced to migrate, this can have a negative effect on the ICM values of re- gions. Good examples of forced migration are hard to come by - it does not happen very often. But when hundreds of newly registered asylum seekers are relocated to another immigration centre or factories are relocated and employees have to move, the effects will be hard to ignore.

Even though the ICM value is not only influenced by the effects of identity itself, we expect that the positive and negative effects created by other factors are very limited and not very prevalent. We therefore do not expect these factors to have a high influence on the average ICM value of all regions.

4.3 Identity regions

In this research we used three different sets of identity regions. Two of these sets are specified and used by the government to analyse the developments of regions, and one set was specified through literature research.

There are clear benefits in using hard-bounded identity regions to incorporate these identities in a model. The regions can easily be specified through extensive literature research, and canbe applied almost everywhere. After discussing the quality of the used identity regions, we will also discuss the challenges in using these identity regions, and suggest other approaches that can be used to include regional identity in the model. The main problem with these approaches, is that they often require more data to be applied: data that is not always present.

4.3.1 Optimisation technique In the optimisation technique that is used to find better configurations of identity regions we made the decision to only introduce changes that increase the ICM value. Without this decision, municipalities could be added to the wrong regions as well. This would in turn make it more difficult to optimise the locations of other municipalities.

This decision comes at a risk: there is a chance that another subset of all possible combina- tions of regions is not searched, even though the optimal combination of regions is located within that subset. This would mean that the most optimal configuration could necessarily be reached when the prespecified regions are used as a starting point. To be able to reach that optimum, sets of randomly generated spatially clustered regions would have to be used.

On the other hand, Figures 3.3 and 3.5 show that the average ICM values of the optimised prespecified identity regions are generally higher than the optimised randomly generated spa- tially clustered regions. In case of the literature regions these distributions did not even overlap.

23 This would imply that the ICM values of these optimised prespecified regions could be very near the global optimal average ICM value, and that the prespecified regions are a better starting position to find an optimal configuration than an optimised randomly spatially clustered region.

In Figure 3.3 we also notice something strange: the optimisation algorithm actually decreased the average ICM value of the NUTS 3 regions. Because we use a local optimisation strategy to increase the average ICM value, the optimisation algorithm increased the median ICM value instead, as shown in Figure 3.8. While this shows that the algorithm does work on a local scale, the algorithm might be improved further to work better on a global scale.

4.3.2 Quality It is very difficult to determine the exact quality of the used predefined regions. Quality canafter all mean different things in different contexts. It could be more useful to describe the qualityof these regions instead.

The lack of quality of the predefined regions lies mostly within the fact that several munici- palities can be relocated in such a way that their negative ICM values become positive. It seems that the allocations of these municipalities are obvious flaws. Apart from that, other munici- palities that do have positive ICM values can be relocated to increase the average ICM value even further. Comparisons between the predefined regions and optimised regions are shown in Figures 3.1, 3.3 and 3.5.

Whether the fact that the predefined regions can still be optimised is a lack of quality could be a topic of debate. After all, all the predefined regions have higher average ICM values than the randomly generated spatially clustered regions. It could therefore be argued that these sets of predefined regions are defined in a quite unique way, as most configurations with similar regions have a much lower ICM value.

The fact that the average ICM value of the optimised randomly generated spatially clustered regions could in some cases become higher than the original average ICM value of the corre- sponding set of regions could matter. It would mean that it would be very easy to define regions that perform better than the regions created beforehand. This is the case with the NUTS 2 regions as shown in Figures ?? and 3.10. This supports the earlier presented evidence that the NUTS 2 regions might not be the ideal regions to use when incorporate regional identities in a human migration model.

4.3.3 Challenges in the usage of identity regions The most important challenge in the usage of these identity regions, is that in practice, identity regions are usually not hard bounded. Some municipalities could be part of multiple identities, and in other municipalities there might be small minority groups of people that hold another identity. A simple approach to this problem would be to introduce fuzzy boundaries. In this case, municipalities that are located close to a certain region are assumed to house people that also have the same regional identity as people living in that region.

A more sophisticated approach to this problem would be to research the social connectivity between municipalities. By doing this, all different identities that make up a certain munic- ipality could accurately be represented. This would create an identity network as shown in Figure 4.1. It could however be difficult to use this approach, because detailed data onevery individuals social connections has to be acquired. Such data is often hard to acquire or not available at all. As result, this way of incorporating regional identities is not applicable in most situations.

24 Figure 4.1: Three different approaches to defining identity region: with hard boundaries, fuzzy boundaries, or by looking at the existing connections between two municipalities.

Another challenge with using a binary approach to regional identity arises when those regions are embedded into a model. The number of migrants is multiplied by a constant factor when two municipalities are located in the same region, but the strength of a regional identity could actually differ in the different identity regions. People could either value their common identity differently, or might hold multiple identities at once. Unfortunately these differences aredifficult to estimate beforehand.

Neither is it very clear what many identity regions should be used in a model. Finding out how many regions should actually be present in the data set can be a challenge as well. There will not always be one right answer: related regional identities could be combined or split, as ev- idenced by the relatively small differences in ICM values between the set of NUTS 3 regions and the regions defined through literature research. As long as the used regions are well researched, they can be used.

25 26 CHAPTER 5 Conclusions

All in all, we have extended a basic gravity model for human migration with the notion of iden- tity regions. This model was fitted using all Dutch intramunicipal and intermunicipal migration data between 1996 and 2016. The three different sets of identity regions used in this research all had a significant effect on the model, and decreased the deviance. When the NUTS3and literature defined identity regions were added to the model, people seemed to be respectively 3.48 and 4.65 times more likely to move to a certain location when it was located within the same identity region.

By choosing to model the identity regions with strict boundaries, we created a way of incor- porating identity regions that can easily be used in other models as well. For each model it should be possible to create a set of identity regions through literature research. This means that the influence of regional identity on the human migration behaviour can no longer bene- glected, and should be included in future human migration models.

We have also shown that it would be better to use smaller sized region whenever that is possible. In this research the larger NUTS 2 regions could not decrease the deviance of the model very well (1.4% versus 10.6% and 10.7%), nor were the effects caused by the regions distinguishable from the effects caused by the same number of randomly generated spatially clustered regions.

When the identity regions were added to the gravity model, this had an effect on the way distance was handled. Instead of dividing the potential number of migrants by distance0.4760, the models using the NUTS 3 and literature defined identity regions divided the potential num- ber of migrants by distance0.3373 and distance0.3493. This is a very interesting find. Because a lot of municipalities that are located within proximity of one another are located within the same identity region, the predicted migration numbers between those municipalities are often increased by the identity factor in the equation. This means that on average, the total number of people that migrate over smaller distances will not differ much when these numbers are compared with the basic gravity model. This is not the case for longer distance migration. In these migration flows, the identity aspect does not enlarge part of the calculated migration numbers. Thismeans that the original model can predict up to seven times more migrants over one hundred kilometre distances than the extended models.

27 28 CHAPTER 6 Future work

Because regional identity seems to be an important factor in the migration decision, it would be interesting to expand this research to radiation models as well, or add in more variables that are already used in other models.

It could also be interesting to look at the influence of regional identity in different circumstances, as we only looked at the influence of regional identity on recent internal Dutch migration figures. It would be interesting to see what would happen in more segregated societies, or in societies where most people travel a lot.

Additionally, it could also be interesting to look at the way the influence of identity has evolved over time. By looking at internal migration data over a hundred year time period, changed could be detected that could then be mapped to changes that occurred in a certain society. This might help in understanding those regional identities a bit better.

Apart from these different applications, the current application could be further improved by using median or quantile values in the ICM function, instead of medium values. In this particular case, this was not possible because the number of data points was too small for certain regions, and there were too many zero-migration data points in the data set. When larger regions are used over larger time periods this additional research might increase the predictive value of the ICM value.

This does not mean that this research cannot be improved upon. The optimisation algorithm did not always fully work as intended, and could thus be bettered. In the current situation, the average ICM value of the literature defined regions dropped when the algorithm was applied. Furthermore, it might be interesting to adjust the algorithm to allow parts of regions to move towards other regions, instead of single municipalities. By doing this, pairs of municipalities could be relocated more efficiently.

29 30 Bibliography

van der Aa, A.J. (1839). Aardrijkskundig woordenboek der Nederlanden: eerste deel, A. Gor- inchem: Jacobus Noorduyn. — (1840a). Aardrijkskundig woordenboek der Nederlanden: derde deel, C en D. Gorinchem: Ja- cobus Noorduyn. — (1840b). Aardrijkskundig woordenboek der Nederlanden: tweede deel, B. Gorinchem: Jacobus Noorduyn. — (1843). Aardrijkskundig woordenboek der Nederlanden: vierde deel, E-G. Gorinchem: Jacobus Noorduyn. — (1844). Aardrijkskundig woordenboek der Nederlanden: vijfde deel, H. Gorinchem: Jacobus Noorduyn. — (1845). Aardrijkskundig woordenboek der Nederlanden: zesde deel, I-K. Gorinchem: Jacobus Noorduyn. — (1846a). Aardrijkskundig woordenboek der Nederlanden: achtste deel, N. O. Gorinchem: Ja- cobus Noorduyn. — (1846b). Aardrijkskundig woordenboek der Nederlanden: negende deel. Gorinchem: Jacobus Noorduyn en zoon. — (1846c). Aardrijkskundig woordenboek der Nederlanden: zevende deel, L-M. Gorinchem: Ja- cobus Noorduyn. — (1847a). Aardrijkskundig woordenboek der Nederlanden: tiende deel, S. Gorinchem: Jacobus Noorduyn en zoon. — (1847b). Beschrijving van den Krimpenerwaard en Lopikerwaard. Schoonhoven: S. E. van Nooten. — (1848a). Aardrijkskundig woordenboek der Nederlanden: elfde deel. Gorinchem: Jacobus No- orduyn en zoon. — (1848b). Aardrijkskundig woordenboek der Nederlanden: twaalfde deel. Gorinchem: Jacobus Noorduyn en zoon. — (1851). Aardrijkskundig woordenboek der Nederlanden: dertiende en laatste deel, Z en aan- hangsel. Gorinchem: Jacobus Noorduyn. van Aitzema, L. (1664). Historie of Verhael van Saken van Staet en Oorlogh, In, ende omtrent de Vereenigde Nederlanden, Beginnende met het vervolch van ’t Jaer 1657 ende eyndigende met het eynde van ’t Jaer 1660. ’s-Gravenhage: Johan Vely. Anderson, J.E. (2011). “The Gravity Model”. eng. In: 3.1, pp. 133–160. issn: 1941-1383. Baardman, C. (1965). Het oude land van Brederode; Vianen, Lexmond, Ameide, Tienhoven, Meerkerk. ’s-Gravenhage: J.N. Voorhoeve. Bachiene, W.A. (1773). Beschryving der Vereenigde Nederlanden, De welke gevonden worden in het Werk van den Heer A. F. Busching, en uitmaakt het Vierde Deel van dat Werk. Amsterdam & Utrecht: Steven van Elsveldt en Holtrop, en Abraham van Paddenburg. — (1775). Beschryving der Vereenigde Nederlanden, De welke gevonden worden in het Werk van den Heer A. F. Busching, en uitmaakt het Vierde Deels Derde Stuk van dat Werk. Amsterdam & Utrecht: Steven van Elsveldt en Holtrop, en Abraham van Paddenburg. — (1777). Beschryving der Vereenigde Nederlanden, De welke gevonden worden in het Werk van den Heer A. F. Busching, en uitmaakt het Vierde Deels Vyfde Stuk van dat Werk. Amsterdam & Utrecht: De Wed. van Elsveldt en Holtrop, en Abraham van Paddenburg.

31 Backer, J.M.C. (1838). Iets over Gooiland. Amsterdam: Johannes Müller. Basisregistratie Kadaster (2016). Bestuurlijke Grenzen 2016. Accessed: 2018-06-23. url: https: / / www . pdok . nl / nl / producten / pdok - downloads / basis - registratie - kadaster / bestuurlijke-grenzen-historie. Bauer, Th. and K.F. Zimmermann (1997). “Network Migration of Ethnic Germans”. In: The International Migration Review 31.1, pp. 143–149. doi: 10.2307/2547262. Becker, G.S. (1975). Human Capital: A Theoretical and Empirical Analysis, with Special Reference to Education, Second Edition. NBER. van Bemmel, A. (1760). Beschryving der stad Amersfoort. Utrecht: Henrikus Spruyt. Berkvens, A.M.J.A., G.H.A. Venner, and G. Spijkerboer (1996). Het Gelderse land- en stadsrecht van het Overkwartier van Roermond 1620. Arnhem: Stichting tot Uitgaaf der Bronnen van het Oud-Vaderlandse Recht. isbn: 978-90-8005-124-9. Blauw, P.W. and C Pastor (1980). Terug naar de stad : een onderzoek naar de retour migratie van gezinnen met kinderen naar de grote stad. Vol. 3c. Rijksuniversiteit Utrecht, Geografisch Instituut. Bodvarsson, Ö.B. and H. van den Berg (2013). “The Determinants of International Migration: Theory”. In: The Economics of Immigration: Theory and Policy. New York: Springer Sci- ence+Business Media. Chap. 2, pp. 27–57. doi: 10.1007/978-1-4514-2116-0_2. Boekestijn, C. (1961). “Binding aan een Streek: Een Empirisch Onderzoek naar de Migratie en de Animo voor Migratie uit de Provincie Zeeland onder Jongere Arbeiders in Walcheren en Zuid-Beveland”. PhD thesis. Bouhuijs, I. and R. Meijer (2017). Adreskwaliteit 2016. ’s-Gravenhage: Centraal Bureau voor de Statistiek. Bouman, P.J. and W.H. Bouman (1967). De groei van de grote werkstad: Een studie over de bevolking van Rotterdam. Assen: Van Gorcum & Comp N.V. Busching, A.F. (1773). Nieuwe Geographie of Aardrijksbeschryving. Vierde Deels Eerste Stuk. Amsterdam en Utrecht: Steven van Esveldt en Abraham van Paddenburg. Castaing Gachassin, M. (2013). “Should I Stay or Should I Go? The Role of Roads in Migration Decisions”. In: Journal of African Economies 22.5, pp. 796–826. issn: 0963-8024. Centraal Bureau voor de Statistiek (2005a). Verhuisde personen tussen gemeenten, 1996. Ac- cessed: 2018-05-24. url: https : / / opendata . cbs . nl / statline / # / CBS / nl / dataset / 70928NED/table. — (2005b). Verhuisde personen tussen gemeenten, 1997. Accessed: 2018-05-24. url: https : //opendata.cbs.nl/statline/#/CBS/nl/dataset/37159/table. — (2005c). Verhuisde personen tussen gemeenten, 1998. Accessed: 2018-05-24. url: https : //opendata.cbs.nl/statline/#/CBS/nl/dataset/37457/table. — (2005d). Verhuisde personen tussen gemeenten, 1999. Accessed: 2018-05-24. url: https : //opendata.cbs.nl/statline/#/CBS/nl/dataset/37408/table. — (2005e). Verhuisde personen tussen gemeenten, 2000. Accessed: 2018-05-24. url: https : //opendata.cbs.nl/statline/#/CBS/nl/dataset/37533/table. — (2005f). Verhuisde personen tussen gemeenten, 2001. Accessed: 2018-05-24. url: https : //opendata.cbs.nl/statline/#/CBS/nl/dataset/70016NED/table. — (2005g). Verhuisde personen tussen gemeenten, 2002. Accessed: 2018-05-24. url: https : //opendata.cbs.nl/statline/#/CBS/nl/dataset/70653NED/table. — (2005h). Verhuisde personen tussen gemeenten, 2003. Accessed: 2018-05-23. url: https : //opendata.cbs.nl/statline/#/CBS/nl/dataset/70800NED/table. — (2005i). Verhuisde personen tussen gemeenten, 2004. Accessed: 2018-05-23. url: https : //opendata.cbs.nl/statline/#/CBS/nl/dataset/70948NED/table. — (2006). Verhuisde personen tussen gemeenten, 2005. Accessed: 2018-05-23. url: https:// opendata.cbs.nl/statline/#/CBS/nl/dataset/71213NED/table. — (2007). Verhuizing; van personen, tussen gemeenten, 2006. Accessed: 2018-05-23. url: https: //opendata.cbs.nl/statline/#/CBS/nl/dataset/71449NED/table. — (2008). Verhuisde personen tussen gemeenten, 2007. Accessed: 2018-05-23. url: https:// opendata.cbs.nl/statline/#/CBS/nl/dataset/71766NED/table. — (2009). Verhuisde personen tussen gemeenten, 2008. Accessed: 2018-05-24. url: https:// opendata.cbs.nl/statline/#/CBS/nl/dataset/80225ned/table.

32 — (2010). Verhuisde personen tussen gemeenten, 2009. Accessed: 2018-05-24. url: https:// opendata.cbs.nl/statline/%5C#/CBS/nl/dataset/80525ned/table. — (2011). Verhuisde personen tussen gemeenten, 2010. Accessed: 2018-05-23. url: https:// opendata.cbs.nl/statline/#/CBS/nl/dataset/81333ned/table. — (2015). Indeling van Nederland in 40 COROP-gebieden: Gemeentelijke indeling van Nederland op 1 januari 2015. — (2017). Tussen gemeenten verhuisde personen. Accessed: 2018-04-03. url: https://opendata. cbs.nl/statline/#/CBS/nl/dataset/81734NED/table. — (2018a). Bevolking op 1 januari en gemiddeld; geslacht, leeftijd en regio. Accessed: 2018-04-30. url: https://opendata.cbs.nl/#/CBS/nl/dataset/03759ned/table. — (2018b). Gemeentelijke indeling op 1 januari 1996. Accessed: 2018-06-18. url: https:// www.cbs.nl/nl-nl/onze-diensten/methoden/classificaties/overig/gemeentelijke- indelingen - per - jaar / indeling % 20per % 20jaar / gemeentelijke - indeling - op - 1 - januari-1996. — (2018c). Gemeentelijke indeling op 1 januari 2016. Accessed: 2018-06-18. url: https://www. cbs.nl/nl- nl/onze- diensten/methoden/classificaties/overig/gemeentelijke- indelingen - per - jaar / indeling % 20per % 20jaar / gemeentelijke - indeling - op - 1 - januari-2016. — (2018d). Verhuisde personen; binnen gemeenten, tussen gemeenten, regio. Accessed: 2018-06- 01. url: https://opendata.cbs.nl/statline/#/CBS/nl/dataset/60048ned/table. de Cock, J.K. (1980). Bijdrage tot de historische geografie van Kennemerland in de Middeleeuwen op fysisch-geografische grondslag. Arnhem: Gysbers & van Loon. isbn: 90-6235-033-X. Cummins, M. (2009). “Revisiting the Migration-Development Nexus: A Gravity Model Ap- proach”. In: url: https://econpapers.repec.org/paper/hdrpapers/hdrp-2009-44.htm. DaVanzo, J. (1978). “Does unemployment affect migration? - Evidence from micro data”. In: Review of Economics and Statistics 60. issn: 0034-6535. — (1983). “Repeat Migration in the United States: Who Moves Back and Who Moves On?” In: The Review of Economics and Statistics 65.4, pp. 552–59. De Aardbol. Magazijn van Hedendaagsche Land- en Volkenkunde. Derde deel: De Nederlanden (1841). Amsterdam: J. H. Laarman. Dekker, E. et al. (2000). Ach lieve tijd: West-Friesland. Zwolle: Waanders Uitgevers. isbn: 90- 400-1043-9. Doedens, A. and J. Houter (2015). Geschiedenis van de Wadden: de CANON van de Waddenei- landen. Zutphen: WalburgPers. isbn: 978-90-5730-429-3. Eiter, S. and K. Potthoff (2016). “Landscape changes in Norwegian mountains: Increased and decreased accessibility, and their driving forces”. In: Land Use Policy 54, pp. 235–245. issn: 0264-8377. doi: 10.1016/j.landusepol.2016.02.017. Ethington, P.J. (1997). “The intellectual construction of” Social Distance”: Toward a recovery of Georg Simmel’s social geometry”. In: Cybergeo: European journal of geography. doi: 10. 4000/cybergeo.227. Eurostat (2013). NUTS 2 regions in the Netherlands, 2010 and 2013. — (2018). Principles and Characteristics. Accessed: 2018-05-07. url: http://ec.europa.eu/ eurostat/web/nuts/principles-and-characteristics. Geist, Claudia and Patricia McManus (2012). “Different Reasons, Different Results: Implications of Migration by Gender and Family Status”. eng. In: Demography 49.1, pp. 197–217. issn: 0070-3370. doi: 10.1007/s13524-011-0074-8. George, G. and B. Rhodes (2017). “Is there a financial incentive to immigrate? Examining of the health worker salary gap between India and popular destination countries”. In: Human Resources for Health 15.1. issn: 1478-4491. doi: 10.1186/s12960-017-0249-5. Glick, Paul C. (1947). “The Family Cycle”. eng. In: American Sociological Review 12.2, pp. 164– 174. issn: 0003-1224. doi: 10.2307/2086982. van Goor, T.E. (1744). Beschryving der Stadt en Lande van Breda. ’s-Gravenhage: Jacobus van den Kieboom. Graham, D.J. (2007). “Agglomeration, Productivity and Transport Investment”. In: Journal of Transport Economics and Policy 41.3, pp. 317–343. issn: 00225258. doi: 10.2307/20054024.

33 Gramm, W.L. (1975). “Household Utility Maximization and the Working Wife”. In: The Amer- ican Economic Review 65.1, pp. 90–100. issn: 0002-8282. doi: 10.2307/1806398. Graves, P. E. (1979). “A life-cycle empirical analysis of migration and climate, by race”. eng. In: Journal of Urban Economics 6.2, pp. 135–147. issn: 0094-1190. doi: 10.1016/0377- 2217(92)90140-5. Graves, Ph.E. and P.D. Linneman (1979). “Household migration: Theoretical and empirical re- sults”. eng. In: Journal of Urban Economics 6.3, pp. 383–404. issn: 0094-1190. Greenwood, M.J. (1985). “Human migration: theory, models, and empirical studies”. In: Journal of Regional Science 25.4, pp. 521–44. issn: 0022-4146. doi: 10.1111/j.1467-9787.1985. tb00321.x. Grigg, D.B. (1977). “E. G. Ravenstein and the ”laws of migration””. In: Journal of Historical Geography 3.1, pp. 41–54. Jonkheer de Haan Hettema, M. (1840). Oud en nieuw Friesland of aardrijkskundige beschrijving van die provincie. : H. C. Schetsberg. — (1851). Het graafschap Staveren en dat van Westergo en Oostergo. Utrecht: Kemink en Zn. Haartsen, A. (2003). Het Land van Woerden. Woerden: Stichting Groene Hart. isbn: 978-90- 8069-422-4. Harts, J.J. and L. Hingtsman (1986). Verhuizingen op een rij: Een analyse van individuele verhuisgeschiedenissen. Utrecht: Elinkwijk. Hipp et al. (2012). “Immigrants and Social Distance”. In: The ANNALS of the American Academy of Political and Social Science 641.1, pp. 192–219. issn: 0002-7162. Huizenga, J. (1985). De Langstraat op de drempel van de twintigste eeuw. Uitgeverij Hecht BV. isbn: 90-7038-410-8. Huurdeman, P. and R. Josselet (1980). Waterland door de eeuwen heen. Hoogheemraadschap Waterland. de Jong, T.T. (2006). Katholiek leven in Noord-Nederland 1956-2006: vijftig jaar bisdom Gronin- gen. Hilversum: Uitgeverij Verloren. isbn: 978-90-6550-901-7. Kang, C. et al. (2015). “A Generalized Radiation Model for Human Mobility: Spatial Scale, Searching Direction and Trip Constraint”. eng. In: PLoS ONE 10.11. issn: 1932-6203. doi: 10.1371/journal.pone.0143500. Karp, R.M. (1972a). “Reducibility among Combinatorial Problems”. In: Complexity of Computer Computations: Proceedings of a symposium on the Complexity of Computer Computations, held March 20–22, 1972, at the IBM Thomas J. Watson Research Center, Yorktown Heights, New York, and sponsored by the Office of Naval Research, Mathematics Program, IBM World Trade Corporation, and the IBM Research Mathematical Sciences Department. Ed. by R.E. Miller, J.W. Thatcher, and J.D. Bohlinger. Boston, MA: Springer US, pp. 85–103. isbn: 978-1-4684-2001-2. doi: 10.1007/978-1-4684-2001-2_9. — (1972b). “Reducibility among combinatorial problems”. In: Complexity of computer compu- tations. Springer, pp. 85–103. Kok, J. (1790). Vaderlandsch woordenboek. XIII. Deel. Amsteldam: Johannes Allart. Kok, J. (2004). “Choices and constraints in the migration of families: The central Netherlands, 1850-1940”. In: History of the Family 9.2, pp. 137–158. issn: 1081602X. doi: 10.1016/j. hisfam.2004.01.002. Kokhuis, G.J.L. (1992). De geschiedenis van Salland. Oldenzaal: Twents-Gelderse Uitgeverij Witkam-De Bruyn. isbn: 90-6693-040-3. Kolmogorov, Andrey (1933). “Sulla determinazione empirica di una lgge di distribuzione”. In: Inst. Ital. Attuari, Giorn. 4, pp. 83–91. Kooiman, N. et al. (2016). “PBL/CBS Regionale bevolkings- en huishoudensprognose 2016–2040: sterke regionale verschillen”. In: Lansing, J.B., E. Mueller, and N. Barth (1964). Residential location and urban mobility. Survey Research Center, Institute for Social Research, The University of Michigan. Lantink, F.W. and J. Temminck (2017). Heerlijkheden in Holland. Hilversum: Verloren. isbn: 978-90-8704-644-6. Lucassen, J. (2000). “In Search of Work”. In: IISG (International Institute of Social History) Research Papers, nº 39.

34 MacQueen, J. (1967). “Some methods for classification and analysis of multivariate observations”. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. Berkeley, Calif.: University of California Press, pp. 281–297. Manders, J.H. (1981). Het land tussen Maas en Waal. Zutphen: Uitgeverij Terra. isbn: 90-6255- 096-7. Massey, D.S. (2015). Migration: Motivations. Second Edi. Vol. 15. Elsevier, pp. 452–456. isbn: 978-00-8097-086-8. doi: 10.1016/B978-0-08-097086-8.32090-6. Mincer, J. (1978). “Family Migration Decisions”. eng. In: Journal of Political Economy 86.5, pp. 749–773. issn: 0022-3808. Moch, L.P. (1992). Moving Europeans : migration in Western Europe since 1650. eng. Interdis- ciplinary studies in history. Bloomington, Indiana: Indiana University Press. isbn: 02-5333- 859-X. Nagurney, A., J. Pan, and L. Zhao (1992). “Human migration networks”. In: European Journal of Operational Research 59.2, pp. 262–274. issn: 0377-2217. doi: 10.1016/0377-2217(92) 90140-5. Nakosteen, R.A. and M. Zimmer (1980). “Migration and Income: The Question of Self-Selection”. In: Southern Economic Journal 46. issn: 0038-4038. National Bureau of Statistics of China (2017). China Statistical Yearbook 2017. Accessed: 2018- 06-18. url: http://www.stats.gov.cn/tjsj/ndsj/2017/indexeh.htm. Nelder, J.A. and R.W.M. Wedderburn (1972). “Generalized Linear Models”. In: Journal of the Royal Statistical Society. Series A (General) 135.3, pp. 370–384. issn: 0035-9238. doi: 10. 2307/2344614. Nijhoff, I.A. (1846). Mededeelingen betreffende het voormalig ambtman, rigter- en dijkgraafschap van Over-Betuwe. Arnhem: Is. An. Nijhoff en zoon. Obeng-Odoom, F. (2011). “Urbanity, urbanism, and urbanisation in Africa : editorial”. eng. In: 3.1, pp. 275–7. issn: 20421478. van Ollefen, L. (1793). De Nederlandsche stad- en dorp-beschrijver, Deel I: ’t eiland van Dor- drecht, de Hoeksche waard, de Zwijndrechtse waard, de Riederwaard, en ’t land van IJssel- monde. Amsterdam: H. A. Banse. — (1795). De Nederlandsche stad- en dorp-beschrijver, Deel III: Amstelland, Weesperkerspel, Gooiland, de Loosdrecht enz. Amsterdam: H. A. Banse. Peters, J.P.M. (1984). “De migratie naar Tilburg (1860-1870) en de Amerikaanse Secessieoorlog”. In: Noordbrabants Historisch Jaarboek 1984, pp. 143–177. Phibbs, Ciaran S. and Harold S. Luft (1995). “Correlation of Travel Time on Roads versus Straight Line Distance”. In: Medical Care Research and Review 52.4, pp. 532–542. doi: 10. 1177/107755879505200406. Piovani, D. et al. (2018). “Measuring Accessibility using Gravity and Radiation Models”. In: url: https://arxiv.org/abs/1802.06421. Poortman, J. (1943). Drente: Een handboek voor het kennen van het Drentsche leven in voorbije eeuwen. Meppel: J. A. Boom & Zn. Ravenstein, E. G. (1876). The birthplaces of the people and the laws of migration. London. — (1885). The laws of migration. London. — (1889). The laws of migration. 2nd paper. London. Ren, Y. et al. (2014). “Predicting commuter flows in spatial networks using a radiation model based on temporal ranges”. In: Nature Communications 5. issn: 2041-1723. doi: 10.1038/ ncomms6347. Robinson, C. and B. Dilkina (2017). “A Machine Learning Approach to Modeling Human Mi- gration”. In: CoRR abs/1711.05462. doi: 10.1145/3209811.3209868. Roth, D.L. (2007). Ene stille waerheyt van swaren dingen: historische opstellen betreffende de Zeeuwse geschiedenis en haar Hollandse en Vlaamse context (1245-1305). Delft: Eburon. isbn: 978-90-5972-153-1. Ryan, Louise (2011). “Migrants’ social networks and weak ties: accessing resources and con- structing relationships post-migration”. In: The Sociological Review 59.4, pp. 707–724. doi: 10.1111/j.1467-954X.2011.02030.x. Sandell, S.H. (1977). “Women and the economics of family migration”. In: The review of eco- nomics and statistics 59.4, pp. 406–414. issn: 0034-6535.

35 Sanders, J.G.M, W.A. van Ham, and J. Vriens (1996). Noord-Brabant tijdens de Republiek der Verenigde Nederlanden, 1572-1795. Hilversum: Uitgeverij Verloren. isbn: 90-6550-532-6. van Schevichaven, J. (1846). Geschiedkundige plaatsbeschrijving van het Rijk van Nijmegen. Nijmegen: J. F. Thieme. Scholten, J.A. (1850). Statistieke opgave en beschrijving van den Alblasserwaard en de Vijf Heeren Landen. Rotterdam: Van Baalen. Simini, F. et al. (2012). “A universal model for mobility and migration patterns”. In: Nature 484.7392. issn: 0028-0836. doi: 10.1038/nature10856. Sjaastad, Larry A. (1962). “The Costs and Returns of Human Migration”. eng. In: Journal of Political Economy 70.5, Part 2, pp. 80–93. issn: 0022-3808. doi: 10.1086/258726. Baron Sloet, L.A.J.W. (1852-1855). Bijdragen tot de kennis van Gelderland. Arnhem: Is. An. Nijhoff en zoon. Smith, A. (1776). An inquiry into the nature and causes of the wealth of nations. London: printed for W. Strahan; and T. Cadell. Staten-Generaal (1971). Wet tot instelling van een gemeente Dronten. Accessed: 2018-06-29. url: http://wetten.overheid.nl/BWBR0002754/1985-07-05. — (1979). Wet instelling gemeente Lelystad. Accessed: 2018-06-29. url: http : / / wetten . overheid.nl/BWBR0003250/1996-01-01. — (1983). Wet tot instelling gemeenten Almere en Zeewolde. Accessed: 2018-06-29. url: http: //wetten.overheid.nl/BWBR0003603/1996-01-01. Steigenga-Kouwe, S.E. (1948). Zeeuws-Vlaanderen. Leiden: E. J. Brill. Stuart, M. (1820). Jaarboeken van het Koningrijk der Nederlanden. Amsterdam: E. Maaskamp. Tamis, F. (2007). : Verkenning van een oud landschap in Oost-Groningen. Leeuwar- den: Uitgeverij Noordboek. isbn: 978-90-330-0553-4. Tegenwoordige staat der Vereenigde Nederlanden. Derde deel. Vervattende de Beschryving der Provincie Gelderland (1777). Amsterdam: Isaak Tirion. Tiebout, Ch.M. (1956). “A pure theory of local expenditures”. In: Journal of political economy 64.5, pp. 416–424. Ubaghs, G.C. (1858). Korte schets der geschiedenis van het Land van Valkenburg. Leuven: Van- linthout en Cie. Vis, D. (1948). De Zaanstreek: een beschrijving van het Zaansch volksleven in zijn historische ontwikkeling. Burgersdijk & Niermans. Voerman, Jan Franciscus (2001). “Het migratiepatroon rond Hoogezand-Sappemeer, en ”. In: Verstedelijking en migratie in het Oost-Groningse veengebied 1800-1940, pp. 336–391. Voerman, J.F. (2001). Verstedelijking en migratie in het Oost-Groningse veengebied 1800-1940. Assen: Koninklijke van Gorcum. Vree, Jasper and Dick Kuiper (2007). “De eenwording van protestants-christelijk Nederland per rail 1839-1939”. In: Het liep op rolletjes. Ed. by Dick Kuiper and Jasper Vree. Zoetermeer: Meinema. Chap. 2, pp. 9–27. isbn: 978-90-2114-167-1. Vriend, E. (2012). Het nieuwe land. Amsterdam: Balans. isbn: 978-94-6003-605-7. Wagenaar, J. (1740). Hedendaagsche historie of tegenwoordige staat van alle volkeren; XIIde deel; Vervolgende de Beschryving der Vereenigde Nederlanden, en vervattende byzonderlyk die der Generaliteits Landen, Staats Brabant, Staats Land van Overmaaze, Staats Vlaan- deren en Staats Opper-Gelderland met den Staat der Bezetting in de Barriere-Plaatsen enz. Amsterdam: Isaak Tirion. Weber, A.F. (1899). The growth of cities in the nineteenth century : a study in statistics. eng. Studies in history, economics and public law 11. New York: Macmillan. van Wijk Roelandszoon, J. (1842). Algemeen aardrijkskundig woordenboek. Vierde stuk. P-Z. Amsterdam: C. L. Schreijer en zoon. Witkamp, P.H. (1877). Aardrijkskundig woordenboek van Nederland. Tiel: D. Mijs. Yang, Y. et al. (2014). “Limits of Predictability in Commuting Flows in the Absence of Data for Calibration”. eng. In: Scientific Reports 4. issn: 2045-2322. doi: 10.1038/srep05662. Yezer, A.M.J. and L. Thurston (1976). “Migration Patterns and Income Change: Implications for the Human Capital Approach to Migration”. In: Southern Economic Journal 42.4, pp. 693– 702. issn: 0038-4038. doi: 10.2307/1056262.

36 Zipf, G.K. (1946). “The P1 P2 D Hypothesis: On the Intercity Movement of Persons”. In: Amer- ican Sociological Review 11.6, pp. 677–686. issn: 0003-1224. doi: 10.2307/2087063.

37 Appendices

I

APPENDIX A Influences on migration decisions

The quality and depth of migration research is dependent on the accuracy of the registration of such human relocations. This principle is reflected in the development of migration theory. The earliest works on migration were mostly based on observations by the author or available statistical snapshots. As time series data became available, researchers started to develop more detailed theories and migration models started to appear. More recently even more detailed migration information has become available for countries with a good administrative system: micro data on an individual or household level. As a result more recent migration theories start to focus on the the differences between individuals and individual households.

To come to understand the findings that have been made on migration so far, we review the different factors that are often described as important in human migration. This approach al- lows us to review the developments made in migration theory, without becoming too repetitive or confusing.

A.1 Considerations in migration decisions

There are many different factors that play a role in the migration of an individual: thechange in economic benefits, the change in the availability of amenities, any personal social connections an individual might already have somewhere, the change in social distance to other people, and the travel distance and information distance between both locations. Other factors are more personal, such as the social connections one has and what someones household looks like. All these factors, along with the policies that are in place and the disasters that might occur, determine the human migration movements that take place. The way in which these factors are taken into consideration when deciding where to migrate is still debated. What has become clear however, is that this consideration is a short-term decision rather than a long-term strategy (J. Kok, 2004).

A.2 Economic benefits

According to some the earliest researchers on migration like Smith and Ravenstein, the major migration motives were of economic cause (Smith, 1776; Ravenstein, 1885; Ravenstein, 1889). Even with more knowledge on the importance of other factors, economic causes are still named today as an important migration motive by both researchers and migrants themselves (Blauw and Pastor, 1980; George and Rhodes, 2017). Thoughts about the ways in which the economy causes migration potential have evolved over time into different theories.

Classic economic theorists first argued that market disequilibrium caused migration (Green- wood, 1985). Smith (1776) found that the spatial dispersion of commodities is smaller than that of wages, which means that an individuals purchasing power would rise upon migrating. In

III classic economic theory it is assumed that each individual is perfectly mobile, has no significant migration costs and always strives for optimisation of purchasing power. While the first two assumptions have been refuted, this last assumption is substantiated with data and still at the very heart of migration theory (Geist and McManus, 2012; Massey, 2015; George and Rhodes, 2017).

Smith (1776) already suspected that migrants can experience difficulties in relocating, but did not incorporate these suspicions in his theory. More recent theories have incorporated these theories by adding in a migration cost (Sjaastad, 1962; Nagurney, Pan, and Zhao, 1992) . This cost can not only reflect the economic costs of migration, but also the psychological costs. Theo- ries that include such a migration cost are often human capital theories, in which the migration decision becomes an investment decision into ones future human capital (Becker, 1975).

Even though the idea that individuals always strive for optimisation of their purchasing power is still important today, minor adjustments have been made to this theory. It seems that this eco- nomic optimisation is mainly a optimisation of economic security. Various researchers have found that families and individuals who are unemployed are more likely to move (Lansing, Mueller, and Barth, 1964; DaVanzo, 1978; Peters, 1984), as well as the people who do not earn enough to keep their family alive (P. Bouman and W. Bouman, 1967). When the population growth in an area does not keep pace with its economic development, people tend to move away (Peters, 1984; Lucassen, 2000). This theory is further supported by J. Kok (2004), who shows that the urban middle classes and people who inherited farms had little reason to move - by taking over the family business they already became economically secure.

This does not imply that economically motivated migration decisions are all relatively easy to predict. Some jobs simply require frequent relocation. Historically railway employees and army personnel were often relocated, but as government agencies and large companies expanded, more people started to migrate because of their career (J. Kok, 2004; Lucassen, 2000; Moch, 1992). Precise information on such career paths is often hard to come by, which makes in hard to include these migrations in most models.

When industries evolve in at a certain location, demands for certain workers with certain types of knowledge change as well. According to Peters (1984) employers have two options to solve this demand: either they attract workers from outside, or they train people already living at the location. As the latter option frees up the jobs these people used to work in, both options are likely to create an inflow of people. Since it is not always clear how jobs are filled andthe demands of companies are often unknown, predicting what type of workers are needed is difficult.

A.3 Availability of amenities

Besides the availability of work, the availability of other resources can influence the migration decision as well. Resources that attract people to a certain location are called quality of life factors. These quality of life factors are usually not universally available and desirable goods, commonly referred to as amenities.

Such amenities come in all shapes and sizes. Rivers are amenities, just like good parking facili- ties and a democratic society are (Bodvarsson and Berg, 2013; Greenwood, 1985). Amenities are often split into two categories: natural amenities and public amenities. Most natural amenities are difficult to change and shape, whereas humans do have an influence on the public amenities in an area. What applies to all amenities, is that each amenity is some sort of a non-traded good or idea that is difficult to change or relocate. Whilst one can adept to a certain local situation by buying tradable goods, one would have to move to fulfil their needs in certain non-traded goods (P. Graves and Linneman, 1979).

Natural amenities of a location can range from its topological benefits to its climatological or en-

IV vironmental benefits the location has. When a location is placed alongside a shore or river, ithas topological benefits: goods can become easier to transport. Living near a river also guarantees a constant supply of fresh water, a climatological benefit people who live in the desert would not have. These topological and environmental benefits aside, Greenwood (1985) also argues that the environment near the shores would attract people, whereas a heavily polluted area would not.

There is an even larger range of public amenities available, for these amenities include both the available public goods, as well as the way a society is shaped (Greenwood, 1985). Public goods are for example public services, such as education, police, and medical care, but also beaches, parks, and the availability of entertainment and housing . When talking about the structure of a society, persecution risks, democracy and cultural acceptance are important (Bodvarsson and Berg, 2013). According to Tiebout (1956) a good availability of such public amenities can also contribute to a better running economy, as people will be able to perform their jobs more efficiently. This could in turn attract even more people. This idea has become knownasthe Tiebout Hypothesis.

The demand for certain amenities is different for different populations. For example, P.E. Graves (1979) found that natural amenity variables are important in explaining age- and race- specific net population migration figures. A correlation between age and the need forpublic amenities might also exist: when children are born, good education opportunities become more important, and as the real wages in a family rise, people might want to dine in more expensive restaurants (Greenwood, 1985).

A.4 Travel distance

The geographical distance between two locations has always been seen as an important factor in the migration decision. Literature suggests that people tend to move to locations that are relatively close-by (Ravenstein, 1885; Ravenstein, 1889; Grigg, 1977; Peters, 1984).

A previous study by Phibbs and Luft (1995) shows that the geographical distance and travel distance between two locations is heavily correlated. But this does not mean both distances are always similar. Ravenstein (1876) already argued that citizens of a certain area are less migratory when travel times are high, and the area is more remote and secluded. The travel time between two locations can be larger because geographical features such as mountains or rivers can slow traffic down, or because there are only poor road connections available Eiter and Potthoff, 2016.

The travel time between two locations can be decreased by building new infrastructure. When this is done, the number of people that migrate between two locations is likely to increase. This effective raise in urban density can then lead to more economic productivity (Graham, 2007; Castaing Gachassin, 2013). Additionally, individuals gain more freedom in optimising other needs, because they no longer have to live close to work.

A.5 Information distance

The travel distance between two locations is not the only distance that is important in migra- tion. Another factor that is closely related to the travel distance is the information distance between two locations. When the information distance between two locations is large, it takes longer for information to be transmitted, and information is more likely to be inaccurate. This information distance can become smaller when the travel time between two locations decreases (Vree and Kuiper, 2007), or faster communication channels are built. In each case more contact between residents of both locations takes place, which means that there are more opportunities for information to travel between both cities along one of those connections (P. Bouman and W. Bouman, 1967; J. Kok, 2004).

Contact between humans is essential in spreading information. When individuals or a group

V of people moves to somewhere, a flow of information back to the origin is established (Green- wood, 1985). Through this contact, people can migrate over longer distances. Once a migrant arrives, the people who migrated before can help this migrant in integrating into the economic, cultural an political systems present at the chosen destination (Bauer and Zimmermann, 1997; Massey, 2015). When there is better information available, there is less risk involved in migrating.

Because every individual has other personal connections, the information distance between two locations can differ from person to person. Mincer (1978) showed that family ties are important in a persons migration choice. Other authors such as Bauer and Zimmermann (1997) and J. Kok (2004) have also stressed the importance of migrant networks that link migrants, former migrants and non-migrants together through friendship and kinship. The phenomenon that people like to move to places where they already know other people is called chain migration (J. F. Voerman, 2001).

Once people have migrated to a certain municipality, they slowly assimilate to this new so- ciety and become less dependent of their original networks (Bauer and Zimmermann, 1997). This means that people who migrated longer ago, tend to have less ties to their community of origin and transmit less information.

Apart from having different personal connections, individuals also have different abilities in acquiring information. Better educated individuals have a greater ability to collect and process information, which means they are able to make a better migration decision (Bauer and Zimmer- mann, 1997; J. F. Voerman, 2001). On the other hand, very young or less educated household heads are less informed on the opportunities they have or misinterpret the available information. This results in failed movements, and families returning soon after their initial move (DaVanzo, 1983).

A reasonably large information distance between two locations can also create a lag in the mi- gration decision of people (Greenwood, 1985). Because there is no fully accurate and up-to-date information available on the situation at a certain location, people have to make a migration decision based on old information, hence creating a lag. As a result, a repeat move is more likely occur when the information distance between two locations is larger (Yezer and Thurston, 1976; DaVanzo, 1983). The decision made on old information did not turn out right.

A.6 Social distance

Every individual is different. A person lives in certain surroundings, holds a certain faith, prac- tices a certain occupation and can have certain politic believes. Based on all these different variables, a person forms an identity. In general, people want to live close to people that hold the same identity. The differences between two different identities can be described as their social distance (Ethington, 1997).

Once a person wants to migrate, the social distance between their own identity and the identities other people living in a possible destination have is important (Ravenstein, 1876; Ravenstein, 1885). As the social distance between two locations becomes larger, it becomes less likely for an individual to move between those locations (Ryan, 2011). A Catholic would not feel at home in a city where everybody is Protestant, and a farmer living in the countryside would not feel at home in a metropolis.

Nevertheless, urbanisation is a process happening in rural areas all over the world (Obeng- Odoom, 2011; National Bureau of Statistics of China, 2017; Kooiman et al., 2016). Although this seems to conflict with the concept of social distance, this is not necessarily the case.In most cases the social distance is bridged in several smaller migration steps. A person would first migrate from a farm towards a village, from that village towards a smaller city, andfrom that smaller city towards a larger city (Weber, 1899; J. Kok, 2004). Once enough people with a

VI similar background live in a larger city, the social distance between that city and the countryside is lowered. As a result, it becomes easier to skip these intermediate migration steps and migrate to a larger city at once.

We could thus state that existing social ties are very important in determining where some- one will migrate to. On the other hand, these social ties can also be used in determining who is not likely to migrate at all. When people have lived in the same community for their lives and sometimes even generations, strong ties with that community are developed. These strong ties can contribute to a feeling of dissimilarity to other communities, and make it more difficult to migrate (Peters, 1984). For example, Boekestijn (1961) discovered that people who did not participate in any local association were 10 times more likely to migrate. In other words: when there are less ties to a community, and one has migrated before, it becomes easier to migrate once again (J. Kok, 2004).

A.7 Household optimisation

At first the migration theory was focused on the optimisation of the human capital of individuals, rather than the optimisation of the human capital of households (Greenwood, 1985). This did not matter much - households often relied on one income and one career. In today’s society this earlier model focused on individuals does not seem to fit anymore. Families in which both partners have a career have a lower probability of migrating, because the family is more tied to one place (Gramm, 1975; Sandell, 1977; Harts and Hingtsman, 1986). The same can be said about the ties that children create; when children start to go to school or work, they create ties and migration chances diminish (Mincer, 1978; Harts and Hingtsman, 1986; J. Kok, 2004).

The theory behind this household optimisation strategy is that the average human capital of the household members should be increased to be able to migrate, because the household is the smallest decision unit (Mincer, 1978). In most cases household members will want to move together, even though several people within one household might have different careers, friends, or identities. It should however be noted that this assumption is not always correct. When there are problems in the local economy individual household members might go and work in another area or country, sending remittances to their family at home (Massey, 2015).

A.8 Family-cycle considerations

It has long been observed that younger people are more migratory than older people (Ravenstein, 1876). Over time this apparent effect has been made more specific by other authors. According to several authors, individuals who had not yet married and individuals who had just started a family were the ones that were more likely to migrate (Peters, 1984; Boekestijn, 1961; P. Graves and Linneman, 1979; J. Kok, 2004). It is argued that these individuals migrate more often be- cause changes in their live keep coming rapidly: they leave school, enter the job market, try to find a suitable spouse and establish a household (Greenwood, 1985; Lucassen, 2000).

As the family grows older, it becomes less probable to migrate (Sandell, 1977; Mincer, 1978; Nakosteen and Zimmer, 1980). As the family settles down, savings grow, and people tend to focus more and more on stability and long-term plans (Lucassen, 2000). As we have seen before, this settling can also be explained through the new ties that are formed by the children. But it is not only that these children have nothing to gain from migrating, according to Bauer and Zimmermann (1997) older people are also expected to have a smaller expected lifetime gain in their human capital from moving.

All these processes are part of the family cycle (Glick, 1947). The current family cycle phase experienced by a household would affect on the importance of certain variable in the migration decision. This means that these family cycle considerations are not a single factor in migration,

VII but rather a way of understanding the change in the importance of other factors over time (J. Kok, 2004).

A.9 Policies & disasters

Although it is difficult to draw general conclusions about the effects of policies and disasters on the migration decision, these effects should not be underestimated. Both can have a significant impact in only a short time frame. In the event of sudden political instability or the establish- ment of protectionist measures, earlier mentioned decision parameters change quickly, which can in turn cause totally different migration flows to appear (Lucassen, 2000).

When more urgent events such as wars or natural disasters occur, people seem to react without taking the time to gather information and fully optimising the decision on the initial migration destination. This means that the exact impact of such an event on the migration network is difficult to predict. In the long run such a major event can affect the economic benefitsofliving a region and its resource availability and create larger travel distances (Massey, 2015). This allows a short sudden event to have a long-lasting impact on the migration network.

VIII APPENDIX B Specification of identity regions

Instead of using identity areas specified by the government, it also possible to construct more meaningful identity areas through an extensive literature research. As human identity is of in- terest to many, books are written about the culture in commonly accepted identity areas, which can then be used to find what places belong to that particular identity area. For every place,the municipality it is currently located was looked up, which has eventually lead to the creation of the table below. In some cases a municipality consists of multiple parts of multiple regions. In these cases it was determined in what part of the municipality most of the population resided, after which it was added to the region associated with this part1.

Some municipalities are areas on their own and therefore difficult to classify in any other way. These municipalities are: Dordrecht (van Ollefen, 1793; van der Aa, 1840a), Gouda (van der Aa, 1843), ’s-Gravenhage (van der Aa, 1843), Steenwijkerland. To be able to calculate their ICM values, these municipalities are added to the area located most nearby.

Table B.1: Dutch municipalities that existed in 2016, split into 70 different identity regions.

Area name Municipalities Further reading

Alblasserwaard Alblasserdam, Giessenlanden, (van der Aa, 1839; Gorinchem, Scholten, 1850) Hardinxveld-Giessendam, Molenwaard, Papendrecht, Sliedrecht, Zederik Amstelland Amstelveen, Amsterdam, (van Ollefen, 1795; Diemen, Ouder-Amstel, van der Aa, 1839) Uithoorn Baljuwschap Hillegom, Lisse, (van der Aa, Noordwijkerhout Noordwijkerhout, Teylingen 1846a; Lantink and Temminck, 2017)

1The historical boundaries within the province of do not always match with contemporary boundaries. To overcome this problem, some historical areas have been joined together.

IX Area name Municipalities Further reading

Baronie van Breda Alphen-Chaam, Baarle-Nassau, (van Goor, 1744; Breda, Dongen, Drimmelen, van der Aa, 1840b) Etten-Leur, Geertruidenberg2, Gilze en Rijen, Oosterhout, Zundert Betuwe Buren, Culemborg, (van der Aa, Geldermalsen, Lingewaal, 1840b; Witkamp, Neder-Betuwe, Neerijnen, Tiel 1877) Beveland Borsele, Goes, Kapelle, (De Aardbol. Noord-Beveland, Reimerswaal Magazijn van Hedendaagsche Land- en Volkenkunde. Derde deel: De Nederlanden 1841; van der Aa, 1840b) Bommelerwaard Maasdriel, Zaltbommel (van der Aa, 1840b; Witkamp, 1877) Delfland Delft, Maassluis, (Busching, 1773; Midden-Delfland, van der Aa, 1840a) Pijnacker-Nootdorp, Rijswijk, Vlaardingen, Westland Eemland Amersfoort, Baarn, Bunschoten, (van Bemmel, Eemnes, Leusden, Renswoude, 1760; van der Aa, Soest, Woudenberg 1843) Fivelingo Appingedam, Delfzijl, (van Aitzema, Loppersum, Slochteren, Ten 1664; van der Aa, Boer 1843) Flevopolder Almere, Dronten, Lelystad, (Staten-Generaal, Zeewolde 1971; Staten-Generaal, 1979; Staten-Generaal, 1983) Gooiland Blaricum, Gooise Meren, (van Ollefen, 1795; Hilversum, Huizen, Laren Backer, 1838; (Noord-Holland), Weesp3 van der Aa, 1843) Gorecht Groningen, Haren, (van der Aa, 1843; Hoogezand-Sappemeer J. Voerman, 2001)

2Both Drimmelen and Geertruidenberg did not originally belong to the Baronie van Breda and were part of South-Holland, but both municipalities are currently located in Noord-Brabant and often interact with the municipality of Oosterhout. 3Weesp is not technically part of the Gooiland, but it was a part of the bailiwick of Stad en Lande and could therefore be included in this area (van der Aa, 1847a).

X Area name Municipalities Further reading

Graafschap Aalten, Berkelland, (Tegenwoordige Zutphen4 Bronckhorst, Doetinchem, staat der Doesburg, Lochem, Vereenigde Montferland, Oost Gelre, Oude Nederlanden. IJsselstreek, Winterswijk, Derde deel. Zutphen Vervattende de Beschryving der Provincie Gelderland 1777; van der Aa, 1851) Heerlijkheid De Wolden, Hoogeveen, Meppel, (van der Aa, 1840a; Echten, Westerveld van der Aa, 1843; Heerlijkheid van der Aa, 1846b; Ruinen & Poortman, 1943) Dieverdingspel Hoeksche Waard Binnenmaas, Cromstrijen, (van der Aa, 1844; Korendijk, Oud-Beijerland, van Ollefen, 1793) Strijen Hunsingo Bedum, De Marne, Eemsmond, (van Aitzema, Winsum 1664; van der Aa, 1844) Kempen Bergeijk, Best, Bladel, Eersel, (Wagenaar, 1740; Eindhoven, Oirschot, Reusel-De van der Aa, 1845) Mierden, Valkenswaard, Veldhoven, Waalre Kennemerland Aalsmeer, Alkmaar, Bergen (van der Aa, (NH), Bloemendaal, Castricum, 1840b; van der Aa, Den Helder;Haarlem, 1845; de Cock, Haarlemmerliede en 1980) Spaarnwoude, Haarlemmermeer, Heemskerk, Heemstede, Heerhugowaard, Heiloo, Langedijk, Oostzaan, Schagen5, Uitgeest Krimpenerwaard Krimpenerwaard, Krimpen aan (van der Aa, 1845; den IJssel van der Aa, 1847b) Kwartier van Boxtel, Goirle, Haaren, (van der Aa, 1846a; Oisterwijk Hilvarenbeek, Loon op Zand, Sanders, Ham, and Oisterwijk, Sint-Michielsgestel, Vriens, 1996) Tilburg, Vught Land van Altena Aalburg, Werkendam, (van der Aa, 1839; Woudrichem Sanders, Ham, and Vriens, 1996) Land van Beverwijk, Velsen, Zandvoort (van der Aa, Brederode 1840b; Baardman, 1965)

4Known today as the “Achterhoek”. 5Den Helder and Schagen do not historically belong to the Kennemerland, but do have strong ties with Alkmaar (van der Aa, 1839).

XI Area name Municipalities Further reading

Land van Cuijk Boxmeer, Cuijk, Grave, Mill en (Bachiene, 1777; Sint Hubert, Sint Anthonis van der Aa, 1840a) Land van Eijsden-Margraten, (Wagenaar, 1740; ’s-Hertogenrade Gulpen-Wittem, Kerkrade, van der Aa, 1844) Landgraaf, Maastricht6 Simpelveld, Vaals Land van Maas en Druten, West Maas en Waal (van der Aa, 1846c; Waal Manders, 1981) Land van Boekel, Landerd, Uden (Bachiene, 1777; Ravenstein van der Aa, 1846b) Land van Beek (Limburg), Brunssum, (van der Aa, 1848a; Valkenburg Heerlen, Meerssen, Nuth, Ubaghs, 1858) Onderbanken, Schinnen, Sittard-Geleen, Stein (L.), Valkenburg aan de Geul, Voerendaal Land van Voorne Brielle, Goeree-Overflakkee, (van der Aa, en Putten Hellevoetsluis, Nissewaard, 1848a; Roth, 2007) Westvoorne Land van Woerden Bodegraven-Reeuwijk, (van der Aa, Wijdemeren, Woerden 1848b; Haartsen, 2003) Langstraat Heusden, Waalwijk (van der Aa, 1846c; Huizenga, 1985) Liemers Duiven, Montferland, (Stuart, 1820; Rijnwaarden, Westervoort, Baron Sloet, Zevenaar 1852-1855) Lopikerwaard Lopik, Montfoort, Oudewater, (van der Aa, 1846c; IJsselstein van der Aa, 1847b) Maasland Bernheze, ’s-Hertogenbosch, Oss (van der Aa, 1846c; Sanders, Ham, and Vriens, 1996) Markiezaat Bergen Bergen op Zoom, Halderberge, (van der Aa, op Zoom Moerdijk, Roosendaal, Rucphen, 1840b) Steenbergen, Woensdrecht Middel-Veluwe Apeldoorn, Epe, Voorst (van Wijk Roelandszoon, 1842; van der Aa, 1848a) Neder-Veluwe Barneveld, Ede, Nijkerk, (van Wijk Putten, Scherpenzeel Roelandszoon, 1842; van der Aa, 1848a)

6Even though Maastricht has not been a part of the Land van ’s-Hertogenrade, this is the best area to include Maastricht to, as the places located within this area and Maastricht shared the same church organisations (van der Aa, 1846c).

XII Area name Municipalities Further reading

Nederkwartier De Ronde Venen, Stichtse (Bachiene, 1775; Vecht, Utrecht7 van der Aa, 1846a) Noorden- en Aa en Hunze, Assen, (van der Aa, 1846c; Middenveld, Midden-Drenthe, , van der Aa, 1846a; Oostermoer en Tynaarlo van der Aa, 1846b; Rolderdingspel Poortman, 1943) Noordoostpolder Noordoostpolder, Urk (de Jong, 2006; Vriend, 2012) , Oldambt, , (van Aitzema, Veendam 1664; van der Aa, 1846a) Oostergo , , (van der Aa, Dongeradeel, Ferwerderadiel, 1846a; Jonkheer de Kollumerland en Haan Hettema, Nieuwkruisland, Leeuwarden, 1851) Leeuwarderadeel, , Opper-Gelre Beesel, Bergen (L), (van der Aa, 1843; Echt-Susteren, Gennep, Horst Berkvens, Venner, aan de Maas, Leudal, and Spijkerboer, Maasgouw, Mook en Middelaar, 1996) Nederweert, Peel en Maas, Roerdalen, Roermond, Venlo, Venray, Weert Overkwartier De Bilt, Bunnik, Houten, (Bachiene, 1775; Nieuwegein, Rhenen, Utrechtse van der Aa, 1846a) Heuvelrug, Veenendaal, Wijk bij Duurstede, Zeist Over-Betuwe Lingewaard, Overbetuwe (van der Aa, 1840b; Nijhoff, 1846) Over-Veluwe Elburg, Ermelo, Harderwijk, (van Wijk Hattem, Heerde, Nunspeet, Roelandszoon, Oldebroek 1842; van der Aa, 1848a) Peelland Asten, Cranendonck, Deurne, (van der Aa, Geldrop-Mierlo, Gemert-Bakel, 1846b; Sanders, Heeze-Leende, Helmond, Ham, and Vriens, Laarbeek, “Nuenen, Gerwen en 1996) Nederwetten”, Schijndel, Sint-Oedenrode, Someren, Son en Breugel, Veghel Purmer Edam-Volendam, Purmerend ( J. Kok, 1790; van der Aa, 1846b)

7Utrecht was not a part of the Nederkwartier, yet the Nederkwartier was later incorporated in the arrondisse- ment of Utrecht (van der Aa, 1848a).

XIII Area name Municipalities Further reading

Rijk van Nijmegen Beuningen, Berg en Dal, (van Schevichaven, Heumen, Nijmegen, Wijchen 1846; van der Aa, 1846a) Rijnland Alphen aan den Rijn, Kaag en (Busching, 1773; Braassem, Katwijk, Leiden, van der Aa, 1840b; Leiderdorp, van der Aa, 1846b) Leidschendam-Voorburg, Nieuwkoop, Noordwijk, Oegstgeest, Voorschoten, Waddinxveen, Wassenaar, Zoetermeer, Zoeterwoude Salland Dalfsen, Deventer, Hardenberg, (van der Aa, 1851; Hellendoorn, Kampen, Kokhuis, 1992) Olst-Wijhe, Ommen, Raalte, Staphorst, Zwartewaterland, Zwolle Schieland Capelle aan den IJssel, (Busching, 1773; Lansingerland, Rotterdam, van der Aa, 1847a) Schiedam, Zuidplas Stellingwerven , (van der Aa, 1847a; Witkamp, 1877) Twente Almelo, Borne, Dinkelland, (van der Aa, 1848a; Enschede, Haaksbergen, Witkamp, 1877) Hellendoorn, Hengelo (Overijssel), Hof van Twente, Losser, Oldenzaal, Rijssen-Holten, Tubbergen, Twenterand, Wierden Veluwezoom Arnhem, Brummen, Renkum, (Tegenwoordige Rheden, Rozendaal, staat der Wageningen Vereenigde Nederlanden. Derde deel. Vervattende de Beschryving der Provincie Gelderland 1777; van der Aa, 1848a) Vijfherenlanden Leerdam, Vianen (van der Aa, 1848a; Scholten, 1850) Wadden , Schiermonnikoog, (van der Aa, , Texel, Vlieland 1848b; Doedens and Houter, 2015) Walcheren Middelburg, Veere, Vlissingen (van der Aa, 1848b; Roth, 2007) Waterland Beemster, Landsmeer, (van der Aa, Waterland 1848b; Huurdeman and Josselet, 1980)

XIV Area name Municipalities Further reading

West-Friesland Drechterland, Enkhuizen, (van der Aa, Hollands Kroon, Hoorn, 1848b; Dekker Koggenland, Medemblik, et al., 2000) Opmeer, Stede Broec Westergo het Bildt, Franekeradeel, (van der Aa, Harlingen, Littenseradiel, 1848b; Jonkheer de Menameradiel, Súdwest-Fryslân Haan Hettema, 1851) Westerkwartier Grootegast, Leek, , (van Aitzema, Zuidhorn 1664; van der Aa, 1848b) Westerwolde , , (van der Aa, 1848b; Tamis, 2007) IJsselmonde Albrandswaard, Barendrecht, (Bachiene, 1773; Ridderkerk van der Aa, 1845) Zaanstreek Oostzaan, Wormerland, (van der Aa, 1851; Zaanstad Vis, 1948) Zeeland beoosten Schouwen-Duiveland, Tholen (van der Aa, 1851; Schelde Roth, 2007) Zeeuws-Vlaanderen Hulst, Sluis, Terneuzen (van der Aa, 1848a; Steigenga-Kouwe, 1948) Zevenwouden , , (Jonkheer de Opsterland Haan Hettema, 1840; van der Aa, 1851) Zuidenveld Borger-Odoorn, Coevorden, (van der Aa, 1851; Emmen Poortman, 1943) Zwijndrechtse Hendrik-Ido-Ambacht, (van Ollefen, 1793; waard Zwijndrecht van der Aa, 1851)

XV XVI APPENDIX C Tactics to increase the average ICM value

In certain cases it can be interesting to find the set of identity regions that optimises theICM value best, and thus reverse engineer the identity regions based on the ICM values that certain regions create. A comparison between this optimised set of identity regions and the prespecified identity regions could provide insight in the quality of these prespecified identity regions. This could be then be used as a tool to pinpoint areas within the Netherlands that should be examined again.

The task of assigning 390 different municipalities to several identity regions can be reduced to the clique problem. In this case the municipalities are the nodes, and the total number of migrants that moved between the two municipalities, taking into account both distance and population size, the weight of the edges. Under the assumption that there is indeed an identity effect present in our data, we could then try to detect cliques: clusters of municipalities inwhich the migration numbers are higher than expected.

Because the clique problem has been proven to be NP-complete (Karp, 1972a), finding this optimal set of regions is a challenge. Nevertheless, we will discuss the different techniques that could be used to increase the average ICM value. This might eventually lead us to finding the global optimal ICM value.

C.1 Using network metrics

A first approach would be to exploit the network features that are present within the migration network. We could search for cliques or connected components in the migration network. To be able to do this, the weakest links would have to be removed at first. Through the human migration network all municipalities are connected in one giant component.

In choosing those weakest links we should take into account the number of citizens living in both locations, and could take into account the geographical distance as well. Without taking distance into account, we would rely on the fact that municipalities that are located more close- by are more likely to share the same identity. On the other hand, this also means that it will be more difficult to form larger identity regions. The number of migrants towards a municipality located on the other side of the actual identity region might be lower than to a municipality located outside of the identity region, but more close-by. Taking into account distance would counter this problem, but would create a new problem as well. When distance is taken into account, there is a chance that municipalities are have high connectivity values to municipali- ties located very far away, just because someone moved to that municipality and the chances of travelling such a large distance would be very slim.

In both cases, there will still be outlier connections present within the network. A group of

XVII pensioners might have moved to a quiet spot on the other side of the country, or new immigrants are moved to another immigration camp. Through the presence of these outlier connections all the different parts of the country are still connected in one giant component, which makesthe idea of using connected components as identity regions unfeasible. As it is not guaranteed that all connections within identity areas are always larger than connections outside of the identity areas, it is likely to be impossible to remove all outlier connections without accidentally removing some connections between two municipalities located within in the same identity area.

It is not feasible to try and find cliques within this migration network either, because this clique decision problem is proven to be NP-complete (Karp, 1972b). It would therefore be better to try and find more efficient ways to optimise the ICMvalue.

C.2 Similarity of migration behaviour

Another approach would be to use the similarity in migration patterns originating from certain municipalities. We could argue that if those are somewhat similar, they might have the same relative identities, and should thus be assigned to the same region.

A good way of detecting such municipalities that have a similar migration behaviour would be to use a k-means clustering algorithm. Each of the municipalities would be assigned a vector that contains all migration data, which is then used in this algorithm. To be able to make the migration data comparable, the migration data is corrected for the population sizes and distance using Equation 2.3.

Even though creating clusters using this data could create regions with higher average ICM values than the fully randomised regions did, this “optimal” k-mean generated solutions have a way lower ICM value than the predefined regions did. We would want the k-mean generated regions to have an average ICM value that is equal to the ICM value of the predefined regions.

The k-means region creation method does not seem to work. This can be explained by the fact that this algorithm could have difficulties in handling smaller sized regions. When seventy regions are created using this algorithm, only a small part of the municipalities will eventually end up in the same region. Initially this might not be seen as a problem, because the munic- ipalities that should share the same regional identity all share consistently higher values. The problem however lies in the different values that the municipalities that share the same regional identity have for the municipality outside of that real regional identity area. When those values differ too much, the sheer number of other municipalities will cause some of the municipalities that should be part of that certain regional identity to be located in other regions. Distant migration figures have become too important.

C.3 Reassigning municipalities

Instead of starting from scratch, we could also start with the predefined regions. It would make sense to assume that regions that have been carefully put together and are well researched should be part of a quite good identity region configuration.

For each municipality, one would have to evaluate what the effect on the other ICM values would be if that municipality would be relocated to another neighbouring region. If that effect is positive the municipality could be relocated. By assigning a certain probability to that relo- cation, we could make sure that no deadlocks appear.

A problem with that algorithm is that only a small part of the possible region configurations can be explored. More optimal regions configurations that can only be accessed by taking accepting negative changes of the average ICM value cannot be found.

XVIII C.4 Simulated annealing

This problem could be solved using the simulated annealing algorithm. Within this algorithm positive and negative changes of the average ICM value can both be accepted. The acceptance rates are controlled by the temperature of the annealing process. When the temperature is high, negative changes are more likely to be accepted. The temperature of the process slowly decreases over time. Once the process has cooled down, the algorithm ends at a region configuration that has a locally optimal ICM value. By repeating this process, we could eventually find a global optimum.

To be able to assure the simulated annealing algorithm can actually reach an optimal ICM config- uration, the initial configuration should already be locally clustered. Without such a predefined locally clustered configuration it would take some time before it reaches a similar configuration by just reapplying the optimisation algorithm on a fully randomised set of regions.

But even using these predefined region configurations the simulated annealing algorithm isstill expensive to run. The average ICM value for the entire country would have to be computed for each time step, to use in the function used to evaluate the value of the effects of the last change made.

C.5 Discussion

Even though there seem to be several possibilities to optimise the average ICM value of all regions, and thus create better identity regions, two possibilities could definitely not be used. Network metrics could not be applied, and the clustering technique based on the similarity of migration behaviour would not either.

The basic algorithm of reassigning municipalities and the simulated annealing algorithm should both work. The difference between both algorithms, is that the first algorithm would only have to calculate the changed ICM values of the affected municipalities when a municipality is relo- cated, whilst the simulated annealing algorithm would also have to recalculate the average ICM value of all municipalities. On top of that, the latter algorithm requires more administration.

To get the best of both algorithms, we could use the predefined locally clustered configura- tions as starting points in the reassigning algorithm instead of the predefined identity regions. By doing so, we would eliminate the risk of not being able to find the global optimum, whilst the costs of doing so are lowered.

Finding the global optimum would still be a challenge. This can be illustrated by a simple example, in which 390 different municipalities are placed in seventy different regions. Thiscan be done in many different configurations, in which municipalities can contain different numbers of municipalities. When thirty of the regions contain five municipalities and forty of the regions 390! ≈ · 666 contain six, there are 5!306!40 1.467 10 unique combinations of municipalities available. The total number of available configurations is even higher.

XIX XX APPENDIX D Geographical distributions of ICM values for different sets of regions

When random regions are generated or municipalities are randomly clustered, it is evident that the generated regions will be different each time. Because the optimisation algorithm doesnot always directly assign municipalities to the region that optimises the ICM value best, these op- timised regions can eventually differ as well.

To be able to fully comprehend what the outcomes of all algorithms look like, four different maps were generated to compare with each initial configuration. The maps were generated using municipal boundary data acquired from the Basisregistratie Kadaster (2016). When these maps are examined in more detail, certain patterns can be distinguished.

Within the randomly generated spatially clustered regions we see a lot more municipalities with negative ICM values than in the regions we defined. Comparing the twelve randomised clustered regions in Figure D.5 with the NUTS 2 regions in Figure D.1, we see such negative values throughout the country. The same effect is also found when the forty randomised clusters shown in Figure D.2 are compared to the NUTS 3 regions in Figure D.9, and when the seventy randomised clusters shown in Figure D.13 are compared to the regions in Figure D.3.

After optimising the randomly generated spatially clustered regions some municipalities with negative ICM values still remain. A comparison between the randomly generated spatially clus- tered regions and their optimised counterparts shown in Figure D.6, D.10 and D.14 also shows that many local optimisations take place: the ICM values of a lot of municipalities with slightly lower positive ICM values are increased.

A comparison of these optimised randomly clustered identity regions with the optimised pre- defined identity regions shows us that the predefined identity regions can be optimised better. In Figures D.7, D.11 and D.15 we can still see municipalities with negative ICM values, but the ICM values of the other municipalities has increased.

When there are only a few randomly generated regions, about half of the municipalities have a slightly negative ICM value, whereas the other municipalities have a slightly positive ICM value. This results in an average ICM value of about zero. As seen in Figure D.4, no other clear patterns are visible. When the number of random identity regions is increased, the number of municipalities with a positive ICM value decreases. This process is visible in Figure D.8 and Figure D.12. As the number of regions increases, the chances of being assigned to a region with the right municipalities become slimmer, which means that more municipalities have a negative ICM value. On the other hand, when municipalities that should belong together end up in the same region, their ICM value will be much higher when the number of regions is increased. After all, there are only a few other municipalities located within the region that could lower the ICM

XXI value. This means that even though the number of municipalities with a positive ICM value decreases as the number of regions is increased, the average ICM value stays the same.

Figure D.1: The ICM values calculated for each municipality when the Netherlands are split into the NUTS 2 regions. All ICM values are positive. The ICM values in the southern part of Limburg, Zeeland, and the Northern parts of Friesland and Groningen are all larger than the ICM values in other parts of the country. The average ICM value for municipalities located within these twelve regions is 20.91, the median ICM value is 12.35. Municipal boundary data used in this map is acquired from the Basisregistratie Kadaster (2016).

XXII Figure D.2: The ICM values calculated for each municipality when the Netherlands are split into the NUTS 3 regions. There are municipalities scattered throughout the country with relatively high ICM values. The average ICM value for municipalities located within these forty regions is 59.57, the median ICM value is 33.03. The lower ICM values are clustered around the centre of the country. Municipal boundary data used in this map is acquired from the Basisregistratie Kadaster (2016).

XXIII Figure D.3: The ICM values calculated for each municipality when the Netherlands are split into the seventy regions specified by literature. The average ICM value for municipalities located within these regions is 73.91, the median ICM value is 42.34. Most ICM values are positive, but clusters of lower ICM values are found in North-Holland and Utrecht. The ICM values of Texel, Vlieland and the Wijdemeren municipalities are even negative. Municipal boundary data used in this map is acquired from the Basisregistratie Kadaster (2016).

XXIV Figure D.4: The ICM values for each municipality in four different scenarios in which the Nether- lands are split into twelve random regions, disregarding any distance. In each of these scenarios about half of the municipalities have a negative ICM value, and half of the municipalities has a positive ICM value.

XXV Figure D.5: The ICM values for each municipality in four different scenarios in which the Nether- lands are split into twelve randomly spatially clustered regions. When the ICM values in these scenarios are compared to the ICM values created by the original regions, it becomes apparent that the ICM values of some municipalities become negative in the randomly generated spatially clustered regions, whereas all ICM values in the original regions were positive.

XXVI Figure D.6: The ICM values for each municipality in four different scenarios in which the Nether- lands are split into twelve randomly spatially clustered regions, and then further optimised. As this optimisation technique is based on chance, different optima are found. When the generated ICM values are compared to the ICM values of the original regions, we see that the ICM values are not distributed in a similar way. Whereas the variance in the ICM values in the original regions is very low, we see that there occur various high ICM values and negative ICM values in the randomly generated spatially clustered regions.

XXVII Figure D.7: The ICM values for each municipality in four different scenarios in which the NUTS 2 regions are further optimised. Since this optimisation technique is partially based on chance, different optima are found. When the generated ICM values are compared to the ICMvaluesof the original regions, we find that the ICM values are not distributed in a similar way. Whereas the variance in the ICM values in the original regions is very low, we see that there occur various high ICM values and some negative ICM values in the randomly generated spatially clustered regions. When compared to the ICM values of the non-optimised randomly spatially clustered regions, we do however find that the number of negative ICM values is decreased.

XXVIII Figure D.8: The ICM values for each municipality in four different scenarios in which the Nether- lands are split into forty random regions, disregarding any distance. In each of these scenarios most municipalities have a negative ICM value. The other 15% of the regions have slightly pos- itive ICM values. In each of the four situations, we find that there are extremely positive and extremely negative ICM values.

XXIX Figure D.9: The ICM values for each municipality in four different scenarios in which the Nether- lands are split into forty randomly spatially clustered regions. As opposed to the municipalities in the original NUTS 3 regions, municipalities within these randomly spatially clustered regions have more negative ICM values.

XXX Figure D.10: The ICM values for each municipality in four different scenarios in which the Netherlands are split into forty randomly spatially clustered regions, and then further optimised. As this optimisation technique is based on chance, different optima are found. Most municipali- ties have positive ICM values, except for the two cases the municipalities on the Frisian islands had negative ICM values, as well as the one case a municipality in Friesland had a negative ICM value.

XXXI Figure D.11: The ICM values for each municipality in four different scenarios in which the NUTS 3 regions are further optimised. As this optimisation technique is based on chance, different optima are found. In all four scenarios almost all municipalities have positiveICM values, except for the municipalities of Texel and Schiermonnikoog. When the ICM values of the optimised regions are compared to the ICM values of the municipalities located in the original regions, it becomes clear that ICM values of municipalities located all over the country are increased.

XXXII Figure D.12: The ICM values for each municipality in four different scenarios in which the Netherlands are split into seventy random regions, disregarding any distance. In each of these scenarios most municipalities have a negative ICM value. The other 10% of the regions have slightly positive ICM values. In each of the four situations, we find that there are extremely positive and extremely negative ICM values.

XXXIII Figure D.13: The ICM values for each municipality in four different scenarios in which the Netherlands are split into seventy randomly spatially clustered regions. en the ICM values in these scenarios are compared to the ICM values of the original regions, it becomes apparent that the ICM values in a lot of municipalities are actually higher than they were before. On the other hand, more municipalities then do have negative ICM values. This pattern could be explained by the fact that municipalities in certain parts of the Netherlands are larger than in others. When the regional centres are spread in an equal way over the country by using the k-means algorithm, this means that municipalities that should belong to the same larger identity region are less likely to be assigned to the same region.

XXXIV Figure D.14: The ICM values for each municipality in four different scenarios in which the Netherlands are split into seventy randomly spatially clustered regions, and then further opti- mised. As this optimisation technique is based on chance, different optima are found. Even though most ICM values are positive, each of the four different scenarios contains at least one municipality with a negative ICM value. On the Frisian islands, and in one municipality in Zee- land these negative ICM values appear more than once. When the ICM values of the optimised regions are compared to the ICM values of the original randomly spatially clustered regions, we find that the number of municipalities with negative ICM values has decreased.

XXXV Figure D.15: The ICM values for each municipality in four different scenarios in which the seventy regions specified by literature are further optimised. As this optimisation technique is based on chance, different optima are found. In three of the four scenarios all municipalities have positive ICM values, in one scenario three municipalities have negative ICM values. When the ICM values of the optimised regions are compared to the ICM values of the municipalities located in the original regions, it becomes clear that ICM values of municipalities located all over the country are increased.

XXXVI