Ethno-Linguistic Diversity and Urban Agglomeration LSE Research Online URL for This Paper: Version: Accepted Version

Ethno-linguistic diversity and urban agglomeration LSE Research Online URL for this paper: http://eprints.lse.ac.uk/104513/ Version: Accepted Version Article: Eberle, Ulrich, Henderson, J. Vernon, Rohner, Dominic and Schmidheiny, Kurt (2020) Ethno-linguistic diversity and urban agglomeration. Proceedings of the National Academy of Sciences of the United States of America. ISSN 1091-6490 (In Press) Reuse Items deposited in LSE Research Online are protected by copyright, with all rights reserved unless indicated otherwise. They may be downloaded and/or printed for private study, or other acts as permitted by national copyright laws. The publisher or other rights holders may allow further reproduction and re-use of the full text version. This is indicated by the licence information on the LSE Research Online record for the item. [email protected] https://eprints.lse.ac.uk/ Ethno-Linguistic Diversity and Urban Agglomeration Ulrich J. Eberlea,b,1,2, J. Vernon Hendersona,1,2, Dominic Rohnerb,1,2, and Kurt Schmidheinyc,1,2 aLondon School of Economics, Centre for Economic Performance, Houghton Street, London WC2A2AE, UK.; bUniversity of Lausanne, Department of Economics, Internef, 1015 Lausanne, Switzerland.; cUniversity of Basel, Faculty of Business and Economics, Peter Merian-Weg 6, 4002 Basel, Switzerland. This manuscript was compiled on May 14, 2020 1 This article shows that higher ethno-linguistic diversity is associated are also in the top 3% of degree of diversity by provinces 36 2 with a greater risk of social tensions and conflict, which in turn is a worldwide and Nagaland is at the center of India’s well known 37 3 dispersion force lowering urbanization and the incentives to move to on-going conflict in its Northeast. These highly fractionalized 38 4 big cities. We construct a novel worldwide data set at a fine-grained states rank in the top 6% and 3%, respectively, of provinces 39 5 level on urban settlement patterns and ethno-linguistic population worldwide in incidence of conflict for 1975-2015 (defined below). 40 6 composition. For 3,540 provinces of 170 countries, we find that in- In terms of the resulting urban concentration, we develop two 41 7 creased ethno-linguistic fractionalization and polarization are asso- measures below: share of the population that is urbanized, 42 8 ciated with lower urbanization and an increased role for secondary and primacy (fraction of the urban population in the biggest 43 9 cities relative to the primate city of a province. These striking associ- city in the province). These two Indian states both rank in the 44 10 ations are quantitatively important and robust to various changes in bottom 30% worldwide of provinces in terms of urban share 45 11 variables and specifications. We find that democratic institutions af- and in the bottom 1% in terms of primacy share. In other 46 12 fect the impact of ethno-linguistic diversity on urbanization patterns. words, their high degree of ethnic fractionalization and conflict 47 is closely associated with people staying in the countryside 48 Ethno-Linguistic Diversity | Fractionalization | Polarization | Urbaniza- and avoiding agglomerating into one main city by spreading 49 tion | Urban Agglomeration | Primacy | Conflict | Democracy urban population across cities. 50 To comprehensively assess these relationships, we created 51 1 he conflict literature has found that ethnic diversity a novel, fine-grained data set of geographical population dis- 52 2 Twithin a region can induce tensions and raise the po- tribution and language use. For 233 countries around the 53 3 tential for conflict (1–3). Existing game-theoretic models of world, these data allow us to compute indices of urban con- 54 4 spatial distributions of ethnic groups and social tensions (4) centration in the year 2015, as well as ethnolinguistic diversity 55 5 predict that, in the presence of tensions between groups, con- at the province level in 1975. Provinces are the first-level 56 6 flicts are more costly when bigger numbers of members of administrative boundaries within countries such as U.S. states 57 7 different groups live at close range. To avoid such conflict or German Bundesländer (see the SI Appendix, p.5 for de- 58 8 costs caused by inter-group hostility, members of ethnic groups tails). We identify the effects of ethno-linguistic diversity on 59 9 have an incentive to remain dispersed in the countryside as urban concentration from within country variation in urban 60 10 opposed to moving to cities to live in close quarters. Further, concentration at the provincial level for 3,540 provinces in 61 11 when they do urbanize, instead of agglomerating into one the 170 countries with more than one province, controlling 62 12 giant regional “melting pot” megapolis, they may spread over for the 1975 levels of the variables of interest. Drawing on 63 13 smaller cities. data of the Global Human Settlement Layer (GHSL) and the 64 14 This paper presents what is, to the best of our knowledge, DRAFTGHS Settlement Model (GHS-SMOD, (16)) on geo-localised 65 15 the first global empirical investigation of the nexus between population and urban boundaries, we first establish a data set 66 16 ethno-linguistic diversity and major patterns of where people 17 live within countries. We show that initial ethnic diversity re- 18 duces urban agglomeration. This has important consequences 19 as policies which inhibit urbanization and urban concentration Significance Statement 20 can strongly restrict economic growth (5, 6). Yet, economists Urbanization and agglomeration of economic activity are key 21 have largely ignored the role of ethno-linguistic cleavages when drivers of economic development. Many factors underlying city 22 studying agglomeration benefits, urbanization and develop- sizes and locations continue to be well studied. However, a 23 ment, the size distribution of cities, and policies which impact key factor has so far been generally ignored: the role of the 24 concentration (7–14). ethno-linguistic composition of local populations. We address 25 Many anecdotal examples of the impact of ethno-linguistic this gap, drawing on a novel, very detailed dataset on local 26 diversity on urbanization patterns may come to mind. One urban agglomeration and ethno-linguistic diversity. We find 27 example is the archetypical bilingual city of Montreal which has that, in multi-ethnic areas, social tensions arise more easily, 28 stagnated in size since the 1960s, while nearby predominantly discouraging the move to bigger cities. Ethno-linguistically 29 English-speaking cities like Toronto or French-speaking cities diverse regions feature less urbanization and agglomeration, 30 like Quebec-Ville have typically grown by at least 50% over with potentially profound economic consequences. 31 the same time period (15). As a more structured example we 32 pick the two Indian states with the highest degree of ethno- 33 linguistic diversity in India as measured by fractionalization, 1U.J.E., J.V.H, D.R. and K.S. contributed equally to this work. 34 a common measure of diversity in the literature which we 2To whom correspondence should be addressed. E-mail: [email protected], 35 define later. These states, Nagaland and Himachal Pradesh, [email protected], [email protected], [email protected]. www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX PNAS | May 14, 2020 | vol. XXX | no. XX | 1–8 Fractionalization (level 15) 0.0 - 0.1 0.1 - 0.2 0.2 - 0.3 0.3 - 0.4 0.4 - 0.5 0.5 - 0.6 0.6 - 0.7 0.7 - 0.8 0.8 - 0.9 0.9 - 1.0 Fig. 1. Global Map of Ethno-linguistic Fractionalization at the Province Level. Fractionalization is calculated at language tree level 15. See text for data sources and construction. 67 at the 1 km grid level, which distinguishes between city cores, control through country fixed effects for unobservable con- 103 68 dense towns, semi-dense towns, suburbs and rural areas for founding country characteristics (like national governance) 104 69 2015. The GHS project for the first time defines areas such as which also influence the urban structure. 105 70 cities, based solely on population and population density mea- Next, using fractionalization as a measure of ethno- 106 71 sures consistently across the world, with no regards to local linguisitic diversity, we graph three motivating sets of associa- 107 72 administrative borders and to census bureau qualitative views tions. Figure 2 displays the association between a conflict mea- 108 73 on what defines urban areas and cities. This consistency in sure and ethno-linguistic fractionalization, as well as between 109 74 definition across and within countries is an important feature the two urban concentration measures and ethno-linguistic 110 ∗ 75 of our contribution. fractionalization, for all provinces across the world. 111 76 In this paper, we first match the grid cells with fine-grained In panel A of Figure 2 we show with a non-linear regression 112 77 language information, drawing on the World Language Map- that ethnic fractionalization correlates positively with the 113 78 ping System (WLMS) data capturing the traditional languages count of conflict incidents in each province from 1975 to 2015 114 79 (as defined by Ethnologue (18)) present in the early 1990s. (based on data from “Geographical Research on War, United 115 80 Ethnologue contains the number of speakers of all languages Platform”, GrowUP (21)), as postulated at the beginning of 116 81 in a given country and WMLS maps the information of the the article. This is in line with our premise that ethnic diversity 117 82 Ethnologue into the geographic location of ethno-linguistic may go in hand with heightened ethnic tensions and conflict. 118 83 groups. All details of the data construction are relegated to As argued above, this risk of unrest may be a dispersion force, 119 84 the “Data and Methods” Section below.

Ethno-Linguistic Diversity and Urban Agglomeration LSE Research Online URL for This Paper: Version: Accepted Version

Language, Part IV B(I)(A)-C-Series , Series-9

2001 Presented Below Is an Alphabetical Abstract of Languages A

Multilingual Practices in Kullu (Himachal Pradesh, India)

Genitive Marking of Arguments in Kullui (Indo-Aryan)

Minority Languages in India

2011 Census Definitions and Output Classifications

Mapping India's Language and Mother Tongue Diversity and Its

Map by Steve Huffman; Data from World Language Mapping System

BHADARWAHI:AT YPOLOGICAL SKETCH Amitabh Vikram DWIVEDI

Map by Steve Huffman Data from World Language Mapping System 16

Cultural & Migration Tables, Part II-C, Volume-XX, Himachal Pradesh

Documentation of the Kullui Language: Problems, Results, Prospects