From Surnames to Linguistic and Genetic Diversity: Five Centuries of Internal Migrations in Spain
Total Page:16
File Type:pdf, Size:1020Kb
doie-pub 10.4436/jass.95020 ahead of print JASs Reports doi: 10.4436/jass.89003 Journal of Anthropological Sciences Vol. 95 (2017), pp. 249-267 From surnames to linguistic and genetic diversity: five centuries of internal migrations in Spain Roberto Rodríguez-Díaz1,2, María José Blanco-Villegas1 & Franz Manni2,1 1) Área de Antropología Física, Departamento de Biología Animal, Facultad de Farmacia, Campus Miguel de Unamuno. 37007, Salamanca, Spain 2) National Museum of Natural History, Musée de l’Homme. 17, Place du Trocadéro, 75116, Paris, France e-mail: [email protected] Summary - In a previous study concerning 33,753 single Spanish surnames (considered as tokens) occurring 51,419,788 times we have shown that the present-day geography of contemporary surname variability in Spain still corresponds to the political geography of the country at the end of the Middle Ages. Here we reprocess the same database, by clustering surnames with Self-Organizing Maps (SOMs) according to their geographic distribution, to identify the monophyletic surnames showing a geo-historical origin in one of the 47 provinces of continental Spain. They are 25,714, and they occur 12,348,109 times, meaning that about 75% of the Spanish population bears a surname that had a polyphyletic origin. From monophyletic surnames we compute migration matrices accounting for the internal migrations that took place since five centuries ago, when Spanish surnames started to be patrilineally inherited. The mono/ polyphyletic classification we obtain fits ancient census data and is compatible with published molecular diversity of the Y-chromosomes associated to selected Spanish surnames. Monophyletic surnames indicate that i) the provinces exhibiting a higher percentage of autochthonous surnames are also ii) those from which emigration corresponds to a local isolation-by-distance model of diffusion and iii) those that attracted a lower number of immigrants. These are also the provinces where languages other than Castilian are spoken. We suggest that demographic stability explains linguistic resilience, as people prefer to move to areas in which the linguistic variety is more similar to their own. So far the reciprocal influence of migration and language has been investigated at local scales, here we outline how to investigate it at national scales and for time- depths of centuries. Keywords - Spain, Surnames, Languages, Self-Organizing Maps, Migrations, Census data, Y-chromosome. Introduction geographical scales, both of which are difficult to obtain otherwise. By providing evidence for Background migration phenomena in different periods and Family names carry social and economic controlling for them, it is possible to delineate information that granted them inclusion in sev- past genetic isolates and population structures eral interdisciplinary approaches to human his- that have been modified or disappeared alto- tory. Historians, linguists, and geographers can gether (Boattini et al., 2010; Rodriguez-Diaz play as active a role as biologists in surname stud- & Blanco-Villegas, 2010, Boattini et al., 2012; ies and population analyses. Today, in an age of Darlu et al., 2012). global migration (Castles & Miller, 2009), the The large expansion of the available surname distribution of surnames remains far from ran- datasets, both in time and space, has led to the dom and has the potential to allow an intermedi- development of new methods and analytical ate level of access to the recent past and to smaller tools. Among them, and now widely used, are the JASs is published by the Istituto Italiano di Antropologia www.isita-org.com 250 Internal migrations in Spain automatic geographic representations of surname components correspond to the relative frequency diversity which plot the variations of frequency of each surname in the provinces under consid- of a given family-name or a set of family-names eration. Then all the vectors (surnames) are clas- sharing some phonetic or grammatical features sified in a discrete number of clusters by using (see the contributions of Bloothooft and Dräger Kohonen maps (Kohonen, 1982, 1984; Kaski, reported in the collective article by Darlu et al., 1997) or other similar methods, so that each 2012; Bloothooft & Darlu, 2013). Some recent cluster corresponds to a group of surnames hav- statistical methods are also becoming established, ing a similar geographic distribution over the such as Bayesian approaches to infer the origins country: frequent in some provinces and not in of migrants (Darlu et al., 2012), Self-Organizing others. Finally, such groups are plotted over a Maps (SOMs) to automatically identify surnames geographic map to see if there are visible peaks having the same geographical origin (Manni et of frequency corresponding to a single province. al., 2005; Boattini et al., 2012), or approaches to If one assumes that the province where the identify ethno-cultural groups (Mateos, 2011) by relative frequency of each surname is the high- retrieving forenames associated to a given set of est corresponds to the geo-historical origin, it surnames and checking to which other surnames is possible to measure migrations because the they are linked, iteratively, until an optimum is diffusion centre of each family name is known, achieved (Mateos & Tucker, 2008). In all cases, as well as its present-day distribution. In some the classification is empirical, not based on sci- cases, the peak of frequency is geographically entific or ethnologic background, and makes ambiguous because it corresponds to two (or possible to identify clusters of linked individu- more) provinces. Such ambiguities are related to als corresponding to several isolated subgroups the fact that many surnames, spelled in a same existing in the population. All the applications way, independently became the name of unre- listed point to a same endeavour: the depiction lated families located in different areas: they are of contemporary and past migrations and the called polyphyletic surnames. It is obvious that assessment of multiculturalism and assimilation only the surnames with a clear origin in one phenomena. The results have a direct interest for province, that is the monophyletic surnames, are social anthropologists and population geneticists. used to assess migrations (see Manni et al., 2005; Boattini et al., 2012 for details about the pro- Migrations inferred from surnames cedure). Migration patterns can be summarized Aside from special cases, it is often difficult to in two migration matrices, one for the aggregate depict the internal historical migrations occurred immigration- and one for the aggregate emigra- within a given region, surnames can be of help tion-processes that took place over the last five when no alternative documentation is available. centuries. From them the provinces can be clas- They make it possible to identify the direction sified into four categories: 1) Isolated provinces of migrations that took place in, say, a European (low emigration, low immigration); 2) Corridor country over the last four or five centuries but, provinces (high emigration, high immigration); unlike historical registers, they do not say when 3) Unattractive provinces (high emigration, low the migrations took place. They could have hap- immigration); 4) Attractive provinces (low emi- pened anytime between the advent of fixed sur- gration, high immigration). names to the last generation, that is for a time span of about five centuries (Spanish surnames Migration range became fixed starting with the 16th century). Concerning migration distances, Boattini et The way we account here for the intensity of al. (2012) classified them as short-, medium- the internal migrations is quite simple. It con- and long-range, and we do the same here. It is sists in coding the surnames listed in the database reasonable to think that the medium and long of current Spanish residents as vectors whose range movements took place in more recent R. Rodríguez-Díaz et al. 251 times, when the mechanization of transportation is a time when surnames became transmitted in and the industrialization led to massive displace- a fixed way and when Castilian linguistic vari- ment of the population that progressively aban- eties gained prestige and spread out, ultimately doned rural life. Differently, other provinces are leading to the castillanization of family names characterized by very local emigration distances by large surname-change favouring a limited set directed to neighbouring areas; they correspond of prototypical Castilian-and Leónese-sounding to processes that took place within a more tra- surnames (a process lasted until 1870) and ditional frame of displacement, probably when explaining why Spain has a lower number of sur- people used to move by their own means, pro- names than other European countries (Kremer, gressively diffusing. Long or short, the aggre- 1992, 1996, 2001, 2003). gation of migration distances, surname after In this report we reprocess the surname cor- surname, contributes to the establishment of pus of Rodriguez-Diaz et al. (2015) with a dou- coherent migration routes. Aggregated data show ble purpose. The first aim, as we said, consists (Boattini et al., 2012) that neighbouring prov- in the identification of monophyletic surnames inces can be quite different in the number and and in the descrption on the internal migrations the provenance of the immigrants they attracted, that took place in Spain since the beginning but also concerning the directions of the emi- of fixed