<<

doi 10.4436/JASS.92004 JASs Invited Reviews Journal of Anthropological Sciences Vol. 92 (2014), pp. 99-117

Analysing as geographic data

James Cheshire

UCL Centre for Advanced Spatial Analysis, UCL, Gower Street, London, WC1E 6BT, U.K. e-mail: [email protected]

Summary - With most research undertaken within the fields of anthropology and population genetics, geographers have overlooked surnames as a credible data source. In addition to providing a review of recent developments in surname analysis, this paper highlights areas where geographers can make important contributions to advancing surname research, both in terms of its quality and also its applications. The review discusses the emerging applications for surname research, not least in the mining of online data, and ends by suggesting three research themes to ensure the building momentum of surname research continues to grow across disciplines.

Keywords – Surnames, Geography, Regions, Geneology.

Introduction This review will highlight areas where geog- raphers have much to contribute to surname , or surnames, provide almost research, in addition to indicating a broadening every culture with a ubiquitous method for dis- range of applications that may help to increase tinguishing between familial groups. Yet the rou- awareness of the value of surname data. It will tine use of surnames has meant that their cultural demonstrate the growing importance of surnames and geographical significance is often taken for as a source of spatial data and argue for their granted. We frequently make judgments - either increased utilisation both within and beyond consciously or subconsciously - about a ’s geography. This review therefore differs from ancestry or origin based on their surname. It is previous literature [see Colantonio et al. (2003) obvious to those in the UK, for example, that sur- and Darlu et al. (2012)] that has traditionally names such as “ h”, “Jones” and “Macleod” focused on the use of surnames in population are English, Welsh and Scottish in origin, respec- genetics. It will begin by outlining the utility of tively. Placing surnames within a regional con- surnames as a source of geographic population text becomes somewhat more specialist yet many data before covering recent developments in their people would, for example, relate the surname analysis at both the individual and regional level. “Cheshire” to the county of the same in The final sections of the review are concerned . Even within cities surnames unique to with improving existing methodologies in the a particular ethnic or cultural group cluster in context of geography and also working towards particular areas, reflecting the underlying popula- a set of research themes to reflect the increasing tion distribution. From such anecdotes alone, it range of surname research applications. is clear that surnames can contain spatial infor- mation on a variety of scales – from city neigh- bourhoods through to continents - relating to the Surnames and population geography probable origins, and areas of residence, for many of their bearers. The purpose of this review is to Surnames and geography dynamically inter- go beyond such anecdotal conjecture to demon- sect through the potential of genealogical data to strate the ways in which researchers are able to enrich geographic research with regard to migra- generate insights into the geography of surnames. tion patterns, and through the construction

the JASs is published by the Istituto Italiano di Antropologia www.isita-org.com 100 Surnames as geographic data

of personal around ethnic origins or an individual) based on its surname composition, ancestral homes (Otterstrom & Bunker (2012). the inclusion of female surnames is less problem- Surname did not occur simultaneously atic because they could either have adopted a sur- in all places, and surnaming conventions have name from a local male, or still hold a surname always been a product of both cultural (includ- inherited from their . Such conclusions are ing linguistic) and legislative processes. Such also true at a European level, and certainly in rural processes are systematic but not geographically areas, with many matrimonial migrations being uniform, resulting in spatial structuring of sur- less than a few kilometres (Manni et al., 2008). name distributions that may subsequently be In some contexts, such as population genet- obscured by population movements. In Britain ics, it makes to exclude females from fine for example, surnames were developed as a means scale studies of particular surnames (see Bowden to distinguish individuals (particularly men) et al., 2007) but at the more aggregate scale from one another at a when there were very their impact is less clear. Winney et al. (2011), few forenames. Such distinctions were essential for example, found only a minor effect after the for administrative purposes and surnames could exclusion of females from their sample design be passed from one generation to the next to suggesting that the phenomenon of women mar- ensure a record of lineage. This inheritability rying locally continues in many areas of the UK. is extremely important because it creates a - More pragmatically, it is often challenging to sys- maintaining, enduring and culturally significant tematically remove females from large surname surname geography. Patterns of form databases without the gender of each bearer being the basis for genealogical research and enable the recorded. In several papers, such as Longley et al. creation of familial lineages that are often trace- (2011), it was considered best to treat surnames able to the point in time when the surname was with male and female bearers in the same way. first formalized. The extent to which surnames Aside from the uncertainty inherent in assigning offer an unbroken line to the geographic past will genders to individuals based on their forenames therefore correspond to the different time hori- (which are more likely to be available), a strength zons of surname adoption. of this is the use of the most complete population A further consideration in the use of sur- data available which could be compromised with names for population research is the fact that they poor gender-based sampling. In some coun- are largely transferred along male lines and passed tries, such as the Czech Republic (see Novotny to women after . This point is especially & Cheshire, 2012), female derivations of a name relevant in the context of sociological studies are easily identified and can therefore be system- concerned with making broader generalisations atically removed if required. about population mobility. In many countries The following two sections will outline recent the majority of women change their surname developments in the geographic analysis of sur- when they marry, thus adopting the same cultural names. Developments discussed in the first seek marker as their partner (see Valetas, 2001). This to create a geographic classification of individual creates minor problems in the classification of surnames by identifying their point of origin areas (according to their surname compositions) or their areas of highest concentration; those in but more significant problems in the classifica- the second seek to identify regions with similar tion of individuals (according to their surname surname compositions. Almost all recent studies lineage). The surname may represent an entirely have benefited from the increasing availability of different culture to the one the woman was born comprehensive digital databases and the com- into, and of course, cannot be used to apply the putational power to process them: compare, for assumption that women who share the same sur- example, Otterstrom & Bunker’s (2012) access name are more likely to be related than those who to 800 million surnames with Lasker’s (1985) do not. When classifying a region (as opposed to access to a few thousand. J. Cheshire 101

The classification of individual a particular forename. This method is surpris- surnames ingly efficient because the heavily skewed nature of forename and surname distributions ensures Many early studies into the geography of relatively few names can cover a large proportion individual surnames were little more than visual of the population (Tucker, 2001). interpretations of point distributions (or counts) Expanding this principle, Mateos et al. of surname occurrences (see Guppy, 1890). In (2011) have constructed a large naming network the past few decades more sophisticated sta- based on surname and forename pairs. This tistical approaches have been applied to offer a approach reveals about the names range of metrics such as a surname’s geographic because certain pairs appear more commonly concentration, its flows and probable area of within ethnic groups and therefore become clus- origin. All methods rely on the fact that sur- tered together in network space. These so-called names are not geographically random – the vast cultural-ethnic-linguistic (CEL) clusters can then majority exhibit some kind of spatial pattern- be assigned labels to tie them to a specific region ing. For example, surnames derived from place- or cultural group (Mateos et al., 2011). An exam- names within a 50km radius of Manchester and ple output from this is provided in Figure 1. This Birmingham in Great Britain occur at 145% work marks a shift away from name dictionaries of the expected frequency, reducing to 124% and towards automated classification with clus- 50-99km away and 82% of expected over tering. For a full review of approaches to 150km away (Kaplan & Lasker, 1983). These name-based ethnicity classification see Mateos statistics offer important insights into the spatial (2007). As is outlined in the applications section, dynamics of surnames and demonstrate the such approaches have a wide range of contempo- that frequencies often follow standard models of rary applications from classifying patients admit- distance decay. There are now several websites, ted to hospital (see Petersen et al., 2011) to users for example worldnames.publicprofiler.org, that of social media (Chang et al., 2010). enable the simple production of surname maps The implicit assumption in name-based eth- alongside some basic descriptive statistics, offer- nicity classification research is that there is a sin- ing specialists and non-specialists alike the facil- gle area of origin for that surname. This may be ity to undertake similar analysis of the surnames true at the global level, but at the sub-national of interest to them. level surnames have more complex spatial dis- One of the biggest challenges in this area tributions with many exhibiting multiple points remains the creation of a consistent approach of origin and altered distributions. To capture to the inductive analysis of the millions of sur- this, a different approach has been developed by names in circulation. Until relatively recently Novotny & Cheshire (2012) who apply the Dice researchers have selected surnames based on coefficient; a comparative measure that builds manual empirical research, rather than objec- a network of surnames based on the degree of tively selecting them utilising a range of consist- similarity in their spatial distributions. The pro- ent metrics. To address these limitations, several cess is computationally intensive since it pro- entirely automated approaches to the geographic duces an adjacency matrix with a column/ row classification of individual surnames have been for every surname in the population. It is there- proposed. The first, outlined by Tucker (2005), fore less easily scaled to very large datasets of the does not require geographic information; it sim- kind processed by Mateos et al. (2011). In the ply uses a reference list of “diagnostic forenames” case of the Czech Republic (the country used in that have been manually classified into ethnic the study) over 200 million surname proximity groups. These can then be used as a template observations were calculated to create the “Czech for the classification of an entire names register surname space”. Nevertheless, the method was based on the frequency of surnames assigned to able to identify clusters of surnames that shared

www.isita-org.com 102 Surnames as geographic data

Fig. 1 - A sample of the Auckland surnames network. The graph shows the highly structured out- come of naming practices in a city with high rates of immigration from all over the world. The giant component in the centre of the graph has been classified with fastcommunity algorithm into 22 clusters, each depicted by a different node colour. Four subgraphs are magnified to show the tightly knit internal structure of some CEL communities. (A) is classified as part of the giant component (and is South Asian/Indian), the others are Tongan (B), Samoan and other Pacific Islanders (C), and Eastern European (particularly Dalmatian: D). The last three are disconnected from the network’s giant component. Taken from Mateos et al.,2011, doi:10.1371/journal.pone.0022943.g002. The col- our version of this figure is available at the JASs website. J. Cheshire 103

Fig. 2 - Some of the 400 maps displayed in Boattini et al. (2012). In the two maps at the top the two clusters of surnames are most frequent in the province of Bari. For Polyphyletic surnames (those with multiple areas of origin) it is less clear-cut. (C) concerns regional polyphyletic sur- names (Piedmont); whereas (D) shows macroregional polyphyletic surnames (all southern besides Sardinia). Regional polyphyletic surnames are quite frequent in Italy and point to long- lasting regional socio-cultural identities and dialects. S is the number of surnames clustered in the SOM cell and N is the total number of individuals bearing one of the S surnames. Taken from Boattini et al., 2012, p. 254). common origins or cultural characteristics, in maintain geographic relations through their use addition to providing a basis for the creation of of self-organising maps (SOMs) on Dutch and surname regions (a topic explored below). The Italian surnames respectively. The Manni et al. paper also highlights the impact of large-scale (2005) study identified areas of highest concentra- enforced migrations on the geographic structure tion and probable origin for 9000 of the most pop- of surnames as well as their surprising resilience ular surnames in the Netherlands. This approach in many cases despite such population changes. was further developed by Boattini et al. (2012) in The approach of Novotny & Cheshire (2012) the context of Italian surnames and is proposed abstracted surnames into a network space whereas as a general method to unravel population struc- Manni et al. (2005) and Boattini et al. (2012) tures. In this study the origins of 49,117 different

www.isita-org.com 104 Surnames as geographic data

Fig. 3 - The varying spatial extents of 8 British surnames between 1881(black) and 2001 (grey) as defined by the methodology of Cheshire and Longley (2012). In line with known population changes, there is a general trend towards more geographically extensive areas. surnames were identified and, crucially, validated but it does offer the potential for more accurate against proven supervised methods of determin- insights into the origin of surnames than some ing surname origins. The geographic information of those previously discussed. The Bayesian captured by the SOM methodology can be easily approach can also differentiate between male stored and queried in a database by researchers and female migrations through the use of mar- interested in constraining the spatial extent of their riage registers and has the potential for gathering study or for excluding certain types of surname, information about the genetic diversity of a par- such as those with multiple origins. Figure 2 offers ticular area (Degioanni & Darlu, 2001). It seems an example of Boattini et al.’s (2012) outputs. well adapted to the analysis of hundreds, or even Darlu et al. (2011) (and in other papers) have several thousand surnames, but the computa- developed an approach to calculate the “proba- tional power required for the iterative Bayesian bility of geographic origin” of surnames. Whilst approach combined with the need for data from it is often presumed that the area of a surname’s multiple time periods may make it impractical to highest concentration is its likely place of ori- deploy on large population registers. gin, Darlu and colleagues calculate the weighted The aforementioned research is reliant on mean probability of geographic origin and feed pre-determined and discrete spatial units (such as it into a Bayesian formula, which is iteratively administrative regions or prefectures). Discrete recalculated until a convergence criterion is met conceptualisations of geography are those bound (see Degioanni & Darlu, 2001; Chareille & to a particular set of underlying spatial units Darlu, (2010). This method is limited in that and can only show transitions at the borders of it requires data from two or more time periods, such units. These are the most common form of J. Cheshire 105

representation of surname data, and population surnames, conceptualise geographic space as data more broadly, and are therefore more widely either continuous or discrete. Sokal et al. (1992) used in analysis. The criteria for the creation of take a continuous approach to produce fre- such units is often administrative and therefore quency surfaces of 100 surnames that are then to change over time, making it challeng- combined to find common boundaries (dramatic ing to undertake temporal analysis. changes in the frequency distribution). This The work presented by Cheshire & Longley study attempted direct comparisons between (2012) overcomes some of the above challenges genetics and surnames by suggesting that abrupt by combining a surface approach with the practi- surname boundaries were the result of barriers to cal advantages of discrete spatial units. Cheshire population movement and mixing. Interestingly, & Longley (2012) achieve this through the use the researchers found no such relationship, of kernel density estimation (KDE) to produce instead suggesting that the boundaries were the a surname density surface, or heat map. KDE product of historical factors related to the origin transfers the input data (at a fine spatial scale) of surnames (Sokal et al., 1992). Nonetheless, onto a regular grid of density estimates that can stronger associations between surname transi- then be compared over time. To simplify the out- tions and physical barriers to gene flow have puts, and facilitate comparison, a contour line is since been found in other studies beyond Britain drawn around the areas of highest surname con- through the use of barrier algorithms, such as centration [see Cheshire & Longley (2012) for Monmonier’s algorithm (see Manni et al., 2004). full methodology]. Figure 3 shows the extent of The most compelling is the implementation of 8 surnames in Great Britain in 1881 and 2001 the algorithm in the Province of Italy, delineated by their respective areas of highest where a number of the identified barriers closely concentration. It is clear that the majority of match topographic features known to restrict surnames in Britain have remained relatively population movement. The approach, however, static in terms of their highest concentrations has only been published in a few studies from according to this measure and that their spatial the same authors (Manni & Barrai, 2001; Manni extents have also changed little. There are many et al., 2004, 2008) and would therefore benefit more metrics that can be applied to surnames as from further research. a result of this methodology and they provide The use of discrete spatial units to create uni- a useful means to partition large surname data- form surname regions is a common approach. bases according to a range of pre-defined criteria. Almost all studies follow a process of induc- It is clear that the identification of surname tive generalisation to produce surname regions origins, either geographical or cultural, can have through the creation of a dissimilarity matrix. important applications in a broad range of fields. This is based on the quantitative comparison They are, however, limited to the study of sur- of the observed mix of surnames in spatial units names on an individual basis and therefore can- used. One of the most popular methods for cre- not offer comprehensive insights into the regional ating such a matrix was pioneered by Lasker dur- contexts in which the surnames, and their bear- ing the 1970s and 1980s (Lasker, 1985). Lasker ers, exist. The next section therefore addresses and colleagues selected surnames from telephone the ways in which surname distributions can be directories or marriage records to undertake aggregated to reveal regional geographies. studies of both regional [for example in Henley- on-Thames (Fox & Lasker, 1983) or Oxfordshire (Lasker, 1999)] and national level [see Mascie- Surname regions & Lasker (1990) a Lasker (1985)]. Other popular methods exist, not least the Nei Distance Studies that create regional geographies from (see for example Manni et al., 2008), that serve surnames, like those interested in individual to indicate the degree of similarity (or )

www.isita-org.com 106 Surnames as geographic data

Fig. 4 - Plots of the first two dimensions from PCA performed on a Lasker Distance matrix of Japanese municipalities. Approximate regions have been labeled as they appear in the cloud of points. The main plot (A) excludes the Ryukyu Islands that, as the inset (B) shows, exhibit significant variation from the majority of municipalities in Japan. Taken from Cheshire et al., 2013, p. 10). in the surname composition between geographic based on surname dissimilarity. Figure 4, taken areas. It is not uncommon to calculate a range of from Cheshire et al.’s (2013) study of Japan measures as each are sensitive to different aspects shows how PCA can reveal a great deal about the of the surname distribution, such as the skew of relationship between surname composition and the underlying population and the importance physical barriers. Results from cluster analysis of small numbers of rare names in particular spa- can be mapped, as each spatial unit is assigned a tial units (see Manni et al., 2008). grouping to maximise within-region similarities Surname dissimilarity matrices of the sort and between- region differences. generated by the above measures can become These methods have been successfully used extremely large given the number of pairwise at the broad European level (Scapoli et al., 2007; comparisons required and therefore need sum- Cheshire et al., 2011) through to the small- mary measures to reveal key patterns. Hierarchical island level (Branco & Mota-Vieira, 2004). The clustering algorithms, such as Wards, or stochas- bulk of this research has been undertaken at the tic approaches, such as K-Means, are popular in national-level by one group of authors. Examples this context, as are data reduction techniques of their European papers include: Austria (Barrai such as principal components analysis (PCA) and et al., 2000); Switzerland (Rodriguez-Larralde multidimensional scaling (MDS). Results from et al., 1998a); (Rodriguez-Larralde PCA and MDS can be plotted to give an impres- et al., 1998b); Italy (Manni & Barrai, 2001); sion of how well the geographic configuration (Rodriguez-Larralde et al., 2003); of the spatial units matches their configuration (Barrai et al., 2004); the Netherlands (Manni et J. Cheshire 107

al., 2006); and (Scapoli et al., 2005). All 16 European countries - 8 million surnames from studies demonstrate the spatial structure of sur- 152 million individuals - with data drawn from names in the countries studied, with fewer com- a range of sources including national population monalities between populations the further away registers and telephone directories. The resulting they were. regionalization, shown in Figure 5, shows 14 key Such differences are to be largely surname regions in inductively generated indicative of the different linguistic and cultural from the Lasker Distance calculation and subse- histories within or between the countries stud- quent consensus clustering. ied. This has been confirmed, on the European The use of consensus clustering (see Monti et level at least, by a number of studies. In France al., 2003) has been used in a number of studies (Scapoli et al., 2005) and Belgium (Barrai et al., within the genetics community. It offers a range 2004) the dialect transitions closely match those of metrics that help to determine both the opti- of surnames, meaning that measures of each can mal cluster solution, in terms of the number of be used interchangeably. In the Netherlands, clusters or regions, and the clustering method however, Manni et al. (2008) found no statisti- selected. These are important issues within the cally significant relationship between surnames classification literature as different methods pro- and dialect. This finding is interesting given the duce different results and the point at which the dialects of the country, and suggests that other differences between groups cease to be significant factors can influence the surnames chosen. In the is context dependent. What may be optimal in case of the Netherlands, Manni et al. (2008) cite an image classification sense will not, for exam- religion as a possible explanation, with surname ple, be the same for population datasets because transitions occurring along the border between clustering algorithms can be blind to a range of Protestant and Roman Catholic areas. The cultural factors identified in more qualitative extent to which such differences (between sur- research. The promise of consensus clustering, as names and language) are visible depends on the discussed by Cheshire et al. (2011), is that it can scale and linguistic diversity of the populations offer the researcher a range of outcomes along- studied. If establishing broad surname regions in side a series of metrics that can be used to inform Europe, as Scapoli et al. (2007) have done, it is the final classification. clear that language is the key determinant. Classifications of surnames both on an indi- The Scapoli et al. (2007) study uses data from vidual basis, and on a regional basis, as outlined 8 European countries to create a continental- above, have been key drivers in the geographic level surname regionalisation. It selected data for analysis of surnames to date. Since much of this 2094 towns and cities grouped into 125 spatial work is situated in the population genetics litera- units. Clear regionalisation patterns in surname ture, more can be done to fully exploit surnames compositions emerged, closely matching the as a source of geographic data. The remainder of national borders for the eight countries included, this paper will outline possible methodological with exceptions showing the geo-historical dis- advances before discussing the expansion of sur- tribution of languages. Whilst being extensive, names research for a wider range of applications. both geographically and in terms of the number of surnames sampled, the work is still limited by its partial sampling of “representative” locations. Enhancing current approaches to The study by Cheshire et al. (2011) (see Figure 5) geographic surname analysis offers a couple of advances on previous work. The first relates to the volume and geographic extent This next section seeks to highlight the of the surnames analysed, whilst the second is merits of treating surnames as a source of geo- methodological, through the use of consensus graphic data and to outline potential improve- clustering to create the regions. The paper covers ments that will be of benefit to many of the

www.isita-org.com 108 Surnames as geographic data

studies undertaken to date. It is clear that sur- work of Mateos et al. (2011) to create a global name research has undergone a transition from database of surname cultural-ethnic-linguistic relatively data poor to extremely data rich and (CEL) groups would, for example, benefit from this, as Darlu et al. (2012) also note, will require the inclusion of some kind of spatial validation the development and application of new statisti- or comparison. This would serve to further dis- cal methods to better handle sampling and lem- aggregate the bearers of a given surname based matisation (the grouping of related surnames). on their population history. For example in the In addition, this abundance of data offers the USA the surname “Lee” is relatively common chance to address some of the challenges associ- and classified as “British” but many of its bearers ated with geographically- referenced data. on the Pacific Coast are of Asian origin. Given that more recent migrant groups have different geographic distributions from early settlers (in Refinements to the geographic the case of the USA) or native groups, a spa- analysis of surnames tial clustering approach, such as that taken by Novotny and Cheshire (2012) or Degioanni & Two broad approaches to the geographic Darlu (2001) would identify these and could analysis of surnames were discussed above. As yet, inform the CEL classification. One could there- there has been no large-scale attempt to feed the fore imagine the inclusion of “recent” or “estab- outputs from the individual surname studies into lished” migrant groups in the list of categories. those seeking broad regional geographies. Such a A further application of the kinds of geo- combination could be of significant value in the graphic surname analyses described in this exclusion of polyphyletic surnames, for example. review is to sample design because they reveal, Even if the capability exists to do so, the analy- and therefore help to exclude (or include), sis of a complete population register is not always members of a population who live in areas sub- desirable. If surnames can be selected for study ject to large-scale changes. Such areas are most based on a number of relevant attributes this may likely to be urban and have, in the past, been lead to improved results in such contexts. The excluded based on arbitrary criteria – typically a ability to process individual surnames automati- distance threshold – in studies seeking to iden- cally, in the way that Boattini et al. (2012), for tify the genetic characteristics of a native popu- example, have demonstrated with SOMs, offers lation (see Winney et al., 2011). Urban areas, the chance to generate such attributes en masse however, are not uniform in either their shape and will enable an informed sampling procedure or exposure to the currents of migration over based on a scalable classification methodology. time. It therefore follows that a more sophisti- The exclusion of polyphyletic surnames, for cated approach to their identification, at least in example, is commonplace in genetic studies but terms of population structure, stands to improve not in all studies of surname regionalisation. This existing practices. Surnames offer a means to do may largely be due to the labour intensive way in this so long as the geographic units used in their which polyphyletic surnames have been identi- analysis are sufficiently small to enable urban fied through the study of each surname in turn. areas to be meaningfully subdivided. As Figure 6 To create a representative regional geography, shows, this was the case in Longley et al. (2011) many thousands (if not millions) of surnames who were able to identify areas of similarity should be used, rendering manual approaches between urban areas due to their diverse popula- impractical. In some cases, therefore, the impact tions comprising a large number of international of polyphyly is intentionally overlooked. migrants. Such areas can be easily excluded Surname databases can also be further refined from further iterations of a sampling or region- for specific applications based on a combination alization strategy if necessary in to reduce the of the approaches outlined in this review. The impacts of such groups. J. Cheshire 109

Fig. 5 - Maps showing the spatial distributions of each of the 14 cluster allocations (A) and their respective robustness values (B). Higher robustness values represent a better result. On the left hand plot each cluster has been assigned a unique pattern. A full-color version can be found at spa- tialanalysis.co.uk/surnames. Taken from Cheshire et al., 2011, p.586.

Integrating spatial concepts partitioning of the data before the analysis may therefore mask culturally distinct populations, Given the relative lack of interest in surnames for example if they comprise relatively small from geographers (Zelinsky, 1997) few studies towns grouped together, whilst increasing the have given serious consideration to a fundamen- impact of relatively uniform areas if they are par- tal issue with spatial data - the Modifiable Areal titioned into a large number of geographic units. Unit Problem (MAUP) (see Openshaw, 1984). This is illustrated in Figure 7. The MAUP refers to the way in which the input Address-level surname data (especially from geographic units of any analysis can impact the sources such as digital telephone directories and outcome of that analysis, especially in relation government databases) are now easily obtainable so to the strength of correlation between variables. there is a reduced need to aggregate surname counts The significance of the geographic units used to large administrative areas. Whilst in many cases in surname studies cannot be overstated for high levels of granularity are unnecessary (and may this reason. In many cases the input geographic in fact be detrimental due to small numbers), a units are utilised, often because they are the only range of spatial units should be explored in any geographic referencing information available or analysis to assess their impact on the results. This have been created for administrative convenience was the approach taken by both Longley et al. (such as municipalities and government districts) (2011) in their regionalisation of Great Britain and and are therefore not subject to the same cultural also Novotny & Cheshire (2012) in their analysis influences that surnames have been. The spatial of Czech surnames. In both studies comparisons

www.isita-org.com 110 Surnames as geographic data

Fig. 6 - Maps taken from Longley et al., 2011 (p. 512) illustrating the similarity in surname composi- tion between 9 urban areas of Great Britain. were made between fine scale and broader scale used in the study of population data; something spatial units. Such comparisons serve to remove that surnames are well-placed to do with their clear speculation about the impact of geographic units links to past populations. Given the relative ease on the results and can confirm the presence of with which such comparisons can now be made, distinct surname regions resulting from legitimate one could imagine such work as a routine precursor transitions in population. to any in-depth large-scale analysis of population In addition, a methodology for assessing the data, provided the data are available. cultural significance of administrative geographies based on surnames has been proposed by Cheshire et al. (2013) using a comprehensive address level Surnames as geodemographic database. They compare Japan’s surname regions to indicators its current administrative geography and also its sys- tem of historic prefectures. Such work gives insights Beyond the technicalities of spatial units, into the cultural significance of the geographic units geography and its sub-disciplines can offer J. Cheshire 111

further insights into population structure. Geodemographics, for example, is the analysis of people by where they live (Sleight, 1997) and is premised on the idea that the location of an individual, or group of similar individuals, pro- vides useful demographic information ( et al., 2005). Geodemographics in this broad sense is relevant for a range of reasons. For example, the work of Winney et al. (2011) can be thought of as producing a geodemographic classification based on what is revealed by the spatial distri- butions of people’s surnames. This is one of a number of studies to explore how surnames can be used, for example, to improve sample designs in population genetics through targeting specific areas of surnames. This work, in principle, is lit- tle different from a retailer or health care pro- vider that uses geodemographics to target spe- cific population groups based on socio-economic characteristics. It can also be used as a basis to Fig. 7 - A simple demonstration of the impact of attach relevant genetic attributes to conventional differing administrative units on the geographic social surveys. partitioning of data. If the dashed boundaries Previous surname research has sought to rep- are used, all settlements (A, B, C and D) are treated separately in any pairwise comparisons resent similarities or differences between con- or other analysis, whilst the solid boundaries tiguous regions, largely in the context of clinal group A with B and C with D. The solid bounda- (or gradual) changes in the distributions punctu- ries may therefore smooth any variation if, for example, C and D were settlements with differ- ated by the occasional abrupt transition. There ent naming conventions whilst analysis using has been relatively little interest in the apparently the dashed boundaries will detect it. close linkages, in terms of surname composi- tions, between regions that are geographically distant. Only one study, Longley et al. (2007), has sought to establish any causality and impacts reflects the large number of Scottish migrants of these linkages. Such regions are often quite and their descendants who live in the town. small and therefore likely to have been missed Corby’s link to Scotland would be overlooked if by the relatively coarse granularity of previous there is the presumption that geographic prox- research. The idea that similarity/ difference is imity is the only driver of social similarity. more than the product of geographical distance The link between surnames and demographic has long been recognised in geodemographics characteristics more broadly is worthy of further and represents important exceptions to the idea research. Some, such as Clark (2013), believe that populations in close geographical proximity they can be a measure of social mobility in spe- are likely to be more similar than those further cific cases, but on a more general level surnames apart (based on the often quoted “Tobler’s First can be usefully applied in a range of contexts, Law of Geography” (Tobler, 1970). In Longley not least healthcare. Petersen et al. (2011) were, et al. (2011), for example, the town of Corby in for example, able to determine the utilisation central England appears to have more in com- of healthcare services by different ethnic groups mon with Scotland in terms of its surname com- as identified by their surnames. The authors position than with its neighbouring areas. This concede that surname classification may not

www.isita-org.com 112 Surnames as geographic data

differentiate between recent and settled migrants user-contributed surnames and other pieces of (or those from later generations) - such a dis- family history information (see Otterstrom 2008 tinction may be important if there are different for more information). behaviours between these groups - but the extent The explosion in online social networking to which this is the case is unclear (Petersen et al. websites creates a further means to discover gene- 2011). Within the healthcare domain there are alogical connections between distant relatives many studies that use surname lists to identify with similar geographic histories (Otterstrom & patients who are at risk from particular illnesses Bunker, 2012; Timothy & Guelke, 2008). Whilst more prevalent in certain ethnic groups (see for not all the data generated from websites such as example Nasseri, 2007; Fiscella & Fremont, Facebook and Twitter are easily accessible, com- 2006). The approaches taken in these studies panion applications are able to, with the users are more simplistic than those described above consent, collect data. In addition, many com- and so healthcare presents a research domain that panies are keen to gather as many demographic could stand to benefit a great deal from the geo- indicators about their users as possible and sur- graphic analysis of surnames. names represent an enticing way to gauge ethnic Conceiving of surnames as geodemographic and cultural groupings. et al. (2010) have, indicators facilitates a link to a broader range for example, derived the ethnic composition of of research applications. Healthcare, discussed Facebook’s users using a simple name-based eth- above, is one of just one domain that could ben- nicity classification, whilst Ambekar et al. (2009) efit from the wider utilization of surnames and achieved a basic ethnicity classification from their associated metrics. The final two substan- Wikipedia data. It is only a matter of time before tive sections of this paper touch on a few more similar methodologies are applied to other social research areas where surname data could offer media platforms, such as Twitter. significant benefits. Surnames can also be used to gauge the repre- sentativeness of social media data of the popula- tion at large. One could, for example, perform Expanding applications simple comparisons between the distribution of surnames in a particular area from social media Technological innovations are facilitating and the distributions from an official population access to new datasets that would benefit from register to reveal selectivity among certain cul- the kinds of analysis honed over the many dec- tural groups. In addition, the linkages between ades of surname research. At the very least, stud- individuals on such sites could serve as a useful ies of surname histories conducted by academics validation of whether those sharing surnames or genealogists should be integrated into large- with similar geographic distributions are more scale databases and used to offer depth to the likely to be in contact. Finally, the creation of an breadth provided by inductive approaches. This online network of relatively alike (in the context is something that commercial com- of surnames at least), individuals would make panies will have and researchers could negoti- the recruitment for any genetic sampling study ate access to. The information carefully collated much more straightforward as the information by enthusiasts may, for example, validate some about future events could be efficiently targeted of the automated approaches discussed above and disseminated. since there is a critical mass of individuals and The ethics of such work are yet to be fully groups keen to improve surname datasets. One considered and likely to be subject to concerns excellent example of the depth that volunteered similar to those leveled at research into popula- information can offer is the Church of Jesus tion relatedness more generally. Concerns have Christ of Latter Day Saints’ FamilySearch (fami- been raised about the use of surnames and lysearch.org) website that hosts over 800 million population genetics research for DNA databases J. Cheshire 113

compiled by governments and their security ser- and a synthesis of current advances to tease out vices. It would be unhelpful if the automated different population episodes across space and analysis of surnames from new forms of interac- time. Despite Darlu et al.’s (2012) call for an tion, such as social media platforms, were to be interdisciplinary approach, these appear quite perceived in this way as some kind of “biopo- specific and can still be confined to the domains litical informational gathering” (Nash, 2005, p. of anthropology and population genetics. The 457). In practice, however, it appears that the further research proposed here therefore seeks enthusiasm for discovering personal ancestry to augment those listed above through three key outweighs concerns about this practice (and the themes: further development of new methodo- tools used to do it) so the importance of social logical approaches; more attention to non-West- media in this domain is likely to grow. ern countries and those with less stable popula- The innovative classification of surnames has tion histories; an embracing and expanding of also been successfully applied to a number of new applications for surname data. more established datasets. In practice it is often With regard to the first, many of the meth- not possible with conventional questionnaires odological approaches applied to surname data to get a highly detailed sense of the ethnic/cul- are concerned with the identification of spatial tural composition of an area (Aspinall, 2009). patterns or pairwise comparisons between geo- The development of automated methods to dis- graphic areas. Such concerns are not unique to cern surname origins offers the potential to cre- surname datasets, with fields such as semantics, ate highly disaggregate classifications of names , economics and regional science having (and therefore their bearers) without the need much to offer. Novotny & Cheshire (2013) for for questionnaires or other contextual data. The example used the Dice and Jaccard coefficients results will not have the same level of accuracy, from economics to build a surname network, but they offer a marked improvement in repre- and Longley et al. (2011) used a number of senting smaller population groups. These can also innovative visualisation techniques, such as word be compared to, and validated against, more in- clouds, to demonstrate the surname composi- depth social surveys to get a sense of the accuracy tions of particular regions. In addition, method- of any name-based classification and to determine ologies exist to create optimal configurations of any systematic errors in the methodology. As has geographic units at multiple scales (see for exam- been discussed above, the inclusion of spatial ple Openshaw & Rao, 1995) and these may be information about the distribution of surnames of use if surname analysis is to move beyond the at the sub-national level may serve to improve often rigidly defined administrative units it cur- name-based ethnicity classification further and rently relies on. Sensitivity analysis of the results address some of its limitations related to cultural with different geographic units and explorations groups sharing similar naming preferences. into their cultural significance (see Cheshire et al., 2013) will offer interesting insights and a fresh perspective on the results of much of the Future trends analysis conducted to date. Arguably, the big- gest limitation of the growing number of large In the concluding part of their review Darlu datasets available to surname research is their et al. (2012) identify six future research trends relatively low signal to noise ratio in comparison that align well with what has been discussed here. to carefully compiled (but much less extensive) The six areas include: the determination of the datasets. The integration of multiple techniques most probable geographical, temporal and cul- (as discussed in this review) offers the chance tural origin of surnames; the identification of to address this through the iterative refinement common surname lineages; the identification of of large-scale databases utilising informed data barriers to cultural and population interaction; mining and surname classification.

www.isita-org.com 114 Surnames as geographic data

The second theme - to move away from through the generations…and in well-defined Western countries and those with stable popu- communities, than in the accumulation of sur- lation histories - seeks to widen the appeal of names on a wider geographical scale” (p. 172). surnames as general demographic indicators. It is hoped that this review demonstrates that Western countries provide the focus for much whilst this may be a dominant direction for sur- of the research outlined in this review and so names research it need not be the only one. The the move east or towards developing countries final theme - an encouragement to move beyond offers the potential for gaining fresh insights traditional applications - reflects this. Interest in into the population structure of such countries. surname research is set to grow beyond what can It is acknowledged that there has been surname be seen as its current core disciplines. There is research in many of these regions, not least in much to offer fields such as social media analysis, Japan (see Cheshire et al., 2013; Takemitsu, 1998) volunteered surname data and social surveys. In and several countries in South America (see for addition, the challenges surrounding “big data” example Barrai et al., 2012; Rodriguez-Larralde et are something many surname researchers are well al., 2011; Dipierri et al., 2011), but the potential placed to address, being experienced in exten- remains to gain the depth of insight achieved in sive data mining and classification methodolo- the literature pertaining to European surnames. gies. In short, there are benefits to learning new In addition, there have been few studies on the approaches from a range of disciplines, but this significance of surnames in places where there should not be a one-sided relationship; many of has been substantial population change either the lessons learnt from surname analysis can offer through forced migration, such as in the Czech insights to other disciplines. Republic (Novotny & Cheshire, 2012), or though the gradual ebb and flow of population change. In parts of the world with large migrant popula- Conclusion tions the focus tends to be on excluding them in that the native population can be studied This review paper has sought to demonstrate (see for example & Usaqúen, 2012). The the importance of the analysis of surnames as reasons for this stem from the interest in surnames a geographic dataset. It has discussed a range as indicators of genetic structure, but as surname of studies that, through their surname analysis, databases become more widely available they can have generated new insights into population be used as more general social indicators. Longley structure. It is hoped that the wealth of research et al. (2007), for example, used surnames to iden- presented here inspires further work in order that tify migrants who moved from one part of Great surnames become a more widely used source of Britain (Cornwall) to another (Middlesbrough) data both within the geography community and and numerous other studies have identified areas beyond. There is potential for integration of the with relatively large migrant communities. From two key research strands that focus on individual a social science perspective, many of these areas surname distributions and regional characteris- are the most dynamic and interesting. As with tics respectively. It is also clear that, aside from the healthcare examples discussed above, surname methodological developments, the breadth and classifications can provide additional attributes to depth of surname research is set to increase and augment standard social datasets and thus offer a become more interdisciplinary as new popu- fresh layer of insight. It is up to existing surname lation datasets from the likes of social media researchers to demonstrate this and promote the become available and require the insights that wider use of surnames in the social sciences. surnames provide. Surname research is therefore Darlu et al. (2012) also see the future of sur- in a very strong position with many new avenues name studies as lying more in the “rich infor- to pursue, not least by geographers and anthro- mation provided by the set of data preserved pologists, over the coming years. J. Cheshire 115

Info on the web Carrieri A. & Scapoli C. 2012. Surnames in Chile: a study of the population of Chile through http://worldnames.publicprofiler.org/ isonymy. Am. J. Phys. Anthropol., 147: 380-388. Boattini A., Lisa A., Fiorani O., Zei G., Pettener Map the distribution of a surname across D. & Manni F. 2012. General method to un- approximately 300 million people in 26 ravel ancient populaton structures through sur- countries of the world. names, final validation on Italian data. Hum. Biol., 84: 235-270. http://gbnames.publicprofiler.org/ Bowden G.R., Balaresque P., King T.E., Map both historic (1881) and contemporary Z., Lee A.C., Pergl-Wilson G., Hurley E., surname distributions for Great Britain. S.J., Waite P., Jesch J., Jones A.L., M.G., Harding S.E. & Jobling M.A. https://familysearch.org 2008 Excavating past population structures by One of the largest repositories of surname data surname-based sampling: the genetic leagcy of available, this website is a useful starting point the Vikings in Northwest England. Mol. Biol. for detailed surname studies. Evol., 25: 301-309. Branco C. & Mota-Vieira L. 2004. Population http://www.peopleofthebritishisles.org/ structure of São Miguel Island, Azores: a sur- This project is one of the largest and most name study. Hum. Biol., 75: 929-939. detailed studies into the genetics’ of the Chang J., Rosenn I., Backstrom L. & Marlow C. population of the . 2010. ePluribus: ethnicity on social networks. Proceedings of the Fourth International AAAI Conference of Weblogs and Social Media, pp. 18-25. References Chareille P. & Darlu P. 2010. Anthroponymie et migration: quelques outlies d’analyse et leur Alonso L. & Usaqúen W. 2012. Y-Chromosome application à l’étude des déplacements dans and surname analysis of the native islanders of les domaines des Saint-Germain-des-Prés au San Andrés and Providencia (Colombia). Homo IXe siécle. In Bourin M. & Martinez Sopena (in press). P. (eds): Anthroponymie et migrations dans Ambekar A., Ward C., Mohammed J., Male S. & la Chrétienté Médiévale, pp. 41-73. Casa de Skiena S. 2009. Name ethnicity classification Velàzquez Madrid, Spain (Collection de la Casa from open sources. Proceedings of the 15th ACM de Velàzquez 116). SIGKDD International Conference on Cheshire J. A., Longley P. A., Yano K. & Nakaya Discovery and Data Mining, pp. 49-58. T. 2013. Japanese surname regions. Papers in Aspinall P. J. 2009. The future of ethnicity clas- Regional Science, Doi: 10.111/pirs.12002. sifications. Journal of Economic and Migration Cheshire, J. A. and Longley, P. 2012. Identifying Studies, 35: 1417-1435. spatial concentrations of surnames. International Barrai I., Rodriguez-Larralde A., Mamolini E., Journal of GIS. 26: 309-325. Manni F. & Scapoli C. 2000. Elements of the Cheshire J. A., Mateos P. & Longley P.A. 2011. surname structure of Austria. Ann. Hum. Biol., Delineating Europe’s cultural regions: popula- 27: 607-622. tion structure and surname clustering. Hum. Barrai, I. Rodriguez-Larralde, A. Manni, F. Biol., 83: 573-598. Ruggerio, V. Tartari, D. Scapoli, C. 2004. Clark G. 2013. Surnames and social mobil- Isolation by language and distance in Belgium. ity. American Economic Association Annual Ann. Hum. Genet., 68, 1: 1-16. Meeting Papers. http://www.aeaweb.org/ Barrai I., Rodriguez-Larralde A., Dipierri J., Alfaro aea/2013conference/program/meetingpa- E., Acevedo N., Mamolini E., Sandri M., pers.php (accessed April 2013).

www.isita-org.com 116 Surnames as geographic data

Colantonio S., Lasker G., Kaplan B. & Fuster V. spatial analysis of surnames. Geoforum, 42: 2003. Use of surname models in human popu- 506-516. lation biology: a review of recent developments. Longley P., Webber R. & Lloyd D. 2007. The Hum. Biol., 75: 785-787. quantitative analysis of family names: historic Darlu P., Bloothooft G., Boattini A. & Brouwer L. migration and the present day neighborhood 2012. The family name as socio-cultural feature structure of Middlesbrough, United Kingdom. and genetic metaphor: from concepts to meth- Ann. Assoc. Am. Geogr., 97: 31-48. ods. Hum. Biol., 84: 169-214. Manni F. & Barrai. I. 2001. Genetic structures Darlu P., Brunet G. & Barbero D. 2011. Spatial and linguistic boundaries in Italy: a microre- and temporal analyses of surname distributions gional approach. Hum. Biol., 73: 335-347. to estimate mobility and changes in historical Manni F., Heeringa W., Toupance B. & Nerbonne demography: the example of Savoy (France) J. 2008. Do surname differences mirror dialect from the eighteenth to the twentieth century. In variation? Hum. Biol., 80: 41-64. Gutmann M.P., Deane G.D., Merchant E.R. & Manni F., Guerard E. & Heyer, E. 2004. Sylvester K. (eds): Navigating Time and Space Geographic patterns of (genetic, morphologic, in Population Studies, International Studies in linguistic) variation: how barriers can be de- Population 9. Springer, New York. tected by using Monmonier’s algorithm. Hum. Degioanni A. & Darlu P. 2001. A bayesian approach Biol., 76: 173-190. to infer geographical origins of migrants through Manni F., Heeringa W. & Nerbonne J. 2006. To surnames. Ann. Hum. Biol., 28: 537-545. what extent are surnames words? Comparing Diperri J., Rodriguez-Larrralde A., Alfaro E., geographic patterns of surname and dialect vari- Scapoli C., Mamolini E., Salvatorelli G., ation in the Netherlands. Literary and Linguistic Caramori G., De Lorenzi S., Sandri M., Computing, 21: 507-528. Carrieri A. & Barrai I. 2011. A study of the Mascie-Taylor C. & Lasker G. 1990. The distribu- population of Paraguay through isonymy. Ann. tion of surnames in England and Wales: a model Hum. Genet., 75: 678-687. for genetic distribution. Man, 25: 521-530. Fox W. & Lasker G. 1983. The distribution of Mateos P. 2007. A review of name-based ethnicity surname frequencies. Int. Stat. Rev., 51: 81-87. classification methods and their potential in pop- Fiscella K. & Fremont A. 2006. Use of geocoding ulation studies. Popul. Space Place, 13: 243-273. and surname analysis to estimate race and eth- Mateos P., Longley P.A. & O’Sullivan D. 2011. nicity. Health Services Research, 41: 1482-1500. Ethnicity and population structure in person- Guppy H. 1890. Homes of family names in Britain. al naming networks. PLoSONE, 6: e22943. Harrison and , London. doi:10.1371/journal.pone.0022943 Harris R., Sleight P. & Webber R. 2005. Monti S., Tamayo P., Mesirov J. & Golub T. 2003. Geodemographics, GIS and neighbourhood target- Consensus clustering: a resampling based meth- ing. Wiley and Sons, Chichester. od for class discovery and visualisation of gene Kaplan B. & Lasker G. 1983. The present distri- expression microarray data. Machine Learning, bution of some English surnames derived from 52: 91-118. place names. Hum. Biol., 55: 243-250. Nash C. 2005. Geographies of relatedness. Trans. Lasker G. 1985. Surnames and genetic structure. Inst. Br. Geogr., 30: 449-462 Cambridge University Press, Cambridge. Nasseri K. 2007. Construction and validation of Lasker G. 1999. The hierarchical structure of an a list of common Middle Easter surnames for urban town, Kidlington, Oxfordshire examined epidemiological research. Cancer Detect. Prev., by the coefficient of relationship by isonymy. J. 31: 424-429 Biosoc. Sci., 31: 279-284. Novotny J. & Cheshire J. A. The surname Longley P., Cheshire J. & Mateos P. 2011. Creating space of the Czech Republic: examining a regional geography of Britain through the population structure by network analysis of J. Cheshire 117

spatial co-occurrence of surnames. PloSONE, 7: subcontinental populations through isonymy. e48568. Doi: 10.1371/ journal.pone.0048568 Theor. Popul. Biol., 71: 37-48 Openshaw S. 1984. The modifiable areal unit Sleight P. 1997. Targeting customers: how to use geo- problem. CATMOG 38, GeoBooks, Norwich. deomographic and lifestyle data in your business. Openshaw S. & Rao L. 1995. Algorithms for NTC Publications, Henley on Thames. reengineering 1991 census geography. Environ. Sokal R., Harding R., Lasker G. & Mascie-Taylor Plan. A, 27:425-446. C. 1992. A Spatial Analysis of 100 surnames in Otterstrom S. 2008. Genealogy as religious ritual: England and Wales. Ann. Hum. Biol., 19: 445-476. the doctrine and practice of family history in Takemitsu M. 1998. Surnames and Japanese. the Church of Jesus Christ of Latter Day Saints. Bungeishunju, Tokyo. In Timothy G. & Guelke J. (eds): Geography Timothy D. J. & Guelke J. K. (eds) 2008. and genealogy: locating personal pasts, pp. 137- Geography and genealogy: locating personal pasts. 151. Burlington VT, Ashgate. Ashgate, Aldershort, UK. Otterstrom S. & Bunker, B 2012. Genealogy, Tobler W. 1970. A computer movie simulat- migration, and the intertwined geographies of ing urban growth in the Detroit region. Econ. personal pasts. Ann. Assoc. Am. Geogr., Doi: Geogr., 46: 234-241. 10.1080/00045608.2012.700607. Tucker D. K. 2001. Distribution of forenames, Petersen J., Longley P., Gibin M., Mateos P. & surnames and forename-surname pairs in the Atkinson P. 2011. Names-based classification United States. Names, 49: 69-96. of accident and emergency department users. Tucker D. K. 2005. The cultural-ethnic-language Health Place, 17: 1162-1169. group technique as used in the Dictionary of Rodriguez-Larralde A., Scapoli C., Beretta American Family Names (DAFN). Onomastica M., Nesti C., Mamolini E. & Barrai I. Canadiana, 87: 71-84. 1998a. Isonymy and the genetic structure of Webber R. & Longley P. 2003. Geodemographic Switzerland. II. Isolation by distance. Ann. Analysis of Similarity and Proximity: Their Roles in Hum. Biol., 6: 533-540. the Understanding of the Geography of Need. In Rodriguez-Larralde A., Barrai I., Nesti C., Longley P. & Batty M. (eds): Advanced Spatial Analysis: Mamolini E. & Scapoli C. 1998b. Isonymy and The CASA Book of GIS 233. ESRI Press, Redlands. isolation by distance in Germany. Hum. Biol., Winney B., Boumertit A., Day T., Davison D., Echeta 70: 1041-1056. C., Evseeva I., Hutnik K., Leslie S., Nicodemus Rodriguez-Larralde A., Gonzales-Martin A., K., Royrvik E. C., Tonks S., Yang ., Cheshire Scapoli C. & Barrai I. 2003. The names of J., Longley P., Mateos P., Groom A., Relton Spain: a study of the isonymy structure of C., Bishop D. T., Black K., Northwood E., Parkinson Spain. Am. J. Phys. Anthropol., 121: 280-292. L., Frayling T. M., Steele A., Sampson J. R., King Rodriguez-Larralde A., Dipierri J., Gomez E.A., T., Dixon R., Middleton D., Jennings B., Bowden Scapoli C., Mamolini E., Salvatorelli G., R., Donnelly P. & Bodmer W. 2012. People of the De Lorenzi S., Carrieri A. & Barrai I 2011. British Isles: preliminary analysis of genotypes and Surnames in Bolivia: a study of the popula- surnames in a UK control population. Eur. J. Hum. tion of Bolivia through isonymy. Am. J. Phys. Genet., 20: 203–210Valetas M. F. 2001. The sur- Anthropol., 144: 177-184. name of married women in the . Scapoli C., Goebl H., Mamolini E., Rodriguez- Bulletin Mensuel D’Inforamation De L’Intstitut Larralde A. & Barrai I. 2005. Surnames and National D’Etudes Demographiques, 367. dialects in France: population structure and Zelinsky W. 1997. Along the frontiers of name cultural evolution. J. Theor. Biol., 237: 75-86. geography. Prof. Geogr., 49: 465-466 Scapoli C., Mamolini E., Carrieri A., Rodriguez- Larralde A. & Barrai I. 2007. Surnames in : a comparison of the Editor, Giovanni Destro Bisol

www.isita-org.com