<<

Downloaded by guest on October 1, 2021 www.pnas.org/cgi/doi/10.1073/pnas.1703513114 systems academic effect. in was law after the antinepotism hired researchers an by the when supported for 2010, results is different effects obtain we other tests that our to fact that opposed claim as The in trend. nepotism nepotism declining detect of a with system, the academic Italian detect the detect we Finally, can . we By in , region. couples married native their and in maiden work well-mixed, contrasting to geographically tend is academics system Italian academic we whereas insti- US randomizations, public the simple top Through that States. show at United working the Italy, those in in tutions academics and all France, three of from analyze names researchers last we the Here, faculty containing posts. datasets with academic large nepotism, for of relatives suspicion their Ital- the of hiring raised distribution has academics last migra- ian the human relat- analyzing and recently, genetic theory More for 2017) neutral tions. proxy 1, of March as studies review used pioneering for in been (received edness 2017 have 1, names June last approved biology, and CA, In Berkeley, , of University Wachter, W. Kenneth by Edited a Grilli Jacopo systems and academic imbalance, across gender nepotism mobility, of analysis name Last fti ok h itiuino atnmsi tla academics Italian spirit in the names to last of Closer to distribution (6). shown the been success work, has this occupational of names of last predictive and associ- be first the associated ethnic-specific recently, of been of More advent have ation (5). the haplotypes With names Y-chromosome (4). last with migrations methods, that human molecular advantage of allow- modern study 3), the (2, the with alleles for data, neutral ing well-approximate (large) pro- would name) of names same source last the cheap with a (1). people vided of parents) study (occurrences his the (2), the isonymies (like genetics” of estimate population cousins man to “poor’s first the England dubbed by Soon in who names Charles), of last of prevalence of (son distribution Darwin the George used with starting biology, aca- in for relatives their hire professors posts?). (do demic fields?), nepotism certain gen- even in raised?), and underrepresented and women born (are were researchers imbalance they (are der where used immigration region be the and can geo- in mobility and employed and in obtain a study, patterns to at easy unveil of working are to field data professors rank, These of their location. names graphic with of along list institution a given data: of form simple systems. between vary also and resources within and both duties, widely freedom Salaries, more faculty. their have choosing reg- universities in special American to contrast, subject servants, In are civil ulations. procedures hiring are their many professors therefore, In example, and practices. and for goals countries, their European in distinct world quite the however, the around are, and systems academic departments ranks), into (for academic scholars similarities ubiquitous of many organization the the Despite example, globe. the across researchers S and eateto clg vlto,Uiest fCiao hcg,I 60637; IL Chicago, Chicago, of University Evolution, & Ecology of Department h s fls ae safr fdt a oghistory long a has data of form a as names last of use The very a using systems academic in differences examine we Here, . . . c ihshlrypbiain n ofrne connecting conferences and endeavor, worldwide publications a been scholarly has science with inception, its ince otwsenIsiueo ope ytm,Nrhetr nvriy vntn L60208 IL Evanston, University, Northwestern Systems, Complex on Institute Northwestern Sttrs rsianmn,nmn uatenemus. nuda nomina nomine, pristina rosa [S]tat a,1 | n tfn Allesina Stefano and isonomy | edrimbalance gender met Eco, Umberto a,b,c,1,2 | nepotism h aeo h Rose the of Name The b 1073/pnas.1703513114/-/DCSupplemental hsatcecnan uprigifrainoln at online information supporting contains article This publicly are results 2 the generate to needed 1 code the at GitHub and on available data The deposition: Data Submission. Direct PNAS a is article This interest. of conflict no paper. declare the authors wrote The S.A. and data; analyzed S.A. and J.G. specific preponderantly target eth- to given tend of backgrounds scientific researchers nic/cultural specific and of immigration typical the that which are in fields—meaning names in Moreover, last immigration. city certain of States, the influence United even strong a or well- with geographically region is mixed, system the American the of work—whereas typical they are last have that researchers level—many names local the at mostly researchers antinepo- of of presence effects the the Italy. detect probe in to and legislation tism us France countries. allow in between data couples and (Italy) academic and the time of within in features both names Special last variability research- of geographic evolution at the the datasets working track These to academics working States. us United and allow the currently years in France, institutions researchers different public in intensive 2015), four CNRS in and the academics at 2010, Italian 2005, all (2000, size: and quality is longevity whether test to (11). (9) even inbreeding mobility to and social a related (10)] of in disparities studies used health in be example, could or [for approach contexts same of the Although academia, variety system. on academic is randomizations each focus of our specific aspects introducing different probe by that and comparison nepo- academic tional of for relatives suspicion recruit the professors raising which positions. regions, in 8): and (7, hires, hires names tistic fields last nepotistic of certain of scarcity significant hypothesis in a the highlighted have test studies to these used been has uhrcnrbtos ..adSA eindrsac;JG n ..promdresearch; performed S.A. and J.G. research; designed S.A. and J.G. contributions: Author optto nttt,Uiest fCiao hcg,I 60637; IL Chicago, Chicago, of University Institute, Computation owocrepnec hudb drse.Eal [email protected]. Email: addressed. be should correspondence who work. To this to equally contributed S.A. and J.G. ht nIay h lgeo eoimpoesr iigtheir hiring nepotism—professors declining. of slowly relatives—is plague the show Italy, we Finally, in sciences. that, the in imbalance highlight gender and of US States patterns United the effect the in the whereas immigration detect pro- region, field-specific can of system, native We well-mixed. their Italian geographically in the is system work in to that, tend show fessors we and States, France, Italy, in United academics the on data around collecting systems By world. academic one the in distributions, patterns name distinctive However, last highlight data. analyzing can by of source that, meager show list we a a here like sequencing, seem high-throughput might and names Data of Big of age the In Significance eut hwta h tla cdmcsse ed oattract to tends system academic Italian the that show Results unprecedented of datasets three in names last analyze We interna- an presenting by results these on expand we Here, github.com/StefanoAllesina/namepairs. . www.pnas.org/lookup/suppl/doi:10. NSEryEdition Early PNAS | f6 of 1

STATISTICS SOCIAL SCIENCES areas of research. Using the distribution of first names, we show In the second randomization (by city), we shuffle the last strong gender imbalance in science, technology, engineering, and names of academics within each city. That is, for each depart- mathematics (STEM) disciplines in all systems. Finally, we show ment, we assign researchers at random from those working in that nepotism is present (but declining) in Italy. the same city. As such, names that are common at the city level but rare nationwide (reflecting, for example, geographic, linguis- Results tic, or cultural barriers) will be sampled with high probability, Data. We collected four datasets for the Italian academic sys- increasing the expected number of IPs. tem, including the names of all professors holding permanent (or In the third randomization (by field), last names are shuffled since they were introduced in 2010, temporary “tenure-track”) within field. This procedure allows us to test the existence of positions along with their institution, academic field (area; 14 field-specific names (for instance, as a consequence of immigra- coarse-grained fields), rank (which we coarse-grained into assis- tion targeting a specific field). For example, a recent National tant, associate, and full professor), and gender. We enriched the Science Foundation survey (12) found that, of 5.2 million immi- data by adding a city and region to each record. The number of grant scientists and engineers in the United States, 57% were professors is 52,004 for year 2000, 60,288 for 2005, 58,692 for born in Asia and that immigrants targeted disproportionately 2010, and 54,102 for 2015. computer science, mathematics, and engineering. For France, we collected the names, unit, and region for all Randomizing by nation, we find that, in all systems, at least a of the researchers affiliated with CNRS (Chercheurs CNRS) or few sectors (Fig. 1) and regions (Fig. 2) have a significant excess working at a mixed CNRS–university research unit (Chercheurs of IPs (with stronger deviations in Italy and France). non-CNRS). Each unit is associated with a scientific field and This excess of IPs could be caused by region-specific distri- a location. Whenever available, we stored the self-reported butions of last names, in which case the difference between maiden names. The database contains 44,860 researchers. local and national distributions would drive the results. Ran- For the United States, we collected from state records the domizing by city, we observe a large drop in the ratio between names of professors at selected R1 institutions (research univer- observed and expected IPs in Italy and France (i.e., blue vs. red sities—highest research activity according to the Carnegie Clas- bars in Fig. 1), meaning that, in these systems, the excess of sification of Institutions of Higher Education). We collected data IPs for many fields and regions is likely due to the geographic on 38 institutions, privileging the states in which more than one R1 operate. Because the data do not contain a disciplinary field, we associated professors with a discipline using the Scopus Italy, 2015 database. We were able to successfully match 36,308 professors in this way. 4 Details on data collection and processing are reported in SI 3 Appendix. The data are publicly available. 2 1 Isonymous Pairs. Each researcher is associated with an institution 0 Observed/Expected Agr Bio and field. Two researchers with the same last name working at Geo Law Med Ped Soc Econ Hum Math Phys Chem the same institution and in the same field form an isonymous CivEng IndEng pair (IP). As a shorthand, we define the “department” d as the France CNRS using maiden names, 2016 set of all researchers working in a certain field at a given institu- 2 tion. For each last name i, nid measures how many researchers with that name work in department d. The number of IPs in a 1 P nid  given department is pd = i 2 . For example, if in depart- ment d, we find three researchers whose last name is Hopper 0 fo ys Observed/Expected and four called Pollock, we have that pd = 3 + 6 = 9 IPs. This Cell Eng Env Geo In Hum Math Ph Chem Genet Neuro measure can be interpreted as the number of edges connect- HE Phys ing researchers with the same name in a network where the US public research institutions, 2016 nodes are the researchers working in the same department (SI 3 Appendix, Fig. S1), and it has excellent statistical properties com- pared with other quantities (SI Appendix, Fig. S2). 2 Given that each department belongs to a geographic region 1 and a discipline, we can sum the number of IPs by region (pr = P P 0 pd ) or field (pf = pd ). Using randomizations, we d∈r d∈f Observed/Expected Agr Bio hys Geo Med Ped Soc Econ Hum Math P Chem probe whether the observed pr (or pf ) is significantly different IndEng from what we would expect at random. randomization nation city field Three Randomizations. p p For each dataset, we calculate r and f Fig. 1. Ratio between observed and expected numbers of IPs for each for each region and field. We then repeatedly randomize the data academic system and field. Different colors stand for three randomiza- in three different ways, each time recording the values of pr and tions explained in the text; saturated colors mark fields in which the pf for the randomized data. In this way, we obtain an approxi- probability of finding a higher or equal number of IPs by chance is ≤ mate P value measuring the probability of finding a number of 0.05 per number of fields (i.e., significant after applying a Bonferroni cor- IPs greater than or equal to what was observed empirically in a rection for multiple hypothesis testing). Agr, agriculture; Bio, biological sci- given region or field. Importantly, each randomization provides ences; Cell, cell and molecular biology; Chem, chemistry and pharmaceutical us with a different angle to probe the data, unveiling distinctive sciences; CivEng, civil engineering and architecture; Econ, economics and patterns of mobility and immigration. statistics; Eng, engineering; Env, environmental sciences; Genet, genetics; 106 Geo, geology and Earth sciences; HE Phys, high-energy physics; Hum, philol- In the first randomization (by nation), we simply shuffle ogy, literature, archeology; IndEng, industrial, electronic, and electric engi- times the last names in the database, each time tracking pr and neering; Info, information and communications sciences; Law, law; Math, pf . This randomization tells us whether the empirical data con- mathematics and computer science; Med, medical sciences; Neuro, neuro- tain more IPs at the regional or field level than we would expect science; Ped, pedagogy, psychology, history, philosophy; Phys, physics and when resampling all academics at random. astrophysics; Soc, social and political sciences.

2 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1703513114 Grilli and Allesina Downloaded by guest on October 1, 2021 Downloaded by guest on October 1, 2021 h rnhdtst hnvrpoie,w sdself-reported used we provided, whenever dataset, For French names. last the distinct have datasets, our marry—in they Couples. distributed Academic evenly or scarce very mean- either level, fields. national among is the immigration at those that field as ing results by same randomizations the France, less about and yield is Italy is in immigration effect that, the which Note sociology), reversed. in and mathemat- fields medicine, for for (pedagogy, IPs preponderant whereas expected physics, and and observed ics between decline large ratio a the observe the we and in field, chemistry by in Randomizing 20th geology. in the 47th sociology, only humanities, Smith, but in agriculture humanities. soci- names and in three in medicine, top name name the common common among is most most however, 115th 41st the the and only geol- chemistry ology but agriculture, in physics in common common and most most ogy, 3rd the the our is and in mathematics Zhang example, and name For the excess immigration. dataset, the analyzing US for found explanation be the could that IPs suggests States United the in of typical are that names city. last or no state are different a there much one: not national is names the last There- from regional of France. and distribution for Italy regional accounting to the results. compared fore, system, effect significant little academic very yields US has States names the United in the that, in Note state sig- No test (Provence-Alpes-C Alpes). regions France two and in Sicilia), nificantly and significantly Puglia, test test (Campania, regions fields Italy Three three in States. and United France, Italy, the in in in significantly significantly fields test for fields results two whereas significant second The no S3 ). yields Fig. Appendix, randomization is (SI distance distribu- names geographic name last and hypothesis of between tions this similarity Italy, pool the For plotting by national one). confirmed local is the the than (i.e., diverse more names much last of distribution expected than IPs (i.e., of random numbers higher at significantly for stand colors Saturated 2. Fig. rliadAllesina and Grilli h atta hsc n ahmtc il infiatresults significant yield mathematics and physics that fact The h aernoiain si i.1btsmigIsb region. by IPs summing but 1 Fig. in as randomizations same The nation nation nation France CNRSusingmaidennames,2016 US public research institutions,2016 P value nIay oe eptermie aewhen name maiden their keep women Italy, In ≤ 5prnme fregions). of number per 0.05 Italy, 2015 city city city t ’zradRh and d’Azur ote ˆ field field field one- ˆ ncranfils hr r oeculssaigtesm first same S7 and 4 the Fig. (Fig. Appendix, sharing expected than SI couples department same more that, the in are shows working names there names last fields, of certain instead in names first for Imbalance. randomization are Gender and they Names when First ties genuine sharing highlight couples that present. can married meaning results, method for significant highly our accounting enriched produces Thus, name significantly same 3). become the (Fig. regions that finding many IPs analysis, Having and in the name. rerun fields husband’s we all the way, the now this contain be in to not data the assumed does modified is name it would married name, name, the maiden when maiden Magritte); as (e.g., we yield name Duchamp names, husband’s the last listing obtain double-barrel Magritte-Duchamp, to of name maiden case assume the to in “subtract” them name: “force” differ- can husband’s list We women their names. 2,933 married work dataset, and couples the maiden married In ent many department. that same fact the the in detect can we whether the on have couples can- married we effect reported, an results. of not much are customary how names was measure maiden not name that one’s and degrees changing recently their until that advanced United retaining given holding the are However, In (13). women women directly. frequently, names—especially more more maiden compare ones and to Italian more analysis the the States, with for fille) results jeune the de (nom names maiden oy n ilg) nawy h feti iia ota fimmi- immigrants. of of that role to the similar playing is women effect with the but way, gration a peda- In (humanities, biology). represented and more ratio gogy, are the scarce women increases where are and those women physics) in field where and by fields engineering industrial randomizing in (e.g., whereas ratio the effect, lowers little ran- accordingly, considerably has that, city Note 4). by (Fig. propor- field domizing the each vs. plot- for IPs women shown expected of women and as tion observed (15): areas between explanation ratio scientific the simple certain ting very in a underrepresented however, are has, (14), ies n optrsine er,nuocec;Py,pyisadastrophysics. and physics Phys, mathematics arche- neuroscience; Neuro, Math, literature, science; sciences; computer philology, communications and Hum, and information Earth physics; Info, and high-energy ology; geology Phys, Geo, HE genetics; engineer- Genet, sciences; Eng, sciences; sciences; environmental pharmaceutical Env, molec- and and ing; cell chemistry Cell, Chem, married testing. biology; by hypothesis significant ular multiple caused mark for is colors accounted Saturated results once department. results the same in the in difference working large couples The names. maiden of 3. Fig. Observed/Expected ecn oee,eprmn ihteFec aae osee to dataset French the with experiment however, can, We 0 1 2 3 4

Cell h aea nFg.1ad2btuigmridnmsinstead names married using but 2 and 1 Figs. in as same The nation Chem France CNRSusingmarriednames,2016 France CNRSusingmarriednames,2016 Eng randomization .Ti at sdt rtcz rvosstud- previous criticize to used fact, This ). Env

Genet

city Geo ainct field city nation

HE Phys of types same the Repeating

Hum NSEryEdition Early PNAS

Info

field Math

Neuro |

f6 of 3 Phys

STATISTICS SOCIAL SCIENCES Italy (first names), 2015 category is assigned to all of the researchers (e.g., academic rank, gender, hired, or retired). Second, IPs for a certain combination 1.5 of categories are computed (e.g., IPs of the type male–female or 1.0 retired–not retired). Third, the categories are repeatedly scram- 0.5 bled within each department to estimate a P value. 0.0 For example, if the excess number of IPs was caused by nepo-

Observed/Expected aw tism, we would expect many of the pairs of isonyms within the Agr Bio Geo L Med Ped Soc Econ Hum Math Phys Chem CivEng IndEng same department to have different ranks because of the age dif- ference between fathers and children. We thus measure the num- randomization nation city field ber of IPs of the kind full professor↔not full professor and com- pute the probability of observing a higher or equal number of IPs of this kind when shuffling the ranks within departments. In all four Italian datasets, we find a significant excess of IPs of this type (P value <0.01 for all years, computed out of 104 randomizations). Similarly, given that last names are inherited by line of father, in the case of nepotistic hires, we would expect an excess of male↔male IPs (or equivalently, fewer IPs involving a woman). Measuring the number of IPs of this kind, we find that, in all cases, the number of male↔male IPs is higher than expected by chance, with 2 years yielding significant results (2005: P value < 0.01; 2010: P value < 0.03) and two differences that are not sig- Fig. 4. (Upper) The same as in Fig. 1 but using first names instead of last nificant (2000: P value = 0.13; 2015: P value = 0.07). names. (Lower) Ratio between observed and expected number of IPs vs. pro- If nepotistic hires were orchestrated by senior faculty mem- portion of women for all years (national randomization). Some of the fields bers, we would expect retirees to be more likely to share names are highlighted for reference. Saturated colors mark significant results once with the remaining faculty than expected by chance. Take two accounted for multiple hypothesis testing. Agr, agriculture; Bio, biological consecutive periods (for example, 2000 and 2005). Some names sciences; Chem, chemistry and pharmaceutical sciences; CivEng, civil engi- neering and architecture; Econ, economics and statistics; Geo, geology and appear in the 2000 database but do not appear in the 2005 Earth sciences; Hum, philology, literature, archeology; IndEng, industrial, database: these faculty members have retired or exited the sys- electronic, and electric engineering; Law, law; Math, mathematics and com- tem in the meantime—we mark these as “retired.” All of those puter science; Med, medical sciences; Ped, pedagogy, psychology, history, who did not retire are marked as “remained.” Measuring the philosophy; Phys, physics and astrophysics; Soc, social and political sciences. number of IPs of the type retired↔remained and computing the probability of observing a larger or equal number of IPs of this kind when shuffling the labels retired/remained within One caveat on the analysis of first names is that, contrary to each department, we see that, in all years, the number of IPs last names, first names can fluctuate widely from year to year, of this type is significantly higher than expected (2000 and 2005: sometimes following specific events (16). For example, in SI P value < 0.01; 2010: P value < 0.02). Appendix, Fig. S6, we show that the frequency of newborns Similarly, we can find new hires for the years 2005–2015 and named Francesco (the most common first name among Italian test whether new hires are more or less likely to share names boys born in the last decade) increased of about 40% after the with the professors already in the system. This test is interest- election of Pope Francis. Because of these idiosyncratic trends, ing, because a “natural experiment” was carried out during these researchers of the same age would be more likely to share first years: the Italian law 240 of 2010, which reformed the university names than those of different ages—a problem that is absent in system, included a provision (article 18) preventing departments the study of last names. from hiring relatives of their faculty, with the explicit intent of curbing nepotism. Our results show that the effects of this law Time Evolution. For the Italian system, we have collected four can be detected in the data. Measuring the number of IPs of snapshots between 2000 and 2015 in intervals of 5 years. We can, the kind hired↔already present, we find that, in 2005 and 2010, therefore, repeat the randomizations for all datasets and track the evolution of the system in time. Earlier years yield a higher number of significant results, with one-half of the fields testing

significantly (randomization by city) in 2000 and 2005; there were 2.0 2000 2005

five significant fields in 2010 and only two significant fields in 2010 1.5 2015 2015 (Fig. 5). The results by region follow a similar pattern (SI Appendix, Figs. S8 and S9). 1.0 0.5

Is Italian Academia Nepotistic? As shown above, the geographic 0.0 distribution of last names as well as field-specific immigration Observed/Expected Agr Bio hys Geo Law Med Ped Soc Econ Hum Math P can greatly affect the number of IPs within departments. In Italy, Chem CivEng IndEng even when accounting for these factors, we do observe significant results. Previous studies (7, 8) have suggested that the excess IPs Fig. 5. Evolution of the ratio between observed and expected number of observed in Italian academia could be caused by nepotistic hires, IPs in Italy between 2000 and 2015. Saturated colors mark significant results with fathers hiring their children and siblings for academic posts once accounted for multiple hypothesis testing. Agr, agriculture; Bio, bio- (mothers hiring their children would be undetectable, because logical sciences; Chem, chemistry and pharmaceutical sciences; CivEng, civil engineering and architecture; Econ, economics and statistics; Geo, geology they would have different last names). Although proving this and Earth sciences; Hum, philology, literature, archeology; IndEng, indus- hypothesis would require access to data on actual family ties, trial, electronic, and electric engineering; Law, law; Math, mathematics and which are not available, in this section, we present four statis- computer science; Med, medical sciences; Ped, pedagogy, psychology, his- tical tests probing whether our results are compatible with the tory, philosophy; Phys, physics and astrophysics; Soc, social and political hypothesis of nepotism. All tests have the same structure. First, a sciences.

4 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1703513114 Grilli and Allesina Downloaded by guest on October 1, 2021 Downloaded by guest on October 1, 2021 rliadAllesina and Grilli data. shed the can in methods patterns simple subtle extremely on light even wanted that we showing that probe, angle we to each and Importantly, for randomization systems. research specific academic a of produced in field differences randomizations their investigate elementary to with used along list information—and professors data—a geographic of of source meager names ostensibly of an taken have we Here, Discussion (Kolmogorov– of expected 2005 distribution 0. was the and test: what 2005 2010, Smirnov and from and 2000 different new 2005 between significantly the hired between faculty under hired the hired those For between those last differences 6). and find (Fig. the we 2010 law city, scrambling before esti- each hired in after accurate hires researchers more value new have all this of (to names recomputing period given and a mates) for members ulty Details in random. are at is model city the the name from of names that whose pick mean to values tend faculty low departments whereas department, disproportionately the in hire present already to likelihood maximum tend the ments max- of the values find to Large of want We estimate department). likelihood the imum excluding city the in j and department, the model, where this Under name population. picking general of probability probability the with from faculty; pick their of they relatives the the among quantify pick depart- to a Suppose attempt model. ment we statistical simple hires, a using nepotistic phenomenon of hypothesis the Nepotism. for Model by A expected than system the to likely in less (P already chance are those 2015 with and names 2010 share between by hired expected members than faculty smaller significantly (2005: not chance is value observed the of values elevated such have should departments ne Klooo–mro test: (Kolmogorov–Smirnov distributions ences the effect), in was of law antinepotism new the (when yield departments the the for of example, 10% For that period. find 5-year we a 2005, a in of randomiz- and hires distribution by 2000 new the repeatedly between all show obtained hires of lines names are last solid lines the The dashed department ing city. the the the whereas in of data, present rest the already the names to the opposed from as hires new sampling of department, each 6. Fig. α ntegnrlpplto ta eetmtda h frequency the as estimated we (that population general the in 16, ˆ

Computing P(α≥x) α ˆ ≥ 0.01 0.05 0.10 0.25 0.50 1.00 ,weesi h admztos efidta ny24 fthe of 2.4% only that find we randomizations, the in whereas 0.1, r uldcoe oehr ilignninfiatdiffer- nonsignificant yielding together, closer pulled are P d π uuaiedsrbto ftemxmmlklho estimates likelihood maximum the of distribution Cumulative . . . . 0.4 0.3 0.2 0.1 0.0 j value (d a odcd nanwhr:wt probability with hire: new a on decide to has ) au 0.04). = value stepooto fpoesr ihname with professors of proportion the is ≤ α ˆ 2005 P α ˆ o l eatet hthrdmr hn1 fac- 10 than more hired that departments all for .Frtehrsbten21 n 2015 and 2010 between hires the For 0.001). π au .9 2010: 0.29; = value stemxmmlklho siaeo h probability the of estimate likelihood maximum the is j ( aeil n Methods. and Materials c ) D stepooto fpoesr ihname with professors of proportion the is ie htorrslsaecnitn with consistent are results our that Given n 0 = . . . . 0.4 0.3 0.2 0.1 0.0 j .19, α ol be would o ahdepartment each for P 2010 D x n value 0 = P q j au .3,btthe but 0.13), = value .087, = α. ˆ ≤ α ˆ απ 2010 0.001; enta depart- that mean . . . . 0.4 0.3 0.2 0.1 0.0 j P ( d ) au 0.083). = value (1 + d 2015 − n year. and j they α, α)π nthe in 1 D For α. ˆ − α ˆ n j ( c α, = is ) , aaadtecd eddt eeaeterslsaepbil vial at The available publicly name. are last results each the generate for to identifier needed github.com/StefanoAllesina/namepairs. numeric code the a and using data by anonymized in were detailed as organized and quality, Data. Methods and Materials advancements. career cannot partner’s partner other one the that for although so responsible partners), be place, especially in domestic are for to provisions extended antinepotism need couples often welcome the hires; institutions without US (spousal many system can Indeed, academic one measures. harsh that fair show States, a different, United the very build and however, system, are, Italian practices the where of those to close quite the representation. down gender slowing equal hus- guests, to their unpaid leading as process as department worked same women the many in because bands, hired (17): be States ” not United “vanishing could the the spouses in of laws phenomenon antinepotism the first century, created the 20th bath- in the example, the of For half hires. with effects, spousal side out targeting negative when baby have especially can the laws antinepotism throwing uni- Second, be the water. disbanding would by pro- system nepotism their of of versity problem fraction large the a Solving lost fessors. have (−18.9%) (Siena, humanities the faculty of their level of and rence, the Toscana one-quarter at Appendix): decade. lost (SI examined institutions last Liguria when single the or worse during fields, even loss regions, declin- look overall steadily numbers 10% been staggering has The academia a Italian with of 2000 ing, size between faculty the of 2005, number the and More- in hires. increase new large than a names after that last over, showed the share we to First, likely retirements: caveats. more of are two because retirees with largely declin- is news, be IPs good to in as seems decrease greeted phenomenon be the should that fact ing the fields and laws and tism regions specific that show 5 and results. the 2 drive 1, Figs. in had izations departments of majority vast an measur- had likely when example, and For region-specific anal- departments. ing of and our handful field- Importantly, a is by 2010–2015. driven nepotism antinepotism period that an the of shows effects for ysis the effect detect testi- can in as we nepotism law that of fact the hypothesis by additional the fied our with of consistent results are The analysis last departments. of within excess an sharing display name academics Italian names, last of tribution gender strong and highlights fields. names pedagogy STEM first in in of imbalance IPs analysis The of fields. husbands’ excess other their the States, takes all family explaining United women system, married the names—possibly of Italian of in signal fraction the whereas unspecified in name, the an maiden that, detect their Note keep present. can women are methods they our when ties that shows tem engineering. prepon- specific and target to science with tend derantly heritages that certain associated fact of the researchers are and American immigration names field-specific ran- with certain consistent when fields, not field: but city by by domizing randomizing the when mathemat- by significantly and highlighted test physics is ics example, immigration for where, of randomizations, signal US strong Appendix, The Ameri- (SI The well-mixed S5). born. geographically Fig. were is they however, where system, work can to tend professors that h xmlso rne hc a iigpoeue htare that procedures hiring has which France, of examples The antinepo- of efficacy the of evidence system, Italian the For dis- field-specific and geographical for accounting when Even sys- French the for names maiden vs. married of analysis The showing S3 ), Fig. , Appendix (SI city by cluster names Italy, In α ˆ h aawr olce rmpbil vial ests hce for checked websites, available publicly from collected were data The o h ie n20,w on ht1%o departments of 10% that found we 2005, in hires the for 93;Genoa, −29.3%; α ˆ ≥ 0.1 w ol xet24 trno) hra the whereas random), at 2.4% expect would (we 14)and (−21.4%) geology and −24.3%), IAppendix SI α ˆ ≈ iial,terandom- the Similarly, 0. NSEryEdition Early PNAS fe olcin h data the collection, After . 02;Flo- −30.2%; | f6 of 5

STATISTICS SOCIAL SCIENCES Randomizations. In all datasets, for each researcher, we have information The maximum likelihood estimate αˆ can be found by maximizing this on first name, last name, institution, and field of study as well a geographic quantity. By taking the logarithm of the likelihood and neglecting the terms information (city and region). The department is obtained by combining independent of α, one obtains institution and field. For each department, we count the number of pairs X   d απ(d) + − α π(c) . of researchers with the same last name (IPs). We then sum the IPs by region mj log j (1 ) j [2] or field. The three randomizations are obtained by (i) randomizing all last j names (randomized by nation), (ii) randomizing last names within each city (d) We determined πj as the frequency of last name j in department d at (by city), and (iii) randomizing last names within each field (by field). the beginning of the period (e.g., in 2000 if the period 2000–2005 is consid- ered) and π(c) as the frequency in the city (removing the department) at the Modeling Nepotism. We consider a simple mixture model, in which the prob- j beginning of the period. One special case that needs to considered is that ability of choosing to hire a researcher with name j in the department d and (d) (c) (d) in which the name of the new hire is not present in the department or the city c is qj = απj + (1 − α)πj , where πj is the frequency of name j in (d) (c) city, in which case πj = πj = 0. In such cases, we postulate that the name is π(c) the department, and j is the frequency in the general population from present in the city at an unknown (low) frequency. This assumption is quite which the new researcher is sampled. The parameter α can be interpreted convenient, because for π(d) = 0, the exact value of π(c) does not impact the as the probability of a nepotistic hire. For instance, in a perfectly nonnepo- j j maximum likelihood estimate of α. In fact, the term log((1 − α)π(c)) appear- tistic system, α would be equal to zero, and all of the last names of new j (d) (c) hires are random samples from the general population. Note that, even if ing in Eq. 2 in the case of πj = 0 can be written as log(1 − α) + log πj , α = 0, it is possible to hire a person with a last name that is already present and therefore, the second additive term does not impact the maximum like- in the department. The parameter α quantifies, therefore, the probability lihood estimate. of a nepotistic hire using the excess of IPs. Note that, given the finiteness of the data, a maximum likelihood esti- For a given period (e.g., 2000–2005), we compile a list of all new hires for mate αˆ > 0 could be a consequence of fluctuations and not nepotistic hires. d each department, obtaining mj : the number of people hired in department To assess the importance of these fluctuations, we compared the maximum d with last name j. Under our model, the probability of observing a set of likelihood estimate of αˆ with the one obtained by randomizing the names new hires with last names {md} is given by the multinomial distribution of new hires within a department.

ACKNOWLEDGMENTS. We thank M. J. Michalska-Smith for comments. Data P d mj ! md were provided by Scopus.com. J.G. was supported by the Human Frontier d j Y  (d) j P({m }) = qj . [1] Science Program; S.A. was supported by National Science Foundation Grant Q md! j j j DEB 1148867.

1. Darwin GH (1875) Marriages between first cousins in England and their effects. J Stat 10. Elliott MN, et al. (2009) Using the census bureau’s list to improve estimates Soc Lond 38:153–184. of race/ethnicity and associated disparities. Health Serv Outcome Res Methodol 9: 2. Crow JF (1983) as markers of inbreeding and migration. Discuss Hum Biol 69–83. 55:383–397. 11. Montesanto A, Passarino G, Senatore A, Carotenuto L, De Benedictis G (2008) Spatial 3. Zei G, Guglielmino CR, Siri E, Moroni A, Cavalli-Sforza LL (1983) Surnames as neutral analysis and surname analysis: Complementary tools for shedding light on human alleles: Observations in Sardinia. Hum Biol 55:357–365. longevity patterns. Ann Hum Genet 72:253–260. 4. Piazza A, Rendine S, Zei G, Moroni A, Cavalli-Sforza LL (1987) Migration rates of 12. Lan F, Hale K, Rivers E (2015) Immigrants’ Growing Presence in the U.S. Science and human populations from surname distributions. Nature 329:714–716. Engineering Workforce: Education and Employment Characteristics in 2013 (NSF Info- 5. Jobling MA (2001) In the name of the father: Surnames and genetics. Trends Genet Briefs, Arlington, VA), pp 15–328. 17:353–357. 13. Goldin C, Shim M (2004) Making a name: Women’s surnames at and beyond. 6. Goldstein JR, Stecklov G (2016) From Patrick to John F.: Ethnic names and occupational J Econ Perspect 18:143–160. success in the last Era of mass migration. Am Sociol Rev 81:85–106. 14. Ferlazzo F, Sdoia S (2012) Measuring nepotism through shared last names: Are we 7. Allesina S (2011) Measuring nepotism through shared last names: The case of Italian really moving from opinions to facts? PLoS One 7:e43574. academia. PLoS One 6:e21160. 15. Allesina S (2012) Measuring nepotism through shared last names: Response to 8. Durante R, Labartino G, Perotti R (2011) Academic Dynasties: Decentralization, Civic Ferlazzo and Sdoia. arXiv:1208.5792. Capital and Familism in Italian Universities. Working paper 17572 (National Bureau 16. Kessler DA, Maruvka YE, Ouren J, Shnerb NM (2012) You name it–how memory and of Economic Research, Cambridge, MA). Available at www.nber.org/papers/w17572. delay govern first name dynamics. PLoS One 7:e38790. Accessed May 22, 2017. 17. Lykknes A, Opitz DL, Van Tiggelen B, eds (2012) For Better or for Worse? Col- 9. Clark G, Cummins N, Hao Y, Vidal DD (2015) Surnames: A new source for the history laborative Couples in the Sciences (Springer Science & Business Media, Basel), of social mobility. Explor Econ Hist 55:3–24. Vol 44.

6 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1703513114 Grilli and Allesina Downloaded by guest on October 1, 2021