Normalization Trends of Authors' Names in Scientific Publications

Total Page:16

File Type:pdf, Size:1020Kb

Normalization Trends of Authors' Names in Scientific Publications Aliaga, Francisco M. & Correa, Ana D. (2011). Normalization trends of authors' names in scientific publications. RELIEVE, v. 17, n. 1, art. 0, p. 1-10 http://www.uv.es/RELIEVE/v17n1/RELIEVEv17n1_0.htm e-Journal of Educational Revista ELectrónica de Research, Assessment and Investigación y EValuación Evaluation Educativa NORMALIZATION TRENDS OF AUTHORS' NAMES IN SCIENTIFIC PUBLICATIONS [Tendencias en la normalización de nombres de autores en publicaciones científicas] by/por Article record Ficha del artículo Aliaga, Francisco M. ([email protected]) About authors Sobre los autores Correa, Ana D. ([email protected]) HTML format Formato HTML Abstract Resumen We analyze current normalization trends of authors' na- Se analizan las tendencias actuales sobre normalización de mes. We review the advantages of using a single biblio- los nombres de autores. Se revisan las ventajas de utilizar graphic name as well as the most common naming rec- un nombre bibliográfico único y se revisan las recomen- ommendations so that each author can choose which to daciones más habituales para que se elijan los de cada au- follow.. tor. Keywords Descriptores Academic journals; Internationality; Scientific communi- Revistas académicas; Internacionalidad; Publicación aca- cation; Impact; Citations; Author. démica; Comunicación científica; Impacto; Citas; Autor. Over the last few years, we have been ex- • Various impulses and guidelines which periencing a substantial increase in the inter- favor the improvement and professionali- nationalization of academic production. This zation of the scientific edition, with prac- is particularly true in the Spanish sphere and tical help such as Román’s texts (1981) or specifically in education magazines (Aliaga those of Delgado López-Cózar, Ruiz Péres & Suárez-Rodríguez, 2007; Hernández Pina and Jiménez Contreras, (2007), the last of & Maquilón, 2010). There are many factors these within the ARCE project: (Support that can influence this, which include: for the Scientific Spanish Journals), from the Spanish Foundation for Science and • The regulations for evaluating scientific Technology (FECYT). research staff (ANECA, CNEAI, etc. in Spain or institutions such as CONACYT • The development of new evaluation tools in Mexico, CAICYT in Argentina or Col- of quality and/or impact, such as Latindex ciencias in Colombia, to name a few in the or In-RECS. Latin-American sphere). • The birth and drive of virtual and direc- tory newspaper libraries, all geared to provide the reader with the simultaneous Revista ELectrónica de Investigación y EValuación Educativa [ www.uv.es/RELIEVE ] pag. 1 Aliaga, Francisco M. & Correa, Ana D. (2011). Normalization trends of authors' names in scientific publications. RELIEVE, v. 17, n. 1, art. 0, p. 1-10 http://www.uv.es/RELIEVE/v17n1/RELIEVEv17n1_0.htm consultation of a large number of journals balization. Thus, a large quantity of dis- (we will cite projects as interesting as Re- agreements and errors which affect the qual- dalyc, e-revistas, DOAJ, etc.). ity and effectiveness of the codification proc- • The negotiations of FECYT with respect esses and the recovery of information are to the unified management of the relations produced. with ISI-Thompson-Reuters about its data One of the principle sources of these errors bases. is the special idiosyncrasy of the author’s In this context, it will produce a much names in certain countries. A well known greater access to journals, even journals with fact, the Anglo-Saxons (and many other Eu- predominately regional interests (such as ropean countries) predominately use the sur- Latinas), to international databases name of the father to name the child, along (Rodríguez Yunta, 2010). On one hand, it is with one, or in some cases, several individual producing an accelerated incorporation to the or given names. This system, although it Web of Knowledge databases (ISI- propitiates overlaps (and therefore identity Thompson-Reuters) that traditionally have confusion), is relatively simple to use. Being ignored the productions of Social Sciences that, the surname or family denomination is from the non-Anglo-Saxon cultural areas. easily identified in any text: one only has to The gradual commoditization of the Institute search for the last word (last name) of the of Scientific Information, created by Gar- name. However, this system presents some field, used to increase the complexity of the problems for the usage, slightly in retrogres- market (with the emergence of competitive sion, when the wife adopts her husband’s initiatives), has substantially modified the surname (generally losing her own), result- commercial policy (therefore affecting the ing in some problems for the identification of scientific) of said institution. On the other the authors (Goldin & Shim, 2004). hand, it has also produced the progressive implementation of some of the emerging Other cultures use specific but different alternatives (SCOPUS, Scholar Google), that forms of identification, which implies com- include a number of noticeably superior plications not considered (or resolved) with journals compared to their competitor Web of the system that we have previously made Knowledge. These and other international reference to and that we have termed, in or- databases, the majority of which have an der to simplify, Anglo-Saxon. Thus, in the Anglo-Saxon origin, are key elements to case with countries such as Korea, 22% of its propitiate the spread of knowledge, thus citizens have the surname Kim (as their only promoting the diffusion of its journals and surname). Therefore, it is not unusual to find authors. They are therefore essential to in- identity names between hundreds and even crease the measured impact of the scientific thousands of people (according to data from activity. 2004 from the National Statistical Office of the Republic of Korea). In the case of China, However, the relative inexperience of even prestigious magazines like Nature have many academics in this new international shown, alarmingly, the serious problems of context propitiates some problems that affect identification resulting from the current nam- the visibility of their work in these databases. ing system. Thus, 85% of the Chinese popu- Furthermore, the difficulties caused by the lation (about 1,100 million people) share the cultural differences must be taken into ac- 129 most common surnames, which gives a count between the creators of these databases small idea about the huge amount of overlaps (like we have said, fundamentally Anglo- that are produced (cfr. Nature, 2008). The Saxon) and the diverse national contexts to variations between name-surname and sur- those that protect the expression of their glo- name-name, as well as the syllabic separa- Revista ELectrónica de Investigación y EValuación Educativa [ www.uv.es/RELIEVE ] pag. 2 Aliaga, Francisco M. & Correa, Ana D. (2011). Normalization trends of authors' names in scientific publications. RELIEVE, v. 17, n. 1, art. 0, p. 1-10 http://www.uv.es/RELIEVE/v17n1/RELIEVEv17n1_0.htm tions (or not) of the names, are added to affects the authors´ visibility and identifiabil- these problems, among others, according to ity, which is very important in regards to the different customs. Other related problems dispersion and recognition of their work. The occur in zones such as the south of India (in necessity to properly establish the author´s which they don’t use a surname, substituting name (in the past called the ´pen name´) is a it instead for the first name of the father) or relevant decision for the authors. This topic the Slavic countries (where the children use a is so important that the Spanish Foundation patronymic name, taken from the father’s of Science and Technology (FECYT) pub- first name, and to which is added an ending). lished a manual (FECYT, 2007), ask EC3 group (which was responsible, for example, In the case of the Hispanic or Iberian de- for In-RECS), a manual about "Recommen- nomination (which also includes the Portu- dations for help in the standardization of guese cultures), the majority of cases are authors´ names and institutions in scientific inherited as much from the surname of the publications” . These recommendations are father as from that of the mother, in addition applicable to all authors that use Hispanic or to having one, or more frequently several, Iberian names, especially in Latin American personal or given names. This system causes countries. In RELIEVE, we choose to sup- frequent confusion when it is included with port the FECYT manual recommendations, the authors in databases created with an An- which favor a suitable cataloging index and glo-Saxon mentality. The last name in the create a greater impact for our publication sequence is taken as the identifying surname and its authors. However, in the end, we will (when looking for the surname and identify- always respect the authors´ wishes. ing it as the surname of the subject), which in our case is not the used surname, but ra- To begin, we have to make an initial ob- ther the secondary one. Therefore, it is nor- servation that this topic would not be neces- mal for the same text to appear to have dif- sary to stress if the problem did not show up ferent authors depending on whether we look so frequently. We´re trying to make a system for them in ‘Hispanic’ databases or the ‘An- in which it is not necessary to search the full glo-Saxon’ databases, being that in the for- and exact name as it appears in the civil reg- mer the first surname is used to identify the isters. The researchers can choose between authors and in the latter the second is used. searching one or both last names (as is done in everyday life, according to how the author So that we can calibrate the importance of uses their name). The researcher can also this topic, we will keep various studies (Co- search the first name or several names. The stas y Bordons, 2007; Ruiz-Pérez, López- purpose of this is to properly identify au- Cózar y Jiménez-Contreras, 2002) in mind thors, not directly follow the birth certificate.
Recommended publications
  • SF424 Discretionary V2.1 Instructions
    Grants.gov Form Instructions Form Identifiers Information Agency Owner Grants.gov Form Name Application for Federal Assistance (SF-424) V2.1 OMB Number 4040-0004 OMB Expiration Date 10/31/2019 Form Field Instructions Field Field Name Required or Information Number Optional 1. Type of Required Select one type of submission in Submission: accordance with agency instructions. Pre-application Application Changed/Corrected Application - Check if this submission is to change or correct a previously submitted application. Unless requested by the agency, applicants may not use this form to submit changes after the closing date. OMB Number: 4040-0004 1 OMB Expiration Date: 10/31/2019 Field Field Name Required or Information Number Optional 2. Type of Application Required Select one type of application in accordance with agency instructions. New - An application that is being submitted to an agency for the first time. Continuation - An extension for an additional funding/budget period for a project with a projected completion date. This can include renewals. Revision - Any change in the federal government's financial obligation or contingent liability from an existing obligation. If a revision, enter the appropriate letter(s). More than one may be selected. A: Increase Award B: Decrease Award C: Increase Duration D: Decrease Duration E: Other (specify) AC: Increase Award, Increase Duration AD: Increase Award, Decrease Duration BC: Decrease Award, Increase Duration BD: Decrease Award, Decrease Duration 3. Date Received: Required Enter date if form is submitted through other means as instructed by the Federal agency. The date received is completed electronically if submitted via Grants.gov. 4.
    [Show full text]
  • Fedramp Master Acronym and Glossary Document
    FedRAMP Master Acronym and Glossary Version 1.6 07/23/2020 i​[email protected] fedramp.gov Master Acronyms and Glossary DOCUMENT REVISION HISTORY Date Version Page(s) Description Author 09/10/2015 1.0 All Initial issue FedRAMP PMO 04/06/2016 1.1 All Addressed minor corrections FedRAMP PMO throughout document 08/30/2016 1.2 All Added Glossary and additional FedRAMP PMO acronyms from all FedRAMP templates and documents 04/06/2017 1.2 Cover Updated FedRAMP logo FedRAMP PMO 11/10/2017 1.3 All Addressed minor corrections FedRAMP PMO throughout document 11/20/2017 1.4 All Updated to latest FedRAMP FedRAMP PMO template format 07/01/2019 1.5 All Updated Glossary and Acronyms FedRAMP PMO list to reflect current FedRAMP template and document terminology 07/01/2020 1.6 All Updated to align with terminology FedRAMP PMO found in current FedRAMP templates and documents fedramp.gov page 1 Master Acronyms and Glossary TABLE OF CONTENTS About This Document 1 Who Should Use This Document 1 How To Contact Us 1 Acronyms 1 Glossary 15 fedramp.gov page 2 Master Acronyms and Glossary About This Document This document provides a list of acronyms used in FedRAMP documents and templates, as well as a glossary. There is nothing to fill out in this document. Who Should Use This Document This document is intended to be used by individuals who use FedRAMP documents and templates. How To Contact Us Questions about FedRAMP, or this document, should be directed to ​[email protected]​. For more information about FedRAMP, visit the website at ​https://www.fedramp.gov​.
    [Show full text]
  • Name Standards User Guide
    User Guide Contents SEVIS Name Standards 1 SEVIS Name Fields 2 SEVIS Name Standards Tied to Standards for Machine-readable Passport 3 Applying the New Name Standards 3 Preparing for the New Name Standards 4 Appendix 1: Machine-readable Passport Name Standards 5 Understanding the Machine-readable Passport 5 Name Standards in the Visual Inspection Zone (VIZ) 5 Name Standards in the Machine-readable Zone (MRZ) 6 Transliteration of Names 7 Appendix 2: Comparison of Names in Standard Passports 10 Appendix 3: Exceptional Situations 12 Missing Passport MRZ 12 SEVIS Name Order 12 Unclear Name Order 14 Names-Related Resources 15 Bibliography 15 Document Revision History 15 SEVP will implement a set of standards for all nonimmigrant names entered into SEVIS. This user guide describes the new standards and their relationship to names written in passports. SEVIS Name Standards Name standards help SEVIS users: Comply with the standards governing machine-readable passports. Convert foreign names into standardized formats. Get better results when searching for names in government systems. Improve the accuracy of name matching with other government systems. Prevent the unacceptable entry of characters found in some names. SEVIS Name Standards User Guide SEVIS Name Fields SEVIS name fields will be long enough to capture the full name. Use the information entered in the Machine-Readable Zone (MRZ) of a passport as a guide when entering names in SEVIS. Field Names Standards Surname/Primary Name Surname or the primary identifier as shown in the MRZ
    [Show full text]
  • Banner 9 Naming Convention
    Banner Page Name Convention Quick Reference Guide About the Banner Page Name Convention The seven-letter naming convention used throughout the Banner Administrative Applications help you to remember page or form names more readily. As shown below, the first three positions use consistent letter codes. The last four positions are an abbreviation of the full page name. For example, all pages in the Banner Student system begin with an S (for Student). All reports, regardless of the Banner system to which they belong, have an R in the third position. In this example, SPAIDEN is a page in the Banner Student system (Position 1 = S for Student system). The page is located in the General Person module (Position 2 = P for Person module). It is an application page (Position 3 = A for Application object type). And in Positions 4-7, IDEN is used as the abbreviation for the full page which is General Person Identification. Position 1 Position 2 Position 3 Position 4 Position 5 Position 6 Position 7 S P A I D E N • Position 1: System Identifier Positions 4-7: Page Name Abbreviation • Position 2: Module Identifier • Position 3: Object Type © 2018 All Rights Reserved | Ellucian Confidential & Proprietary | UNAUTHORIZED DISTRIBUTION PROHIBITED Banner Page Name Convention Quick Reference Guide Position 1: System Identifier. What follows are the Position 1 letter codes and associated descriptions. Position 1 Position 2 Position 3 Position 4 Position 5 Position 6 Position 7 S P A I D E N Position 1 Description Position 1 Description A Banner Advancement P
    [Show full text]
  • PRIMARY RECORD Trinomial ______NRHP Status Code 6Z Other Listings ______Review Code ______Reviewer ______Date ______
    State of California – The Resources Agency Primary # ____________________________________ DEPARTMENT OF PARKS AND RECREATION HRI # _______________________________________ PRIMARY RECORD Trinomial _____________________________________ NRHP Status Code 6Z Other Listings ______________________________________________________________ Review Code __________ Reviewer ____________________________ Date ___________ Page 1 of 11 *Resource Name or # (Assigned by recorder) 651 Mathew Street Map Reference Number: P1. Other Identifier: *P2. Location: Not for Publication Unrestricted *a. County Santa Clara County And (P2b and P2c or P2d. Attach a Location Map as necessary.) *b. USGS 7.5’ Quad San Jose West Date 1980 T; R; of Sec Unsectioned; B.M. c. Address 651 Mathew Street City Santa Clara Zip 94050 d. UTM: (give more than one for large and/or linear resources) Zone 10; 593294 mE/ 4135772 mN e. Other Locational Data: (e.g., parcel #, directions to resource, elevation, etc., as appropriate) Parcel #224-40-001. Tract: Laurelwood Farms Subdivision. *P3a. Description: (Describe resource and its major elements. Include design, materials, condition, alterations, size, setting, and boundaries) 651 Mathew Street in Santa Clara is an approximately 4.35-acre light industrial property in a light and heavy industrial setting east of the San Jose International Airport and the Southern Pacific Railroad train tracks. The property contains nine (9) cannery and warehouse buildings formerly operated by a maraschino cherry packing company, the Diana Fruit Preserving Company. The first building was constructed on the property in 1950 and consists of a rectangular shaped, wood-frame and reinforced concrete tilt-up cannery building with a barrel roof and bow-string truss (see photograph 2, 3, 4, 10). A cantilevered roof wraps around the southwest corner and shelters the office extension.
    [Show full text]
  • Local Course Identifier Local Course Title Local Course Descriptor
    State Course Code (optional if course Sequence will be mapped Optional Sort Optional Sort Local Course Local Course Local Course Descriptor Credits Total using CO SSCC Field 1 (For Field 2 (For Identifier Title (optional) Course Level (Carnegie Units) Sequence (optional) (optional) Mapping System) District Use) District Use) Maximum 100 100 2000 1 4 1 1 30 20 20 Length Format Alphanumeric Alphanumeric Alphanumeric Alphanumeric Numeric Numeric Numeric Alphanumeric Alphanumeric Alphanumeric Details Default MUST Be Unique! If your district does not Blank G The number of length of the course The Sequence field combined The total number The appropriate state course User Defined User Defined have title for a course, in terms of Carnegie Units. A one with “Sequence Total” of classes offered number which corresponds please repeat the Same year course that meets daily for describes the manner in which in a series of to the local course identifier. Value as the Local approximately 50 minutes to 1 hour school systems may “break classes. Used in Refer to Colorado SSCC Course Identifier = 1 Carnegie Unit Credit. Base all up” increasingly difficult or conjunction with Codes and match as many as calculations on 1 hour for 1 year. more complex course “Sequence” possible with the Therefore, a semester long course information. The sequence corresponding SSCC Code. that meets for approximately 1 hour represents the part of the total. Separate multiple values = .5 Carnegie Unit Credit. with commas. Notes The identifier The Local Course Title The description provided by the local The level associated with The course offered. Valid values are: What is the Carnegie Unit? Typically Sequence will equal Typically Refer to the following link to Recommended Recommended designated by the designated by the local district for the course.
    [Show full text]
  • Common Identifier Background Paper 1: Frameworks for Creating a Common Identifier for a Statewide Data System Kathy Bracco and Kathy Booth, Wested
    Common Identifier Background Paper 1: Frameworks for Creating a Common Identifier for a Statewide Data System Kathy Bracco and Kathy Booth, WestEd Introduction In 2019, California enacted the Cradle-to-Career Data System Act (Act), which called for the establishment of a state longitudinal data system to link existing education, social services, and workforce information.1 The Act also laid out a long-term vision for putting these data to work to improve education, social, and employment outcomes for all Californians, with a focus on identifying opportunity disparities in these areas. The legislation articulated the scope of an 18-month planning process for a linked longitudinal data system. The process will be shaped by a workgroup that consists of the partner entities named in the California Cradle-to-Career Data System Act.2 Suggestions from this workgroup will be used to inform a report to the legislature and shape the state data system designs approved by the Governor’s Office. Because the legislation laid out a number of highly technical topics that must be addressed as part of the legislative report, five subcommittees were created that include representatives from the partner entities and other experts. The Common Identifier Subcommittee will 1 Read the California Cradle-to-Career Data System Act at: https://leginfo.legislature.ca.gov/faces/codes_displayText.xhtml?lawCode=EDC&division=1.&title=1.& part=7.&chapter=8.5.&article= 2 The partner entities include the Association of Independent California Colleges and Universities, Bureau for Private Postsecondary Education, California Community Colleges, California Department of Education, California Department of Social Services, California Department of Technology, California Health and Human Services Agency, California School Information Services, California State University, California Student Aid Commission, Commission on Teacher Credentialing, Employment Development Department, Labor and Workforce Development Agency, State Board of Education, and University of California.
    [Show full text]
  • Relating Identifier Naming Flaws and Code Quality
    Open Research Online The Open University’s repository of research publications and other research outputs Relating identifier naming flaws and code quality: An empirical study Conference or Workshop Item How to cite: Butler, Simon; Wermelinger, Michel; Yu, Yijun and Sharp, Helen (2009). Relating identifier naming flaws and code quality: An empirical study. In: 16th Working Conference on Reverse Engineering, 13-16 Oct 2009, Lille, France. For guidance on citations see FAQs. c 2009 IEEE Version: Accepted Manuscript Link(s) to article on publisher’s website: http://dx.doi.org/doi:10.1109/WCRE.2009.50 http://web.soccerlab.polymtl.ca/wcre2009/program/detailed_program.htm Copyright and Moral Rights for the articles on this site are retained by the individual authors and/or other copyright owners. For more information on Open Research Online’s data policy on reuse of materials please consult the policies page. oro.open.ac.uk Relating Identifier Naming Flaws and Code Quality: an empirical study Simon Butler, Michel Wermelinger, Yijun Yu and Helen Sharp Centre for Research in Computing, The Open University, UK Abstract—Studies have demonstrated the importance of good and typographical structure of identifiers [1], [7], [2], [6]. identifier names to program comprehension. It is unclear, Identifiers are a significant source of domain concepts in however, whether poor naming has other effects that might program comprehension [2]. Lawrie et al. found identifier impact maintenance effort, e.g. on code quality. We evaluated the quality of identifier names in 8 established open source names composed of dictionary words are more easily recog- Java applications libraries, using a set of 12 identifier nam- nised and understood than those composed of abbreviations, ing guidelines.
    [Show full text]
  • Eponymous Entrepreneurs⇤
    Eponymous Entrepreneurs⇤ Sharon Belenzon† Aaron K. Chatterji‡ Brendan Daley§ Fuqua School of Business Duke University November 11, 2014 Abstract We demonstrate that firm eponymy—the familiar convention of firms being named after their owners—is linked to superior performance. We propose a novel ex- planation, referred to as “utility amplification,” and develop a corresponding signaling model. The model generates three main empirical predictions: (1) The incidence of eponymy will be low; (2) Eponymous firms will outperform other firms; (3) These e↵ects will be intensified when the entrepreneur’s name is rare. Using unique data on over 485,000 firms from Europe and the United States, we find support for all of these predictions. Several extensions and robustness checks are considered. Keywords: Entrepreneurship, Signaling, Firm Names ⇤The authors thank seminar participants at Harvard, MIT, Stanford, Northwestern, NBER, UCLA, and UVA for their useful comments and suggestions. †[email protected][email protected] §[email protected] 1 Introduction Many firms are eponymously named; that is, they bear the name(s) of their founding owner(s). Leveraging a unique dataset, this paper demonstrates that eponymy is linked to superior firm performance. For instance, controlling for other characteristics, eponymous ventures generate, on average, a 3.2 percentage-point higher return on assets (ROA), which is approximately one-third the magnitude of the sample mean ROA. Further, and perhaps counterintuitively, we propose that non-pecuniary considerations may be a large driver of the eponymy-performance relationship. Succinctly put, we propose that eponymy creates a stronger association between the entrepreneur and her firm that amplifies the utility or disutility of favorable or unfavorable market outcomes, respectively.
    [Show full text]
  • Matching Survey Responses with Anonymity in Environments with Privacy Concerns: a Practical Guide
    Matching survey responses with anonymity in environments with privacy concerns: A practical guide Dominik Vogel ORCID: 0000-0002-0145-7956 This is the postprint version of the following article: Vogel, D. (2018). Matching survey responses with anonymity in environments with privacy concerns: A practical guide. International Journal of Public Sector Management, 31(7), 742-754. https://doi.org/10.1108/IJPSM-12-2017-0330. Abstract Purpose: In many cases, public management researchers’ focus lies in phenomena, embedded in a hierarchical context. Conducting surveys and analyzing subsequent data requires a way to identify which responses belong to the same entity. This might be, for example, members of the same team or data from different organiza- tional levels. It can be very difficult to collect such data in environments marked by high concerns for anonymity and data privacy. This article suggests a procedure for matching survey data without compromising respondents’ anonymity. Approach: The article explains the need for data collection procedures, which pre- serve anonymity and lays out a process for conducting survey research that allows for responses to be clustered, while preserving participants’ anonymity. Findings: Survey research, preserving participants’ anonymity while allowing for re- sponses to be clustered in teams, is possible if researchers cooperate with a custo- dian, trusted by the participants. The custodian assigns random identifiers to survey entities but does not get access to the data. This way neither the researchers nor custodian are able to identify respondents. This process is described in detail and illustrated with a factious research project. 1 Originality/value: Many public management research questions require responses to be clustered in dyads, teams, departments, or organizations.
    [Show full text]
  • Whiteness of a Name: Is “White” the Baseline? John L
    Marquette University e-Publications@Marquette Management Faculty Research and Publications Management, Department of 1-1-2014 Whiteness of A Name: Is “White” the Baseline? John L. Cotton Marquette University, [email protected] Bonnie S. O'Neill Marquette University, [email protected] Andrea E.C. Griffin Indiana University - Northwest Accepted version. Journal of Managerial Psychology, Vol. 29, No. 4 (2014): 405-422. DOI. © 2014 Emerald Publishing Limited. Used with permission. Marquette University e-Publications@Marquette Management Research and Publications/College of Business Administration This paper is NOT THE PUBLISHED VERSION; but the author’s final, peer-reviewed manuscript. The published version may be accessed by following the link in the citation below. Journal of Managerial Psychology, Vol. 29, No. 4 (2014): 405-422. DOI. This article is © [Emerald] and permission has been granted for this version to appear in e- Publications@Marquette. [Emerald] does not grant permission for this article to be further copied/distributed or hosted elsewhere without the express permission from [Emerald]. Whiteness of a name: is “white” the baseline? John L. Cotton Department of Management, Marquette University, Milwaukee, WI Bonnie S. O’Neill Department of Management, Marquette University, Milwaukee, WI Andrea E.C. Griffin School of Business and Economics, Indiana University Northwest, Gary, IN Contents Abstract: .......................................................................................................................................................
    [Show full text]
  • The Quasi-Identifiers Are the Problem
    The Quasi-identifiers are the Problem: Attacking and Reidentifying k-anonymous Datasets Working Paper Aloni Cohen∗ May 25, 2021 Abstract Quasi-identifier-based (QI-based) deidentification techniques are widely used in practice, including k-anonymity [Swe98, SS98], `-diversity [MKGV07], and t-closeness [LLV07]. We give three new attacks on QI-based techniques: one reidentification attack on a real dataset and two theoretical attacks. We focus on k-anonymity, but our theoretical attacks work as is against `-diversity, t-closeness, and other QI-based technique satisfying modest requirements. Reidentifying EdX students. Harvard and MIT published data of 476,532 students from their online learning platform EdX [HRN+14]. This data was k-anonymized to comply with the Family Educational Rights and Privacy Act. We show that the EdX data does not prevent reidentification and disclosure. For example, 34.2% of the 16,224 students in the dataset that earned a certificate of completion are uniquely distinguished by their EdX certificates plus basic demographic information. We reidentified 3 students on LinkedIn with high confidence, each of whom also enrolled in but failed to complete an EdX course. The limiting factor of this attack was missing data, not the privacy protection offered by k-anonymity. Downcoding attacks. We introduce a new class of privacy attacks called downcoding attacks, which recover large fractions of the data hidden by QI-based deidentification. We prove that every minimal, hierarchical QI-based deidentification algorithm is vulnera- ble to downcoding attacks by an adversary who only gets the deidentified dataset. As such, any privacy offered by QI-based deidentification relies on distributional assumptions about the dataset.
    [Show full text]