t f n o

e 6 e 1 s : 1 1 g

e 5 0 c m

r a u i 2 s e e

p s , u h b i m r

t o g s e u m l i t l E u h n e

n o i T N a V e v W L e h T D

LISTENING/Journal of Communication Ethics, Religion, and Culture Vol. 51 Winter 2016 Covers 4 and 1_Fall 2015 Covers 1 and 4 11/17/16 1:20 PM Page 1 Page PM 1:20 4 11/17/16 and 1 Covers 2015 1_Fall 4 and Covers 2016 Winter Winter 2016 Covers 2 and 3_Fall 2013 Covers 2 and 3 11/17/16 1:20 PM Page 1

Journal of Communication Ethics, Religion, and Culture

Editor ...... Janie M. Harden Fritz, Duquesne University

Production Editor ...... Craig T. Maier, Duquesne University Assistant Editor...... Joshua D. Hill, Duquesne University

Consulting Editors ....Mark McVann, F.S.C., Saint Mary’s College of California Marilyn Nissim-Sabat, Lewis University Thomas E. Wren, Loyola University Chicago Assistant Production Editors ...... Matthew Mancino, Duquesne University Joshua D. Hill, Duquesne University Justin N. Bonanno, Duquesne University

u The views expressed in the articles in Listening/Journal of Communication Ethics, Religion, and Culture remain those of the authors. Their publication does not constitute an endorsement, explicit or otherwise, by the editors. u

Listening is published three times a year, in Winter, Spring, and Fall. All correspondence (including subscriptions) should be sent to the Editor, Listening/Journal of Communication Ethics, Religion, and Culture , Depart ment of Communication & Rhetorical Studies, Duquesne University, 600 Forbes Avenue, Pittsburgh, PA, 15282. Tel: (412) 396- 6558. Subscription rates (prices include postage):

• Individuals (USA, Canada, and International): $35 per year. • Libraries and Institutions (USA, Canada, and International): $75 per year.

© 2016 Listening, Incorporated [Non-profit journal]. u Microfilm: Complete volumes of Listening/Journal of Communication Ethics, Religion, and Culture are available on microfilm. Address inquiries to: University Microfilm International, 300 North Zeeb Road, Ann Arbor, Michigan 48106. ISSN 0024-4414 THE ETHICS OF IDENTIFICATION AND ISO 639

J. Albert Bickford

The ISO 639 standard, and especially its Part 3 (ISO 639 -3), has become a widely-used tool for language identification. In a standard this important, there is potential for ethical challenges, and some controversy about it has indeed developed. The purpose of this paper is to evaluate the principal ethi - cal challenges surrounding ISO 639 and to suggest ways that it can be applied ethically so as to maximize its benefits for all while minimizing potential harm.

LANGUAGE IDENTIFICATION In order to understand the ISO 639 standard, one must first understand the broader issue of language identification, which has long been recognized as a thorny and often contentious issue. The question language identification addresses is deceptively simple: What linguistic varieties should be considered separate rather than dialects of some larger language? On a technical linguistic level, answering this question is challenging, and it is further compli - cated by the fact that language is often a marker of identity and is tied to issues of power, prestige, and resource allocation. The meaning of “dialect” relevant here is not the popular and somewhat pejorative conception of dialects as being less prestigious or even substandard, compared to languages. Rather, as linguists use the terms, a dialect is simply one variety of a language, so that one can speak of, for example, the Midwestern American dialect of the or the Quebecois dialect of French. Still, as is well-known, there are many cases that are not easily classified as to whether they are dialects or separate languages. For example, and Chinese are for some purposes each considered single languages, but both vary considerably across their broad geographic ranges, even to the point of differ - ent varieties being mutually unintelligible. Thus, at least their spoken dialects

Ethics of Language Identification and ISO 639 /21 can, under some circumstances, be considered separate languages. To cope with this ambiguity, linguists sometimes talk about varieties, lects, or languoids ,1 terms that are intended to be noncommittal about the question of whether their referents are dialects or languages. The decision as to whether to call a particular linguistic variety a dialect or a language may be based on different factors. Some factors are essentially lin - guistic, such as the degree of lexical or grammatical similarity or mutual intelli - gibility. Others involve a range of other factors such as published literature, eth - nic identity, or political boundaries. This is the basis of the quip, first cited by Max Weinreich, that “a language is a dialect with an army and a navy.” 2 Because different people may regard different factors as significant, and further because many of these factors are gradient with no clear dividing line between “same” and “different,” there is much room for alternate groupings in specific cases, especially regarding minority languages without a strong literary or edu - cational tradition. There are practical implications to calling something a language vs. a dialect. Governments and NGOs may allocate resources or develop policies based on particular lists of identified languages. Legal decisions may depend on whether a variety has been identified as a language. Educational policies may be built around recognized languages, so that speakers of varieties not recognized as languages do not have equal access to education. As such, decisions in this area can have far-reaching repercussions, leading to significant ethical concerns. In addition, once the decision has been made to regard a particular set of language varieties as constituting one language, there is an additional issue in language identification, also often controversial, of what to call the language. Because many languages have more than one name and some names are used for more than one (often unrelated) language, it is often not a trivial issue to know what name(s) to use. Further, since speakers of a language may object to a name for a variety of reasons, the name used for a specific language also has ethical import. Language identification, then, involves two main factors: (1) which dialects group together to form a single language, and (2) what names or other labels to use when referring to it.

ISO 639 -3 The long-standing issue of language identification has in recent years col - lided with practical concerns arising from information technology, particularly the practical need for metadata to locate resources in or about particular lan - guages (cataloguing) and to properly process examples of those languages (information processing). To address this need, a set of international standards has been developed by the International Organization for Standardization (ISO), known as ISO 639, “Codes for the representation of names of lan -

22 /BICKFORD guages.” ISO 639 associates language names with codes that can serve as unam - biguous and concise references to languages, similar in some ways to airport codes. This addresses the problems associated with the existence of multiple names for some languages and multiple languages for some names. It also pro - vides a compact way to refer to languages in databases and to exchange data about them. Although its primary purpose is to create precise identifiers that refer to languages—the second factor in language identification—it cannot effec - tively do so without making clear what language varieties the identifiers refer to. As a result, it is involved in both of the thorny aspects of language identification discussed above. ISO 639 consists of four main parts, containing distinct but interrelated and overlapping sets of codes. Part 3, “Alpha-3 code for comprehensive cover - age of languages,” is of greatest interest here because it attempts to cover as complete a set of languages as possible (7865 codes as of October 2015). 3 The earlier Parts 1 & 2, which encompass far fewer languages, are also relevant because of their interdependence with Part 3. 4 Part 3, or ISO 639 -3, was based originally on the 3-letter language codes for languages used in SIL International’s publication , with some adjustments. 5 As such, it inherited the operational definition of language that Ethnologue used. That is, two related varieties are considered the same lan - guage based primarily on inherent mutual intelligibility—the ability of the speak - ers of the two varieties to understand each other without needing to learn the other variety. In marginal cases, some adjustments are possible based on other factors, particularly the existence of a common literature or a strong ethnolin - guistic identity (either common or separate). The standard makes no claim as to whether this definition is the best one possible, acknowledging that others may not feel it is appropriate, but it was “thought to best fit the intended range of applications for this standard,” 6 name - ly, information processing and retrieval. The standard also makes no claims as to whether the list of “languages” in the standard is in any way a definitive or authoritative statement about what languages exist in the world. It is simply a coding scheme for referencing one particular way of grouping linguistic varieties into languages; other groupings are possible and legitimate. ISO 639 has become pervasive in the digital world. It is widely used in Internet communications to indicate which language is used on a web page or other interfaces, 7 as well as in operating systems such as Apple’s OS X, Microsoft Windows, and various flavors of Unix/Linux. 8 This is important for a wide range of language processing tasks such as indexing, sorting, spell-check - ing, and translating. 9 Librarians and archivists also use ISO 639 for cataloguing in databases and information retrieval. 10 Because of ISO 639 -3’s ambitious scope, encompassing all languages of the world, the list needs constant updating as knowledge of languages improves. There is thus an active system, administered by SIL as the standard’s registra -

Ethics of Language Identification and ISO 639 /23 tion authority, for processing changes to the standard: additions, retirements, associated informative information, etc. Anyone is invited to participate in the change request process, either by submitting proposals for changes or by com - menting on other proposals. I should note at this point that I am a member of SIL International and part of the Ethnologue editorial staff, with particular focus on signed languages. I am not part of the ISO 639 -3 office, which is separate from Ethnologue , although I have submitted and commented on a number of change requests and made recommendations on how to process sign languages generally. The opinions and recommendations expressed in this paper are my own and do not represent any official position or statement on the part of SIL or ISO.

WHO SHOULD ADMINISTER THE STANDARD When ISO 639 -3 was first proposed about ten years ago, there was a flur - ry of opposition to its reliance on SIL International as the registration authori - ty responsible for administering the standard. 11 A variety of reasons were expressed, but the ones with strongest ethical import centered on SIL’s reli - gious activities (in particular, assisting language communities with Bible transla - tion) alongside its scientific and language development activities. Some seemed to question whether what they regarded as essentially a religious organization could be trusted with the responsibility of language classification, a position that on its own seems more a matter of prejudice than substance. With more nuance, though, some understandably objected to having to provide informa - tion to SIL (to propose changes to the standard) that could further SIL’s reli - gious goals and activities, which they fundamentally disagreed with. This was distasteful to some even though such information, if provided to some other organization acting as the registration authority, would become public informa - tion and thus available to SIL anyway. There were also concerns as to whether SIL’s particular organizational goals would color its decisions, although a simi - lar concern could probably be raised about any organization that administered the standard, faith-based or not. Since then, much of this furor has died down. One factor may be the development of an alternative listing of dialects, languages, and families, the Glottolog ,12 with its own list of identifiers, providing an alternative for those who disagreed with the definition of languages in 639 -3 or with its association with SIL. And, as Martin Haspelmath points out, 13 having alternate listings of languages with different approaches is valuable, particularly since there is no one “right” way of grouping dialects into languages. Another factor diminishing the controversy seems to be that several change requests were adopted by ISO 639 -3 which ran counter to Ethnologue’s previ - ous practice, in particular a major reorganization of the standard’s treatment of Mayan languages, a change that was in fact not supported by some people with -

24 /BICKFORD in SIL. 14 This demonstrated that SIL’s 639 -3 office is responsible first to ISO and to the world community that uses the 639 -3 standard, not to Ethnologue or any other part of SIL. SIL follows the rules and practices laid down by ISO and the ISO 639 Joint Advisory Committee, and when valid arguments are made that the current code set is inconsistent with the standard’s own rules, the codes need to change. There have, in fact, been hundreds of changes since the stan - dard was adopted, many of which could be regarded as corrections to previous mistakes made by Ethnologue . Change requests and the rationale for decisions about them are publicly available on the ISO 639 -3 website, providing a level of transparency that was not present with the older Ethnologue codes. The Ethnologue itself has bound itself to respect the decisions of ISO 639 -3 as a basis for its listing of the world’s languages. At the same time, decisions about change requests are still made by SIL internally. From the standpoint of accountability and credibility, it would be helpful if decisions about change requests were made by a broader group of peo - ple, something like the editorial board of a journal that would include experts not otherwise associated with SIL, whose names would be publicly available.

LINGUISTIC AND CULTURAL DIFFERENCES A standard for language identification necessarily has stakeholders in every language of the world, and thus faces special cross-cultural and cross- linguistic challenges. One practical area of concern is that the registration authority can only process requests that are in a language it can understand or obtain a translation for. In one case, the ISO 639 -3 registrar, Melinda Lyons, received a proposal from India that was in non-native English and was essentially unintelligible. The proposer did not seem to understand Lyons’s questions. Lyons was unfortu - nately never able to establish adequate communication with the proposer to get a coherent proposal. 15 Thus, a limitation of the standard is that not all people around the world have equal access to the change process. At the same time, access to any standard is going to face such obstacles. The only alternative would be not to have a standard at all, but this would lose all the benefits that come with a standard. One thing that can help this situation is to make information about ISO 639 -3 available in languages other than English. For example, I have written an explanation about the standard and the change process, with particular refer - ence to signed languages; my explanation is available in English, Spanish, French, and ASL. 16 More generally, the world community of users of the stan - dard certainly has a responsibility to help language communities understand it and how they can influence its development. I would recommend, though, that the registration authority develop explanatory materials and forms in multiple world languages to encourage submissions from a wider range of people.

Ethics of Language Identification and ISO 639 /25 Similarly, cultural differences can also interfere whenever the change process operates according to Western cultural norms that may be quite dif - ferent from the expectations of people from other cultural backgrounds. For example, Lyons noted that in some Asian contexts, mere acceptance of a pro - posal by the registration authority may create a strong expectation that the pro - posal will eventually be approved—something that may not happen. Again, such misunderstandings are probably inevitable, but they can be reduced by provid - ing more information about the standard in accessible ways to as wide an audi - ence as is reasonably possible.

UNIQUENESS OF CODES Some ethical challenges arise from the very nature of the standard. ISO 639 -3 is constructed so that, with one class of exceptions, the languages identi - fied by the standard do not overlap. That is, a given language variety has a unique ISO 639 -3 code that refers to it—it is classified as only one language. The exceptions are codes that denote macrolanguages , which are groups of other languages in ISO 639 -3 that for some purposes are treated as if they were all one language. For example, Arabic [ara] is classified as a macrolanguage that includes about thirty different varieties of Arabic, each with its own ISO 639 -3 code. The concept of macrolanguages was introduced in order to maintain compatibility with ISO 639-2, which contains codes for single languages that correspond to multiple languages in ISO 639 -3 according to the criterion of mutual intelligibility. In these cases, there are two codes that could refer to the same variety, one for the macrolanguage, one for the individual language. Aside from this mechanism, there is no provision for alternative ways of grouping dif - ferent varieties into languages. The reason for this uniqueness requirement stems from the purpose for having such a standard: to avoid the problems that arise from having multiple names for the same language. For example, the codes are often used in meta - data to identify the language in a resource. If there were multiple language codes applicable to a particular language variety, then it would complicate searching for resources in that variety since there is no way to know which code may have been used to tag a particular resource. Thus, users would need to search over all possible codes that could possibly have been used to tag a resource in order to reliably find it, complicating the process of resource discovery. Unfortunately, languages and dialects are not that simple, and there are sometimes alternate ways of grouping dialects into languages, particularly when different criteria are used for different purposes. For example, the Yol ŋu Matha languages in northern Australia form a complex of dialects associated with approximately 30 clans. These varieties are largely mutually intelligible, but are generally grouped into about half a dozen different “languages”—although in a variety of ways. Currently, there are ten ISO 639 -3 codes for the group, but

26 /BICKFORD some represent clans while others represent higher-level groupings. A series of change requests have been submitted since 2013 in an attempt to tidy up the situation, most of which were rejected because they did not fit the requirements of the standard. Further proposals are being considered as of this writing. 17 For the standard to work effectively for cataloguing, one particular way of grouping varieties into languages must be chosen without representing the alter - natives, even if doing so in some sense privileges that grouping over all others. Sometimes there is strong consensus supporting one particular approach to lan - guage identity, as in the Scandinavian languages whose various dialects are large - ly mutually intelligible but are considered to be separate languages on sociolin - guistic grounds, as they are associated with distinct nations and bodies of litera - ture. 18 However, when consensus does not exist, as sometimes happens with the smaller, politically less-powerful languages for which ISO 639 -3 was creat - ed, there is often nothing in the standard to indicate this lack of consensus. As Cysouw and Good 19 point out, ISO 639 -3 tries to find a “useful balance between capturing diversity and facilitating interoperation.” Even if a user dis - agrees with the meaning of a particular code, the fact that its meaning is well- defined and stable makes it possible to use the code for finding materials in and about the language. When it comes to metadata, the requirements of standard - ization and interoperability need to be favored over concerns to accommodate varying ways of grouping dialects into languages. To put it bluntly: in some sit - uations, it is more important to be consistent than to be right. The Joint Advisory Committee of ISO 639 has taken the position that its standard is one such situation. Indeed, if users want an identifier that is based on some other grouping of dialects, there is a way to do so using a language tag maintained by the Internet Engineering Task Force (IETF). For example, the Valencian dialect of Catalan does not have its own code in ISO 639, but is rather covered by the codes for Catalan ([ca] in 639-1 and [cat] in 639 -3). The Valencian dialect can, however, be referred to specifically with an IETF language tag [ca-valencia]. The process of registering new tags is simpler than obtaining new ISO 639 -3 codes, and there is no requirement that there be only one identifier per language. There are also special codes associated with the Glottolog. 20 Having multiple identifi - cation systems available helps to underscore the fact that ISO 639 -3 is not a definitive list, but simply one possible way of tagging languages. A related structural requirement in ISO 639 -3 is that language codes remain stable over time. Morey, Post, and Friedman 21 regard this as prob - lematic because the codes, though technically arbitrary, are often mnemonic and, unfortunately, sometimes based on names that speakers of those lan - guages object to. However, once a code is established in ISO 639, the Joint Advisory Committee has insisted that it not change unless the denotation also changes scope, as when one language is split in two, requiring two new codes and retirement of the old one. Again, this is an outgrowth of the principle that

Ethics of Language Identification and ISO 639 /27 language codes for information processing must be unique and remain stable over time. Quite understandably, language communities may object to a code that is based on an unacceptable name. Morey, Post, and Friedman cite one such instance in which ISO 639 -3 rejected the request from the Galo community of Arunachal Pradesh, India, to change the code for their language from [adl] to something that did not imply that it was part of the Adi language. On the other hand, ISO 639 -3 did approve their request to change the reference name from “Adi, Galo” to simply “Galo” 22 in line with its policy of avoiding pejorative or otherwise objectionable names. 23 In other words, there is a difference between a name and a code, with dif - ferent ethical implications. For names, it is important to avoid usages that are offensive or otherwise objectionable to the people they apply to, in this case, users of a language. For codes in a metadata standard, what is needed is unique - ness, consistency, and stability in order to have a well-functioning system that benefits everyone. It would be helpful if the ISO 639 website could include an explicit statement to this effect, clarifying its purpose. ISO 639’s approach to balancing the benefits of standardization with respect for speakers’ sensitivities is one which not everyone will agree with. However, the approach is principled rather than capricious. Despite the prob - lems with some past decisions that have led to the current form of the stan - dard, indigenous communities do benefit from having a robust, effective sys - tem of language codes that facilitate their ability to find resources in and about their languages.

LANGUAGE NAMES In order to make clear what each code refers to, ISO 639 -3 lists one or more language names with each code. No attempt is made to include all names that may be associated with a code; that task is more appropriate for encyclopedic materials such as the Ethnologue or Wikipedia. Still, in many cases it is helpful to list more than one language name to make clear what lan - guage is designated. One name is designated as primary, as the code’s “reference name.” Additional names can be associated with a code, but still the reference name may easily be perceived as the “better” or “more privileged” name. Indeed, on the form used to request a new code, one name is requested to be designated as a “preferred name,” and presumably this eventually becomes the reference name in most cases. If two different communities using the same language have different names for it, the choice of the final reference name may appear to favor one community over the other. In northern Australia, the ISO code [dhg] denotes a Yolngu language known under (at least) two different names: Dhangu (its current reference name) and Djangu (an additional name), a difference

28 /BICKFORD stemming from the pronunciation difference between two clans in the commu - nity. 24 As of this writing, the reference name has been proposed to be changed to the somewhat clumsy Dhangu-Djangu in order not to give the appearance of favoring either clan. 25 This should be unnecessary. I see no reason to maintain a distinction between reference and additional names; there should simply be a list of names associated with a code, with no preference given to any of them.

RELYING ON ISO 639 -3 FOR OTHER PURPOSES The ISO 639 standard is not intended to define decisively what consti - tutes a language to the exclusion of all other definitions, as noted above. Rather, the goal is to provide one unambiguous referencing scheme for use in information processing, with no claims that it is the only one possible. In prac - tice, however, the high profile of the standard has led non-linguists to look to ISO codes as “proxies” 26 for linguistic expertise when they need an answer to the question “Which language varieties are languages rather than dialects of another language?” For example, the Wikimedia Foundation currently requires an ISO 639 code before it will set up a new project, such as a new Wikipedia, in that lan - guage. It states: “If there is no valid ISO 639 code, you must obtain one. The Wikimedia Foundation does not seek to develop new linguistic entities; there must be an extensive body of works in that language. The information that dis - tinguishes this language from another must be sufficient to convince standards organizations to create an ISO 639 code.” 27 This position is certainly under - standable—demonstrating the foundation’s recognition that it does not have the expertise to make decisions about language identification. However, the question remains whether ISO 639 -3 is a reliable basis for making decisions that go beyond simply clarifying which language is being dis - cussed or represented in a document. This must be determined separately in each case rather than simply assumed. In the case of Wikimedia, relying on ISO 639 may not cause great harm to a language community, but other uses may have greater negative impact. People have raised concerns that govern - ments and NGOs may rely on ISO 639 in inappropriate ways to guide resource allocation, if that reliance is coupled with a naïve assumption that there is a sin - gle, easy answer to questions of language identity. 28 These problematic resource allocations may include decisions about what languages to use in schools and textbooks, what languages to make available in publications about health and agriculture, and which language speakers are entitled to interpreters in legal proceedings. How serious are these concerns? As part of language planning, countries often draw up their own authorita - tive lists of languages. It remains to be seen how much weight ISO 639 -3 will have in such decisions, but I suspect its impact will be negligible. China, for

Ethics of Language Identification and ISO 639 /29 example, is not likely to expand its list of 56 recognized ethnic groups to include ethnicities corresponding to all 300 or so languages currently listed in the stan - dard for China. For decades, the Mexican government maintained an official list of 52 languages. In 2008, it expanded the list to include 364 based on its own extensive research, actually more than the number listed in ISO 639 -3 codes for Mexico. 29 Thus, it appears that in cases where countries are allocat - ing substantial resources to language communities, they are going to make their own determinations about what is a language instead of relying on ISO’s deci - sions. I am unaware of any instances where a country has relied uncritically on ISO 639 -3 for resource allocation. Private organizations may be more likely to misuse ISO 639 -3 in this way since they may have fewer resources for making their own language category decisions. Suppose that a non-governmental organization, recognizing its inabil - ity to translate and publish a particular set of informational materials in all lan - guages of a region, sets a threshold of, say, 50,000 speakers as a minimum size for a language group that it will attempt to serve. In such a case, a decision in the standard about whether to combine two varieties as one language may determine whether those speakers get access to the information at all. Together, the two groups may be above the population threshold, but not if kept separate. Similarly, an organization may require a language to have an ISO 639 -3 code before it provides any funding, thus effectively denying resources to a community that has been unaware of the standard or which has chosen not to request a code. Still, if the standard is to work effectively as a way of referring to languages, such factors should not be considered in developing the standard. Rather, pri - vate organizations need to be educated about how such uses of the standard are inappropriate. Indeed, SIL’s own experience has been that its activities in language devel - opment have not been constrained by ISO 639 -3 (or the Ethnologue identifi - cations of languages that preceded it). Indeed, resource allocation decisions are often made before any decisions are reached about language codes, and at any rate are made by parts of the organization separate both from the ISO 639 -3 office and from Ethnologue . There are dozens of cases where SIL has helped produce multiple sets of materials in different dialects that share the same ISO code, or in languages that have no code at all. These are local decisions based on locally-available information, not on ISO 639 -3. Another area for possible misuse of the standard is in legal proceedings. In a blog posting on Ethnologue’s website, Paul Lewis 30 reports two cases involv - ing immigration and asylum applications. In them, the lack of an entry for a lan - guage in Ethnologue was used as the basis for arguing that the applicant’s claimed language did not exist and that therefore the application was falsified. Since Ethnologue only lists languages that have ISO 639 -3 codes, this means that an omission in ISO 639 -3 will be reflected in Ethnologue . If judges and

30 /BICKFORD lawyers assume that Ethnologue and ISO 639 -3 are complete listings of all lan - guages in the world, they may deny legal protection to people who should have it. Lewis’s blog post was specifically directed against such a misuse. Language and dialect are often important markers of ethnic identity. Therefore, ethnic groups may look to ISO 639 -3 as a way of validating their existence and distinctiveness. 31 In cases where two ethnic groups use mutual - ly intelligible language varieties, this becomes problematic because the stan - dard is intended to identify languages, not ethnicities. Therefore, proposals based solely on ethnic identity that are not reflected in a clearly distinct lan - guage are most likely going to be rejected, as in the Yol ŋu Matha case men - tioned previously. Although sociolinguistic factors such as ethnolinguistic identity can enter into decisions about language codes, they do so only in cases of marginal intelligibility. Granted, in some societies, ethnic or clan identity may be far more impor - tant than “language.” Still, the purpose of ISO 639 is to facilitate information processing, not to serve as an identifier for ethnicity or any other culturally important category. When it is important to refer to those categorizations, dif - ferent means should be used, such as IETF language tags (when appropriate) or ordinary names drawn from indigenous languages themselves. In summary, then, in order to use the standard appropriately, it is impor - tant to remember that the standard is not intended to be the only or exclusive way of referring to language varieties, but simply one useful way of doing so for information processing and retrieval. Decisions about resource allocation to language communities should not be made based merely on the presence or absence of a code in the standard. 32 Again, it would be helpful if the standard’s website could include a statement clarifying its purpose and explicitly identify - ing inappropriate uses of the standard.

LANGUAGE IDENTIFICATION AS A COMMUNITY DISCUSSION Ultimately, language identities are social constructs (“political decisions” in Kamusalla’s 33 terms). They arise out of the interaction between the speakers of the language and the outsiders who need to refer to the language. ISO 639 -3 provides one forum in which such discussions can take place, through its change process, and thus is providing a valuable service to the world commu - nity of languages. It would be naïve to claim that ISO 639 has no role to play in determining whether a particular language variety is a language or a dialect. People will con - tinue looking to the standard as documentation of language identities, even if they recognize that it is not the only way of grouping dialects into languages, and some people will try to use it for purposes other than what it is intended for. For discussions in the standard to proceed ethically, it is important that all relevant parties be involved in decisions about language codes and the names

Ethics of Language Identification and ISO 639 /31 associated with them so that changes to the standard can be made with full aware - ness of their potential impact. The most obvious parties to be involved are speak - ers of the languages themselves. Ideally, change requests should be presented by representatives of language communities. In practice, this has happened in only a few cases that I am aware of. 34 Most proposals seem to be made by outsiders whose relationship to the community is not clear from the change request forms. Even when a request originates within a community, it is not always clear whether it comes from an individual only or whether it reflects a broader consensus. How to effectively involve language communities in decisions about codes, however, is a distinct challenge. ISO 639 -3 cannot be reasonably expected to contact the affected language community about each change request that it receives. In most cases, ISO workers would have no knowledge of even how to do so. What individuals or groups within the community should be consulted? Who can speak for the community as a whole? What if people disagree? How should we proceed when hundreds of thousands of people are involved? Clearly, then, this responsibility must rest on those who submit requests—to consult with appropriate speakers of the language to evaluate whether the pro - posed change is going to have negative repercussions for their community. Because of the great variety of situations, no strict rules can be proposed as to how to do this. But, it would seem appropriate for ISO 639 -3 to add a recom - mendation to this effect in its guidelines and change requests forms, 35 to alert proposers to their responsibility to include the language community itself.

CONCLUDING REMARKS In summary, then, the ISO 639 standard is one useful tool available to the world community for locating information in and about the languages of the world and for efficiently processing them in electronic resources. As a side ben - efit, the standard can be helpful in addressing broader questions of language iden - tification, but it should not be relied on as a definitive statement of what languages exist in the world—a question that will always have different answers depending on the purposes for which it is asked. The most important ethical principle to fol - low, then, is not to expect the standard to do things that it is not intended to do. I have recommended making some revisions to materials associated with the standard to maximize its benefits and minimize potential harm:

• Accountability and credibility would be improved by involving people from outside SIL International in deciding change requests. • Official information about the standard should be available in world lan - guages other than English. • There should be a clear statement that codes are not names; though sometimes mnemonic, they are arbitrary, and no conclusions should be drawn from the choice of letters in a code.

32 /BICKFORD • No distinction should be made between reference names and additional names associated with a code; all names should have equal status. • Clear statements should be made about what the standard is and is not intended to do, and about inappropriate uses of it. • People making change requests should be encouraged to actively in- volve language communities themselves in the change request process.

NOTES

1 Jeff Good and Calvin Hendryx-Parker, “Modeling Contested Categorization in Linguistic Databases,” in Proceedings of the EMELD ’06 Workshop on Digital Language Documentation: Tools and Standards: The State of the Art (2006 E-MELD Workshop on Digital Language Documentation, Lansing, Michigan, 2006), 5, http://www.linguistlist.org/emeld/workshop/2006/papers/GoodHendryxParker-Modelling.pdf; Michael Cysouw and Jeff Good, “Languoid, Doculect and Glossonym: Formalizing the Notion ‘Language’,” Language Documentation and Conservation 7 (2013): 331–59, http://hdl.handle.net/10125/4606. 2 Henry Hitchings, The Language Wars: A History of Proper English (New York: Farrar, Straus and Giroux, 2011), 20; Tomasz Kamusella, “The Global Regime of Language Recognition,” International Journal of the Sociology of Language 2012, no. 218 (2012): 59–86. 3 International Standardization Organization, ISO 639-3:2007: Codes for the Representation of Names of Languages, Part 3: Alpha-3 Code for Comprehensive Coverage of Languages (Geneva: International Organization for Standardization, 2007), Registration authority: http://www.sil.org/iso639-3/. 4 Part 5 is concerned with groupings of languages such as language families and generally does not pose major ethical concerns because language communities are usually not impacted by how their language is categorized in larger schemes. 5 The codes were originally derived from the 14 th edition (Barbara F. Grimes, Ethnologue: Languages of the World, 14th Edition (Dallas, Texas: SIL International, 2000)) and then used in draft form in the 15 th edition (Raymond G. Gordon, Ethnologue: Languages of the World, 15th Edition (Dallas, Texas: SIL International, 2005), http://archive.ethnologue.com/15/). About 10% of the codes in Ethnologue 14 needed to be changed to avoid conflicts with codes already used in ISO 639-2, as explained to me by Gary Simons in personal communi - cation. Typographically, the ISO 639-3 codes are lower case, whereas earlier Ethnologue codes were upper case. Codes were also added for some extinct and constructed languages which are not listed in Ethnologue ; these are maintained by The Linguist List (http://linguistlist.org/forms/langs/find-a-language-or-family.cfm). For these, how - ever, there are fewer, if any, ethical concerns, so I will ignore them in this paper. 6 ISO 639-3/RA, “Scope of Denotation for Language Identifiers” (Dallas: SIL International, 2015), http://www- 01.sil.org/iso639-3/scope.asp. 7 A. Phillips and M. Davis, “Tags for Identifying Languages” (Internet Engineering Task Force, 2009), https://tools.ietf.org/html/bcp47; Kamusella, “The Global Regime of Language Recognition,” 80. 8 Andrew Smith, “Over 7,000 Languages, Just 1 Windows,” Windows Experience Blog , 5 February 2014, http://blogs.windows.com/windowsexperience/2014/02/05/over-7000-languages-just-1-windows/; The Consortium, “Unicode CLDR Project” (, 2015), http://cldr.unicode.org/. 9 Peter Constable and Gary Simons, “Language Identification and IT: Addressing Problems of Linguistic Diversity on a Global Scale,” SIL Electronic Working Papers 2000, no. 2000–2001 (2000), http://www.sil.org/silewp/2000/001/SILEWP2000-001.html. 10 OLAC, “OLAC Language Extension,” Open Languages Archiving Community, February 22, 2008, http://www.language-archives.org/REC/language.html; OCLC, “041 (R),” Online Computer Library Center, 2015, https://www.oclc.org/bibformats/en/0xx/041.html; “Search by Language,” The Linguist List, 2015, http://linguistlist.org/forms/langs/find-a-language-or-family.cfm. 11 See, for example, arguments advanced by Patience Epps and others in “In Opposition to Adopting Ethnologue’s Language Codes for ISO 639-3,” SSILA Bulletin , no. 246 (2006): 2–3, https://web.archive.org/web/20140520221057/http://linguistlist.org/ssila/Bulletins/Archive/bull246.pdf. 12 Harald Hammarström et al., eds., Glottolog 2.6 (Jena: Max Planck Institute for the Science of Human History, 2015), http://glottolog.org/. 13 Martin Haspelmath, “Can Language Identity Be Standardized? On Morey Et Al.’s Critique of ISO 639-3,” Diversity Linguistics Comment , 4 December 2013, http://dlc.hypotheses.org/610. 14 See change requests 2008-048 through 2008-063 at http://www-01.sil.org/iso639-3/chg_requests.asp?order=C- R_Number&chg_status=2008. 15 Melinda Lyons, personal conversation, n.d.

Ethics of Language Identification and ISO 639 /33 16 These are available on Google Drive, although they represent an earlier stage in my thinking about ISO 639-3 and do not in all respects match what I have written in the current paper: https://drive.google.com/folderview?- id=0B8JMRC0yhNk5YUV0dU1kZDc5YTQ&usp=sharing. 17 Cathy Bow, “Shoehorning Yolngu Language Names into the ISO 639-3 Standard,” paper presented at the Australian Linguistic Society Conference, Newcastle, Dec 10-12, 2014, http://www.cdu.edu.au/laal/wp- content/uploads/2015/09/Yolngu_ISO_codes_ALS_2014.pdf; ISO 639-3/RA, “Change Request Documentation for: 2015-063” (Dallas: SIL International, 2015), http://www-01.sil.org/iso639-3/chg_detail.asp?id=2015- 063&lang=dhg. 18 Heinz Kloss, “‘Abstand Languages’ and ‘Ausbau Languages’,” Anthropological Linguistics 9, no. 7 (1967): 29–41, http://www.jstor.org/stable/30029461. 19 Cysouw and Good, “Languoid, Doculect and Glossonym: Formalizing the Notion ‘Language’.” 20 Hammarström et al., Glottolog 2.6 . 21 Stephen Morey, Mark W. Post, and Victor A. Friedman, “The Language Codes of ISO 639: A Premature, Ultimately Unobtainable, and Possibly Damaging Standardization,” Presentation at PARADISEC 2013, Research, records and responsibility: Ten years of the Pacific and Regional Archive for Digital Sources in Endangered Cultures, http://hdl.handle.net/2123/9838. 22 ISO 639-3/RA, “Change Request Documentation for: 2006-087” (Dallas: SIL International, 2006), http://www- 01.sil.org/iso639-3/chg_detail.asp?id=2006-087. 23 ISO 639-3/RA, “ISO 639-3 Downloads” (Dallas: SIL International, 2015), http://www-01.sil.org/iso639- 3/download.asp. 24 Bow, “Shoehorning Yolngu Language Names into the ISO 639-3 Standard.” 25 ISO 639-3/RA, “Change Request Documentation for: 2015-063.” 26 Lise M. Dobrin and Jeff Good, “Practical Language Development: Whose Mission?,” Language 85, no. 3 (2009): 626. 27 Wikimedia, “Language Proposal Policy” (Wikimedia Foundation, 2015), https://meta.wikimedia.org/wiki/ Language_proposal_policy, accessed 2015-11-04; Dobrin and Good, “Practical Language Development, 626. This principle does not necessarily apply to all Wikipedias currently in existence. 28 Cysouw and Good, “Languoid, Doculect and Glossonym,” 335; Morey, Post, and Friedman, “The Language Codes of ISO 639.” 29 INALI, “Catálogo de las lenguas indígenas nacionales: Variantes lingüísticas de México con sus autodenomi - naciones y referencias geoestadísticas,” Diario oficial (Mexico, D.F.: Instituto Nacional de Lenguas Indígenas, 14 January 2008), http://www.inali.gob.mx/pdf/CLIN_completo.pdf. 30 Paul Lewis, “How NOT to Use the Ethnologue,” Ethnoblog , 1 October 2014, http://www.ethnologue.com/eth - noblog/m-paul-lewis/how-not-use-ethnologue. 31 For example, see the comments by Henrik Rosenkvist on a current request for assigning a separate code to what the Swedish government regards as a dialect of Swedish, especially the last two paragraphs: http://www- 01.sil.org/iso639-3/cr_files/CurrentComments/2015-046_comment02.pdf. 32 Of course, factors that were considered in making decisions about code change requests may well be relevant to other decisions, and so ISO 639-3’s website documenting those factors may be a helpful resource. 33 Kamusella, “The Global Regime of Language Recognition.” 34 For example, Valencian Sign Language was added to the standard at the request of a member of the Valencian Deaf Community (http://www-01.sil.org/iso639-3/chg_detail.asp?id=2006-087), and Inuit Sign Language was added at the request of outside researchers who had discussed the matter with local community members and had their full support (http://www-01.sil.org/iso639-3/chg_detail.asp?id=2014-060). 35 ISO 639-3/RA, “ISO 639-3 Change Management” (Dallas: SIL International, 2015), http://www- 01.sil.org/iso639-3/changes.asp.

34 /BICKFORD