Modeling Terminology Across Thousands of Languages

Arya D. McCarthy, Winston Wu, Aaron Mueller, Bill Watson, and David Yarowsky Department of Computer Science Johns Hopkins University, Baltimore, MD USA {arya,wswu,amueller,billwatson,yarowsky}@jhu.edu

Abstract Language Color Literal Gloss

There is an extensive history of scholarship Welsh brown into what constitutes a “basic” , Italian marrone coffee + of ﻗﻬﻮه ای as well as a broadly attested acquisition se- Persian quence of across many lan- Cantonese ar coffee + color guages, as articulated in the seminal work of Berlin and Kay (1969). This paper employs Table 1: Examples of terms representing brown, aris- a set of diverse measures on massively cross- ing from four processes: borrowing (Welsh; from En- linguistic data to operationalize and critique glish), null affixing (Italian), derivational affixing (Per- the Berlin and Kay color term hypotheses. Col- sian), and compounding (Cantonese). lectively, the 14 empirically-grounded compu- tational linguistic metrics we design—as well as their aggregation—correlate strongly with data are hard to find in the long tail of languages, both the Berlin and Kay basic/ we still aim to consider more than ever before— term partition (g = 0.96) and their hypothe- 1 sized universal acquisition sequence. The mea- 2491 languages and dialects. We leverage natural sures and result provide further empirical ev- language processing tools to operationalize long- idence from computational linguistics in sup- standing literature on language universals. port of their claims, as well as additional nu- We provide a three-pronged investigation of the ance: they suggest treating the partition as a classic criteria for basic color terms, examining spectrum instead of a dichotomy. the degree to which color are abstract (§5), monomorphemic/monolexemic (§6), and salient 1 Introduction (§7). Our operationalization of these (B&K) cri- How many are in the ? An infinite teria shows that individual features do not reflect number, but each language divides up perceptual the basic/non-basic divide. Nor is this divide bi- space into a finite number of categories by giving nary, as B&K suggest: We show that abstract- names to colors. The seminal work on color cate- ness, monomorphemicity, and even salience do not gories, by Berlin and Kay (1969, hereafter B&K), cleanly divide colors. characterizes a universal evolutionary sequence for Nonetheless, by treating basicness as a spectrum languages’ core colors (their basic color terms) and aggregating these features (like human-judged and their corresponding categories, at each stage concreteness, frequency of compounding, and word refining the partition of . length) into basicness scores (§8), we can largely A handful of criteria define basic color terms, in- distinguish between basic and non-basic colors (val- cluding abstractness, monomorphemicity, and not idating our measures), and our scores recreate the being subsumed by a broader basic term. (See §2 historical sequence of color acquisition in lan- for the complete list.) These criteria are accused guage. The sequence is in no way directly encoded of biasing analyses of color systems—especially in the criteria for basic color terms; as such, recre- in non-Western societies (Wierzbicka, 2006). To ating it is a separate and novel empirical discovery. mitigate this bias, a pan-lingual approach to analyz- 1 To this end, we present a large cross-lingual, type-level ing color systems may reveal general (“universal”) database of translations of basic and secondary color terms trends more reliably than smaller datasets. While across 2491 languages (§3).

2241 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 2241–2250, Hong Kong, China, November 3–7, 2019. c 2019 Association for Computational Linguistics 2 Color Terminology Not all languages have the same number of color < red < < blue < brown < black yellow 2orange3 words; for instance, a single Korean color word   ⇥ ⇤ ⇥ ⇤ ⇥ ⇤ 6 7 (pureu-n) applies to both grass and sky—an un- 4 5 usual concept for native English speakers. Simi- Figure 1: The diachronic sequence of color acquisition larly, Russian distinguishes between two families (Berlin and Kay, 1969). of what English speakers call “”: the lighter goluboy and the darker siniy. Reaction time ex- periments show the cognitive importance of these and ). Additionally, (6) recent foreign loan categories (Gilbert et al., 2006; Winawer et al., words “are suspect”, and (7) if the lexemic status of 2007), and the existence of a named category both the word is difficult to judge, then multimorphemic aids (Brown and Lenneberg, 1954) and guides (Bae words are also “suspect”. et al., 2015; Cibelli et al., 2016) color judgment and memory. In addition to this definition, (B&K) surveyed speakers of 20 languages in the San Francisco Bay Color terms may be concrete (i.e., derived from Area, plus a sweeping examination of the literature, a real-world referent like “blood” or “sky”) or ab- to find a to the emergence of color words stract. Diachronic processes can weaken the link sequence between a concrete term and its referent, until a new in language. Cultures with two color words uni- cohort of speakers believes the term to be abstract. versally used them to distinguish and warm Indeed, this process explains the development of colors from dark and cool ones; the third color was English color words (Casson, 1994). In addition universally , and the sequence continued until to metonymy with named things, the words may matching the set of eleven colors represented by be borrowed, compounded, or inherited from an English basic color terms. We present their par- ancestor language. tial ordering in Figure 1, though later authors have proposed alterations (Heider, 1972; Kay, 1975). While industrialized societies’ languages pos- sess a wealth of color words (Hardin, 2014), only We are not the first to assess the notion of a a handful are considered basic color terms; the re- basic color term. Crawford (1982) gives a point-by- mainder are secondary. A basic color term (BCT) point rebuttal on pragmatic grounds—the criteria must satisfy four obligatory criteria (B&K): are hard for a field worker to assess, and many 1. It must be monolexemic (and monomor- introduce subjectivity that will bias data collec- phemic). “Light blue” and “blue-green” each tion. Lucy (1997) argues that the definition pro- contain two lexemes and do not qualify. vides more of a post-hoc screening tool for when 2. It may not possess any color hypernyms (su- the “denotational net” of elicitation has captured perordinate color terms). (E.g., “” has too many terms, as opposed to a morphosyntacti- the hypernym “purple”.) cally informed approach (e.g., Conklin, 1955). Fi- 3. It may not be limited in application to a nar- nally, Wierzbicka (2006) argues that other societies row class of objects. “Blond(e)” may only may not share the Western conception of -based be applied to a handful of referents like hair, color terms, making the application of the concept wood, and beer, for example. inappropriate. In addition to these postulatory ob- 4. It must be psychologically salient. This im- jections, a vast literature of similarity judgments, plies that the color term has a stable range reaction times, and other human measures debates of reference across speakers and has an entry the question from a cognitive perspective (Heider, in the lexemic inventory of most (if not all) 1972; Jameson, 2005; Roberson et al., 2005, 2008; native speakers’ respective idiolects. Goldstein et al., 2009; Loreto et al., 2012; Persaud Additional criteria are introduced in cases of doubt and Hemmer, 2014, inter alia). (Kay and McDaniel, 1978), though these are subjec- By contrast, we examine the conditions empir- tively applied (Crawford, 1982). Among these: (5) ically, broadly and automatically on a massively a BCT is not the name of an object that character- multilingual scale (versus manually and theoreti- istically has a particular color; in other words, the cally). Our evidence for assessing B&K’s criteria color must be abstract, and not grounded in some of abstractness, monomorphemicity, and salience concrete object (which rules out colors like comes from a multilingual dragnet of color terms.

2242 3 Data to qualitative observations, our experiments evalu- ate these qualities through several metrics, illumi- We investigate the three aspects of our the- nating flaws in the definition of “basic color term”. ory assessment—abstractness, monomorphemic- When averaging the 14 features together, the im- ity, and salience—through multilingual dictionar- plied total ordering is suggestive of the original ies. We additionally leverage English corpora to B&K sequence. explore abstractness and salience. We use these to construct a dataset of color senses and transla- Goodman and Kruskal’s gamma We measure tions, with scores along numerous axes. As a final correlation between basicness and our features with resource to investigate salience, we use a global Goodman and Kruskal’s gamma (Goodman and elicitation of color terms from pre-industrialized Kruskal, 1954, 1959, 1963, 1972), which is well societies. suited for comparing binary variables to ordinal In English, the basic color terms are red, , ones. It is a pair-counting measure which ignores , green, blue, purple, brown, pink, , tied values. We compute it by maximum likelihood white, and grey. These align to the eleven ba- estimation, giving an expression: sic color categories identified by Berlin and Kay N N (1969). In addition to these eleven, we consider a g = s d , (1) list of 92 second-tier color terms identified by Cas- Ns + Nd son (1994). These were elicited from 30 speakers where Ns is the number of color pairs for which over several days to ensure salience, then filtered basicness and a feature agree in their ranking; Nd by a dictionary to keep only conventional (rather is the number of pairs ranked in opposite orders. than novel) color descriptors. We omit 12 of these Arbitrarily, we represent basicness as 1 and non- which do not appear in the datasets we employ. basicness as 0. We will be concerned only with Translation may act as a useful resource for dis- the magnitude, not the direction of correlation. A ambiguating word senses (Diab and Resnik, 2002). score of 1 thus indicates perfect correlation with By translating English BCTs into other languages, ± basicness. we can find their basic color terms. Then, back- translating to English, we obtain a list of the poten- A remark on noise Each of our measures is im- tial senses of a given color term. We draw trans- perfect; it possesses a bias. Combining the results lations of color terms from two large type-level from many weak indicators gives robustness to the dictionary resources, PanLex (Baldwin et al., 2010; noise in any given measure. Foreshadowing future Kamholz et al., 2014) and Wiktionary, which to- discussion, we see this supported empirically by the gether provide color word translations for 2491 effect of aggregation on Goodman and Kruskal’s languages or dialects. gamma. For each of the 11 basic color concepts and 80 secondary terms e in English, we translate 5 Abstractness it into every available foreign language ` by The basic color terms in English did not start as ab- dictionary lookup to get a set of non-English stract concepts. For instance, “orange” and “pink” color words F`. We then back-translate each were originally derived from the concrete color of a term into English (again by lookup) to get a fruit (Citrus sinensis) and flower (Dianthus plumar- (e) = ( )= set E` f F` e0 trans` En f e0 of possi- ius 2 { | ! } ) respectively, and earlier the English color term ` ble round-tripS translations through the language . “black” had its origin in a word for soot, but the The final dataset contains tuples of the form (En- abstract color senses of these words in relative glish color word, foreign language, foreign word, popularity; many supplanted the original defini- English back-translation). tions as the more common word sense (Casson, 1994). It could be that many “basic” color terms 4 Summary of Experiments emerged metonymically from concrete, real-world We perform a series of experiments to explore the referents, as in English. abstractness, , and salience of color terms across thousands of languages.2 In addition 5.1 Concreteness judgments 2 We give our data and implementations at https:// As a first pass, we directly look up the concrete- github.com/aryamccarthy/basic-color-terms. ness for each color word on our list, in a dataset of

2243 Category Back-translation Freq. Conc. OED POS human-judged concreteness of each of our back- Black black 1065 3.76 Adj translations, then average these for each color term, dark 467 4.29 Adj weighted by the number of languages for which a dirty 162 4.23 Adj word is a back translation. This balances between White white 4423 4.52 Adj a single frequent sense and multiple infrequent bright 163 3.92 Adj senses. clear 163 3.55 Adj Performing this averaging over languages and Red red 3053 4.24 Adj reddish 320 3.42 Adj Adj senses magnifies our correlation to g = 0.62; ! 286 4.00 Adj clearly, exploiting the diversity of senses is ben- Green green 3786 4.07 Adj eficial. Still, there is no clear separation between unripe 468 3.31 Adj Adj basic and secondary color terms. Concreteness or ! blue 242 3.76 Adj abstractness thus provides incomplete evidence of Yellow yellow 1306 4.30 Adj basicness. saffrony 133 - N Adj ! luteous 94 - F.W. Adj ! 5.3 Part of speech as a proxy for concreteness Blue blue 1324 3.76 Adj Adjectives are, on average, perceived as more ab- gloomy 361 2.52 Adj grim 293 1.82 Adj stract than nouns (Darley et al., 1959). We af- Brown brown 671 4.48 Adj firm this finding: in the Brysbaert et al. judgments, pink 314 3.93 Adj nouns are less abstract than adjectives on average brownish 106 3.24 Adj Adj ! (3.53 versus 2.50). Because of this, we are comfort- Purple purple 932 4.04 N able using color words’ part of speech as a coarse 324 4.48 N hint of their abstractness. purpure 148 - N We collect part of speech annotations from two Pink pink 121 3.93 Adj sources: the Google Books Ngram Corpus, con- rose 13 4.90 N carnation 10 3.93 N taining about 4% of all books ever printed (Michel et al., 2011), and the Penn Treebank (Marcus et al., Orange orange 1257 4.66 N orange tree 89 - N 1993). The former is machine-annotated for part- mandarin 87 3.67 N of-speech (and thus noisier); the latter is annotated Gray gray 775 3.46 Adj by linguists. For each corpus, we compute the ratio grey 570 4.11 Adj of adjectival relative to nominal part of speech, as greyish 155 3.99 Adj well as total frequency.3

Table 2: Top back-translations and their concreteness 6 Morphology for each of the 11 basic color categories. Extended in the supplementary material. Part of speech is first- In this section, we ask whether there are affixes that listed from the Oxford English Dictionary (Simpson are highly correlated with color; these can be either et al., 1989). general derivational affixes or sequences specific to color terms, as in Table 1 and Table 4. The presence 40,000 English lemmas’ concreteness (Brysbaert of subword structure in basic concepts’ translations et al., 2014) as rated on a Likert scale by Ama- would show that they violate the monomorphemic- zon Mechanical Turk workers in the United States. ity criterion. As expected, we find that concreteness negatively Although the B&K criteria demand monolex- correlates with basicness (g = 0.58). Many non- emic and monomorphemic words, color terms in basic colors are less concrete than basic terms; for many languages are formed by some derivational instance, “” is the least concrete color (3.41), process from a concrete term. To discover the com- and most words are less concrete than “orange”. ponents in an unsupervised fashion, as is necessary for languages in the long tail of linguistic resource 5.2 A hologeistic perspective 3 These features work double duty—they shed insight into A word may have multiple senses, which we hope both the concreteness and the salience as descriptive concepts to capture by taking a pan-lingual, or hologeis- of each color word. It can lead to spurious ordering, though, if not tempered by other measures—“gold” and “flesh” are both tic, perspective, getting at the concept itself rather frequent because of their material senses, so raw frequency is than any surface form. To do this, we find the not an indicator of basicness.

2244 availability, we look for constituent morphemes Language Affixes and constituent lexemes by segmentation and com- nci -tic† (40%), tla- (27%), xox- (27%) pound detection, respectively, on each foreign lan- aqc - (36%) guage of our dataset, then use these to score colors’ тут⇤ dbu - gó (29%) basicness. N mpj -lpa⇤ (26%), -ku (15%) 6.1 Affix discovery ciw -aan-† (23%), -zo⇤ (23%)

Segmentation and affix discovery is a challenge Table 3: Common example affixes. indicates a gen- ⇤ for the low-resource languages in our study. To eral derivational affix. † indicates a color-specific mor- give signal to the model, we leverage our other met- pheme. Bare affixes are neither. Included for each mor- rics. We compute the percentage of the time that a pheme is the percentage of color words in our dataset in color word’s translation occurs with a suffix that is the given language which contain the morpheme. ISO strongly associated with one of the top 10-highest- language codes are given in Appendix A. ranking colors on the basicness scale, according to the aggregation we mention in §4 and detail in §8. Method Example In other words, words that associate with a typical concrete + color cmn: Y (orange) + r (color) color affix in that language tend to be colors. color + der. affix spa: anaranja (orange) + do This is not a test for basicness, per se, but rather Adj. + color deu: dunkel (dark) + rot (red) being a likely color, so it supplements the part of speech measures. But it does diminish the rank of Table 4: Discovered concatenation strategies pooled almost all words with senses/translations that are across languages, with representative examples from not primarily colors. In addition, this measure has Mandarin, Spanish, and German. the highest correlation of any of ours with basicness (g = 0.92)—though this is not surprising, as it was terms in B&K’s ordering—implying that Nahuatl computed by bootstrapping from the other results, lacks basic color terms under the B&K definition. which already correlate well in aggregate. We also see derivational morphemes, which are As another tack, we identify likely color- applied to words from a given part-of-speech class related and general derivational affixes by un- to convert them to another class—e.g., “тут” in supervised morphological segmentation.We Archi (aqc) in Table 3, which is a fused morpheme define a probability distribution over segmenta- denoting adjectivalization and marking Archi’s tions. Let S = s1s2 ...sm segment the word W = fourth gender. This morpheme appears in the BOSw1w2 ...wnEOS. We seek to find the optimal Archi’s terms for . As with Nahu- S⇤. To do this, we decode a model atl, this implies that Archi lacks basic color terms— according to a strict interpretation of the criteria. S⇤ = argmaxPr(S)=argmax ’ Pr(si) S S s S i2 6.2 Compound detection using the Viterbi algorithm, where the individual To particularly identify color names which are segment probabilities are maximum a posteriori formed by compounding of words, we extend a estimates under a Dirichlet prior (a = 0.01). The model for compound discovery to identify , inspired by Ge et al. (1999), is similar to terms which were produced compositionally. This other unigram segmentation models (Creutz and lets us ask two questions about the BCT defini- Lagus, 2005; Kudo, 2018). We then search for tion: (1) Across languages, are there “basic” color these affixes across the terms recorded in that lan- terms that are not monolexemic, and (2) Are “basic” guage, to determine whether the affix is broadly terms less likely to be compounds? The answer to derivational or specific to color terms. Select re- both, we find, is “yes”. sults are given in Table 3. Wu and Yarowsky (2018) propose a multilin- Consider the -tic suffix in Nahuatl (nci). This gual compound analysis and generation method morpheme semantically denotes “color”; one com- that only requires a readily available multilingual bines real-world referents with this suffix to obtain dictionary. They first extract potential compounds colors, e.g., chichiltic (chili-color). This appears by splitting any word into three substrings corre- even with black and white—the first basic color sponding to a left component, glue, and right com-

2245 Process Basic Secondary compounding than secondary colors; nevertheless, this should be taken with a grain of salt: The an- Inheritance 1161 2356 notations for secondary colors are less complete, Derivation 82 183 so while we use these scores, we do not take the Cognate 303 483 criterion of borrowings as a prong of our criticism. Borrowing 18 84 None of these 42566 65969 7 Salience

Table 5: Sources of foreign color words. The colors are B&K assessed salience by a color’s tendency to not mutually exclusive. appear earlier when asking speakers to list their language’s colors. More general assessments are outlined by Hays et al. (1972): word length, fre- ponent. These compounds are used to construct quency of use, ethnographic frequency, and corre- compound “recipes”. For example, they discovered lation of vocabulary size with cultural complexity. that the concept of ‘hospital’ is frequently repre- While the final one of these is beyond our scope, we sented across a variety of languages as a compound present simple experiments to test the other three, of ‘sick’/‘disease’ and ‘house’/‘home’ in their re- based on our translation dataset. spective language. They use these recipes to score their initial list of potential compounds, filter out 7.1 Word length low-scoring, unlikely compounds, and performing Durbin (1972) supposed, based on Zipf’s Law of a second pass of recipe construction, resulting in a Abbreviation (Zipf, 1932, 1949), that word length higher-quality compound dataset. Compound anal- would decrease for more salient, broadly used color ysis is performed in a similar manner as recipe con- terms. We test this across over 2400 languages by struction. Compound generation takes into account computing the mean length of all translations for language-specific knowledge of which components each English color word, regardless of script. and glues are common in that language. With the exception of grey, the first six colors Wu and Yarowsky (2018) consider only single- of the (B&K) sequence—black white, red, yellow, or zero-character glue between the two components. green, and blue—have lower average lengths than By contrast, , we allow glues of arbitrary length the subsequent five. This supports Durbin’s two- exhaustively searching through all segmentations phase theory of basic color terms. Still, beyond this into three parts. This increases the algorithmic handful, there is no clear separation between basic complexity by Q(K), where K is the length of the and non-basic colors. There is only a moderate word. Searching through our PanLex and Wik- correlation (g = 0.41) with basicness. tionary foreign translations of only our basic color terms, we find several examples of compounded 7.2 Frequency: Usage and ethnography words. Some examples are given in Table 4. The Neither English corpora nor multilingual dictio- frequency of a color being expressed by compound- naries give a complete picture of the world’s lan- ing, though, turns out to be a weak indicator of guages; after all, only 3 to 4 thousand of them have basicness (g = 0.35). writing systems (Lewis et al., 2015). To augment 6.3 An aside on borrowings our analyses, we consider the grounded speaker elicitation performed in the World Color Survey One of B&K’s lower-tier criteria for color terms (WCS; Cook et al., 2005). In it, 2568 speakers was that borrowings were “suspect”. Here we ex- from 110 pre-industrialized societies gave their amine how often color terms are borrowed, as well color naming judgments about colored chips on a as other avenues for color construction. stimulus palette (see Figure 2) which evenly varied In addition to translations and definitions, Wik- hue and . Field workers then transcribed tionary provides etymologies for many languages. the utterances and applied the B&K criteria to as- These relations have been parsed and extracted as certain which colors were basic. With the WCS, EtymDB (Sagot, 2017). we can discover the synchronic homogeneity of a We report the aggregation of EtymDB’s parsed term’s use for the labeling task. etymologies in Table 5; these are broken down by color in the supplementary material. Basic colors Do all speakers agree that a given term should are more often created by borrowing, suffixing, and be used? We examine the consensus of use 2246 Score Defined Basic Sequence Word concreteness §5.1 0.578 0.554 Translation concreteness §5.2 0.616 0.583 Ngram frequency §5.3 0.897 0.896 Ngram %ADJ §5.3 0.806 0.780 Penn TB %ADJ §5.3 0.707 0.705 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

A Count of compounding §6.2 0.874 0.858 B C Frequency of compounding §6.2 0.354 0.383 D E Affix presence §6.1 0.924 0.892 F G Borrowing etym. §6.3 0.254 0.231 H I Cognate etym. §6.3 0.797 0.804 J Derivation etym. §6.3 0.830 0.805 - Suffix derivation §6.3 0.877 0.869 Figure 2: Top: The stimulus palette used to collect Inheritance etym. §6.3 0.765 0.774 color namings by Berlin and Kay (1969) and later the Word length §7.1 0.414 0.427 World Color Survey. Colors vary vertically in Aggregate §8 0.964 0.962 and horizontally in hue. All are fully saturated. Bot- tom: The extensions of the six colors identified by one Table 6: Goodman and Kruskal’s gamma rank correla- speaker of the Iduna language, spoken in Papua New tion between each measure and both basicness and the Guinea. diachronic color sequence. Features not thought to indi- cate basicness (as discussed) have been negated. As or- dering the sequence is more nuanced than sifting basic from non-basic, one might expect consistently lower correlations. However, the sequence gamma can have larger magnitude if, despite poor sifting, the order of the basic terms is correct.

Do all speakers have the same number of col- ors? We look at the variability of sizes of each speaker’s inventory for a language, rather than the consensus on each term. The standard deviation of inventory sizes from the speakers of each language Figure 3: The relative frequencies of each language’s shows notable variation, especially when the mean color term in the World Color Survey. The height of inventory size exceeds 6. the colored columns forms a histogram, showing the Heterogeneity in within-language color invento- total number of elicited words. Redder color indicates ries is unsurprising; as Kay (1975) noted, younger broader consensus: The word was elicited by a greater members of numerous language communities pos- fraction of the language’s speakers. sess a broader inventory of basic color terms, fol- lowing Berlin and Kay’s evolutionary sequence of inventories. Nevertheless, basic color terms are de- among words elicited in the World Color Survey. fined on a per-language level; they ignore this syn- For each elicited color word in the language, we chronic variation. This raises questions for future count the number of speakers who used it. In- work: Are there categories salient only to young deed, the distribution is varied. In Figure 3, we speakers, not the old? What would be a popula- give a histogram of the total number of color terms tion’s (or language’s) basic color terms? elicited from each language, as well as the con- sensus for each individual color. We see a lexical 8 Aggregation of Features core and periphery for most languages. Most color terms are unique, and languages may have up to We have operationalized some B&K criteria for ba- 79 terms given among their speakers. It is natu- sic color terms. Independently, each fails to match ral for a speaker to use an unexpected word, still up with the known set of basic color terms. Now, believing the color to fit in a more basic category we aggregate our independent scores to create a ro- (as with ‘’ and ‘blue’ in English), but it is bust measure of basicness from the weak measures. surprising that these words would not be sifted out We do not cherry-pick our measures; some that by field linguists’ “denotational net”. we include actually harm the ordering. Instead, we

2247 Color Rank B&K Agg. score 9 Discussion white 1 1–2 1.00 While the notion of a basic color term has been black 2 1–2 0.97 widely used, its validity has been taken as given. red 3 3 0.92 With NLP techniques in the broadest multilingual green 4 4–5 0.83 survey by far, we add to the literature investigat- yellow 5 4–5 0.80 ing the definition of basic color terms. The ability blue 6 6 0.79 to produce color templates shows that monomor- gray 7 8–11 0.73 phemicity is an unreasonable criterion. The con- gold 8 0.67 creteness of many color words’ back-translations brown 9 7 0.66 violates abstractness. Finally, the heterogeneity of pink 10 8–11 0.64 color naming data contends with the salience re- 11 0.64 quirement. None of the traditional criteria for basic purple 12 8–11 0.62 color terms hold up robustly. Despite this, when crimson 13 0.60 taken in aggregate (and operationalized as we do), beige 14 0.58 they suggest the traditional sequence of color terms 15 0.55 and a coarse division between basic and non-basic blond 16 0.54 colors. tan 17 0.52 As color terms are often decomposable, we can 18 0.52 turn the decomposition on its head to generate miss- flesh 19 0.51 ing color words. We have shown cross-lingual 20 0.48 patterns of word formation that future work can exploit, giving plausible entries in a bilingual dic- Table 7: Total ordering of colors according to our ag- tionary (Wu and Yarowsky, 2018): Without the gregate score, with conventionally basic colors bolded. word for “hospital”, one can convey the concept Complete results are given in the supplementary ma- by “sick”+“house”; likewise, without the word for terial. This includes the position of the missing basic “gray”, one can use “ash”+DERIVATIONAL AFFIX. color, orange. Future work will investigate generation and valida- tion of unseen color terms. Finally, given that the divide between basic and operationalize each criterion, then see what shakes secondary color terms is so blurred, future compu- out. We take an unweighted average of normal- tational models of these should employ models of ized scores for the same reason. This makes the graded membership, such as fuzzy set theory. result more evocative; they come purely from our operationalization, rather than targeted tuning.4 10 Conclusion Using the aggregated measure produces the high- This paper has investigated the universal basic color est correlation with both basicness and the order of term theories of Berlin and Kay (1969) and oth- the B&K sequence; see Table 6. We recover the ers. It provides empirically-grounded computa- first six colors in the evolutionary sequence from tional linguistic metrics with evidence from 2491 Figure 1: white and black, red, green and yellow, languages, harnessing multiple on-line resources of and blue! An extended ordering is given in Table 7; varying quality. We have shown that although the it suggests that the most primary of the non-basic obligatory criteria do not in fact cleanly separate terms are gold, scarlet, crimson, and beige. basic from non-basic colors, our features’ aggrega- Some further inquiry is possible. We see that tion correlates strongly with the Berlin and Kay ba- orange is the 24th ranked color out of 91. This is sic/secondary color term partition (g = 0.96). The a stark separation from the ten other basic color aggregation also largely predicts the Berlin and terms; beyond this, it has the highest concreteness Kay hypothesized universal acquisition sequence, of the basic terms. which is in no way directly entailed by the basic- ness criteria. Thus, we provide further empirical evidence from computational linguistics in support 4 Nonetheless, we characterize the contribution of individ- of the B&K claims, while also providing additional ual features in Appendix B. nuance and perspective thereon.

2248 Acknowledgments Mathias Creutz and Krista Lagus. 2005. Unsupervised morpheme segmentation and morphology induction We thank several members of the Johns Hopkins from text corpora using Morfessor 1.0. Helsinki University Center for Language and Speech Pro- University of Technology Helsinki. cessing, as well as Elyse Wilson, for early discus- Frederic L Darley, Dorothy Sherman, and Gerald M sions that shaped the manuscript. Keehoon Trevor Siegel. 1959. Scaling of abstraction level of sin- Lee prepared part of the data for Table 2 as an gle words. Journal of speech and hearing research, undergraduate research assistant. We thank Ryan 2(2):161–167. Cotterell for producing the lower part of Figure 2 Mona Diab and Philip Resnik. 2002. An unsupervised and Dorothy Hu for presentation suggestions on method for word sense tagging using parallel cor- Figure 3. pora. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics.

References Marshall Durbin. 1972. Review of basic color terms. Semiotica, 6(3):257–278. Gi-Yeul Bae, Maria Olkkonen, Sarah R. Allred, and Jonathan I. Flombaum. 2015. Why some colors ap- Xianping Ge, Wanda Pratt, and Padhraic Smyth. 1999. pear more memorable than others: A model combin- Discovering Chinese words from unsegmented text ing categories and particulars in color working mem- (poster abstract). In Proceedings of the 22Nd Annual ory. Journal of Experimental Psychology: General, International ACM SIGIR Conference on Research 144(4):744. and Development in Information Retrieval, SIGIR ’99, pages 271–272, New York, NY, USA. ACM. Timothy Baldwin, Jonathan Pool, and Susan Colow- ick. 2010. PanLex and LEXTRACT: Translating all Aubrey L. Gilbert, Terry Regier, Paul Kay, and words of all languages of the world. In Coling 2010: Richard B. Ivry. 2006. Whorf hypothesis is sup- Demonstrations, pages 37–40. Coling 2010 Organiz- ported in the right visual field but not the left. ing Committee. Proceedings of the National Academy of Sciences, 103(2):489–494. and Paul Kay. 1969. Basic color terms: Their universality and evolution. University of Cali- Julie Goldstein, Jules Davidoff, and Debi Roberson. fornia Press. 2009. Knowing color terms enhances recognition: Further evidence from English and Himba. Journal Roger W. Brown and Eric H. Lenneberg. 1954. A study of Experimental Child Psychology, 102(2):219–238. in language and cognition. The Journal of Abnormal and Social Psychology, 49(3):454. Leo A. Goodman and William H. Kruskal. 1954. Measures of association for cross classifications. Marc Brysbaert, Amy Beth Warriner, and Victor Ku- Journal of the American Statistical Association, perman. 2014. Concreteness ratings for 40 thousand 49(268):732–764. generally known English word lemmas. Behavior Research Methods, 46(3):904–911. Leo A. Goodman and William H. Kruskal. 1959. Mea- sures of association for cross classifications. II: Fur- Ronald W. Casson. 1994. Russett, rose, and : ther discussion and references. Journal of the Amer- The development of English secondary color terms. ican Statistical Association, 54(285):123–163. Journal of Linguistic Anthropology, 4(1):5–22. Leo A. Goodman and William H. Kruskal. 1963. Mea- Emily Cibelli, Yang Xu, Joseph L. Austerweil, sures of association for cross classifications III: Ap- Thomas L. Griffiths, and Terry Regier. 2016. The proximate sampling theory. Journal of the Ameri- Sapir–Whorf hypothesis and probabilistic inference: can Statistical Association, 58(302):310–364. Evidence from the domain of color. PloS one, 11(7):e0158725. Leo A. Goodman and William H. Kruskal. 1972. Mea- sures of association for cross classifications, IV: Harold C. Conklin. 1955. Hanunóo color categories. Simplification of asymptotic variances. Journal of Southwestern Journal of Anthropology, 11(4):339– the American Statistical Association, 67(338):415– 344. 421.

Richard S. Cook, Paul Kay, and Terry Regier. 2005. Isabelle Guyon, Jason Weston, Stephen Barnhill, and The World Color Survey database. In Handbook of Vladimir Vapnik. 2002. Gene selection for cancer categorization in cognitive science, pages 223–241. classification using support vector machines. Ma- Elsevier. chine Learning, 46(1):389–422.

T.D. Crawford. 1982. Defining “basic color term". An- C.L. Hardin. 2014. Berlin and Kay theory. Encyclope- thropological Linguistics, pages 338–343. dia of Color Science and Technology, pages 1–4.

2249 David G. Hays, Enid Margolis, Raoul Naroll, and Debi Roberson, Jules Davidoff, Ian R.L. Davies, and Dale Revere Perkins. 1972. Color term salience. Laura R. Shapiro. 2005. Color categories: Evidence American Anthropologist, 74(5):1107–1121. for the cultural relativity hypothesis. Cognitive Psy- chology, 50(4):378–411. Eleanor R. Heider. 1972. Universals in color naming and memory. Journal of Experimental Psychology, Debi Roberson, Hyensou Pak, and J. Richard Hanley. 93(1):10. 2008. Categorical perception of colour in the left and right visual field is verbally mediated: Evidence Kimberly A. Jameson. 2005. Culture and cognition: from Korean. Cognition, 107(2):752–762. What is universal about the representation of color experience? Journal of Cognition and Culture, Benoît Sagot. 2017. Extracting an Etymological 5(3):293–348. Database from Wiktionary. In Electronic Lexicogra- phy in the 21st century (eLex 2017), pages 716–728, David Kamholz, Jonathan Pool, and Susan Colowick. Leiden, Netherlands. 2014. PanLex: Building a resource for panlingual lexical translation. In Proceedings of the Ninth In- John Simpson, Edmund SC Weiner, et al. 1989. Oxford ternational Conference on Language Resources and English dictionary online. Oxford: Clarendon Press. Evaluation (LREC-2014). European Language Re- Retrieved March, 6. sources Association (ELRA). Anna Wierzbicka. 2006. The semantics of color. Paul Kay. 1975. Synchronic variability and diachronic Progress in Color Studies. Amsterdam/Philadelphia: change in basic color terms. Language in Society, John Benjamins Publishing Company, pages 1–24. 4(3):257–270. Jonathan Winawer, Nathan Witthoft, Michael C. Frank, Paul Kay and Chad K. McDaniel. 1978. The linguis- Lisa Wu, Alex R. Wade, and Lera Boroditsky. tic significance of the meanings of basic color terms. 2007. Russian reveal effects of language on Language, pages 610–646. color discrimination. Proceedings of the National Academy of Sciences of the United States of Amer- Taku Kudo. 2018. Subword regularization: Improving ica, 104(19):7780–7785. neural network translation models with multiple sub- word candidates. In Proceedings of the 56th Annual Winston Wu and David Yarowsky. 2018. Massively Meeting of the Association for Computational Lin- translingual compound analysis and translation dis- guistics (Volume 1: Long Papers), pages 66–75. As- covery. In Proceedings of the Eleventh International sociation for Computational Linguistics. Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language M Paul Lewis, Gary F Simons, and Charles D Fennig, Resources Association (ELRA). editors. 2015. Ethnologue: languages of the world, eighteenth edition. SIL International, Dallas. G. K. Zipf. 1932. Selected studies of the principle of relative frequency in language. Harvard Univ. Press, Vittorio Loreto, Animesh Mukherjee, and Francesca Oxford, England. Tria. 2012. On the origin of the of color names. Proceedings of the National Academy of Sci- George Kingsley Zipf. 1949. Human behavior and the ences, 109(18):6819–6824. principle of least effort. Addison-Wesley Press, Ox- ford, England. John A. Lucy. 1997. The linguistics of “color”. In Color categories in thought and language, chap- ter 15, pages 320–346. Cambridge University Press.

Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computa- tional Linguistics, 19(2):313–330.

Jean-Baptiste Michel, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, , Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden. 2011. Quantitative analysis of culture using millions of digitized books. Science, 331(6014):176–182.

Kimele Persaud and Pernille Hemmer. 2014. The influence of knowledge and expectations for color on episodic memory. In Proceedings of the An- nual Meeting of the Cognitive Science Society, vol- ume 36.

2250