Evolutionary Order of Basic Color Term Acquisition Not Recapitulated by English or

Somali Observers in Non-Lexical Hierarchical Sorting Task

Thesis

Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in

the Graduate School of The Ohio State University

By

Aimee Violette

Graduate Program in Vision Science

The Ohio State University

2019

Thesis Committee

Delwin T. Lindsey, PhD, Advisor

Angela M. Brown, PhD

Andy Hartwick, PhD

1

Copyrighted by

Aimee Violette

2019

2

Abstract

The connection between language and color has long been examined through studies of color naming. It is well-established that speakers of different languages have different numbers of basic color terms and that additional color terms are acquired in a predictable order. Berlin & Kay (1969) argued that color lexicons evolve over time, and that much of the diversity observed in the color lexicons languages of pre-industrial cultures occurs because these languages are at different stages of a highly constrained evolutionary sequence. The present study tests and extends a study by Boster (1986), who employed a non-lexical binary sorting task, in which English speakers sequentially divided a palette of 8 colors into 2, 3, …, 7, 8 piles. Boster claimed that the resulting progression of color sorting patterns mimicked the patterns of color term evolution proposed by Kay and McDaniel (1978). This claim suggests that this particular non- lexical representation of color in humans guides color term evolution and that this representation can be examined by color sorting. The purpose of the experiments described in this thesis was to test Boster’s claims.

In Study I, I re-analyzed Boster’s results, and found that, while general trends in the data were consistent with his hypothesis, no individual subject followed the expected order of color sorting. In Study II, English-speaking subjects sorted a palette of 30 simulated Munsell color samples on an iPad that were far more diverse in terms of hue, saturation and lightness than the one used by Boster. Study III repeated the iPad sorting

ii task using a palette of test colors consisting of 25 chromatic samples that spanned the color circle but were similar in saturation and lightness. Study III also explored cross- cultural differences in the mental representation of color by comparing color sorting in

English- and Somali-speaking subjects.

The results of Experiments II and III revealed that subjects did not create successive color categories that closely follow patterns of color term evolution proposed by Kay and McDaniel (1978), although there were common and principled patterns which individuals did follow. There was variation in the strategies adopted by same- language subjects and differences across languages. Differences in sorting strategies were present between sample color sets, suggesting task dependency. The results of my research do not support Boster’s view that color sorting taps a mental representation that guides the evolution of basic color categories that was described by Kay & McDaniel.

iii

Dedication

This thesis is dedicated to my parents, Sharon and Dave Violette, who have believed in me and ceaselessly supported my academic pursuits from day one. The

Reader Rabbit computer games they purchased for me in preschool really set a strong foundation. Carolyn Chakuroff was also an integral part of this experience. West coast

“writing workshops”, children’s cooking competition shows, and herb shopping made a stressful project into a surprisingly fun process that I will remember happily. My final dedication is to Jake Sander who kept me human and thankfully knew to solve my seemingly-insurmountable stress with a pile of blankets.

iv

Acknowledgments

This thesis would not have been possible without Del Lindsey and Angela

Brown’s passion to teach me about this whole new area of science. Furthermore, the colorful figures within this thesis would not be possible without Del’s Mathematica expertise. Thank you to Labs of Life at COSI and Care Point East for allowing us to recruit subjects.

v

Vita

2011………………………………. Pewaukee High School

2014………………………………. Biology, The University of Minnesota Twin Cities

2015 to present……………………. Doctor of Optometry student, College of Optometry,

The Ohio State University

2015 to present……………………. Candidate for Masters of Science in Vision Science,

College of Optometry, The Ohio State University

Fields of Study

Major Field: Vision Science

vi

Table of Contents

Abstract ...... ii Dedication ...... iv Acknowledgments...... v Vita ...... vi List of Tables ...... x List of Figures ...... xi Introduction ...... 1 Early study of language and perception ...... 2 The rise of Universalism ...... 5 Boster (1986) Study ...... 12 The World Color Survey and beyond ...... 15 New Interpretations of the Evolutionary Model ...... 17 Quantitative WCS Analysis ...... 21 Color Naming Motifs ...... 24 Color Term Evolution ...... 27 Hadza Color Naming ...... 28 Project Overview ...... 32 Study I: Retrospective Analysis of Boster (1986) ...... 34 Methods...... 34 Review of Boster’s Methods ...... 34 Analysis of Individuals’ Sorting Data...... 36 Results ...... 38 Prevalence of sorting patterns ...... 38 Similar to Expected Trajectory ...... 40 Lightness -based sorting strategy ...... 41 Other observed sorting strateies ...... 43 Study II: English-speaker 30-Color Sort...... 46 Methods...... 47 vii

Subjects ...... 47 Apparatus ...... 47 Procedure ...... 48 Stimuli ...... 49 Statistical Analysis ...... 50 Results ...... 51 Overall patterns ...... 51 Individuals’ sorting patterns ...... 53 Study III: 25-Color Sort ...... 57 Methods...... 58 Subjects ...... 58 Apparatus ...... 59 Procedure ...... 59 Stimuli ...... 60 Statistical Analysis ...... 61 Control experiment ...... 62 English-speaker Results ...... 63 Overall patterns of color category formation ...... 63 Sorting pattern prevalence ...... 67 Individual subject sorting examples...... 71 Somali Results ...... 74 Overall patterns of color category formation ...... 74 Sorting pattern prevalence ...... 77 Individual subject sorting examples...... 80 Control results ...... 82 Discussion ...... 90 Within-language variability and K&McD ...... 92 Cognitive Strategy in n-sorting ...... 95 Cross-language variability and Universalism ...... 95 Hadza ...... 97 Bibliography ...... 101 Appendix A. Calibration Tables ...... 105

viii

Appendix B. Subject information sheets ...... 108

ix

List of Tables

Table 1. RGB primary color calibrations...... 105

Table 2. 30-color palette calibrations...... 106

Table 3. 25-color palette calibrations ...... 107

x

List of Figures

Figure 1. Basic Color Term evolutionary order...... 7

Figure 2. Dani mili/mola consensus map...... 8

Figure 3. Seven-stage model of color term evolution...... 11

Figure 4. WCS color sample palette...... 17

Figure 5. Color category trajectories of WCS...... 18

Figure 6. The Main Line of color term development...... 19

Figure 7. WCS color centroid correspondence to English...... 22

Figure 8. WCS focal colors and corresponding English focal colors...... 23

Figure 9. WCS color naming motifs...... 26

Figure 10. Paired responses to color stimuli ...... 31

Figure 11. Munsell samples from Boster (1968) non-lexical binary sort...... 35

Figure 12. Data coding scheme in Boster (1986)...... 36

Figure 13. Reproducing subjects’ sorting piles from Boster (1986) coded data...... 37

Figure 14. Prevalence of sorting strategies in Boster (1986)...... 39

Figure 15. Individuals’ sorting patterns most similar to expected pattern...... 41

Figure 16. Lightness-based sorters from Boster (1986)...... 43

Figure 17. Other Boster subject sorting strategies...... 45

Figure 18. iPad game display during sorting procedure...... 49

Figure 19. Thirty sample color palette used in Study II...... 50

xi

Figure 20.Results of cluster analysis on 30-color data...... 52

Figure 21. Representative examples of subjects adopting achromatic sorting strategy. .. 55

Figure 22. Representative examples of subjects who adopted a hue-based sorting pattern.

...... 56

Figure 23. iPad display showing 25-color palette...... 60

Figure 24. 25-color sort sample palette...... 61

Figure 25. Results of cluster analysis of US 25-color sorting categories...... 66

Figure 26. Diversity of US subject color sorting data...... 70

Figure 27. English-speaking subjects’ sorts closest to K&McD’s predicted...... 72

Figure 28. English-speaking subjects adopting lightness-based strategy...... 73

Figure 29. Cluster analysis of Somali 25-color sorting categories...... 76

Figure 30. Diversity of Somali color sorting data...... 79

Figure 31. Somali subject adopting lightness-based strategy...... 80

Figure 32. Somali subjects example of “wildcard” sort...... 81

Figure 33. Control group vs experimental group 25-color sort...... 83

Figure 34. Color sorting pattern predicted by K&McD (1978)...... 84

Figure 35. Phase plots comparing control and main experiment sort pattern...... 86

Figure 36. Pile size histograms...... 89

xii

Introduction

At its most basic level, the language of a culture serves to break down human experience into systems of semantic categories. Words function as labels to effectively communicate experience to others within the same culture and linguistic community.

Cross-language variations can be immense, from the array of nouns and adjectives available to describe different phenomena, to the details of different verb tenses required to fully convey a message. In attempts to explain why this diversity exists, two main hypotheses regarding the intersection of thought and language have arisen: Linguistic

Relativity and Universalism.

Linguistic Relativity, also commonly referred to as the Sapir-Whorf hypothesis, posits that variations in language structures develop arbitrarily and influence perception of the world for those who speak them. Along this line of thought, speakers in cultures that use directional terms such as ‘north’ and ‘south’ in the place of egocentric terms such as ‘left’ and ‘right’ can effortlessly point in any cardinal direction because cardinal directions are central to the speaker’s spatial perception of the world (Boroditsky 2003).

Speakers abiding by self-centric terms may need to wrack their brains to spatially align themselves in this same way. The nuances of language semantics influence the speaker’s view. On the other hand, proponents of Universalism also believe that culture influences semantic representations of the world, but unlike the Relativitists, believe that the number of possible sematic variations is limited by mental representations that guide

1 lexicon development. Universalists greatly downplay the role of language in determining an individual’s world view. For well over a century and a half (Berlin & Kay 1969), color lexicons have been major points of interest in the study of cross-cultural differences in . That’s because color is easily quantifiable and, despite the fact that the color spectrum is continuous, it has been partitioned into lexical categories by nearly all cultures studied thus far. Moreover, studies have revealed striking diversity in how humans lexically partition colors seen in their environment.

Early study of language and perception

When color was first being studied as an intersection of language and perception, it was believed that development of physiological mechanisms used to perceive color was the main factor influencing diversity of color lexicons. Gladstone, a scholar of Homeric texts, noted a lack of certainty and consistency in the use of color terms by the Ancient

Greeks he studied. Terms revealing a clear distinction between lightness and darkness were consistently deployed within these texts, but terms describing hue were not used in accurate or consistent ways. He interpreted this less developed color lexicon as a less developed ability to perceive color, attributing it to physiological differences in the visual system (1858) due to human sensory evolution since the time of the Ancient Greeks. This was expounded upon further by Geiger (1880) who was the first to demonstrate a sequence of color term acquisition over the course of a language’s history through analysis of ancient texts. He proposed that the evolution of visual system physiology over thousands of years accounted for the cultural adoption of more color terms from the 2

Homeric lexicon to modern languages. In this theory, Geiger held that the visual system, after being sensitized to differences in light and dark, was sensitized to long-wavelength hues such as red, followed by sequentially shorter-wavelengths such as blue. This explanation was used to rationalize the early acquisition of a term for red in ancient languages, followed by lexical additions corresponding to yellow, green, and blue.

Dissenters argued that the paucity of well-established color terms in ancient languages and some modern languages did not result from a lack of perceptual ability.

While Homeric writing did not utilize abstract color terms to describe objects in a literal sense, Grant Allen (1879) noted that ‘secondary color terms’ were regularly used to represent subtle details of the visual world with poetic intent. For example, one may be described as ‘green in the face’ if they look fearful. Similarly, in calling a tree a ‘red oak’ one is not claiming it is overtly red, so much as appreciating the fine distinction from the color of the ‘black oak’. Allen suggested that the inconsistent usages of different color words were misconstrued by Gladstone, and the ancient languages did in fact have an established set of color terms which they used to describe fine distinctions between objects. Hugo Magnus (1880), however, was the first to recognize and argue that perceptual ability and color lexicon complexity didn’t need to go hand in hand: philological evolution could exist independent of physiological evolution. He employed a color discrimination test for speakers of different languages around the world and found that no tribe lacked perception of any ‘main colors’, while lexicon varied considerably.

Linguistic Relativity, or the Sapir-Whorf hypothesis, was the next theory to rise to prominence. Two main hypotheses lay at the heart of Linguistic Relativity: (I) structural

3 differences in languages will be largely paralleled by non-linguistic cognitive differences between speakers of two languages, and (II) the structure of one’s native language influences that individual’s world view. Some added to these main tenets of Linguistic

Relativity to hypothesize a third: (III) language systems vary without constraint (Brown

1976). Because II was difficult to test empirically, most hypothesis testing involved I and

III, and most frequently involved color.

Brown and Lenneberg (1954) argued that hypothesis I within the domain of color involves evaluating a correlation between a linguistic cognitive variable of colors

(‘codability’ or ‘communication accuracy’) and a nonlinguistic cognitive variable of a color (‘memorability’) within a given language. They therefore determined ‘codability’ of a color sample based on a combination of subject agreement in color chip naming, the length of the name given, and response latency in naming the color. The subject’s ability to remember and identify the color was then tested. Brown and Lenneberg found that codability and memorability were indeed highly correlated, a result that they claimed was consistent with Linguistic Relativity. Lantz and Stefflre (1964) performed a similar experiment that focused on subjects’ ability to accurately communicate about a color, rather than the color’s codability. This study found a correlation between subjects’ ability to accurately communicate about color and their ability to remember and recognize that color.

Those examining hypothesis III sought to describe and compare the semantics of different color lexicons to determine if they did, indeed, vary without constraint. In his study of the Native American color lexicon, Ray (1952) concluded that “there is no such

4 thing as natural division of the spectrum. Each culture has taken the continuum and has divided it upon a basis which is quite arbitrary.”

The rise of Universalism

The linguistic relativity thesis, which generally prevailed at this point, was challenged by Berlin and Kay (1969), whose skepticism was founded on the observation that color terms translate too well between pairs of unrelated language for divisions to be completely arbitrary. They studied the color naming diversity among 98 different languages, specifically looking to describe the “basic color terms” (BCT) used in each.

Of these languages, B&K collected data for 20, and studied color term usage in texts such as dictionaries for 78. BCTs were required to meet certain criteria. Terms had to be monolexemic, meaning they could not be predictable by the sum of their parts – disqualifying ‘Blue-green’ or ‘lemon-colored’. Terms could not describe a range of colors encompassed completely within any other term, eliminating terms such as ‘scarlet’ which falls squarely within the ‘red’ category. Their use could not be restricted to describing a narrow range of objects (like ‘blonde’), and they needed to be psychologically salient for all speakers of the language.

For 20 (of the 98) languages examined, color naming data were collected experimentally from native speakers, many of whom also spoke English. First, the basic color terms of the speaker’s language were directly elicited from the subject. Second, the subject was shown a color chart of 329 Munsell color chips spanning the hue and lightness dimensions of color space and asked to locate in this chart the best example(s) 5 of each BCT they named. Finally, subjects were asked to map out the boundaries of each named basic color category on a color chart. For the remaining 78 languages, the color lexicon’s basic terms were investigated through examination of ’s literature on color terms, including sources such as dictionaries.

Rather than finding unconstrained variations between color lexicons, as predicted by linguistic relativity, Berlin & Kay uncovered cross-cultural patterns that supported the theory of Universalism through two pivotal findings: 1) A universal inventory of eleven basic color categories exists, namely white, black, red, green, yellow, blue, brown, purple, pink, orange, and grey. Different languages may encode different numbers of these categories. 2) A language with less than eleven basic color categories has limitations on which universal categories it may encode, revealing a relatively fixed sequence of evolutionary stages that occur as a language’s BCT expand. For example, a language must contain categories corresponding to black and white if it is to contain a term for red. Furthermore, it must contain black, white, and red categories to contain a term for either yellow or green, and so forth. This described the order in which languages adopt new basic color terms as their color lexicons grow, and was therefore interpreted as a sequence of evolutionary stages (Figure 1).

6

푝푢푟푝푙푒 푤ℎ𝑖푡푒 [𝑔푟푒푒푛] → [푦푒푙푙표푤] ⋱ 푝𝑖푛푘 [ ] → [푟푒푑] → [푏푙푢푒] → [푏푟표푤푛] → [ ] 푏푙푎푐푘 [푦푒푙푙표푤] → [𝑔푟푒푒푛] ⋰ 표푟푎푛𝑔푒 𝑔푟푒푦

Figure 1. Basic Color Term evolutionary order. First, all languages have basic color terms for white and black. The next BCT acquired will be red, followed by green then yellow, or yellow then green. Next comes blue, then brown, then purple, pink, orange, and grey in no defined order. (Berlin and Kay 1969)

B&K’s findings regarding the universality and evolution of basic color terms rejected the arbitrariness of linguistic relativity and legitimized Universalism in color research. To this day, this research and the studies following it have provided some of the most compelling arguments for the case of linguistic universalism in the realm of color.

A study that followed soon after by Heider and Olivier (1972) clarified Stage I language color category boundaries and focal colors, and corrected some misconceptions of B&K.

B&K described Stage I languages as containing two words for color- Black and White – with a lightness-based divide between categories. Heider and Olivier studied Dani, a

Stage I language spoken by highlanders in New Guinea, which uses the terms ‘mili’ and

‘mola’ to describe the black and white categories, respectively. They found that what

B&K portrayed as a lightness-based division between ‘Black’ and ‘White’ for Stage I languages, upon more rigorous examination of the Dani, proved to be a division between

‘warm-light’ and ‘cool-dark’ (Figure 2). Interestingly, rather than a white hue being the consistent focal color chosen for the ‘mola’ or ‘White’ category, 69% of informants focused the category at a red (Heider 1972). This finding also reinforced that Stage I

7 languages’ two BCT categories aren’t contingent upon a dichotomy of light and dark, but upon the multifactorial sensation of ’warm-light’ vs ‘cool-dark.

a)

b)

Figure 2. Dani mili/mola consensus map. a) color space map of 330 WCS Munsell samples (see text for description of WCS), b) contour plot of color naming consensus, displayed in steps of 0.1. That is, whitest area encloses colors called mola by all subjects; the outer boundary of the next grey level bounds corresponding Munsell colors named mola with consensus >= 0.9, then >= 0.8, etc.. See main text for more details. (Heider and Olivier 1972).

Recognizing that BCT development was not about the acquisition of new foci but rather about the differentiation of color categories, Kay and McDaniel (1978; K&McD) defined the series of subsequent divisions in color space corresponding to B&K’s evolutionary order. K&McD produced the seven stages model of color term evolution

(K&McD 1978), which acknowledges the shifting partitions of color space as new basic terms arise within a language (Figure 3). Central to this model was the recognition of the 8 crucial role played by Hering’s elemental color sensations in the lexical partition of color and the establishment of ‘fuzzy sets’ in color category membership. Within the seven stages model, every color category at each BCT evolutionary stage is described with respect to the fundamental response systems of the human visual system: the six Hering elemental color sensations.

Hering described the six elemental color sensations – red, green, yellow, blue, white, and black – as neurally graded responses within the retina which interact to create the perception of color. An interaction between different photoreceptors and ganglion cells create two color opponent channels—red-green and blue-yellow—and a black-white channel. According to Hering, the hue appearance of a color is dependent on the stimulation of one unitary color sensation corresponding to red, green, yellow, or blue, or as a combination of two non-opponent sensations. Examples of non-opponent sensations include orange (red-and-yellow) and purple (blue-and-red). Antagonistic sensations such as green-and-red and blue-and-yellow are not possible within this theory of color appearance- any hue containing a red element cannot contain a green element, nor can a hue containing a blue element contain a yellow element. Empirical studies and quantitative models which featured observed judging whether a given color was reddish vs. greenish and bluish vs. yellowish ushered Hering’s theory of color appearance into the modern era (Hurvich & Jameson 1957; Krantz, 1975). K&McD judged these Hering elemental sensations to be especially perceptually salient; hence, they proposed their role in guiding the early stages of color term evolution.

9

The Hering colors were applied in this model through their unique hue identity, as a fuzzy union, or as a fuzzy intersection of two elemental sensations. Fuzzy sets such as these recognize degrees of membership in different categories versus forcing discrete, all- or-none categorical memberships. Analyzing B&K’s data on individuals’ chosen focal colors and boundaries for each basic color category, McDaniel (1972) found that degrees of category membership for one focal color eventually reached zero at the focal color of adjacent color categories. Basic color categories have boundaries containing both focal member and non-focal member colors. Non-focal colors aren’t solely contained in one category, but within the boundaries of adjacent color categories simultaneously. In the case of K&McD’s model, ‘Fuzzy union’ categories (denoted by or in Figure 3) precede the naming of each individual elemental color and contain multiple elemental colors within their boundaries. Consequently, these ‘fuzzy unions’ contain multiple potential focal points and areas of poor categorical membership are less frequent. This was witnessed in the case of Dani color categories mili and mola (Heider and Olivier 1972).

‘Fuzzy intersection’ categories (denoted by ‘+’ in Figure 3), on the other hand, are formed after each elemental color is named. They arise from the low consensus border areas where a given hue has an equally poor degree of membership in two adjacent color categories. Once a new basic color term fills this niche, the same decline in consensus will occur outward from the focal hue.

10

Figure 3. Seven-stage model of color term evolution. Distribution of Hering primaries between color categories at different B&K stages of lexical evolution also shown. A fuzzy set contains multiple primaries and is indicated by an ‘or’, while a fuzzy union is a combination of primaries which results in a new BCT hue, and is denoted by a ‘+’. (from Kay and McDaniel 1978).

11

Boster (1986) Study

With these universal patterns discovered and characterized by B&K and K&McD,

James Boster’s 1986 work probed for the mechanisms that might make this cross-cultural consistency in lexical categories possible. Boster hypothesized that the trajectory of color term evolution in world languages was possible because of a universal set of cognitive strategies used by members of language communities for categorizing universal perceptual responses to color. Boster believed that these cognitive strategies would be mirrored in a sequential color sorting procedure, in which subjects divide colors first in to two piles, then three piles, then four piles, etc. I will refer to these divisions as n-sorts, where n = 1...N, where n is referred to as the sort level and N is equal to the total number of samples in the stimulus set. Color sorting using colored stimuli is said to be a “non- lexical” color categorization task because subjects are asked to categorize colors according to their similarities to one another without actually naming the category to which the piles of color samples belong. Indeed, modern English speakers probably do not have simple words to name many of the color categories they create when asked to produce an n-sort, especially when n is less than five. Thus, Boster proposed that color sorting can be used to tap cognitive processes that may have been present in humans long before their respective color lexicons developed to their present forms.

Boster’s choice of a sorting task to test for universal cognitive strategies in color lexicon evolution was based on three assumptions: (1) between-language variations in similarity-based color sorting tasks would mirror within-language variations; (2)

12 individuals’ internal color hierarchies would match the successive divisions made by entire communities; and (3) individuals would treat color foci in similar ways, no-matter the developmental stage of their color lexicon. According to this view, subjects’ sequential sorts should recapitulate the evolutionary order in which BCTs are acquired, as proposed by Kay and colleagues.

In his study, Boster actually employed both lexical and non-lexical binary sort procedures involving 8 colors- white, black, red, orange, yellow, green, blue, and purple.

In the lexical sort, subjects sorted color names without seeing any colored samples, while in the non-lexical sort, subjects partitioned a set of eight Munsell color samples into different numbers of piles. The non-lexical sort was designed to isolate the perceptual similarities of colors, while minimizing or eliminating the influence of confounding ideas associated with color names. The lexical and non-lexical sorting tasks were performed by separate groups of subjects.

In the non-lexical sort procedure, which is the focus of the present study (in

Boster’s study, the name sorting task produced the same results), subjects were asked to perform a sequential binary n-sort on eight color samples selected from the Munsell Book of Color. These colors were selected as good examples of black, white, red, orange, yellow, green, blue, and purple and can be viewed below in Study I, Figure 11. Boster’s subjects performed a 2-sort first; i.e., they sorted the eight colors into two “piles” (or groups) based on the perceived similarity of the colors to one another. Next, one of the piles (chosen at the subject’s discretion) was divided into two piles based on perceived color similarities and the requirement that the number of piles must increase by one. This

13 resulted in a 3-sort. This procedure– creating an n-sort by dividing an existing pile of colors into two new piles – was repeated until every color had been isolated in its own category (an 8-sort).

In order to compare subjects’ n-sorts with predictions based on K&McD, Boster created an 8 x 8 similarity matrix from his data and a second matrix from the K&McD model predictions. Each (i,j) entry in these matrices corresponded to the number of sort levels (1…8) until a color sample specified by the ith row of the matrix was assigned to a pile that differed from that of a color specified by the jth matrix column. Thus, each color has a similarity value of 8.0, when compared to itself (diagonal entries in the matrix; i=j), because it always stayed in the same pile as itself throughout the sorting procedure. On the other hand, colors that were separated during the first sort (i.e., a 2- sort) were assigned similarities of 1.0. More generally, colors that remained in the same pile until the nth sort, were assigned similarities of n. Boster then performed a correlation analysis to determine how well the model matrix based on K&McD accounted for observers’ average similarities among colors, as represented by the similarity matrix derived from the sorting data.

Ultimately, Boster found that subjects overall sorting of colors agreed well with

K&McD’s model predictions. Crucially, in Boster’s analysis, as summarized above, individual variations in color sorting were treated as noisy perturbations around the more important central tendency of correspondence between data and model similarity matrices. From these results, Boster concluded that, if faced with the appropriate task, individuals will recapitulate the successive color category partitions made by entire

14 cultures. Details of the binary sort procedure and data analysis are elaborated upon in

Study I- Methods section of this paper.

Color sorting has been used to investigate other aspects of categorization linked to the study of color perception and language. For example, Bonnardel (2006) has used constrained- and free- sorting protocols to study implicit color categories in red/green color deficient subjects. Roberson, et al. (2005) used a free-sorting paradigm to test for universal tendencies in color categorization in a diverse group of industrialized and preindustrial cultures. However, to my knowledge, Boster (1986) is the only previous study to employ a color sorting paradigm to formally test a model of color term evolution.

The World Color Survey and beyond

Since the time of Boster’s study, much more has been learned about color term evolution, which suggests that the original formulation by B&K and K&McD is probably overly simplistic. Almost all progress on this front is due to a massive data set of color naming in preindustrial societies called The World Color Survey (WCS). The WCS data were collected by Kay, Berlin, Maffi, Merrifield and Cook (who published a WCS monograph in 2009) in collaboration with the Summer Institute of , beginning in 1976 and continuing throughout the late 1970s. The WCS was designed to address a number of weaknesses in B&K’s study, such as the inclusion of too few experimental languages, too few informants per language, and poor representation of diverse languages in low-technology cultures with unwritten languages. It was also of concern that all of

15

B&K’s BCT informants were bilingual and spoke English in addition to the experimental language, and that all informants lived in the San Francisco area rather than in the homeland of the experimental language. To address these concerns, the WCS attained a mean sample size of 24 speakers per language for 110 unwritten languages from 45 different language families, rather than relying on data from one speaker to characterize the entire culture’s BCT. Additionally, the WCS subjects representing each language were native speakers of the mostly unwritten languages and spoke no English. Testing was performed by fieldworkers (typically, Christian missionaries) in each language’s homeland rather than in the United States. These modifications addressed concerns about the influence of English on the bilingual subjects of B&K.

B&K were also criticized for their color naming protocol. In that study, B&K first derived a list of BCTs from informants and then asked them to map the boundaries of color categories corresponding to the BCTs. Critics argued that this methodology lacked objectivity. To improve upon the color-mapping protocol, the WCS protocol called for subjects to provide a monolexemic name for each of 330 Munsell color samples in the

WCS test chart (Figure 4). The examiner then assessed a language’s BCTs based on the color naming data for all the speakers of that language, in addition to eliciting “focal colors”, or best-representative color samples, for each color term. Beyond eliciting color terms in a more natural way and allowing more objective analysis, procedural aspects of the WCS methodology allowed for a more exhaustive characterization of how different cultures linguistically divide color space. The new data collected in the WCS allowed for

16 new interpretations of the evolutionary model, and new statistical analyses of old principles.

Figure 4. WCS color sample palette. 330 Munsell color samples varying in dimensions of hue, lightness, and saturation were used in the WCS.

New Interpretations of the Evolutionary Model

Refining the evolutionary model beyond the ordering of color term acquisition

(B&K 1969) or descriptions of the fuzzy sets comprising these categories (K&McD

1978), Kay, Berlin, Maffi, and Merrifield (1997) processed WCS data to model all the possible trajectories that languages followed throughout the first five stages of color term evolution, as each elemental color gains its own BCT (Figure 5). This model did not focus solely on the acquisition of new BCTs, but also considered the successive divisions of color space to which new BCT acquisition corresponded. Kay et al. observed five different trajectories leading up to the stage V language, though they did not occur in the data with equal frequency. These trajectories varied in their divisions at Stage III and IV on the way to Stage V.

17

Figure 5. Color category trajectories of WCS. Categorization of Hering primary colors as observed in WCS results. KBMM observed three ways that component primary colors of ‘fuzzy sets’ were divided among languages with four or five basic color terms. The trajectory following IIIBk/G/Bu and IVG/Bu corresponded to K&McD’s predicted trajectory. (from KBMM 1999)

18

Furthermore, Kay and Maffi (K&M; 1999) described the Main Line of Basic

Color Term Evolution (Figure 6.), noting that 91 of 110 (83%) languages surveyed by the

WCS followed the same pattern of color category divisions from Stage I to Stage V.

Additionally, K&M described an ordered application of four different principles that could account for the subsequent color space divisions consistent with the Main Line model.

Figure 6. The Main Line of color term development. Each of the color categories enclosed in brackets references a particular stage according to Kay and Maffi. The most common trajectory present within WCS data represents 91 languages, or 83% of the WCS languages. (from Kay and Maffi 1999).

These principles were derived from tendencies they noted in linguistic behavior and color interpretation. The foremost principle to influence categorical divisions was

Partition, which describes the tendency of languages to exhaustively name color space regardless of the number of names used. The Partition principle can be noted in other examples of systematic naming besides color, such as anatomical designations, kin relations, seasons, and days of the week. As cultural shifts occur and the salience of colors increases within a society, color space will continue to be partitioned into

19 increasingly specific categories to serve greater communicative functions. In application to the Main Line, Partition simply guides the formation of a new category within color space.

The next three principles are based on color appearance. They guide how new partitions are formed based on the comparative salience of different color features; specifically, hue opponency and the opposing achromatic sensations of black and white.

The first of these principles to guide color term development is the distinction between

(1) Black and White [Bk&W] , the second relies on the distinction widely made between

(2) Warm and Cool [Wa&C], and the third involves the saliency of (3) Red [Red].

When applying these principles to account for the main line of color term evolution,

Partition is applied at each step along the way to form a new category, followed by a color appearance-based principle, which together guide the nature of that partitioning.

In Stage I, [Bk&W] dictates that black and white are placed in separate categories, and [Wa&C] dictates red and yellow will be in a separate category from green and blue. Yellow is an inherently light color and may not be recognized as yellow if the lightness is low, causing an association between warm colors and white, and inversely of dark and cool colors. When moving to a stage II language, principles (1), [Bk&W] and

(2), [Wa&C] do not give preference to a division in either category, while (3), [Red] influences R/Y to split from W. Stage III(G/Bu) is primarily influenced by principle (1), causing Bk to separate from G/Bu, as W has already become its own category. When

Stage III(G/Bu) evolves into Stage IV(G/Bu), principle (1) does not apply and (2) does not help prioritize divisions in either category, so principle (3) causes the distinction of R

20 from Y. For the final evolution to a Stage V color lexicon, a general application of

Partition is the only principle necessary to distinguish G and Bu from their composite category.

While Kay & Maffi’s analysis of the WCS data seems to confirm B&K’s original view that the trajectory of color term evolution is highly constrained, almost 20% of the languages examined nonetheless do not fit the “Main Line” evolutionary trajectory.

Moreover, the Kay & Maffi analysis is largely subjective – quantitative criteria were not applied to the WCS data – and subsequent quantitative analyses of these data based on cluster analysis (see discussion below) reveal a much more complex picture of color term evolution than the one promoted by Kay & Maffi.

Quantitative WCS Analysis

One analysis of WCS data (Kay and Regier 2003) compared the centroids of color categories across industrialized and non-industrialized languages. Color centroids were determined by averaging the color space coordinates of Munsell samples named with a given term, then finding the Munsell sample closest to the average coordinates. This analysis determined that, between all WCS languages, there is greater clustering of color centroids within the color diagram than would be expected by chance. Additionally, this analysis found a greater-than-chance similarity between the color category centroids of non-industrialized languages (WCS data) and the centroids of speakers from industrialized societies (B&K data). English color terms fell very near the peaks of each

WCS centroid (Figure 7), with an exception of an extra centroid at the intersection of

21 green and blue. This intersection represents a composite color category which is consistent with a color naming pattern that many WCS speakers utilize. Despite the difference in the number of color terms present in different languages, Kay and Regier’s analysis revealed a cross-cultural tendency for named color categories to cluster at certain locations in color space. These locations also correspond closely to English centroids.

Figure 7. WCS color centroid correspondence to English. WCS color centroid clusters show the distribution of individuals’ centroids. Outer rings contain 100 centroids, and each inset ring represents an increase of 100 centroids in that region. English color term centroids fall near these peaks. (Kay and Regier 2003)

A similar study involving focal colors of WCS languages rather than centroids provided similar evidence (Figure 8). Focal colors were selected by subjects as a color sample which was most representative of a given color term. Non-industrialized language foci for Red, Yellow, Green, Blue, White, and Black clustered near focal colors selected by English-speaking subjects. Furthermore, focal colors clustered even more tightly than centroids, supporting the importance of universal color foci as the source of universal 22 color naming tendencies (Regier, Kay, Cook 2005). This put emphasis on focal colors of categories, rather than than the boundaries between them, as their defining characteristic.

Figure 8. WCS focal colors and corresponding English focal colors. WCS subjects indicated best example colors. These data were used to construct focal color clusters. Outer boundaries of clusters indicate an area of 100 subjects’ best example colors, and each inset contour contains 100 more. Contour plots corresponded well to English focal color locations. (Regier, Kay, and Cook 2005)

While the analysis of focal colors and color category centroids supports the original B&K view of a “universal inventory” of BCTs, it sheds little light on the fundamental question of how color lexicons might evolve. A line of research by Lindsey and Brown (2006; 2009) was designed to address this issue. Looking beyond focal colors and centroids of color categories, Lindsey and Brown (2006) first devised a methodology for analyzing the patterns present in the WCS data. They analyzed color categories for chromatic colors, but not achromatic terms. This analysis continued to highlight similarities between WCS languages and English, revealing that average WCS chromatic color naming clusters glossed mostly to English patterns. Lindsey & Brown’s inventory of universal categories however, diverged from the one suggested by Kay and colleagues:

23

1) ‘yellow’ and ‘orange’ did not exist as separate categories in the WCS and 2) in addition to ‘blue’ and ‘green’ categories, analysis revealed a non-English category ‘green and blue’, known as ‘grue’ in the literature. Concordance analyses determined that, among WCS language speakers, color space corresponding to the Hering primaries had especially high concordance across languages, and that boundary regions had much lower concordance. Concordance was especially low at the boundary between ‘Warm’ and

‘Cool’, suggesting that Kay and Maffi’s principle (2) [Wa & C] applies to the WCS data.

These findings, like K&McD’s findings, supported the special salience of the Hering primaries as focal colors, except for ‘blue’, and suggested a fundamental cognitive distinction across the warm/cool divide which defines Stage I languages such as Dani.

Color Naming Motifs

Armed with a quantitatively-derived universal glossary for cross-language comparisons of WCS color lexicons, Lindsey & Brown (2009) performed a second cluster analysis, in which they compared the color naming systems of all the WCS informants (Figure 9) . Lindsey & Brown found that the color lexicons of speakers across the world could be classified according to three to six universal color naming systems, which they called ‘motifs’, rather than Stages. Whereas in Kay and colleagues’ view, each Stage is characterized by more or less fixed subset of basic color categories, each motif in Lindsey and Brown’s formulation is organized around a dominant color naming trait, even though informants assigned to a particular Motif may otherwise show considerable variability in color naming. The differences in motifs primarily center on

24 the lexical treatment of dark/cool colors. The ‘Dark’ motif is indiscriminate between cool and dark colors, much like a B&K Stage II language, whereas the ‘Grey’ motif involves a separate term for ‘black’ which lexically divides greys and cools from the darkest darks.

The ‘Grey’ motif roughly emulates a Kay and Maffi Stage IIIa language. In another common motif, ‘Grue’, speakers do not distinguish between ‘blue’ and ‘green’ (B&K

Stage IV, or Kay & Maffi Stage IVa), and those under the ‘GBP’ motif have distinct terms for ‘blue’, ‘green’, and ‘purple’ which corresponds roughly to a B&K Stage VI language.

Interestingly, the same motifs could be found in completely unrelated languages from completely different parts of the world, while multiple motifs frequently appeared within one population that spoke same language. The presence of the same motifs in very distinct languages supports the universality of the forces influencing color lexicon development. Because of each motif’s correlation to a different evolutionary stage of color naming, it is logical to suppose that the presence of multiple motifs within a language may be a signal of an actively evolving color lexicon as Lindsey & Brown suggested (2009). Because many WCS languages are in early stages of color term evolution and have room to develop new color terms, it is no surprise that there would be great variability within the motifs of these languages.

25

Figure 9. WCS color naming motifs. a) Dark motif subjects did not distinguish between Black, Grey, or cool colors in their naming. Corresponds to a B&K Stage II language, B) Grey motif distinguishes Black from Grey, but groups cool colors with greys. Corresponds to Stage IIIa language. C) Grue motif does not distinguish green from blue. Corresponds to stage IV language. D) GBP motif has a purple category. Corresponds to Stage VI language. Multiple motifs may be present in the same language. (Lindsey and Brown 2009).

Crucially, Lindsey & Brown’s (2009) analyses revealed tremendous variability in color naming system within as well as across languages. The within-language variability was totally at odds with Kay & colleagues’ view that color term evolution within a language community proceeds in an orderly fashion from one high-consensus, monolithic color naming system to another. Moreover, the motifs analysis also revealed that the diversity in informants’ color naming could not be considered as “noise” – i.e., random variations in naming around a single central color naming system. Instead, within- language diversity mirrored across-language variability in that in both cases informants’

26 color naming systems could be classified according to a fixed set of universal patterns of color naming called “Motifs”.

Color Term Evolution

How, then, do color naming systems evolve? According to Kay & colleagues, color lexicons follow a constrained trajectory of color term acquisition that is characterized by transitions from one stable high-consensus lexical state to the next. The impetus for change, according to Kay & colleagues, is technological advances within a culture that create greater need to distinguish lexically among human artifacts of different colors that are byproducts of technology. Lindsey & Brown view the process of color term evolution as analogous to biological evolution. That is, at any given time, many color idiolects (individual color lexicons) are represented within the language community. What changes as color lexicons evolve is the relative prevalence of these different idiolects. In this view, the color lexicon within a language community is always somewhat diverse, and color term evolution is messy. Like Kay & colleagues, Lindsey &

Brown cite technological advance as the primary impetus for color term evolution, but

Lindsey & Brown emphasize that its influences are highly dependent upon cultural context.

Lindsey, Brown & Isse (2016) went on to classify Somali color naming motifs through a prospective study. Like the results of WCS retrospective analysis, different motifs within this culture vary primarily in their treatment of cool/dark colors. Somali subjects showed diverse color naming behaviors. Each of the ‘Blue-green’ (or ‘BGP’),

27

‘Grue’, ‘Grey’, and ‘Dark’ motifs was represented by at least one speaker within the sample population, although ‘Grey’ and ‘Dark’ glossed into a combined ‘Neutral’ motif.

Some investigators have proposed this less discriminate treatment of cool colors

(specifically blues) within cultures who live near the equator (such as the Somali) may be due to acquired Type III color vision defects caused by increased UV-B exposure to the lens of the eye earlier in life (Bornstein, 1973; Lindsey & Brown, 2002; Ratliff, 1976).

Lindsey, Brown & Isse compared D-15 and F-100 color vision test scores to motif membership on a group and individual level. The Somali group showed an overall depression in color vision test scores compared to those of their English-speaking counterparts, and the Somali speakers as a group had a higher incidence of

Grue/Grey/Dark motifs. However, an individual’s color test score was not related to his motif usage. Color vision deficiencies might account for differences in motif-usage frequency on a group level, but not on an individual basis.

Hadza Color Naming

One prediction of the Kay & colleagues’ perspective on color term evolution is that the color lexicon present at any given stage should be stable and high-consensus.

Moreover, though not previously mentioned here, they also argue that these color lexicons are complete; that is, they can be used within the culture to communicate about any color. Lindsey & Brown (2015) tested these ideas by studying color naming among the Hadza, a group of nomadic hunter-gatherers living a subsistence lifestyle in a remote region of Tanzania. Anthropologists generally regard the Hadza as the best living

28 example of how humans probably lived prior to the developments of agriculture and animal husbandry (Marlowe 2010). Thus, one would expect the Hadza color lexicon to be near the earliest stage of color term evolution.

This study utilized a palette of 23 colors selected from the WCS palette and a similar protocol to the WCS, with one key distinction: “don’t know” (DK) was allowed as a response. Lindsey et al. also used the same palette to test groups of Somali and

English speakers. These languages are representative of “moderately”- and “highly”- evolved color lexicons. Consensus between speakers of the same language was judged through pairwise comparisons of the responses of two speakers. Consensus was quantified by the fraction of informant pairs who either both used the same term to describe a sample, or used different terms to describe a sample, or used at least one DK within the pair to describe the sample (Figure 10).

Lindsey et al. found that the Hadza named only White, Red, and Black with perfect consensus, while there was only 20% consensus (at most) for other terms used by the Hadza subjects. Overall, DK was the most common response. Some of the Hadzane terms of lower consensus were loanwords, which are terms adopted from other languages without great modification. Somali speakers exhibited 100% consensus only with their naming of ‘White’, but they exhibited greater than 40% consensus for nine other stimuli.

In contrast, English speakers named ten stimuli with 100% consensus. There was also greatest rate of DK responses for Hadza speakers (56.6%), followed by Somali (12.8%), and English speakers (0.56%). A striking feature of the Hadza data was that although their color naming was very low in consensus (many different words for the same colors

29 and very high prevalence of DK), a statistical analysis of patterns of Hadza color naming revealed that the low-consensus color terms grouped test colors into categories that often

(>75%) fell wholly within the universal set of color categories found in the WCS. Thus, even though Hadza color naming subjectively appeared haphazard and idiosyncratic when viewed from the perspective of consensus within the Hadzane community, the

Hadza usage of their color terms was not haphazard and idiosyncratic when viewed from the perspective of universal patterns of lexical grouping. Indeed, while no individual

Hadza informant had terms for all the WCS categories, the complete list of these categories was represented in the collective responses of the informants.

This snapshot of color term use in diverse color lexicons demonstrated the process of basic color term acquisition; upon initial introduction, color terms will have low consensus and gradually gain consensus within a language through use by early-adopters.

The driving force in the introduction and widespread use and consensus of new color terms is an increased need to communicate properties of objects more clearly, essentially whittling down the breadth of the “don’t know” response.

30

Figure 10. Paired responses to color stimuli Blue indicates both used same Non-DK term, Green indicated subjects used different non-DK terms. Orange indicates at least one DK between the pair. Hadzane used DK to describe non-WBR terms at a rate (56.6%) greater than Somali (12.8%) or English (0.56%) speaking subjects. (from Lindsey & Brown 2015)

31

Project Overview

B&K’s order of basic color term evolution is ubiquitous in cultures across the world, and basic color terms across diverse languages have been determined to correspond to consistent portions of color space. Given this evidence for universality of color lexicon development, one may expect that individuals share an underlying perception of color space that guides the way that cultures acquire color terms. Boster’s

1986 study addressed this concept and claimed that subjects did, in fact, create successive color space divisions, which recapitulate the sequence of color space divisions made by entire cultures (described by K&McD) as their language grows to include more basic color terms. The purpose of this project was to further explore Boster’s findings from three different angles: a retrospective analysis of Boster’s individual subjects’ sorting patterns (Study I), and a series of prospective color sorting experiments studying the effect of a multidimensionally variable color palette (Study II), and cross-cultural variations in English and Somali color sorting using an expanded version of Boster’s sample palette, which, like Boster’s, varied primarily in hue (Study III).

If individuals of different cultural and linguistic backgrounds do in fact share an underlying perception of color that guides basic color term evolution, I predicted:

1) Subjects performing a perceptually based sorting task will create successive color categories in an order that is consistent with K&McD’s seven-stage model;

2) Color sorting will be performed with similar hue-based strategies corresponding to BCT evolution, regardless of palette;

32

3) Subjects of different cultures will perform color sorting tasks similarly, regardless of current color lexical stage – any cross-cultural variations will be reflected by variation within the same culture.

33

Study I: Retrospective Analysis of Boster (1986)

Boster examined the overall trends present within his sorting data and concluded that subjects sort colors in a way that is consistent with K&McD’s model of color term evolution. Following Boster’s original experiment, research has described the use of multiple color naming motifs within the same language (Lindsey & Brown 2009) so it is reasonable to expect that there may be lawful differences in color sorting present within a culture as well. Boster’s approach inherently treated individual sorting variations as noise, whereas examining individual differences may tell a more complete story. In this retrospective analysis of Boster’s sorting data, I focused on individual differences in sorting patterns.

Methods

Review of Boster’s Methods

The 21 subjects in Boster’s non-verbal binary sort task were asked to divide a sample of eight Munsell color samples (Figure 11. Munsell samples from Boster (1968) non-lexical binary sort.) into groups based on the colors’ perceptual similarity to one another. Once the colors were separated into two groups of most perceptually similar colors, subjects were asked to divide one of the existing groups into two groups of most- similar color categories for a total of three groups. This division of color groupings was continued until all 8 colors were coded into their own ‘category’. When the task was

34 completed, the data for each subject could be represented as a hierarchy, and this hierarchy was recorded in shorthand as a ‘tree-string’ for each subject (Figure 12).

Boster analyzed the overall trends within the data by computing similarity matrices between the colors based on subjects’ hierarchical sorting divisions and the tree- strings they produced. These matrices quantified the perceived similarity of a given pair of colors based on the proximity of colors in an individual’s tree-string. The similarity matrices for each set of colors were averaged out across all subjects to determine the aggregate measure of similarity between the colors. Boster then compared these similarity matrices to one generated from K&McD’s evolutionary model through a series of statistical analyses.

a)

b)

Figure 11. Munsell samples from Boster (1968) non-lexical binary sort. a) The samples, as ordered above, were N1.5 (black), N9.5 (white), 10PB4/10 (purple), 5PB4/12(blue), 5G5/10 (green), 5Y8/14 (yellow), 10R6/14 (orange), and 5R4/14 (red). b) locations of best-example Munsell equivalents

35

Figure 12. Data coding scheme in Boster (1986). Hierarchical sort trees (a) were encoded as tree-strings (b) by Boster. Number ‘1’ indicates where the samples were divided for the first sort, and so forth. The larger the number between two numbers, the longer a subject kept them in the same group.

Analysis of Individuals’ Sorting Data

I created visual representations of individuals’ sorting piles at each sort level. To create these diagrams, I reverse-engineered individuals’ sorting hierarchies from the individual subjects’ tree-string data published by Boster, and then re-coded the hierarchy data to demonstrate category membership at each sort level (Figure 13). I compared these individuals’ sorting piles to those predicted by K&McD, as well as identifying other patterns which emerged under Boster’s binary non-lexical sort. Individual sorting data were analyzed with and without the secondary colors (purple and orange) in order to see the full complexity of individuals’ color sorting as well as more directly judge the prevalence of K&McD adherence at each sort level.

36

Figure 13. Reproducing subjects’ sorting piles from Boster (1986) coded data. Tree-string data (a) was reverse-engineered into a sorting hierarchy (b), re-coded to demonstrate sorting piles (c), and used to visually represent each sorting pile for each subject (d). This figure is an example of the sorting order predicted by K&McD (1978). Colored discs exhibit the color samples composing a given color category.

37

Results

Prevalence of sorting patterns

For a direct comparison of subject sorting data to the K&McD-predicted order, I excluded secondary colors (purple and orange) from this portion of the analysis. As displayed below in Figure 14, 14/21 (66.7%) of Boster’s subjects followed the predicted

2-sort pattern. Of these 14 subjects, only one subject followed the predicted 3-sort pattern

(7.1%). Common deviations in 3-sorts included an unpredicted early division of black from the cool colors (5/14 subjects; 35.7%) and a division of the warm/white category which kept yellow with white rather than with its fellow chromatic colors (7/14 subjects;

50.0%). Both of these common patterns deviated from K&McD’s predicted order in that they did not show white breaking off from the warm/light 2-sort category to leave a category of chromatic warm samples.

When performing the 4-sort, subjects who deviated from the predicted order during the 3-sort in specific ways had the potential to re-align with the predicted trajectory. While no subjects’ 4-sorts matched the predicted Stage IIIa, three subjects of those who followed the predicted two sort then followed the predicted trajectory for

Stage IIIb through to Stage V (3/14; 21.4%). The most common 4-sort overall was similar to the predicted pattern but maintained the combined yellow and white group from the deviating 3-sort (‘r/yw split’; 50.0%). Three subjects exhibited 4-sorts in which green emerged as a category much earlier than predicted (‘early green’; 21.4%).

38

Figure 14. Prevalence of sorting strategies in Boster (1986). Secondary colors (purple and orange) not included in analysis in order to evaluate adherence to K&McD predictions of color term evolution. Representative examples of each observed sorting strategy are shown. Note diversity in subjects’ sorting strategies.

39

Similar to Expected Trajectory

Many of Boster’s subjects performed the 2-sort as anticipated, but Subject 3

(Figure 15) was the only individual who performed the 3-sort as predicted by K&McD.

Disregarding secondary colors, the remainder of Subject 3’s sorts follow K&McD’s predicted order. When secondary colors are taken into account, an early emergence of purple prior to separation of blue and green deviates from predicted.

Subject 7 alternately performed a early black/cool split for the 3-sort, deviating from K&McD’s predictions at this step. Following the 3-sort, Subject 7 resumes the

Stage IIIb predicted pattern and continues to perform predicted sorts through to their final round. After White and Black separate as their own categories, the warm category divides. This is followed by the division of the cool category, and then the secondary colors peeling off of their neighboring primaries. The comparison between Subject 3 and

Subject 7 highlights the ways that constraints may affect a subject’s potential to follow

B&K’s predicted order. For example, even in the case of orderly sorters, if purple is placed in a sorting category with black, the treatment of cool colors is interrupted and encourages the early emergence of purple.

40

Figure 15. Individuals’ sorting patterns most similar to expected pattern. Examples from Boster (1986). Subject 3 deviated from K&McD’s predicted order only in early emergence of a purple category. Subject 7 resumes expected trajectory after deviation in the 3-sort.

Lightness -based sorting strategy

In addition to hue, the color samples used in this experiment varied in lightness.

Lightness, for these purposes, was classified based on the value of each Munsell sample.

The lighter half of the color samples included White, Yellow, Orange, and Green. The darker half was Black, Red, Purple, and Blue (Figure 16 a, b). This boundary did not respect the warm/cool divide as predicted by K&McD, placing Green with warmer colors and Red with cooler colors. Of all 21 subjects, four appeared to base their 2-sort criteria on the dimension of lightness (Figure 14). Some of these subjects appeared to have based further n-sorts on this criterion as well (Figure 16 c).

41

Subject 10’s 2-sort did not include Green in the ‘light’ category, drawing the line for the ‘light’ category at a higher cut-off. Next, this subject refined their lightness-based divisions by separating white from the light chromatic samples for the 3-sort, and separating black from the dark chromatic samples for the 4-sort. Importantly, these divisions occurred before Red split off from the ‘cool’ colors. The combination of Red with ‘cool’ colors is very unlikely within a hue-based sort, but the samples of these colors all had similar lightness. Prioritizing the division of colors with more marked differences in lightnesss over the separation of Red from ‘cool’ colors of similar darkness emphasized the salience of lightness for this individual.

Subject 15 also used a lightness-based sorting strategy. For the 2-sort, they divided the samples into equal piles. Their 3-sort separated the lightest light samples

(white, yellow) from the darker light samples (green and orange) . For the 4-sort, the category with the next biggest difference (between yellow and white) was divided.

However, it is almost as if subject 15 was fixated on dividing one pile and forgot about the ‘dark’ 2-sort category.

42

Figure 16. Lightness-based sorters from Boster (1986). a) circles: approximate locations of Boster samples on WCS color chart. Note lightness differences in color stimuli, and dividing line between lightest and darkest halves; b) pie chart representation of Boster samples, showing division between lightest and darkest samples; c) subjects 10 and 15 show evidence of lightness-based color sorting, overriding predicted hue-based division of colors.

Other observed sorting strateies

Subjects 13 and 9 both produced 2-sort categories which placed white, yellow, orange, and purple in one category and black, red, green, and blue in another (Figure 17).

These categories did not respect a warm/cool division or a completely lightness-based divide, but separated colors based on ‘boldness’. ‘Boldness’-based category membership 43 took into account the status of the color sample as a Hering primary or secondary color, and lightness of the sample. The ‘pale’ category – white, yellow, orange, and purple – contained the secondary colors from the sample palette, and the lightest of the primary colors. The ‘bold’ category – black, red, green, and blue – contained darker primary colors only. As these subjects continued dividing categories they produced unexpected combinations of colors, including a purple/orange category which persisted to the 7-sort, and a red/blue category which persisted to the 6-sort (subject 13). Subject 9 produced the same orange/purple category, though it only persisted until the 5-sort. The continued association of the secondary colors suggested some subjects may perceive a salient difference between the Hering primary and secondary colors.

In another example of an unexpected 2-sort, Subject 6 placed the neutral samples together in both the 2- and 3-sort, prioritizing the achromacy of the samples above their value. The remainder of their trajectory was similar to the sorting patterns of other subjects, creating a rare example of an unexpected first sort which did not induce a series of unusual results downstream. Looking at Boster’s results on an individual level, it is clear that there was great variation in the sorting behaviors between individuals within the same language and culture.

44

Figure 17. Other Boster subject sorting strategies. Subjects 13 and 9 made color divisions based on “boldness” of the color samples (further explained in text), and subject 6 utilized a partially achromatic category.

45

Study II: English-speaker 30-Color Sort

The constraints of the binary sort procedure used in Boster’s experiment complicated the interpretation of the individual data. One questionable sorting decision early in the task could result in less than ideal sort patterns in subsequent binary divisions, and it is impossible to know which unusual individual variations may have been a result of this effect. A binary sort procedure also implies rigid color category boundaries as successive divisions occur, when it is widely observed that the adoption of new BCTs involves the re-distribution of adjacent category boundaries. To minimize artifacts which likely resulted from heavy constraints imposed by a sequential binary sort protocol, and to more closely mirror the natural divisions of color space which are free to occur during lexical evolution, this experiment used a non-binary sort procedure: each n- sort was independent of all the others. Additionally, the number of color samples was increased to 30 in order to explore color space divisions more precisely. Finally, with the exceptions of the black and white and yellow samples, Boster’s palette consisted of only five Munsell colors that varied in hue but were of approximately equal in saturation and value. In Experiment II, the test palette contained many more samples, which varied not only in hue, but also in lightness and chroma (saturation). Also, unlike Boster’s protocol,

I tested only non-lexical sorts. A final difference between Boster’s and the present protocol is that subjects in Experiment II did not exhaustively sort the 30-color palette, they were asked only to produce 2-, 3-, 4-, 5- and 6-sorts.

46

Methods

Subjects

English-speaking subjects (n=90; 59 females), age 7-69, were recruited at The

Ohio State University College of Optometry and through Labs of Life at COSI science museum and research center in Columbus, OH. Subjects were recruited as volunteers interested in contributing to science. All protocols used in this and proceeding experiments were approved by the Ohio State University biomedical research IRB and were in accordance with Declaration of Helsinki principles.

Apparatus

The color sorting game was presented to subjects using the Consort application on an iPad Air (Figure 17). Using a portable device which could display a large range of colors with consistent display quality was important to the protocol of this experiment, due to the variety of locations where subjects were tested. iPads were calibrated using model PR-670 spectroradiometer (Photo Research, Syracuse, NY), and the gamut proved to comply with sRGB standards (Table 1). The color stimuli within the game were calibrated across multiple machines, and demonstrated consistency in display across iPads (Table 2).

47

Procedure

Before the sorting task began, age and gender demographic information was collected from subjects. From a randomized 5x6 display of the stimulus colors on a grey background (Figure 18), subjects were asked to sort the color stimuli into two separate boxes based on the following prompt:

“Sort all of the colors into two categories based on how similar they

look to one another. Colors that are similar in some way, but different

from the others, should be put in the same pile. The only rules are that

1) all colors need to be placed in one of the piles and 2) every pile must

have at least one color sample in it. 3) Do not assume that the two piles

must contain equal numbers of colors.”

Subjects then pulled colors across the screen into one of the boxes, and could adjust the contents of each box until they were content with the groupings they had produced. Throughout the task, it was emphasized that there was no correct or incorrect answer. Once every color was assigned to a group and the subject declared their sort final, the results were saved and a new randomized color grid appeared- this time with one more box than the previous task had offered. Subjects were asked to apply the same sorting prompt for 3, 4, 5, and then 6 categories.

48

Figure 18. iPad game display during sorting procedure. For the 2-sort above, subjects dragged every color sample into one of the two boxes, and each box was required to contain at least one color sample. When this task was complete, samples reset and a third box was added. This was repeated with 3, 4, 5, and 6 boxes.

Stimuli

This set of 30 color stimuli was selected to be representative of variations in WCS hue and lightness. That is, not only was each color category represented, but it was represented with a number of samples proportionate to the amount of the color space diagram that color category encompasses (Figure 19).

49

a)

525 0.8 510 540

555 b) 0.6 570

585

495

0.4 600

615 630 645

0.2

480

465 450 0.0 0.0 0.2 0.4 0.6 0.8

Figure 19. Thirty sample color palette used in Study II. a) Closest matches of 30 samples with WCS Munsell samples. Samples chosen to represent each basic lexical color category proportionally, b) samples shown in CIE chromaticity space spanned the majority of iPad gamut (grey triangle).

Statistical Analysis

The sorting patterns present within the sample population across all n-sorts were determined using a spectral clustering algorithm (Ng et al, 2002). This is an advanced cluster technique that avoids many of the limitations inherent in more popular clustering techniques, such as k-means, that have been used in prior research in color and language

50

(e.g., Lindsey & Brown, 2006). In particular, the K-means clustering algorithm often fails when clusters of data do not correspond to convex regions. Spectral clustering readily handles these situations.

Results

Overall patterns

Spectral clustering revealed the prevalence of 11 different color categories across all the different n-sorts and thus provided a way to see how the overall structure of the n- sorts changed as the number of color categories (piles) increased from two to six. The prevalence of a category type and the consensus within it are represented by the lightness of the color within Figure 20. The pattern lightnesses at a given sort level have been normalized to the maximum consensus obtained for any color sample at that sort level

(i.e., normalization by column in Figure 20).

In addition to hue-based color categories seen in Figure 20, note the surprisingly high prevalence of sorting patterns consistent with lightness-based sorting criteria,

‘Light’, ‘Dark’, ‘Achro-1’ and ‘achro-2’. These were especially prevalent during 2-sorts and gradually decreased in prevalence as subjects were instructed to sort the color samples into a greater number of piles. As the numbers of sorting categories increased, so did the prevalence of hue-based categories. In the following figure plots of 30-color sort data, achromatic samples (black, white, grey) are not represented. Categories labeled as

‘light’ or ‘dark are chromatic categories, portioned based on the lightness or darkness of the samples, but do not include the achromatic samples.

51

Figure 20.Results of cluster analysis on 30-color data. Each panel shows consensus (brightness in each panel) among 90 subjects in their respective n-sorts (columns). Names at left indicate roughly the cluster category identity. All cluster categories (besides brown) were represented roughly equally as early as 2-sort level. All increased in prevalence and became more distinct as n-sorts progressed. Note, achromatic and grue categories decreased in signal strength as sorts level progressed.

52

Individuals’ sorting patterns

When viewing the sorts at an individual subject level, it is clear that some individuals followed an achromatic sorting pattern while some followed a warm/cool strategy from the initial 2-sort. Interestingly, the warm/cool strategy often closely resembled the Dani mili/mola division. The less constrained sorting protocol allowed for individuals to switch from a lightness- to hue-based sort strategy at any point. As evidenced in Figure 21, individuals switched strategies after any number of sorts, or stuck with the lightness-based strategy from start to finish. Sorting that began as lightness- based most commonly transitioned to a hue-based strategy as the subject progressed through the experiment; however, on rare occasions subjects following a hue-based sort strategy switched to lightness-based sorting.

The surprising prevalence of lightness-based sorts is revealed only because the palette of color samples used in this experiment varied widely in respect to hue saturation and lightness, allowing subjects to consider all these perceptual dimensions of color as color category criteria. This diversity of color samples more closely mimics the natural diversity of color that languages would have adopted words to name. Boster’s palette did not contain this wide variation in lightness/saturation.

Subjects who maintained a hue-based strategy throughout each round of sorting did not follow a uniform sorting pattern, and multiple hue-based sort patterns were present along the way (Figure 22). At the 3-sort level, subjects varied even more in their color space divisions. A few followed the order expected by K&McD and divided the

‘warm’ category into its ‘red’ and ‘yellow’ components, while a greater number divided 53

‘cool’ into it’s ‘green’ and ‘blue’ components first. Some subjects even adopted a partially achromatic pattern, maintaining the ‘warm’ category and dividing ‘cool’ based on lightness rather than hue. Two subjects (1, 4) employed a 3-sort which re-distributed colors in such a way that a purple-centered category evolved early. Even at the 2-sort level where most subjects followed a similar warm/cool division, some drew an unexpected line between these categories. For example, S 64 grouped yellows with greens and reds with purples, forming much different categorical boundaries. Clearly, there was not even a majority of chromatic-only sorters following an order predictable by

K&McD.

54

Figure 21. Representative examples of subjects adopting achromatic sorting strategy. Final use of achromatic sorting strategy was determined for each subject. Subjects may have sorted using a hue-based strategy at some point prior to the final n-sort. It was common to maintain a hue-based strategy, lightness-based strategy, or switch from one to another as sorts progressed.

55

Figure 22. Representative examples of subjects who adopted a hue-based sorting pattern. Between the 23 subjects who adopted a hue-based strategy, substantial sort pattern variability was present. Few performed the n-sort in a way that was consistent with K&McD.

56

Study III: 25-Color Sort

In a final experiment, I reduced color sample variations along the dimensions of chroma and Munsell value to better isolate how people group color spaced based on hue.

The amount of diversity in the 30-color palette samples caused multiple interesting sorting strategies to arise, but made it difficult to compare data to Boster’s findings and evaluate the treatment of hue by subjects. Therefore, I used a 25-chromatic sample palette in Study III, which is much larger than Boster’s palette (6 chromatic samples), but somewhat more simplified from the 30-color palette which evoked such a variety of strategies. The larger chromatic palette allowed me to better explore individual differences in the boundaries between sorting categories. To further explore the question of cross-cultural factors in color space division within this study, the sorting task was performed by both English- and Somali-speaking subjects. This allowed me to test the other two limbs of Boster’s prediction: that within-culture variations in color space division will mirror cross-cultural variations, and individuals of different color lexicon developmental stages will treat color foci the same way.

Somali was selected for this comparison as it is historically a stage II color lexicon, and a large enclave of Somali immigrants live in Columbus OH. Through traditional methods of analysis, Maffi (1990), determined that modern Somali has 6 basic color terms – black (madow), white (cadaan), red (huruud, casaan), yellow (cawl, huruud), green (cagaar), and blue (buluug) – though the relatively rapid transitions which

57 the Somali language is undergoing can be appreciated through the variety of color naming motifs among Somali speakers (Brown, Isse, and Lindsey 2009). The within- language diversity of Somali, as evidenced by its wealth of color naming motifs (Lindsey

& Brown 2016), makes it a dynamic language to study when investigating how the differences between sorting task results for individuals within the same language compare to the differences between speakers of different languages.

Finally, in order to better understand the factors that might contribute to diversity in n-sorts, I ran a control experiment in which English speakers were told precisely how to produce each n-sort. The instructions were designed to produce n-sorts that recapitulated the pattern of color term evolution predicted by K&McD. This experimental protocol therefore controlled for individual differences in cognitive strategy used to produce the n-sorts, and any inter-subject differences in the results could be attributed solely to individual differences in color perception.

Methods

Subjects

English-speaking experimental subjects were selected and recruited by the same methods as described for Study II, with the addition of some subjects being recruited partially from personnel within Fry Hall at The Ohio State University. Control subjects were recruited from lab personnel. For Study III, 55 English-speaking subjects (27 females) were tested. English-speaking subjects ranged from ages 6-83. Somali subjects

(n=23; 14 females), aged 19-70, were recruited at the CarePoint East refugee clinic, also 58 located in Columbus, OH. Somali subjects were approached by an interpreter who explained an overview of the sorting game to them. They were compensated with a $10

Kroger gift card in exchange for their participation. Somali subjects completed a brief questionnaire in addition to the first name, gender, and age elicited from English- speaking subjects. This questionnaire included questions regarding age, gender, languages spoken, occupation, highest level of education, birthplace, other countries of occupancy, and approximate date of immigration (Appendix B).

Apparatus

The same iPad and ConSort application were used as described in Study II.

Procedure

The same procedure as described in Study II was performed using 25 colors arranged in a randomized 5x5 array (Figure 23).

59

a) b)

Figure 23. iPad display showing 25-color palette. a. Beginning of a 2-sort trial. Randomized 5x5 arrangement of 25-color test palette colors, and the two “bins”; b. two-sort was complete when the subject pulled every color sample over into one of the boxes. When this task was complete and saved, samples reset and a third box was added. This was repeated with 3, 4, 5, and 6 boxes.

Stimuli

The 25 colors utilized in the second set of stimuli were designed to be approximately perceptually equal Munsell steps around an iso-chroma hue circle, including strong examples of all focal colors. While most of the samples fell on a chroma

8 circle, the saturation of the red region needed to be increased to chroma 10 to 12 for the hue to appear recognizably red. Upon analysis there was no significant emergence of categories that adhered strictly to this ‘bump’ in chroma, suggesting this manipulation did not bias the results. I also introduced a gradual increase in luminance of color samples in the yellow-orange portion of the palette in order to produce good yellows and oranges.

Colors with these chromaticities that were isoluminant with the other palette colors looked brown.

60

a)

b) c)

Figure 24. 25-color sort sample palette. a) circles: closest matches of 25 samples with WCS Munsell samples; b) colored wedges approximating colors selected as roughly equal perceptual steps around a hue circle; c) filled red circles: sample chromaticities in relation to constant Chroma 8 contour (blue curve). See text for further details.

Statistical Analysis

A spectral clustering analysis was applied to data from this experiment, as described in the methods for Study II.

61

Control experiment

The procedure described above was performed by ten English-speaking subjects who were given explicit directions about how to perform each n-sort in a way consistent with K&McD’s seven stages model. The category guidelines and corresponding K&McD stages were as follows:

“The 2-sort corresponds to Stage I, so there should be a warm/cool division:

(red-purple) + red + orange + yellow // green + blue + (blue-purple)

The 3-sort corresponds to Stages IIIb and IV. It should have a red category, and categories containing the remaining warm and cool color samples:

(red-purple) + red + (red-orange) // (yellow-orange) + yellow // green + blue + (blue-purple)

The 4-sort corresponds to Stage V and should have well-defined categories for each Hering primary, with secondaries split across neighboring primaries:

(red-purple) + red + (red-orange) // (yellow-orange) + yellow // green // blue + (blue-purple)

In the 5-sort corresponding to Stage VII, either orange or purple gets its own category and the other is still split between neighboring primaries:

red + (red-orange) // (yellow-orange) + yellow // green // blue // purple

OR: (red-purple) + red // orange // yellow // green// blue + (blue-purple)

In the 6-sort: all primary and secondaries have own category:

red // orange // yellow // green // blue // purple”

Subjects were instructed to select colors from the palette, one category at a time. Thus, for example, in the 3-sort condition, subjects were first asked to select and place into a pile, all the colors that appeared red-purple, red, or red-orange. Then, they were asked to

62 select from the remaining colors in the palette, those that were orange, yellow-orange or yellow, and place these into a second pile, etc. After all colors had been placed in one of the piles appropriate for a given sort level, subjects were asked to make any needed adjustments to their n-sort before moving on to the next sort level. The 5-sort procedure was performed two times by each subject to capture both the orange-first and purple-first trajectories possible at this sort level.

English-speaker Results

The overall sorting patterns revealed by the spectral clustering analysis and individuals’ sorting variations were both examined. The prevalence of different sorting patterns at each n-sort among the individual subjects and the full sorting data of each subject were studied to best understand how subjects performed relative to K&McD’s expected results produced in the control procedure (Study III, Control Results).

Overall patterns of color category formation

At the 2-sort level, two main sorting patterns were revealed by cluster analysis

(Figure 25a). The sorting patterns which this section refers to are groupings of color categories which roughly span all the color samples, without overlap of high-consensus portions of categories. The number of individuals who clearly fall into these common patterns were judged by examiner inspection, and the prevalence of different individual patterns is further detailed in the following section.

63

Pattern 2a adhered closely to the expected warm/cool division with high consensus until the colors at the category boundaries. When the 25 color samples were plotted relative to the Stage I Dani mili/mola division (Figure 25b), pattern 2a exhibited a similar division to Dani use of mili and mola. Areas of high consensus within the category and low consensus along boundary colors both corresponded well with the color naming behaviors of a Stage I language.

Pattern 2b did not follow the expected warm/cool division and there was less consensus on where category boundaries lay. Subjects comprising this pattern placed greens with traditionally warm colors such as yellow and orange, and often included reds with traditionally cool colors such as purple and blue. This patterns was also observed in

Boster’s subjects (Figure 16. Lightness-based sorters from Boster (1986). Overlaid on the mili/mola diagram, conensus for pattern 2b was greater at mili/mola category boundaries than within the categories themselves. Though stimuli were designed for minimal variability in lightness, centroids of either 2b category lay at the low and high ends of lightness for the sample set. Despite the presence of no more than minimal lightness variation, subjects contributing to pattern 2b may have been sorting along the dimension of lightness in addition to hue, overriding the warm/cool division with loose light/dark boundaries.

At the 3-sort level, the most common sorting pattern, 3a, included a red-centered category which included red-oranges and red-purples, a yellow-green category which corresponded to the lightest samples, and a blue-centric category which corresponded to many darker samples. This sorting pattern corresponded to a B&K Stage II language

64 when considering the lightness component – there was a corresponding category for

‘red’, ‘black’ (or dark) and ‘white’ (or light). The combination of yellows and greens within a category did not agree with K&McD’s expected order of category divisions, which anticipates greens remaining with cool colors.

Results strayed further from the expected as the number of sorting piles increased to four. The most common 4-sort pattern, pattern 4, includes categories centered on blue, yellow-green, orange, and purple. The red category prominent in pattern 3a was split apart between the orange and purple categories within pattern 4. This early emergence of

Hering secondary color categories was unanticipated, as was the disappearance of a Red category. At the 5-sort level, all but one color category should have emerged, and

K&McD predicted it would be a Hering secondary – either orange or purple. However, at this point the most common composite category remaining was yellow-green, while yellow remained a fairly low-consensus category. A similar association between yellow and green remained in the 6-sort. As a whole, composite categories became less common and individual color categories gained greater consensus as the number of categories increased. However, the prevalence of a yellow category increased much later than expected due to its degree of association with green, while orange and purple categories emerged earlier than predicted.

65

Figure 25. Results of cluster analysis of US 25-color sorting categories. a) Cluster analysis revealed ten common color sorting patterns; b) magnified views of panels outlines in panel c. c) for comparison with classic warm/cool classification, background shows prevalence of mili/mola (warm/cool) Dani designations from Heider & Olivier (1972). See text for further details.

66

Sorting pattern prevalence

Figure 26 demonstrates the most common sorting categories and interesting minority sorting patterns as determined by examiner inspection and judgement of individuals’ sorting data. Not all individuals who were counted exhibited this exact color sorting pattern – individual variation was present even within these categories. There was deviation from the expected and variation within same-culure subjects at all stages.

On the 2-sort level, 18/55 subjects (32.7%) appeared to sort in a way which was consistent with pattern 2a, and 21 (38.8%) sorted consistent with 2b. There was a greater amount of variation in the boundaries for pattern 2b sorters, as predicted by the lower consensus boundaries visible. Some subjects extended the cool category further into the red-purples, and some extended the warm category into the greens. Six subjects (10.9%) followed pattern 2c which was only significant in the Somali subject population by cluster analysis (Figure 29). This sorting pattern is discussed further in Somali results section. Five subjects performed the 2-sort in a way which suggested lightness may have been used as an additional sorting criteria to hue. This sorting strategy is discussed further in the individual subject sorting examples section below.

For the 3-sort, pattern 3a was performed by 29/55 subjects (52.72%). The dark/light/red (dk/lt/R) strategy also corresponded well to K&McD’s expected 3-sort, but with a more restricted‘red’ and more expanded ‘light’ category (n=7; 12.7%). Another similar sorting pattern, BG/YO/RP (n=7; 12.7%), placed greens within the ‘cool’ category, but its ‘red’ category was more centered on purple than good examples of red.

Otherwise, this would be another suitable interpretation of predicted 3 sort within 67

K&McD’s criteria. The PB/G/YOR (n=5; 9.1%) sorting pattern diverged most distinctly, exhibiting the early emergence of a green category.

Greater than half of subjects (n=29; 52.7%) followed pattern 4 (B/GY/O/P) in the

4-sort. The presence of secondary color groups (orange and purple) before primary color groups deviated from the predicted. The BG/Y/O/P sorting pattern (n=8; 14.5%) varied from pattern 4 in its treatment of green, but was still inconsistent with the predicted. One would expect something like the B/G/YO/R (n=8; 14.5%) or PB/G/Y/OR (n=7; 12.7%) sort because primaries were treated as the focal member of the groups while secondary colors were split between neighboring categories.

There was a large amount of subtle diversity in the 5-sort patterns, as any small change essentially shifted a pattern from one species to another. Though this limitation was present at each level of the n-sort to some extent when using this method of analysis, it was best exemplified in the 5-sort. One would expect ‘split P’ or ‘split O’at this stage – every color besides one of the secondaries would have its own category. Only 13 subjects

(23.6%) performed one of these sorting patterns. Interestingly, the most common 5-sort pattern (n=18; 32.72%) involved splitting red samples between more focally ‘orange’ and

‘purple’ categories. This would not be expected, given the proven salience and early establishment of a red BCT.

By the 6-sort, most subjects (n=41; 74.5%) created six categories consistent with the lexical categories of red, orange, yellow, green, blue, and purple. Subjects who did not perform the 6-sort as expected varied in their deviatons from the expected pattern.

Often, there would be two sorting categories for one lexical category (ex. Two different

68 blue piles), throwing off the expected border of categories or forcing the subject to split a lexical color category across two neighboring groups.

69

Figure 26. Diversity of US subject color sorting data. The minority, idiosyncratic sorting patterns were not shown. Numbers in red indicate numbers of subjects exhibiting a particular sorting pattern. Numbers weredetermined subjectively by inspection of sorting data. Individual variation was present even within these categories.

70

Individual subject sorting examples

Close to Expected

Subjects 49 and 36 are examples of individuals who performed the color sorting task most similiarly to K&McD’s prediction from beginning to the end (Figure 27).

Subject 49 did not perform the 2-sort as expected, but the 3-sort emphasized red as a salient new category. Proceeding from there, the 4-sort gave each Hering primary its own category, and the 5-sort split purple between its neighboring categories. Subject 36 performed the 2-sort more in line with the predicted warm/cool division. Their 3-sort contained an extended red category that expanded into purples and oranges, but the other categories corresponded to the darkest and lightest colors, following the logic of red/light/dark categories of the predicted 3-sort. The 4- and 5-sort were both performed as expected. Though each of them had a redder purple category and a red category that included only the redder orange hues, both subject 49 and 36 made final sorts which had categories which corresponded to each of the six basic colors.

71

Figure 27. English-speaking subjects’ sorts closest to K&McD’s predicted. Some subjects, such as S 49 and S 36, followed the predicted K&McD trajectory with only minor deviations in strategy.

Lightness-based strategies

Subjects 11 and 31 were examples of subjects who performed early n-sorts with a lightness-based strategy (Figure 28). While lightness varied minimally, it was apparent that there was some variability in that dimension when the closest corresponding Munsell samples were mapped on a WCS color space diagram. The more distinctly ‘light’ category of samples expanded from yellow-oranges to greens. Subject 11’s 2-sort made a very strictly light yellow category. Their 3-sort still placed darker red and blue colors in one category – an unpredicted association between warm and cool colors – while medium-light greens were separated from the lighter yellows and oranges. Subject 31 almost perfectly followed the light vs dark division for their 2-sort, and then even more distinctly gained an orange-yellow-green light category within the 3-sort. This was

72 interesting, as it indicated that some subjects may have been very sensitive to this slight lightness difference in samples.

Figure 28. English-speaking subjects adopting lightness-based strategy. a) circles: closest matches of 25 samples with WCS Munsell samples. Horizontal line indicates subjective division between light and dark samples. b) pie chart showing lightness-based division of color palette c) Representative examples of lightness-based sorting strategies.

73

Somali Results

Overall patterns of color category formation

Cluster analysis of Somali sorting data revealed differences in sorting patterns beginning at the 2-sort level (Figure 29). While one of their sorting patterns was roughly equivalent to English-speaking subjects’ pattern 2b, another prominent sorting pattern

(2c) included one category spanning blue, green and yellow while the other category contained orange, red, and purples. This was in clear violation of the warm/cool 2-sort division, as BGY/ORP was not expected. As the experiment proceeded to the 3-sort, the orange, red, and purple category persisted, almost unchanged, while the other category dissolved into a blue category and a yellow category, with greens split between them. At this point, a stronger association between green and blue would be expected than between yellow and green. However, it was somewhat consistent with a divide between the lightest and darkest hues.

The 4-sort pattern 4 observed in the population of English speaking subjects was also a common sorting motif for Somali speaking subjects, as evidenced by the presence of an orange, purple, yellow-green, and blue category. However, there was a significant presence of yellow, green, and red categories at this sort level as well, suggesting a greater proportion of B/G/Y/R sorting occurring.

By the 5-sort, all 6 lexical colors had an established category, besides red. There was also a persistence of a yellow-green species at this point in the sort. The 6-sort looked very similar to the 5-sort, with an absence of a ‘red’ category persisting in the

74 overall data. This was a clear departure from the English and expected sorts, as each of the six basic color terms in this sample set did not receive their own distinct category.

75 v Figure 29. Cluster analysis of Somali 25-color sorting categories. Figure format identical to Figure 25. See that figure caption for further details.

76

Sorting pattern prevalence

The most common 2-sort pattern followed by 10/23 Somali subjects (43.4%) was pattern 2b (Figure 30), which they shared with English speakers. There were also minor variations in the boundaries of color categories present in the Somali subjects, but they tended more towards the extended warm category in which greens were regularly grouped with red, orange, and yellows. Pattern 2c was significant in the Somali population, but not the English population. Seven of the 23 subjects (30.4%) categorized yellow, green, and blue together, and orange, red, and purple together. There was also one subject who performed the 2-sort consistent with the “lightness” pattern previously described (4.3%). Two subjects (8.7%) placed purples with lighter colors such as yellows and greens – an unlikely pairing which has been observed in naming tasks prior to this study (Lindsey and Brown 2016). This will henceforth be referred to as the “wildcard sorting pattern”, following the terminology adopted by Bimler (2011), who also observed similar color naming patterns in other languages.

Motif 3b was followed by 11 (47.8%) subjects during the 3-sort. Another sorting pattern also maintained the orange, red, and purple category from pattern 2a while placing the green with blue instead of yellow (BG/Y/ORP), and it was followed by 5 subjects (21.7%). At the 4-sort level, 12 (52.2%) subjects followed pattern 4 (B/GY/O/P), while five subjects (21.7%) contributed to the predicted B/G/Y/R sorting pattern. In both the 3- and 4-sort, the “wildcard” remained a minority sorting pattern.

Similar to English speaking subjects, Somali subjects demonstrated a diverse set of 5-sort patterns. Most commonly, a YG group persisted at this level (n=8; 34.8%), and 77 many subjects also split red between its neighboring purple and orange categories (n=6;

26.1%). At this point, it was expected that either orange or purple would have poorly established categories. Only two subjects (8.7%) exhibited the splitting of orange, and two exhibited the splitting of purple. Moving into the 6-sort, only 13 subjects (56.5%) established good categories for the six BCTs. Of the sorts with poorly established color categories, the most commonly neglected color category was red (n=6; 26.1%), much like it was for English-speaking subjects.

78

Figure 30. Diversity of Somali color sorting data. The minority, idiosyncratic patterns are not shown. See Figure 26 caption for further details.

79

Individual subject sorting examples

Lightness-based strategies

Subject 16’s 2-sort followed the lightness divide in the color samples, giving a clear example of a lightness-based sort (Figure 31). Their 3-sort maintained somewhat of a light/dark divide while incorporating Red in as the third category.

Figure 31. Somali subject adopting lightness-based strategy. See Figure 28 caption for further details.

80

Wildcard

Subject 4 used a wildcard sorting pattern in their 2-sort, while subject 21 only had a wildcard category present in their 3-sort (Figure 32). Often, “wildcard” subjects performed like subject 21, in that they produced color categories containing two non- contiguous groups of color samples in multiple n-sorts. If the combination of yellows and purples was intentional, the reason for the perceived similarity might have been similar to the reasons given for a ‘boldness’ based sort among Boster’s non-binary sort subjects

(Study I Results- Individuals’ sorting patterns). However, it is more likely that these categories occur because subjects don’t know what to do with purple samples and happen to place them in a pile with yellow samples where they stand out as non-contiguous, rather than blue or red categories where they would blend in.

n-sort: 2 3 4 5 6

S 04 S 21

Figure 32. Somali subjects example of “wildcard” sort. Subject 4 used a “wildcard” pattern in the 2-sort, subject 21 used a “wildcard” pattern in the 3-sort. This pattern is further described in the text.

81

Control results

The results of the control experiment are shown in Figure 33, below. For comparison, English (middle panel) and Somali (right panel) are also shown. Their correspondence to K&McD is displayed in Figure 34. Notably, unlike the English and

Somali cluster results, panel a shows no supernumerary sorting patterns for any of the sort levels. As expected, there are only two distinct patterns observed for the 2-sort, three patterns for the 3-sort, and so forth. Thus, the results of the control experiment strongly suggest that the supernumerary categories observed in the English and Somali data shown above are not due to individual differences in color perception, but are due to differences in the cognitive strategies adopted by the English and Somali subjects when sorting the test palette colors.

In a similar vein, the results also show that some of the unexpected sorting patterns observed in both the English and Somali cluster analysis, but not predicted by

K&McD, are not due to our particular choice of stimuli. Of particular note is the “green- yellow” category observed in Row 4 of Figure 33’s middle and right panel. The results obtained from control subjects, shown in Row 4 of Figure 33’s left panel, correspond to the evolutionary sequence for the yellow category as proposed by K&McD. These results are completely at odds with the “green-yellow’ category derived from the data in the main experiment.

82

Figure 33. Control group vs experimental group 25-color sort. Results of clustering of respective data sets. Left: ten cluster patterns obtained from control experiment. Middle panel: results from English data set from main experiment. Right panel: results from Somali- speaking groups in the main experiment. Note, Somali subject results are displayed in a different order than they were in Figure 29. Column headings correspond to 2-, 3-, 4-, 5-, and 6-sorts, respectively.

83

Figure 34. Color sorting pattern predicted by K&McD (1978). Correspondence of K&McD color category predictions and control group sorting representation of these divisions.

84

These comparisons between main and control experiment results are summarized in the phase plots shown in Figure 35 for English-speaking (left panel) and Somali (right panel) subjects. Each phase plot compares the results from the control experiment with the corresponding result obtained by cluster analysis, as described above. The arrows in the phase plots indicate the results of Pearson correlation analysis. Each English- or

Somali-speaker consensus plot was compared to the control experiment data at the same level. Arrow orientation (rightward arrow = 0 degrees phase shift) indicates by how much the control pattern had to be rotated counterclockwise to produce the maximum correlation with the corresponding pattern obtained from the results of the main experiments. Arrow length indicates the correlation magnitude (0…1.0). Correlation magnitude was determined by calculating the Pearson product moment correlation coefficient. Green plots flag phase shifts that exceeded 30 degrees (two sample positions) clockwise or counter clockwise around the color circle used to represent the 25-color palette employed in this experiment. Red patterns indicate supernumerary patterns or patterns for which the Pearson correlation coefficient was less than 0.80.

85

Figure 35. Phase plots comparing control and main experiment sort pattern. Left panel: English speakers. Right panel: Somali speakers. Sort-level indicated by column heading. Consult text for further details.

86

These phase plots do a good job of summarizing major trends found in the

English and Somali 25-sample sort data. First, they illustrate just how frequently supernumerary sort categories occur in both English-speaking and Somali groups.

Second, they also illustrate the presence of the “green-yellow” category (Row 4) in both groups that substitutes for the “yellow’ pattern of color term evolution predicted by

K&McD. Third, they show that, by the 6-sort, cluster patterns in both English- and

Somali-speaking groups tend to conform more closely with K&McD predictions. Fourth, they show that patterns obtained from the English-speaking group tend to deviate less frequently from K&McD then do patterns from Somali speakers (14 vs. 25 phase plots color coded red or green). And finally, these plots illustrate the absence a good representation of “red” in the Somali data. That category (Row 6) is entirely absent from the data and although the Somali patterns more “orange”-like, they nonetheless deviate from “orange” patterns observed in the English control experiment.

The control experiment also allowed me to investigate another issue concerning subjects’ sorting strategies. I noticed when collecting data for the main experiment that

US and Somali subjects tended to sort the palette colors into approximately equal pile sizes, even though they were instructed not to do so unless absolutely necessary. An analysis of pile size from the English and Somali sorts verified this informal observation.

The left and right panels of Figure 36 are pile-size histograms for the two subject groups, respectively; that is, plots of the number of times (ordinate) piles for a given n-sort

(accumulated across all piles produced by all test subjects in the test group) contained a particular number of samples drawn from the color palette (abscissa). Thus, for example,

87 the 2-sort histograms are based on pile sizes tabulated from a sample of 112 2-sort piles

(56 subjects x 2 piles per subject); 3-sort histograms are based on 168 piles (56 x 3), etc.

Separate plots in each panel are shown for each n-sort.

Modal pile size closely corresponds with the equal-pile-size predictions (note the correspondences between vertical dashed lines and the histogram peaks in each panel) for both English- and Somali-speakers across all sort levels. Is this due to a common bias in cognitive strategy while sorting, or is it an accidental property of the choice of colors in the test palette? Figure 36 shows pile-size histograms obtained in the control experiment.

For all but the 3-sort, control subjects’ pile sizes tend to peak around a value corresponding to the average, as they do in the results from the main experiment.

However, note that pile-size frequency tends to be trimodal in the case of control subject

3-sorts. These results thus indicate some degree of subject bias to make the pile sizes about the same size (at least for the 3-sort), although the shoulder on the left-hand side of the English 3-sort histogram suggests that the effect may be modest in size for this subject group.

88

Figure 36. Pile size histograms. a. English-speaking subjects. b. Somali- speaking subjects. c. English-speaking control subjects. See text for details.

89

Discussion

The main goal of this thesis work was to test an interlocking pair of hypotheses that connect a color sorting paradigm devised by Boster (1986) and a model of color term evolution proposed by Kay & McDaniel (K&McD, 1978). Boster’s study was designed to test: first, that color sorting can be used to access universal cognitive mechanisms that have played a role in the evolution of color lexicons; and second, that these cognitive mechanisms have promoted an evolutionary sequence of color term acquisition that is predicted by K&McD’s model of color term evolution.

This thesis re-examined these two hypotheses in three different ways. In Study I, I looked beyond group trends in Boster’s data and focused on an analysis of individual differences in sorting among his 23 subjects. The results of my analysis of Boster’s data show that:

(1) Most of Boster’s subjects’ sorting patterns actually deviated substantially from those predicted by K&McD’s model. In particular, the warm/cool sort pattern expected at the 2-sort level was observed in only about 67% of subjects; other subjects adopted other strategies when dividing the palette colors into two piles, and these patterns deviated substantially from the K&McD model predictions.

In Study II, the first of two prospective studies conducted for my thesis, I introduced a larger palette of color samples that varied not only in hue but also significantly in lightness and saturation. This choice of test palette was designed to examine sorting behavior using an ecologically more valid set of colors; that is, a diverse

90 array of colors that better represent the diversity of color in the natural environment than the palette used by Boster. K&McD’s theory of color term evolution is based on the assumption that hue is the most salient feature driving color term evolution. With the exception of black and white samples, Boster’s palette of test colors varied almost exclusively in hue. How might subjects’ sorting behavior be affected by the adding variation in saturation and lightness to the test palette? Would subjects’ sorts still be dominated by the salience of hue, as predicted by K&McD? I found that:

(2) With multiple color dimensions upon which to base their n-sorts, subjects often used lightness/saturation rather than hue. In many cases, individual subjects switched from lightness/saturation- to hue-based sorting strategies as they proceeded through n-sorts, or vice-versa.

(3) There was essentially no sequential addition of color categories in the group data as the number of sorting bins increased. All the major color categories observed at the later sort levels were represented at the 2-sort level by inter-subject variability.

Achromatic and grue categories decreased in strength as n-sorts proceeded, while other chromatic categories became more distinct.

In Study III, I examined cross-cultural variations in color sorting using a color palette which varied primarily in hue. This palette was chosen to more closely resemble the one used by Boster, but introduced more variation in hue than his and thus served as a more critical test of the link between color sorting and cognitive strategies in color term evolution. Crucially, if color sorting accesses universal cognitive strategies that underlie

91 color term evolution, then English- and Somali-speaking subjects should sort colors in the same ways. In Study III, I found that:

(4) Unlike my 30-sample results, the 25-sample palette produced n-sort categories that did change substantially in composition as sort level increased from 2 to 6. The progression nonetheless was quite diverse across subjects within both English- and

Somali-speaking language groups. In general, sort patterns deviated from those predicted by K&McD.

(5) Cross-cultural differences were discovered in the sorting patterns of US and

Somali subjects. Some of these differences corresponded to differences in the languages of origin. For example, on cluster analysis, the Somali category best corresponding to

‘red’ contains orange hues as well, much like their term for ‘red’ does.

(6) Pile size analysis revealed that Somali and US subjects were slightly biased toward creating equally sized piles. This doesn’t invalidate the data, as equal-pile-sizes cannot alone explain both inter- and intra-subject variability in their n-sorts.

(7) Phase diagrams showed that there was no one-size-fits-all strategy for sorting; there were large deviations from the predicted sorting pattern and variation between subjects in all testing scenarios.

Within-language variability and K&McD

Study I results are clearly at odds with Boster’s claims. While major trends are evident in his results, nonetheless, there is considerable variability in color sorting across individual subjects. Importantly, this variability is not random, but tends to fall into a

92 few distinct groups, as described in “Study I Results- Prevalence of sorting patterns”

(Figure 14). This suggests that Boster’s subjects developed a strategy early in their sorting that is based on the most salient features in the color palette. In Boster’s case, hue is the most salient feature, but that is only because of Boster’s choice of palette colors.

However, it should be noted that at the 2-sort level, strategies that prioritize “boldness”

(Munsell Chroma) or lightness of colors as a sorting dimension are also evident, even though Boster’s palette colors varied only slightly in these color dimensions.

As pointed out in the Introduction, the binary n-sort protocol adopted by Boster meant that subjects were always required to divide an existing pile into two at each sort level, meaning that the sorts subsequent to level n depended upon what the subject had done up to that level. Therefore, a good deal of the subject variability may have been due to

Boster’s protocol. I discuss this further below.

Subject variability in Study II was vast due to applications of different cognitive strategies, as addressed below. However, even subjects who adopted a similar strategy and based sorts on hue throughout the procedure exhibited variability and there were very few examples of adherence to predictions of color term evolution suggested by the

K&McD’s model (Figure 22).

Results of Study III also revealed variability in individuals’ sorting strategies. As was true for Boster’s subjects, the variability was not completely random. As described in

“Study III Results- Sorting pattern prevalence” (Figure 26), most subjects’ n-sort patterns were consistent with one of four to five strategies, one of which was driven by the

93 salience of lightness, which by design varied minimally across the 25-sample color palette used in Study III.

Substantial deviations from K&McD were also evident in the results from Study III, for both US and Somali subjects, even though this study employed a color palette that, like Boster’s, varied principally in hue. The extent of this variation is clearly present when experimental subject data are compared to those obtained in a control experiment

(Figure 33). That is, one might want to argue that individual differences in sorting behavior are due to a combination of differences in both cognitive strategy and color perception. The instructions in the control experiment in Study III controlled for cognitive strategy by explicitly telling subjects how to sort the samples. Under the reasonable assumption that all subjects understood the instructions, individual differences in the sorts should be governed almost completely by differences in color perception.

Indeed, very little variation was found in the resulting color sorting patterns.

It is not easy to make direct comparisons in inter-subject variability between the present thesis studies and Boster’s study because of the differences in sorting protocol.

As mentioned previously, sorting patterns at the nth sort level in Boster’s study were dependent upon all earlier n-sort levels, while in the present study all n-sorts were independent of sorts at levels less than n. However, it seems unlikely that both inter- observer variability in sorting and deviations from predictions of K&McD in Boster’s data are solely a consequence of his sorting protocol.

94

Cognitive Strategy in n-sorting

Study II was designed to explicitly test the salience of hue in guiding sorting behavior. Berlin & Kay and their colleagues have long argued that this is the dominant cue guiding the evolution of color terms. If this is so, and if sorting provides a window into cognitive strategy that might have guided color term evolution, then we would expect subjects to ignore the substantial lightness/saturation cues in the 30-color palette.

However, as shown in Figure 21, this is not the case. The majority of subjects adopted a lightness-based sorting strategy at some point in the sorting protocol. If the increased perceptual salience of certain hues guides the order of BCT evolution, one would expect overall emergence of sorting categories in Study II group data to resemble color naming order. However, the uniform emergence of chromatic sorting categories within the Study

II data (Figure 20) did not suggest increased salience for any hue.

In Study III, when major variability in color dimensions was limited to hue, we see evidence of a hue-based color-categorical evolution (Figure 25) that was not present in

Study II. However, in addition to the salience of hue, there appears to also be a bias toward equal-sized piles. This was seen most clearly in the 3-sorts. In the other cases, equal pile size is not an issue, and we still observed deviations from the K&McD predictions, suggesting that the equal-pile-size bias alone could not account for my data.

Cross-language variability and Universalism

A crucial claim of K&McD and others is the existence of a universal cognitive framework that guides color term evolution, and Boster argued that n-sorts were a way of

95 exploring this strategy. If this were true, then we would expect subjects from different cultures to express the same cognitive strategy in their n-sorts. As shown in Figure 33 above, this is not the case. Substantial differences between English-speaking and Somali- speaking subjects’ color sorting was apparent beginning at the 2-sort level. Major differences were also apparent even at the 6-sort, as Somali subjects failed to create a

‘red’ category that matched the English sorting category. In K&McD and Boster’s logic, this suggests a different cognitive framework for color categorization between US and

Somali subjects.

Cross-language and within-language diversity in color naming have been explored previously by L&B (2009) in their study of WCS motifs. Multiple naming strategies were often present at one time within a language, and these differences were also reflected in the diversity of motifs across different languages. This highlighted the importance of diversity as a component within an evolving language, much as diversity among conspecifics is a central tenant of biological evolutionary theory. The results of the present studies show that, to some extent, individual differences in sorting parallel those found in color naming, and that some of the variability observed among English-speaking subjects was also observed in the sorting patterns of Somali-speaking subjects. For example, a minority of US subjects performed the 2-sort in a way that was consistent with a majority Somali sorting patterns (pattern 2c). Moreover, recall that Somali subjects, unlike English subjects, do not reveal distinct orange and red sorting categories in their 6-sorts. These cross-cultural differences in sorting behavior parallel differences in color naming in the two languages. This suggests that, for at least sorts at higher sort

96 levels, sorting behavior is influenced by the language a group of subjects speaks, and is not, therefore, an entirely non-lexical task

To summarize the discussion to this point: the results of the present study suggest that

English-speaker color sorting (from 2- to 6-pile sorts), reveals some common universal color naming categories – e.g., warm, cool, red, green, blue, grue – but also reveals some uncommon lexical categories – e.g, light and dark and yellow-green. The n-sort patterns depend critically upon the choice of color palette, which subjects are asked to sort.

Moreover, there are substantial individual differences in sorting strategies among subjects speaking the same language, regardless of color palette. Subjects may even change sorting strategy when progressing from a 2- to 6-sort. Also, contrary to Boster’s original conclusions, the results of the present study conform only very loosely to predictions of based on K&McD’s highly constrained model of universal color term evolution. Instead, subjects’ n-sorts are diverse in ways that sometimes mirror within- and cross-cultural differences in color naming observed in the World Color Survey. However, n-sorts were observed in the present study that are not generally observed in the World Color Survey.

How can one make sense of these disparate results?

Hadza

I believe the key to understanding the present color sorting results can be found in the color naming behavior of the Hadza (Lindsey et al., 2015). Recall that the Hadza are nomadic Tanzanian hunter gatherers possessing a very rudimentary color naming system: other than naming good examples of red, black and white color samples, individual patterns of Hadza color naming are extremely diverse and subjects often cannot name 97 standard color samples. However, in aggregate – that is, when color naming patterns are pooled across an entire community of subjects – Hadzane is surprisingly English-like in structure.

The observation that Hadza color naming is extremely diverse across individuals, but surprisingly structured in aggregate mirrors the results I observed in my studies of color sorting. I would like to suggest that in the case of the Hadza, color naming is guided to some degree by the universality of color perception. However, the diversity in color naming occurs because color communication within the Hadza community is extremely limited because they have little reason to communicate about color on a day to day basis.

Thus individual Hadza have very little practice classifying colors in their environment and thus would have little prior experience naming colors they may never have seen before. Also, what classifications Hadza may have done in their daily lives would not have been as abstract as those involving the color naming of painted cardboard color samples.

I argue here that the same explanation may account for the patterns of color sorting observed in the present study, especially the diverse results found within as well as between US and Somali cultures. It is highly unlikely that any of my subjects had any prior experience in constrained color sorting. Therefore, given the novelty of the non- lexical task, subjects were likely to adopt different strategies in producing their n-sorts.

Crucially, the n-sorts were not random, but were guided by perceived similarities in hue, lightness and/or saturation and to some degree by their prior knowledge of color categories named in their own language. In US and Somali color sorting, as in Hadza

98 color naming, subjects were asked to create novel color categories, based in part on mental representations of color that are personal and not shared within (or across) the language community.

In order to better understand this explanation, imagine someone tasting fine wines for the first time, and they are asked to describe the wines they are drinking. They may be able to easily classify a wine superficially as red and or white, and then they may comment on its relative sweetness or bitterness. However, identifying and appreciating traits familiar to the palate of the sommelier would likely be next to impossible, because the wine-tasting neophyte would have neither the experience in wine tasting, nor the lexical categories with which to classify and communicate their taste sensations. The neophyte would probably find it difficult to articulate subtleties in wine tastes, such as their herbaceous flavors, or fruitiness or fine distinctions in tannin profiles, even though she might be able to readily discriminate two wines as different from one another in taste.

Without a previously acquired framework for these other features of the wine, picking out similarities and distinctions between wines in the formal way that a sommelier might is an exercise in futility. Grouping wines beyond ‘red’ and ‘white’ would cause uncertainty, and inexperienced wine tasters might end up grasping for categorization strategies that seem silly or incomplete, or simply incorrect to an experienced wine-drinker with a more refined palate.

Much like a new wine-drinker forced to act as a sommelier, Hadza were exposed to complex, unfamiliar stimuli and an unfamiliar cognitive task which they didn’t have an established mental framework to communicate about. Similarly, asking a layperson,

99 regardless of their culture of origin, to perform a color sorting task for which they have little mental framework causes uncertainty and reliance upon color knowledge that is likely not to be a part of the shared experience of the language community and therefore to some degree idiosyncratic. In the absence of a “don’t know” option, my color sorting subjects faced uncertainty by picking a sorting strategy and applying it for a round of the sorting experiment. These strategies varied from the expected hue-based sorting, to lightness-based sorting, to simply creating equally-sized groups of contiguous colors. In the case of 30-color sort subjects, the uncertainty was sometimes visible from n-sort to n- sort as subjects switched between distinct strategies (e.g., lightness-based to hue-based) throughout the procedure. Somali subjects performed the sorting task in unexpected ways

(“wildcard” sort and other non-contiguous groupings of colors) at a greater rate than

English-speaking subjects. This may be explained in part by cross-cultural differences in the mental representation of color, but an additional explanation may be that Somalis have less “color experience” than American English speakers related to the classification of colors in their daily lives.”

100

Bibliography

Allen, G. (1892). The colour-sense: its origin and development. An essay in comparative

psychology. 2nd edition. London: Kegan Paul, Trench, Trübner.

Berlin, B., & Kay, P. (1969). Basic Color Terms: Their Universality and Evolution.

Berkeley & Los Angeles: University of California Press.

Bimler, D. (2011). Universal trends and specific deviations. In New Directions in Colour

Studies. Editors: Biggam, C.P. Hough, C.A., Kay, C.J., and Simmons, D.R., John

Benjamins Publishing Co., Amsterdam, The Netherlands.

Bornstein, M. H. (1973). Color vision and color naming: A psychophysiological

hypothesis of cultural difference. Psychological Bulletin, 80, pp. 257–285.

Boster, J. (1986). Can individuals recapitulate the evolutionary development of color

lexicons? Ethnology, 26(1), pp. 61-74.

Brown, A. M., Isse, A., & Lindsey, D. T. (2016). The color lexicon of the Somali

language. Journal of Vision (2016) 16(5), pp 1–23.

Brown, R. (1976). Reference in memorial tribute to Eric Lenneberg. Cognition, 4(2), pp.

125-153.

Brown, R., & Lenneberg, E. (1954). A study in language and cognition. Journal of

Abnormal and Social Psychology, 49, pp. 454–462.

Cook, R., Kay, P., & Regier, T. (2005). The World Color Survey Database: History and

Use(pdf). In Cohen, H. and Lefebvre, C., editors, Handbook of Categorisation in

the Cognitive Sciences. Elsevier.

101

Geiger, L. (1880). History and development of the Human Race. London: Tubner and

Company.

Gladstone, W. E. (1858). Studies on Homer and the Homeric Age. Oxford:

Oxford University Press.

Heider, E. R., & Olivier, D. C. (1972). The structure of the color space in naming and

memory for two languages. Cognitive Psychology, 3(2), pp. 337-354.

Hurvich, L. M., & Jameson, D. (1957). An opponent-process theory of color vision.

Psychological Review, 64(6, pt 1), p.p. 384-404.

Kay, P., & McDaniel, C. (1978). The Linguistic Significance of the Meanings of Basic

Color Terms. Language, 54(3), pp. 610-646.

Kay, P., Berlin, B., Maffi, L., & Merrifield, W. (1997). Color naming across languages.

In Hardin, C. L. & Maffi, L., editors, Color categories in thought and language,

pp. 21-56. Cambridge University Press, Cambridge, England.

Kay, P., Berlin, B., Maffi, L, Merrifield W.R, and Cook, R. (2009). The World Color

Survey. Stanford, CA: CSLI Publications.

Kay, P. & Maffi, L. (1999). Color appearance and emergence and evolution of basic

color lexicons. American Anthropologist, 101, pp. 743-760.

Kay, P., & Regier, T. (2003). Resolving the question of color naming universals.

Proceedings of the National Academy of Sciences, 100, pp. 9085–9089.

Krantz, D. H. (1975). Color measurement and color theory: II. Opponent-colors theory.

Journal of Mathematical Psychology 12(3): p.p. 304-327.

102

Lantz, D., & Stefflre, V. (1964). Language and cognition revisited. The Journal of

Abnormal and Social Psychology, 69(5), pp. 472-481.

Lindsey, D. T., & Brown, A. M. (2002). Color naming and the phototoxic effects of

sunlight on the eye. Psychological Sciences, 13, pp. 506–512.

Lindsey, D. T., & Brown, A. M. (2006). Universality of color names. PNAS, 111(44), pp.

16608-16613.

Lindsey, D. T., & Brown, A. M. (2009). World Color Survey color naming reveals

universal motifs and their within-language diversity. PNAS, 106(47), pp. 19785–

19790.

Lindsey, D. T., Brown, A. M, Brainard, D. H., & Apicella, C. L. (2015). Hunter-gatherer

color naming provides new insight into the evolution of color terms. Current

Biology, 25, pp. 2441–2446.

Magnus, H. (1880). Untersuchengen uber den Farbensinn der Nâturv ¨ olker. Jena,

Fraher.

Marlowe, T. (2010). The Hadza: Hunter gatherers of Tanzania. University of California

Press.

McDaniel, C. K. (1972) Hue perception and hue naming. Unpublished BA thesis,

Harvard Univer., Cambridge, MA.

Ng, A. Y., Jordan, M. I., & Weiss, Y. (2002). On spectral clustering: Analysis and an

algorithm. Advances in Neural Information Processing Systems, 2, pp. 849–856.

Ray, V. F. (1952). Techniques and problems in the study of human color perception.

Southwestern Journal of Anthropology, 8, pp. 251–959.

103

Regier, T., Kay, P., & Cook, R. S. (2005). Focal colors are universal after all.

Proceedings of the National Academy of Sciences, 102, pp. 8386–8391.

104

Appendix A. Calibration Tables

RGB Primary x y red 0.643 0.333 green 0.304 0.604 blue 0.152 0.059

Table 1. RGB primary color calibrations.

105

Approx Munsell x y Y (cd/m^2) 5R 4/14 0.627 0.3385 29.47 5G 3/10 0.294 0.538 17.41 5BP 4/12 0.2145 0.18 35.92 10B 7/8 0.251 0.275 125.7 N9.5 0.312 0.3305 344.9 7.5BG 5/8 0.2295 0.335 85.25 7.5BG 7/6 0.236 0.345 191 5RP 4/10 0.457 0.2355 29.84 7.5R 8/6 0.4025 0.348 180.5 5YR 6/12 0.538 0.4165 127.9 5YR 8/6 0.4205 0.3825 188.3 5G 7/8 0.253 0.419 198 5Y 8/14 0.4495 0.4875 239.8 7.5GY 8/8 0.32825 0.4845 193.5 2.5B 3/6 0.219 0.2885 27.36 5G 5/10 0.2645 0.4495 39.46 7.5Y 9/6 0.4025 0.46005 291.2 5YR 9/2 0.369 0.3835 266.2 N6 0.3135 0.334 82.37 5GY 8/10 0.3735 0.483 155.4 N1.5 0.2895 0.289 1.446 7.5RP 7/8 0.372 0.29 128.4 10BP 7/6 0.2725 0.2435 135.6 2.5P 4/10 0.2625 0.1485 30.98 10Y 4/6 0.416 0.5075 60.53 10B 5/12 0.1955 0.215 53.87 2.5 Y 5/8 0.4635 0.4685 49.79 7.5R 2/8 0.5775 0.3465 10.52 2.5Y 3/4 0.4635 0.4325 13.27 10B 3/10 0.1825 0.1625 17.23

Table 2. 30-color palette calibrations.

106

Approximate Munsell x y Y (cd/m^2) 2.5R 5/14 0.5154 0.3108 31.2 5R 5/14 0.5527 0.3431 36.67 7.5R 6/12 0.5559 0.3618 41.62 10R 6/14 0.5419 0.381 47.78 2.5Y 6/16 0.5211 0.3989 54.84 5YR 7/14 0.5027 0.4208 63.72 5YR 8/8 0.4729 0.4458 79.38 10YR 8/16 0.4487 0.4748 96.55 5Y 8/12 0.4295 0.4949 110.3 7.5Y 8/12 0.4244 0.5183 100 2.5GY 8/12 0.3989 0.5158 82.95 7.5GY 7/12 0.3743 0.5116 71.75 10GY 7/10 0.3446 0.4998 61.85 5G 7/8 0.2961 0.4602 55.04 5BG 7/8 0.2536 0.3861 47.21 10BG 6/8 0.2255 0.3019 41.48 5B 6/8 0.2121 0.2557 37.14 7.5B 5/8 0.2032 0.2245 33.14 2.5BP 5/12 0.2089 0.2009 31.22 5BP 5/12 0.2239 0.2033 29.96 7.5BP 6/8 0.2473 0.2055 30.13 5P 6/8 0.2829 0.2145 29.71 7.5P 6/10 0.3399 0.2377 30.35 10P 5/12 0.3909 0.2567 30.42 5RP 5/12 0.4479 0.2766 30.56

Table 3. 25-color palette calibrations .

107

Appendix B. Subject information sheets

Demographic data sheet

Age (if under age 89): ______check here if elderly (89 or older):______Gender: ______Race: ______Ethnic group: ______Languages spoken Native language:______Other languages spoken in the home:______Other languages spoken before age 12:______

Occupation If Somali: Occupation before immigration______All participants: Occupation now______

Highest level of education ______

Countries of residence, with ages/dates Country: ______Ages:______Dates:______Country: ______Ages:______Dates:______

Hometowns If U.S.: Hometown #1______ages_____to_____ Hometown #2______ages_____to_____ Hometown #3______ages_____to_____

If Somali or Bhutanese Nepali see maps.

108

Location of participant hometowns in Somalia: please mark on map, and indicate ages. Hometown #1______ages_____to_____ Hometown #2______ages_____to_____ Hometown #3______ages_____to_____ Hometown #4______ages_____to____

109 ocation of participant hometowns in Bhutan, Nepal, or India: please mark on map, and indicate ages. Hometown #1______ages_____to_____ Hometown #2______ages_____to_____ Hometown #3______ages_____to_____ Hometown #4______ages_____to_____

110