ABSTRACT

The research reported in this dissertation encompassed the investigation

of the sources of variation in visual judgment of differences in textiles and the consequent disparities in color differences calculated from instrumental color measurement. In a paired comparison method, 46 color pairs, including color inconstant samples, were assessed four times by a panel of 59 observers under two daylight simulators, filtered tungsten and F7. Observer variation was assessed with performance factor analysis for observer accuracy and repeatability and by Kappa statistics for observer reliability. Accuracy and repeatability were low compared to previous studies and assessment of inter- and intra-observer reliability showed poor agreement in visual judgments within and between observers. While observers were found to be the most significant source of variation, observer accuracy and repeatability were significantly higher for samples viewed under the filtered tungsten simulator, as compared to F7, leading to the conclusion that choice of simulator affects visual judgments.

Visual results were also used to test the effectiveness of four color difference formulae: CIELAB, CMC, CIE94, and CIEDE2000. Using performance factor analysis with confidence intervals formed with a bootstrap method, instead of the simple ranking that has been used in other research, no statistical difference was

ii

found in the performance of any color difference formula for either simulator.

However, the combination of CIEDE2000 and filtered tungsten consistently showed less variation. The use of constant and inconstant samples did not affect the results; however, the effect may be due to the large color differences of the inconstant samples. Even though prediction of visual judgments by color difference equations was low, the importance of instrumental color difference evaluation is supported by its stability in contrast to the variability demonstrated by observers.

iii

Dedicated to my grandmother, Maggie M. Kuhn

iv

ACKNOWLEDGMENTS

I wish to thank my adviser, Kathryn A. Jakes, without whose support and encouragement, guidance, and attention to detail, this dissertation would not have been completed. Thank you to Dr. Jakes, for always pushing me a little further.

I am enormously grateful to Rolf Kuehni, whose ideas and suggestions lead this research in a new direction. His understanding of color and instrumental color evaluation is without equal. His consistent willingness to answer questions is greatly appreciated. I also wish to thank David Hinks, in conjunction with Rolf Kuehni, for the supply of the color sample set used in this research.

I am indebted to George Weckenbrock of Divtech Equipment for loan of the F7 simulator, the spectroradiometer, and GretagMacbeth software. This research would not have been possible without these supplies. Datacolor

International must also be thanked for their gift of software.

I wish to thank Louise Friend for her help in running experiments and I would like to thank the 59 observers who gave up hours of their time to participate.

v

I thank The Ohio State University Statistical Consulting Service, Cheryl

Dingus, Jeni Squiric, and especially Dongmei Li for all their advice and hard work

in preparing the statistical analyses.

I would also like to thank John Pryba of the Office of Technology and

Enhanced Learned of the College of Human Ecology for his help in various

computer issues.

This research was supported by the College of Human Ecology Dorothy

D. Scott Fellowship, the Ohio State University Alumni Grants for Graduate

Research and Scholarship, and the Department of Consumer Sciences Lucy R.

Sibley Research Award.

vi

VITA

December 23, 1975………………………Born – Madison, Ohio

August 1999……………………………….Dual B.S., Economics, Fashion Merchandising

1999-2004…………………………………Graduate Teaching Associate, The Ohio State University

October 2002-January 2003……………..Color Research Consultant, Limited, Inc.

2004-2005…………………………………Graduate Research Fellow, The Ohio State University

PUBLICATIONS

Research Publications

Cunningham, P., Mangine, H., & Reilly, A. (2005). and Fashion in the 1980s. In P.A. Cunningham & L. Welters (Eds.), Twentieth Century American Fashion (pp. 209-228). London: Berg.

Mangine, H.N., Jakes, K.A., & Noel, C.J. (2005). A preliminary comparison of CIE color differences to textile color acceptability using average observers. Color Research and Application, (30)4, 1-7.

FIELDS OF STUDY

Major Field: Textile science, color technology, human ecology

Minor Field: Statistics

vii

TABLE OF CONTENTS

page

Abstract………………………………………………………………………………...…ii

Dedication……………………………………………………………………………….iv

Acknowledgements……………………………………………………………………..v

Vita…………………………………………………..…………………………………..vii

List of Tables………………………………………………………………………....…xi

List of Figures……………………………………………………………………...…..xiii

Chapters

1. Introduction……………………………………………………………………...1

1.1 Problem Statement…………………………………………………………5 1.2 Justification…………………………………………………………………..6

2. Review of literature……………………………………………………………...7

2.1 Understanding color measurement……………………………………….7 2.1.1 …………...…………………………………………..7 2.1.1.1 ………………………………………………..…...7 2.1.1.2 Object…………………………………………………...9 2.1.1.3 Observer………………………………………..….....10 2.1.2 …………………………………………….………..12 2.1.2.1 Standard illuminants……...…….…………………….13 2.1.2.2 Standard observers..…………………………………13 2.1.2.3 Calculating tristimulus values…………………..……16 2.1.2.4 ………………………….……………..…20 2.1.2.5 Ellipses…………………………………………………21 viii

2.1.3 Colorimetric spaces……...…………………………………..…22 2.1.3.1 Problems with CIELAB…………………………….…25 2.1.4 Color measurement………………………………...…………26 2.1.4.1 Visual measurement………………………………….27 2.1.4.2 Instrumental measurement…………………….....…28 2.1.5 Color difference…………………………………………………30 2.1.5.1 CIELAB……………………………………………...…31 2.1.6 Alternative color difference equations……………………...…34 2.1.6.1 CMC……………………………………………..…..…35 2.1.6.2 CIE94………………………………………………..…40 2.1.6.3 CIEDE2000………………………………………..…..42 2.1.7 Non-uniformity in color difference studies……..…….……….43 2.2 Variability in daylight……………………………………………………...45 2.3 Variability of samples………………………………………………….….54 2.4 Variability of observers……………………………………………………57 2.5 Research questions……………………………………………………….61

3. Methodology……………………………………………………………………59

3.1 Summary………………………………………………………………...…62 3.2 Light sources…………………………………………………………..…..62 3.3 Color pairs……………………………………………………………….…69 3.3.1 Properties………………………………………………………..69 3.3.2 Mounting…………………………………………………………69 3.4 Color measurement……………………………………………………….71 3.5 Color difference calculations……………………………………………..71 3.6 Psychophysical experiment………………………………………………79 3.6.1 Pilot test………………………………………………………….79 3.6.2 Observers………..………………………………………………79 3.6.3 Experimental assembly.………………………………………..79 3.6.4 Procedure: Paired comparison………………………………..81 3.7 Data analysis…………………………………………………………...... 81 3.7.1 Visual results…………………………………………………….81 3.7.2 Instrumental results……………………………………………..83 3.7.3 Performance factor/3 analysis…………………………………83 3.7.4 Assessment of observer accuracy……………………….……85 3.7.5 Assessment of observer repeatability…………………….…..88 3.7.6 Assessment of inter- and intra-observer reliability……….….88 3.7.7 Assessment of observer drift……………………………….….89 3.7.8 Comparing simulators……………………………………….….89 3.7.9 Performance of color difference formulae……………..….….90 3.7.10 Assessment of quality of simulators…………………..……..90

4. Results and discussion…………………………………………….………….91

4.1 Observer performance……………………………………………………91 ix

4.1.1 Observer accuracy……………………………………………..91 4.1.2 Observer repeatability………………………………………….92 4.1.3 Drift of observer results………………………………….……..95 4.1.4 Inter- and intra-observer reliability………………………...…..98 4.2 D65 simulator performance……………………………………………..103 4.2.1 Comparing simulators…………………………………………103 4.2.2 Performance of formulae……………………………………..107 4.2.3 Assessment of quality of simulators…………………………112 4.3 Implications of Results…………………………………………………..117

5. Conclusions…………………………………………………………...………126

5.1 Research Question 1……………………………………………..……..126 5.1.1 Summary of results……………………………………………126 5.2 Research Question 2……………………………………………...…….127 5.2.1 Summary of results………………………………………..…..127 5.3 Research Question 3……………………………………………………129 5.3.1 Summary of results……………………………………………129 5.4 General Conclusions…………………………………………………….129 5.5 Further Research…………………………………………………...……130

References………………………………………………………………………...….132

Appendix A……………………………………………………………………………138

x

LIST OF TABLES

Table page

Table 1.1 Color difference equation performance factors for each dataset……………………………………………………………………3

Table 2.1 Wavelengths of visible light corresponding to each ……………..8

Table 2.2 MI rating system…………………………………………………………52

Table 3.1 SPD of each simulator at 5 nm intervals and at 10 nm intervals………………………………………………………………....64

Table 3.2 Relative SPD for each simulator………………………………………67

Table 3.3 CIELAB ∆E’s, average ∆E, and categorical rating of each simulator for color pairs used for MIvis ratings………………...…….69

Table 3.4 CMF’s and normalized CMF’s for the 10º observer from 380nm-700nm at 10nm intervals………..…………………………...72

Table 3.5 Relative SPD’s of filtered tungsten and F7 multiplied by normalized CMF’s and the resulting tristimulus values…………... 74

Table 3.6 ∆E for CIELAB, CMC(2:1), CIE94(2:1:1), and CIEDE2000(2:1:1) for each of the 46 color pairs under each light source………………...... 76

Table 4.1 PF/3 results for observer accuracy and repeatability……………….94

Table 4.2 ANOVA of date of experiment, time, and simulator on observer accuracy………………………………...…………………...98

Table 4.3 Kappa coefficients for inter-observer reliability……………………….99

Table 4.4 Kappa coefficients for intra-observer reliability…………………….102

Table 4.5 PF/3’s of visual results between simulators………………………...104

xi

Table 4.6 Color differences of standard neutral reference pair under each simulator…………………...……………………………105

Table 4.7 Observer indications of color difference of standard neutral difference pair under filtered tungsten relative to a color difference of 1 under F7………………..……………………………106

Table 4.8 Performance of color difference equations (PF/3’s of ∆V vs. ∆E(sim.))…………………………………………………………..109

Table 4.9 Pf/3’s for performance of simulators…………………………………114

xii

LIST OF FIGURES

FIGURE page

Figure 2.1 Diagram………………………………………………….19

Figure 3.2 Diagram of sample mounting…………………………………………68

Figure 3.2 Diagram of experimental assembly…………………………………..80

Figure 4.1 Time series plot of observer accuracy vs. experiment number for filtered tungsten…………………………………………..96

Figure 4.2 Time series plot of observer accuracy vs. experiment number for F7…………………………………………………………..97

Figure A.1 Instructions to observers……………………………………………..139

Figure A.2 Consent form…………………………………………………………..140

xiii

CHAPTER 1

INTRODUCTION

A significant problem that faces the textile industry, particularly in the increasingly global marketplace, is the accurate evaluation of color and color difference. Not only is the industry faced with attempting to match dyed on different fabrics intended for the same garment, they also must reproducibly prepare colored fabrics that match the designer’s specification, and that appear equally similar in all conditions as evaluated by a population of consumers with widely ranging color perception. In order to remove the subjectivity of human evaluation, instrumental measurement of color has been proposed and used by the industry.

The sensation of color is produced by the interaction of an observer, light, and a colored object. By changing any part of this triad, perceived color will change. In industry, it is important to ascertain the difference between colors in an effort to control color. Humans have difficulties making reliable judgments of quantitative color differences, so an instrumental method of measuring color difference was devised. In instrumental measurement, the

1

light and observer have been standardized and mathematical equations

devised which quantitatively measure color difference.

In instrumental color measurement, a significant problem that faces

the textile industry is the creation of a color difference equation that is universally accurate. That is, the equation must accurately and reliably

indicate whether two colors match for the average observer under any given

set of conditions. Such a goal is quite possibly unattainable (Wright, 1959)

and thus far, color difference equations have only been able to attain

approximately 65% accuracy in predicting color matches as reported by the

average observer. In other words, the equations only explain 65% of the

variation of average perceived small color differences, i.e. color differences

less than a color difference value or CIELAB ∆E* of 6. The most recently

released color difference equation CIEDE2000, which is only valid for color

differences of ∆E* less than 4, has as yet not shown an improvement over

CMC(2:1) for textile samples. Unless a significant improvement is gained, i.e.,

the benefits of switching outweigh the costs, such an equation will not be

readily used (Kuehni, 2003b).

The variation in accuracy of the color difference equations is also

problematic. In a comparison of today’s most used color difference equations

to a number of highly used datasets, the accuracy of each equation varies

resulting in performance factors ranging from 19 to 46 (Table 1.1) (Kuehni,

2003b).

2

Color difference formula Dataset CMC CIE94 CIEDE2000 RIT-DuPont 28.1 19.4 19.0 Witt 46.3 41.6 38.4 BFD 39.7 42.7 34.1

Table 1.1: Color difference equation performance factors for each dataset.

What causes this variability? Not all causes are readily evident, but accuracy can change depending on observers, thus one possible cause of the variation is random differences in observer panels. In fact, for an individual observer, the accuracy of any of the equations in predicting the matches observed is probably closer to 50% and at most 65% (Kuehni,

2003b). In addition, there is little information on the reliability of observers.

A second possible cause for variability of accuracy of a color difference formula is the effect of the conditions of the experiment. Accuracy may be high for a very specific set of conditions or mediocre for a broader set of conditions. Research reported in the literature is based on experimental conditions that vary considerably. Data that are used and combined in research, and that are considered reliable are based on different experimental conditions (Kuehni, 2004b). Experimental conditions vary in terms of physical viewing parameters and methods of evaluation. However, as new data are collected, no consideration is given to the effect of different experimental conditions (Kuehni, 2003b), when changing one aspect of the experimental conditions can lead to largely different results (Kuehni, 2004b). Perception of

3

color difference can be affected by physical viewing parameters, like the separation between samples, background, size of samples, type of substrate, luminance, and size of the color difference (Guan & Luo, 1999b). Perception is also affected by different methods of evaluating color difference such as absolute methods, paired comparison, and gray scale, and by bias of observers that may be based on culture or experience (Kuehni, 2003b). With the plethora of methods used to gain visual results, we are left with inconsistent information as to which variables are of importance. If we are unsure of how these types of variables will affect visual results, it is necessary to conduct new research with more constraints placed on experimental conditions. If variables such as light source, substrate, and method of evaluation are standardized, then variability of the experiments will only stem from observers (Kuehni, 2004b). However, to standardize experimental conditions, conditions which produce valid and reliable results need to be determined. Thus, research investigating the accuracy of a given color difference formula when changes in experimental conditions are made needs to be evaluated to find which parameter changes result in significant loss in accuracy. Once the changes in experimental conditions that cause no loss of accuracy are determined, the reliability of those conditions may be determined. To validate these results, replication is necessary. Therefore, before any results from such research can be useful, they will need to be

4

replicated within a study as well as by different researchers using comparable testing conditions (Kuehni, 2003b).

According to prominent color scientist Rolf Kuehni (2003b), to find which conditions are best for reliable results, such experimenting should begin by using samples produced by different colorants with color constant and inconstant formulations. These samples should be evaluated 3-5 times in a fixed position against a reference pair in different daylight sources by 40 or more observers (Kuehni, 2003b).

1.1 Problem Statement

The problem addressed in this research is to standardize conditions for color matching by assessing the effect of different conditions on the variability of results. Three specific objectives may be stated:

(1) To investigate the inter-observer and intra-observer variability of

color difference judgments.

(2) To identify which daylight simulator, filtered tungsten or

fluorescent, produces more reliable results and best agreement

with the CIELAB, CMC, CIE94 and CIEDE2000 color difference

equations, while controlling experimental conditions for a group of

observers and a set of substrates.

5

(3) To assess the variability in results and agreement with the

CIELAB, CMC, CIE94 and CIEDE2000 color difference equations,

caused by color constant and color inconstant samples, while

controlling experimental conditions for a group of observers.

1.2 Justification

Color matching can be performed both visually and instrumentally.

While instrumental measurement may not completely accurately simulate

decisions made by human observers, it serves the purpose of an objective

moderator in color matching decisions, which is needed due to the high

subjectivity of observers (Kuehni, 2003b). As new color difference equations

have been recommended, the calculated accuracy of any equation is still

significantly low. Companies may be unwilling to use new equations if little

increase in accuracy is gained and consequently profitability is not increased.

Therefore it is necessary to determine if the accuracy of a formula can be improved.

This research contributes to the improvement of the evaluation of color difference. By controlling factors in experimental design, the effect of light source, and , between and within observer variability were assessed. Results of this research could influence the textile industry in choice of light boxes as well as in the use and choice of color difference equation.

6

CHAPTER 2

REVIEW OF LITERATURE

2.1 Understanding Color Measurement

2.1.1 Color Theory. Color can be defined as a sensation caused by

the effect of light on the human eye and the result of such an effect on the

observer. In essence, it only exists in the viewer’s mind (Billmeyer &

Saltzman, 1981), and thus color is a subjective perception (Aspland, 1993).

For the color of any object to be perceived, three elements must be present, a

source of light, an object, and an observer (Christment, 1998).

2.1.1.1 Light. Without light, there can be no sensation of color.

Also, color of objects may change with the incident light (Berger-Schunn,

1994). The scientific understanding of color actually began with Isaac

Newton when he demonstrated with prisms that light is made of a

spectrum of color (Hunter, 1975). Visible light is part of the electromagnetic

spectrum, which is a continuous band of radiation. The radiation differs in the

number of oscillations in a measured unit of length (Berger-Schunn, 1994).

The consists of wavelengths anywhere from 10-5 nm to a few miles (Aspland, 1993).

7

Visible light is a part of the electromagnetic spectrum located between

ultraviolet and infrared radiation (Aspland, 1993). For textile purposes visible

light is the electromagnetic radiation that extends in wavelengths from 400nm

to 700nm and different sections of this spectrum approximately correspond to

different as shown in Table 2.1 (Berger-Schunn, 1994).

Hue nm Violet 400-430 430-480 480-560 560-590 590-620 620-700

Table 2.1: Wavelengths of visible light corresponding to each hue (Berger- Schunn, 1994).

Light sources emit radiation that differ at each wavelength (Berger-

Schunn, 1994), thus they have a radiant energy that can be distributed many ways across the visible portion of the spectrum. When power is graphed as a function of wavelength, a spectral power distribution for a light source is created. Daylight itself can have many different spectral power distributions

(SPD’s), depending on weather conditions. For example, relative power of typical north sky daylight peaks around 450nm and decreases as wavelength increases (Billmeyer & Saltzman, 1981). When dealing with textiles, a

relatively small number of light distributions are of interest in comparison to all

8

that are possible (Aspland, 1993). These include average daylight,

fluorescent light, and incandescent light. For color matching specifically, light

sources as well as illuminants have been standardized. A light source is an

actual physical entity with a spectral power distribution that can be

determined, whereas an illuminant is a set of numbers defining the SPD of a

light standard that may or may not have a corresponding light source. Many

illuminants are used for which modern technology has not been able to make

a spectrally-exact matching source (Billmeyer & Saltzman, 1981).

2.1.1.2 Object. The second element necessary to perceiving

color is the object. For apparel industry purposes, the object is a textile

containing colorants. Colorants are dyes and that absorb light.

Different colorants absorb different wavelengths of light to different degrees.

Wavelengths that are not absorbed by the colorant are reflected or scattered and left as radiation thereby producing the sensation of color (Berger-Schunn,

1994). The spectral ability to reflect light can be described by a reflectance curve. The curve shows, at each wavelength, the fraction of light that is reflected from the sample (Billmeyer & Saltzman, 1981). Then, by recording the percentage of light that has been reflected at each wavelength, a reflectance curve is attained. The reflectance curve is like a fingerprint for color but in fact the agreement between reflectance and perceived color is very complex. Reflectance is also measured in comparison to a white standard, and the ratio of light reflected from object to light reflected from the

9

white standard is called the reflectance factor (Berger-Schunn, 1994). The lower the average overall reflectance, the darker the color, and the higher the average overall reflectance, the lighter the color (Aspland, 1993).

2.1.1.3 Observer. The last and most important element

necessary for perceiving color is the observer. In a human observer, the eye

in conjunction with the brain perceives color. In some sense, the eye acts like

a camera. The lens of the human eye forms an image on the retina. The

retina is light sensitive and it contains rod cells and cone cells. The rods

allow the eye to see dim light and support vision without color. The cones,

centered in the fovea where rods are not present, are especially important to

. There are three types of cones. Each type responds to

wavelengths of light in different ways, thus each has a different spectral

response curve. The signals from the cones interact and the result is sent to

the brain to produce the sensation of color. However, each person has

slightly different spectral response curves (Billmeyer & Saltzman, 1981).

Also, the eye is not equally sensitive to light at all wavelengths. A luminosity

curve can be graphed to show how the eye responds to spectral intensity.

The eye sees more than light and dark luminosity; it can “see” colors and the

brain can arrange them in some sort of three-dimensional configuration,

because the three types of cones receive light radiation in a spectrally

different manner (Hunter, 1975).

10

The accepted theory, called opponent color theory, is that the three types of cone receptors and the rod receptors produce three signals consisting of three opponent pairs of white/, red/green, and yellow/blue

(Christment, 1998). E. Hering first proposed this theory and a mathematical expression was formulated by H.L.F. von Helmholtz in 1896. The four basic hues are represented by what Hering called the unique hues: unique red is a red that is neither yellowish nor bluish, comparable statements are made for the other three unique hues. The three receptors in the cones make signals that are converted in the cortex into one or two of the four kinds of hue experiences and a brightness experience. In other words, color is seen as having lightness or darkness, redness or greenness, and yellowness or blueness. According to the theory one or two of the four chromatic experiences can be mixed together and with various amounts of white and/or black produce color experiences in different hues, different lightness, and different chromatic intensity, or chroma. From the hue experiences we see hues and from the lightness experience we see light. However, chroma or intensity is also a dimension of color, but it does not have its own receptor.

Instead it is seen from the relative magnitudes of the hue and lightness experiences (Kuehni, 1998b).

Opponent color theory first assumes that humans have the same hue mechanism for both lightness and object colors. Second, opponent color theory assumes that hue is visually independent of both lightness and

11

chroma, a concept which is often distorted as will be discussed later in this paper. Lastly, so far there is not a proven specific neurological path supportive of this theory. However, some support for this theory does come

from recordings of retinal cells (Kuehni, 1998b). It is an important

experimental fact, though, that individuals reliably pick color samples as

representing for them unique hues that vary widely in spectral properties

(Kuehni, 2004a).

Although each person has a different spectral response curve, and

different selections of samples to represent unique hues, 96% of the

population is estimated to have normal color vision. The remaining 4% are

considered as having impaired color vision (Berger-Schunn, 1994). Tests can

be run to determine an observer’s color vision, including the Farnsworth-

Munsell 100 Hue Test (Vanderhoeven,1992), the , and the

Pickford Nicolson anomaloscope (McDonald, 1980). In total, a human observer can differentiate anywhere from 2.28 million (Pointer & Attridges,

1998) to 7 million (Aspland, 1993) to 10 million different shades of color (Judd

& Wyszecki, 1975).

2.1.2 Color Space. Based on the assumption that a human observer

can arrange colors into a three dimensional configuration, a number of color

spaces have been created in the past century. The color spaces presently

used in industry are CIE color spaces from the International Commission on

Illumination. The first CIE space was created in 1931 on the basic premise

12

that color is a stimulus perceived from the light, object, observer triad

(Billmeyer & Saltzman, 1981) and on both tristimulus color theory and

opponent color theory (Kuehni , 1999c). It was created specifically for

measuring color stimuli and is not represented by a set of actual physical

samples. The CIE has standardized the sources, observers, and

measurement (Billmeyer & Saltzman, 1981).

2.1.2.1 Standard illuminants. Standard illuminants for CIE first

included sources A (incandescent light), B (simulated noon-), and C

(simulated overcast sky daylight), which served the purpose until the introduction of fluorescent whitening agents which made necessary the use of

the ultraviolet portion of the spectrum. In 1965, more illuminants were added

based on new studies of natural daylight. Most important to textiles was the

addition of D65 with a of 6500K, which most closely

represents average daylight. A, B, C all have sources, whereas D65 does not

(Billmeyer & Saltzman, 1981).

2.1.2.2 Standard observers. The color vision of the CIE

Standard Observer represents the average color vision of a human possessing normal color vision (Billmeyer & Saltzman, 1981). The first standard observer is the 2º Standard Observer. This was based on the theory that the cones, as receptors of color in the eye, are centered in the fovea. For samples to be focused on the fovea they must appear in a viewing field of less than 4º. The standard of 2º was chosen which is equivalent to

13

looking at a shirt button from the distance of 10 inches. When these experiments were started in 1920, the concept of additive mixing had already been introduced, whereby adding the three primary colored of red, blue, and green together, nearly all other colors can be created. The amount of each light required to match a color is related to the sensitivity of the receptors in the eye (Berger-Schunn, 1994). Thus, one observer at a time was shown a monochromatic spectral light on the screen. He or she then had to match that light by combining selected red, blue and green lights at varying intensities. The amount of each light used was recorded as values r for red, g for green, and b for blue for that wavelength. This was repeated for each wavelength of the , and then averaged for all observers to obtain r, g, and b over the visible spectrum (Billmeyer & Saltzman, 1981).

The color sensation created by some wavelengths could not be

matched with the red, green, and blue lights chosen, so matches were

achieved by adding one of the three to the light to be matched. This is

tantamount to having a negative value of r, g, or b in the match. Since it was

important for the CIE to eliminate negative numbers, the r, g, and b, were mathematically transformed to amounts of imaginary primaries. The primaries themselves are usually denoted with bold X, Y and Z in order to distinguish them from the tristimulus values, ¯x , ¯y , and ¯z which are the spectral functions that describe the primaries (Berger-Schunn, 1994). In relationship to opponent color theory, the color matching functions denote

14

changes in human color perception. Changes in ¯x relate mainly to changes in red/green, changes in ¯z relate to changes in yellow/blue, and changes in ¯y

relate to changes in lightness (Kuehni, 1999c). When these values are

graphed as a function of wavelength the result is a spectral response curve.

Each primary, has its own spectral response curve (Berger-Schunn, 1994).

These three curves are referred to as color matching functions (Aspland,

1993) and were derived first by Maxwell and Helmholtz’s assistant König in

the 19th century. The values used for the 1932 CIE 2º observer were

determined by Wright and Guild (Kuehni, 1998b). Since these experiments

were time consuming, only 17 observers were used and their results

averaged to attain the 2º Standard Observer (Berger-Schunn, 1994).

However, comparing colors with only a 2º field of vision is impractical.

The fovea has different sensitivity than the rest of the eye, so when using the

2º Standard Observer with larger samples, which are more practical to look

at, the visual match did not correlate well with the calculated color differences.

Thus, the 10º Standard Observer was created which covers the entire fovea

plus some of the surrounding retina. It was recommended by the CIE in 1964

and today is the most commonly used of the two (Berger-Schunn, 1994).

The color vision of each person is different from this Standard

Observer. The Standard Observer is an average though and should be used

as such. Researchers still question whether or not the Standard Observer should be revised since each person differs from the mean and a few

15

researchers have pursued this course of action including North and Fairchild,

and Stiles who all achieved results similar to the CIE recommendations (Color

Forum, 1993). More important, though, is the fact that the colors we perceive

for a given object do not only depend on the color matching functions but on

additional, as yet not fully understood, operations of the brain.

2.1.2.3 Calculating tristimulus values. Now that standard illuminants and observers have been defined, the only part of the triad that remains is the light reflected from the object. This is measured in the form of a reflectance curve (measuring this curve will be discussed later). From the triad combination, tristimulus coordinates are calculated so that each sample is assigned a position in CIE color space. In other words, the coordinates X,

Y, and Z describe color stimuli. In order to get these values, the amount of light that falls on the eye is determined by the spectral power distribution

(S(λ)), which will be known as SPD throughout the rest of this dissertation, of

the illuminant which has been predetermined, then a part of that light is

reflected from that sample and the reflectance curve (R(λ)) is attained. For

each wavelength the product of the spectral power distribution and the

reflectance curve is multiplied by the color matching function for that

coordinate. For example, if determining the value for X, the S(λ) is multiplied

by R(λ) and then multiplied by ¯x (λ). This is done for each wavelength then

summed to get the value X. This process is repeated for Y and Z with use of

16

the corresponding color matching function (Berger-Schunn, 1994; Color

Forum, 1993), as seen in the following equations (, 2004):

Tristimulus Values

X = ∑S(λ)R(λ)x(λ)

Y = ∑S(λ)R(λ)y(λ)

Z = ∑S(λ)R(λ)z(λ)

The tristimulus value Y has special importance. It actually is a measure of the lightness in the sample (Berger-Schunn, 1994). If Y=100 then it is perfect white, and 100% of light is reflected at all wavelengths. This is the maximum value for Y for non-fluorescent samples (Billmeyer & Saltzman,

1981). Remember that these values can change depending upon which illuminant and observer are used (Berger-Schunn, 1994). Once these values are calculated, the X,Y,Z coordinates can be plotted in color space (Aspland,

1993).

Unlike the Munsell system which is based on physical samples having steps of equal in terms of lightness, hue and chroma, the

CIE system is not based on physical samples nor is it based on equal steps of perception. Its intent is only to determine if two colors match for the standard

17

observer. It does not tell what colors look like or how they differ (Billmeyer &

Saltzman, 1981). Colors match when tristimulus values are the same for two samples (Berger-Schunn, 1994), but only under the conditions for which that set of tristimulus values was determined (MacAdam, 1985). This type of color

matching is referred to as basic colorimetry (Color Forum, 1993).

A three dimensional space is difficult to visualize with X, Y, and Z. It is more practical to translate these coordinates into a two-dimensional color map. These new coordinates are called chromaticity coordinates and they describe the hue and chromaticness of the colors along with its lightness. In

CIE, these chromaticity coordinates are x,y,z (note that these do not have a bar over them as in the color matching functions) (MacAdam, 1985):

Chromaticity Coordinates

X x = X + Y + Z Y y = X + Y + Z Z z = X + Y + Z

If x, y, and z, are added together, the result will be 1.0, thus if two values are

present, the third can always be calculated (MacAdam, 1985). This is the

basic premise of the two-dimensional color map because only two dimensions

18

can be graphed into a plane. Usually only x and y are plotted and the result is called the CIE chromaticity diagram. The chromaticity coordinates of the spectral colors define the boundaries of this graph. The points making up the

outline of the diagram are called the spectrum locus, which is shaped like a

horseshoe (Figure 2.1).

Figure 2.1: Chromaticity Diagram.

The chromaticity diagram can be changed to a three dimensional

space by adding a Y (tristimulus value) axis rising up from the coordinates for

the standard illuminant which has been predetermined. However, as Y

19

increases, the horseshoe gets smaller. With this transformation, the appearance of a color still cannot be inferred (Billmeyer & Saltzman, 1981).

As MacAdam (1985) has shown, chromaticity coordinates can be used to explain more when comparing two samples. Consider two samples, one where X=15.50, Y=24.19, and Z=22.64, and the other where, X’=25.26,

Y’=39.42, and Z’=36.89. Since these samples do not have the same tristimulus values, they do not match. If they are converted to chromaticity coordinates, the first sample has Y=24.19, x=.2487, and y=.3881. The second sample has Y’=39.42, x’=. 2487, and y’=.3881. We can see that the two samples have the same chromaticity, but differing luminance. This was not evident in the tristimulus values (MacAdam, 1985).

2.1.2.4 Metamerism. In the case above, the differing tristimulus values show that the samples had different spectral reflectance curves. It can also happen that two samples have different reflectance curves but the same color coordinates. Identical tristimulus values can also be calculated from different reflectance curves and result in metameric objects (Berger-Schunn,

1994). This can be simplified by defining metamerism as samples that have different reflectance curves, but match under at least one illuminant with at least one observer. The color difference of metameric pairs at illuminant/observer conditions other than the conditions in which they match is

20

called the metamerism index, which requires a statement of conditions when

the pairs match and a statement of conditions for which the index was calculated (Berger-Schunn, 1994).

2.1.2.5 Ellipses. Another irregularity in color space is that there

are numerous ellipses that define the threshold of color (Hunter, 1975). With

any given x, y color center, an ellipse exists around it. Any point within the

contours of the ellipse visually appears the same color as the color center of

the ellipse (Melgosa, 2000). Thus, colors can sometimes appear to match

even if they do not have the same tristimulus values. They are not an exact

match technically speaking, but to the human eye, no difference is noticeable.

Thus through experimenting, a threshold of color difference can be attained

(Hunter, 1975). However, color space is three dimensional, therefore ellipsoids exist which are simply three-dimensional ellipses (Alman, Berns,

Snyder, & Larsen, 1989).

Within a color space, color tolerances vary depending upon the location of the color. Ellipses summarize these inconsistent tolerances. The ellipses are not equal in size and the color tolerance changes with the direction of the color difference (Alman et al., 1989). For example, the human eye can see changes that plot as small distances in the violet region of the spectrum but it is relatively insensitive to changes that plot as much larger distances in the green region of the chromaticity diagram (Hunter, 1975). The shape and the size of ellipses are also affected by observing conditions. For

21

example, as the visual field size increases, the size of ellipse decreases and

the perceiver can see smaller color changes (Wyszecki & Fielder, 1971).

2.1.3 Colorimetric spaces. Since CIE color space is not based on

equal visual perception nor does it have the ability to identify the appearance

of color or the amount of difference between two samples, transformations

have been made that convert the tristimulus values to other, more useful,

color spaces. The transformations have been called colorimetric spaces

because they allow for color difference measurement. Many attempts have been made and used, but today only a few are of particular importance

(Billmeyer & Saltzman, 1981).

Transformation can be either linear or non-linear. Non-linear transformations relate more to textiles than do linear ones. Non-linear transformations began by trying to place CIE color space in terms of opponent color theory. In essence, the eye codes color into three signals; light or dark, red or green, and blue or yellow (Billmeyer & Saltzman, 1981).

In 1942, Hunter used this theory to create Hunter L, a , b color space. L is the lightness of the sample and it goes from 0 (black) to 100 (white). Next, “a” is redness (+) or greenness (-) and “b” is yellowness (+) or blueness (-). Then the L, a, b coordinates can be plotted into a more uniform color space

(Hunter, 1942).

Based on Hunter’s approach, Adams and Nickerson created what came to be known as ANLAB. The principle difference between Hunter’s

22

approach and ANLAB lies in the transformation applied to X, Y ,and Z to

obtain the L, a, b values. Hunter used square root functions while Adams and

Nickerson used cube root functions. In 1976, ANLAB became recommended

by the International Commission on Illumination (CIE) and became known as

CIELAB. CIELAB has two sets of coordinates. Since it also uses opponent

color theory, like Hunter’s, asterisks note the coordinates. Instead of L, a, b,

CIELAB uses L*, a*, b* in the first set of equations (Billmeyer & Saltzman,

1981):

CIELAB

1/ 3 ⎛ Y ⎞ ⎛ Y ⎞ L* = 116⎜ ⎟ −16 for ⎜ ⎟ > .008856 ⎜ ⎟ ⎜ Y ⎟ ⎝ YO ⎠ ⎝ O ⎠

⎛ Y ⎞ ⎛ Y ⎞ L* = 903.3⎜ ⎟ for ⎜ ⎟ < .008856 ⎜ Y ⎟ ⎜ ⎟ ⎝ O ⎠ ⎝ YO ⎠

1/ 3 1/ 3 ⎡⎛ X ⎞ ⎛ Y ⎞ ⎤ a* = 500⎢⎜ ⎟ − ⎜ ⎟ ⎥ ⎜ X ⎟ ⎜ Y ⎟ ⎣⎢⎝ O ⎠ ⎝ O ⎠ ⎦⎥

1/ 3 1/ 3 ⎡⎛ Y ⎞ ⎛ Z ⎞ ⎤ b* = 200⎢⎜ ⎟ − ⎜ ⎟ ⎥ ⎜ Y ⎟ ⎜ Z ⎟ ⎣⎢⎝ O ⎠ ⎝ O ⎠ ⎦⎥

(note: Xo, Yo, and Zo are the tristimulus values for the

illuminant)

23

In essence these formulae take the cube roots of the tristimulus

values, subtract them and then weight them in order to get numbers to represent the signals of the chromatic experiences. It was experimentally determined that taking the cube root difference of luminance is approximately equal to differences in lightness (Kuehni, 2005).

CIELAB also has equations that change the coordinates L*, a*, b* into

L*, C*, h (Note that “h” has no asterisk). L* is for lightness, C* is chroma, and h is the hue angle. Some can visualize the color with this terminology better

(Christment, 1998). The a* and b* plane can be described by C* which is the distance from the origin where a* and b* equal zero. The “h” is an angular distance in degrees measured counter clock-wise from the line which represents positive a* (Aspland, 1993). If the hue angle is close to 0º or 180º, a* corresponds to chroma and b* corresponds to hue. If the hue angle is close to 90º or 270º then b* corresponds to chroma and a* corresponds to hue. If the angles are not close to the major axes then a* and b* are of similar magnitude and neither are directly related to hue or chroma (Aspland

& Jarvis, 1986). The parameters of lightness, chroma, and hue are defined in the following equation (Christment, 1998):

24

CIELAB L*, C*, h

1/ 3 ⎛ Y ⎞ ⎛ Y ⎞ L* = 116⎜ ⎟ −16 for ⎜ ⎟ > .008856 ⎜ ⎟ ⎜ Y ⎟ ⎝ YO ⎠ ⎝ O ⎠ ⎛ Y ⎞ ⎛ Y ⎞ L* = 903.3⎜ ⎟ for ⎜ ⎟ < .008856 ⎝ YO ⎠ ⎝ YO ⎠

2.1.3.1 Problems with CIELAB. The main problem with CIELAB is that it is a much too simplistic model to describe the complexities of color vision. It is a Euclidean model used to describe non-Euclidean behavior of the visual system (Kuehni, 2003a). In the CIELAB colorimetric space, the use of

Fechner’s law creates a “bulking out” in the color matching functions (Kuehni,

1998b), and the cube root function negatively affects Munsell hue placement

leading to chroma errors on the yellow and blue axis of the color space

(Kuehni, 1999d). Another issue that arises is that hue differences are distorted around the hue circle (Kuehni, 1999a). The CIELAB colorimetric space is cylindrical and in this shape, as chroma increases the difference between two hues increases and consequently it appears that new hues are created. For example, imagine a cross section of the color space which would look like a circle. If two hues are chosen, at 90° and at 100° with a

Munsell chroma of 2, these colors would be represented by two points close

to the center of the circle and close to one another. However, when the

25

Munsell chroma is increased to 14, then the points are closer to the outer edge of the circle and much further apart from each other. When Euclidian geometry is applied as in the color difference equations to be discussed later, the distance between the hues at the higher chroma is much larger than the distance between the hues at the lower chroma and it appears that more hues should be present at the higher chroma (Kuehni, 1999d). In fact, more recent research indicates that new hues are created between two given constant hues as chroma increases, but the hue lines differ non-linearly such that fewer hues are created than expected in CIELAB (Kuehni, 2003a). Even in an opponent diagram where one thinks in terms of L*, a*, and b*, the spacing is not uniform with regard to Munsell, suggesting that the conversion of the tristimulus values is inaccurate (Kuehni, 1998b). Lastly, changing sizes

of ellipsoids may indicate the irregularity of CIELAB colorimetric space (Alman

et al., 1989). Recently it has been found that luminance level changes

discrimination. Color discrimination of the average observer improves at

higher luminance, therefore reducing the size of ellipses (Pridmore &

Melgosa, 2005). According to the CIE, though, CIELAB colorimetric space is

uniform (Christment, 1998).

2.1.4 Color Measurement. What has been discussed so far are the

necessary components for measuring color stimuli instrumentally. Numbers

are assigned to what we see as a color so that we can communicate what we

see to others. However, color is also measured visually. Regardless of

26

whether color is measured visually or instrumentally, two principles must be

followed: examination and assessment. The examination involves the triad of

light source, object, and observer. The assessment is a statement of whether

or not there is a difference between two samples. If there is a difference, it

must be stated in terms that others who are involved can understand, whether

it be qualitative or quantitative. The last step of assessment is to evaluate the

difference in color and decide whether or not it is an acceptable color difference. Standard conditions must be agreed upon before color measurement can begin (Billmeyer & Saltzman, 1981).

2.1.4.1 Visual measurement. Color measurement can be done

in two different ways, the first is visual assessment, which has been

standardized by ASTM. Visual measurement is that technique in which the observer views both a standard and a sample at the same time under a standard light source. The human eye cannot estimate the size of a color

difference, but it can tell whether or not two samples match, which is

important for textile dyeing because the human eye is the organ that sees the

fabric. So far this procedure has been “unsurpassed” (Billmeyer & Saltzman,

1981).

When measuring samples visually, it is important to use two or three

standard light sources that are agreed upon and used by both the buyer and

the seller. As discussed previously, color can vary under different sources

and even under different observers with normal color vision (Billmeyer &

27

Saltzman, 1981). If a test is repeated using the same observer and the same conditions, the observer may have different responses. The observer appears to be affected by outside events such as time (Wyszecki & Fielder,

1971). Observers could be screened beforehand in order to get the least amount of variability, but this biases the results. The average of a panel of observers should be used instead (Billmeyer & Saltzman, 1981).

In visual observation, samples should always be placed side by side on the same plane. The nearer the samples are to each other, the more sensitive the eye is to small color differences. Also the samples should be switched side to side and examined again to avoid bias (Hunter, 1975).

Lastly, the samples must be appropriate (Billmeyer & Saltzman, 1981). In other words, it is necessary to have the same size area and pattern (Hunter,

1975). Visual results will vary least if a neutral gray background is used. In a study of parametric effects of small color differences, Guan and Luo reported that the separation between the samples has no effect on visual judgment of color difference (Guan & Luo, 1998b).

2.1.4.2 Instrumental measurement. Instrumental color

measurement began to be widely used in the early 1900s (Billmeyer &

Hammond, 1990). Today, it is increasingly important, especially in textiles

where visual measurement has been relied upon thus far (McDonald, 1990).

In visual measurement, the eye is sensitive to qualitative differences in color,

whereas instruments are sensitive to quantitative differences. Once we have

28

the instrument, which acts as both light source and detector, and the sample, the light reflected from the sample can be measured at each wavelength.

Most instruments used for textiles use polychromatic illumination that passes through a monochromator after reflection. Monochromatic light is then used to illuminate the sample. Monochromatic light refers to light containing one wavelength of the spectrum and the instrument scans through the spectrum to measure the spectral reflectance curve. Other instruments illuminate the sample with whole light and the reflected light passes through the monochromator to produce the curve. The curve contains information needed to calculate the color for any source and observer. This is the process called spectrophotometry (Billmeyer & Saltzman, 1981).

Modern spectrophotometers are linked to computers, which store all the standard illuminant and observer information. The computer is used to compute the tristimulus values and chromaticity coordinates (Hunter, 1975).

The nature of the spectral power distribution of a source in a spectrophotometer is relatively unimportant for measuring reflectance and for calculating the tristimulus values as long as the samples are not fluorescent.

The reason is that the reflectance function is independent of the spectral power distribution (SPD) of the light source, again only for non-fluorescent samples. The only requirements are that the source is stable and that it produces enough power at each wavelength of the visible spectrum

(Billmeyer & Saltzman, 1981).

29

A spectrophotometer is a very precise instrument. However, an instrument can only be as accurate as the procedures and standards by which it was calibrated. Most spectrophotometers are standardized but differences in construction and programming can change results. It is best to measure the standard and the sample consecutively (Billmeyer & Saltzman,

1981).

2.1.5 Color difference. Although a human observer can judge two materials as matching or not, color perceived by each observer is subjective and there is variation in opinion. Color difference equations help reconcile the variations in opinions with objectivity (Billmeyer & Saltzman, 1981).

Companies want to rely more on objective instruments and less on subjective humans, thus they only visually inspect the color matches that have failed according to color difference equations (McDonald, 1988). Ideally, calculation of color difference would give numbers that have the same result as visual judgment, but this may be an unattainable goal (Wright, 1959).

Since 1931, many color difference equations have been published. In

1963, approximately 20 color difference formulae were in use (McDonald,

1990) but by 1976, only 13 color difference formulae remained in use

(McDonald, 1988). During these years, the use of many equations caused confusion since they frequently did not agree with each other. More importantly, each of the equations exhibited low performance (Billmeyer &

Saltzman, 1981). Performance of equations is based on how well they

30

correlate with visual judgments which is done by comparing a large number of colors (McDonald, 1980).

2.1.5.1 CIELAB. In 1976, the CIE recommended both the use

of a colorimetric space, CIELAB, which they proclaimed to be uniform, and

equations for calculating color differences within this space. The

Pythagorean theorem is used to measure the distance between two points in

an Euclidean space (Christment, 1998):

CIELAB (L*, a*, b*)

2 2 2 1/ 2 ∆E a*b* = [()()()∆L* + ∆a* + ∆b* ]

The color difference can also be calculated in terms of lightness,

chroma, and hue. The L* and C* are in quantifiable terms, but metric hue

angle (h) is not, thus the change in H* must be calculated first (Billmeyer &

Saltzman, 1981). With color difference equations, this can simply be called

CIELCH (Aspland, 1993; Berger-Schunn, 1994):

CIELCH

2 2 2 1/ 2 ∆E a*b* = [()()()∆H* + ∆L* + ∆C* ]

⎛ π ⎞ ∆H* = C * ∆h⎜ ⎟ ⎝180° ⎠

31

This definition of H* is recommended for use with small color differences. However, the concept of small color differences is not clearly defined and neither is C*, although most agree that C*, here, is the geometric mean between the standard and the sample and not an arithmetic mean.

Also, this definition of hue is dependent on chroma whereas hue is believed

to be independent of chroma. Due to the confusion inherent in this equation

an alternate definition of ∆H* has been used (Huntsman, 1989):

2 2 2 1/ 2 ∆H* = [()()()∆Ea*b* − ∆L* − ∆C* ]

However, this equation depends on the prior calculation of color difference in CIELAB colorimetric space using a* and b* and the resulting ∆H* has components of both hue and chroma. Including both hue and chroma in the definition of ∆H* exaggerates small hue differences at high chromas.

Through research, Huntsman has suggested the following equation as a more

precise definition of ∆H* which is independent of chroma (Huntsman, 1989):

⎛ π ⎞ ∆H* = ∆h⎜ ⎟ ⎝180° ⎠

Meanwhile, another researcher has suggested an alternative definition

of H* (Seve, 1991):

32

∆h ∆H* = 2()C* C* 1/ 2 sin 1 2 2

If it were clearer how to calculate ∆H*, then CIELCH would be more

commercially useful (Seve, 1991).

For CIELAB and CIELCH color difference equations, a ∆E* around 1 is

considered to be an acceptable color match (Billmeyer & Saltzman, 1981).

One must remember ellipses, though. Many different numerical tolerances apply in different regions of color space (McDonald, 1980, McDonald, 1988).

For a given visual difference the ∆Ea*b* between colors with high chroma can

be seven times larger than for neutral grays (McDonald, 1988). A small

difference is important for dark grays and , but a larger difference is

relatively unimportant for highly saturated colors (Aspland, 1993).

In 1993, most textile businesses used either CIELAB or CIELCH color difference equations (Aspland, 1993). However, these color difference formulae do not correlate well with visual judgments when tested with samples having small color differences similar to the type that are seen in textile color matching (Kuehni, 1999d). For example, when using CIELCH,

McDonald has found that in visual judgments of two sets of samples with the same numerical difference between the standard and the sample, that the color differences were not judged to be equal visually (McDonald, 1990).

Poor correlation may stem from the possibilities that Munsell is not visually uniform, color matching functions are incorrect, the opponent theory is

33

incorrect, and the use of cube roots is incorrect (Kuehni, 1999d). In actuality, the CIELAB formulae do not represent the Munsell system well (Kuehni,

1998a; Kuehni, 1999b; Kuehni, 1999d).

Upon recommendation of the CIELAB color difference formulae in

1976, the CIE acknowledged that the formulae were imperfect in representing visual judgments (Color Forum, 1991). The CIE also indicated that weighting factors were necessary for color difference calculations dependent upon the application (Melgosa, 2000). However, the main goal of the CIE at that time was to reduce confusion in the area of color matching since so many other formulae were being used (Color Forum, 1991). This subject is further discussed in Kuehni’s 1990 study (Kuehni, 1990).

2.1.6 Alternative color difference equations. As research in color

measurement has continued, other color difference equations have been

proposed, including CMC, BFD, and CIE94 that make use of such weighting

factors. These equations were specifically engineered for color measurement

of small color differences. In other words, they adjust the local inaccuracies

of the CIELAB system (Kuehni, 1998b; Kuehni, 1999d). These equations

have a common structure (Melgosa, 2000):

2 2 2 1/ 2 ⎡⎛ ∆L * ⎞ ⎛ ∆C * ⎞ ⎛ ∆H * ⎞ ⎤ ∆E* = ⎢⎜ ⎟ + ⎜ ⎟ + ⎜ ⎟ ⎥ ⎜ k W ⎟ ⎜ k W ⎟ ⎜ k W ⎟ ⎣⎢⎝ L L ⎠ ⎝ L L ⎠ ⎝ L L ⎠ ⎦⎥

34

Thus ∆L*, ∆C*, ∆H* are computed and weighted by two types of

variables, W and k. W represents weighting functions whose intent is to improve CIELAB’s visual uniformity by matching the ellipsoid shape of the unit

difference contour. The k on the other hand represents parametric factors

that offset the influence of conditions on perceived color differences

(Melgosa, 2000).

2.1.6.1 CMC. It was found that CIELAB calculations

underweight the importance of lightness (Morely, Munn, & Billmeyer, 1975)

and have poor correlation with visual measurement because of the non-

uniformity of color space (CMC, 1998). Roderick McDonald had found that in

the and CIELAB tolerances were over-predicted.

Consequently, he added a hue-angle dependent correction factor to CIELAB

color difference equation to create JPC79. Later, JPC79 was refined and

named CMC after the British Color Measurement Committee (Qiao, Berns,

Reniff, & Montag, 1998). CMC adds a simple modification to CIELAB. It

adjusts for the incorrect calculation of hue differences in CIELAB, the CIELAB

hue scale, and adaptation effects due to the lightness and chroma of the

surround (Kuehni, 1999c).

Specifically, CMC was developed because a hue difference correction

function was needed (Kuehni, 1999a; Qiao et al., 1998) for small color

difference calculation (Kuehni, 1999c). This equation adjusts the hue scale

analytically. In essence, a hue scale adjustment factor (HSAF) is derived for

35

the hue angle of the standard and then it is applied to the CIELAB color difference equation which is then divided by the appropriate factor. The

HSAFs are optimized around the hue circle as determined through different sets of small color difference data. These adjustment factors have uniform application regardless of the chroma of the samples (Kuehni, 1999b).

Chroma and lightness scaling were also in need of adjustment as compared with Munsell data (Kuehni, 1999c). CMC allows the use of single number tolerances (McDonald, 1988) that can be used for all changes in color regardless of area of color space, whereas individual tolerances need to be set for each color in CIELAB (Vanderhoeven, 1992).

CMC also allows for adjustment of tolerance ellipsoids and recognizes that ellipsoids around standards are different in size depending on where a standard is in color space. These ellipsoids are scaled so that L*, C*, and h are the same size as they are visually perceived (McDonald, 1990). CMC color difference is defined by the following equation (Berger-Schunn, 1994):

36

CMC

1/ 2 2 2 2 ⎡⎛ ∆L *⎞ ⎛ ∆C *⎞ ⎛ ∆H *⎞ ⎤ ∆E = ⎢⎜ ⎟ + ⎜ ⎟ + ⎜ ⎟ ⎥ CMC ⎜ lS ⎟ ⎜ cS ⎟ ⎜ S ⎟ ⎣⎢⎝ L ⎠ ⎝ c ⎠ ⎝ H ⎠ ⎦⎥

where, SL is the function of L*

SC is the function of C*

SH is the function of H* and C*

and, l and c are correction factors

The use of the ellipsoidal semi-axes (lSL, cSC and SH) makes CMC

appropriate for many colors (CMC, 1998). They define the size of the

tolerance ellipsoid around the standard (Heggie, Wardman & Luo, 1996). SC adjusts chroma (Kuehni, 1999c). The use of SC is necessary because the chroma differences of colors with high chroma are less distinguishable than those with lower chroma (Kuehni, 1999d). SH is the hue difference

adjustment factor. This factor is both hue angle dependent and chroma

dependent. The chroma dependency adjusts hue differences so that they

coordinate with the chroma differences that have been adjusted by SC. The hue angle dependent portion of SH changes hue differences so that they

display better agreement with the Coats acceptability data. SL adjusts

lightness scaling in order to make ellipsoids spherical and the same in size

throughout color space (Kuehni, 1999c). Lightness requires scaling because

equally sized lightness differences are not equally noticed, so the CMC

37

equation is adjusted in recognition of the fact that when lightness is higher, differences are less noticeable (Kuehni, 1999b). In practice, SL does not work

as one would expect by only decreasing the importance of lightness in

samples high in lightness. Instead it just reduces the importance of lightness

altogether (Kuehni, 1999d).

The l and c are correction factors that can be chosen so that the

calculated numbers correspond to optimal values for particular color matching

conditions and samples. In textile samples, the impact of differences is less

than in flat samples, thus, an l of 2 is used for textile color difference

measurement, i.e., CMC(2:1).

In 1989, the CMC equation was included in the AATCC test manual

(Aspland, 1993). Until 1994 (Heggie et al., 1996), CMC was considered the

best equation to use (Berger-Schunn, 1994; Vanderhoeven, 1992; McDonald,

1988). More precisely, CMC(2:1) was shown to have the best correlation with

visual judgments (Berger-Schunn, 1994). Some researchers have even concluded that the CMC equation is more reliable than the human eye in perceptibility of color difference (McDonald, 1988). In 1989, one in five companies surveyed said that they used a formula with better agreement than

CIELAB, most of whom used CMC (Melgosa, 2000). Today, CMC is an

International Standards Organizations (ISO) standard and is the most widely

used formula in the textile sector.

38

CMC has been shown to have problems, though. For example, when measuring samples with large color differences, the weighting factors are drastically different depending upon which sample is chosen as the standard and consequently, it is not supposed to be used for large differences (Guan &

Luo, 1999a). Also, the CMC equation is considered practical for use with many colors. When only a limited range of colors, such as those used for

Army uniforms, are being considered, the CMC equation is less useful and a color specific formula should be used. If the color difference between samples is due to changes in hue and chroma, and if the CMC equation passes the match, an observer also will. If the color difference is in lightness, an observer cannot tell the differences as easily, so even if the CMC equation indicates “no match”, an observer will pass it. One study showed that CMC disagrees with visual assessment in 23.9% of observations (Vanderhoeven,

1992). Another study has found that the coefficient of variation between average visual judgments and calculated differences is approximately 50%

(Kuehni, 1999d). However, CMC does perform better than CIELAB which was its goal (Alman et al., 1989). As with other color difference calculations,

CMC is based on a particular set of observational data. However, the data on which it was based has gaps as well as unexplained irregularities (Kuehni,

1990). Also, Kuehni has found that CMC only adjusts for the incorrect calculation of hue differences by CIELAB and does not correct the other problems associated with CIELAB colorimetric space (Kuehni, 1999c).

39

2.1.6.2 CIE94. After the 1976 CIE recommendations, the CIE

Technical Committee 1-29 was established to improve CIELAB for better

prediction of perceived color differences in conditions common to those used

in industry (Melgosa, 2000; Color Forum, 1991). The committee’s work led

to a development in CIE based color difference. The CIE94 formula was

recommended by the CIE in 1994 along with reference conditions for which it should be performed (Christment, 1998). CIE94 is similar to CMC but it has a

constant value of unity for SL. It addresses the fact that a given chroma

difference is less visible at high chroma than at lower chroma (Kuehni,

1999b). It also uses simplistic linear functions of chroma in order to get

values for SC and SH (Heggie et al., 1996). The following equation is the

CIE94 color difference equation (Christment, 1998):

CIE94

1/ 2 2 2 2 ⎡⎛ ∆L * ⎞ ⎛ ∆C * ⎞ ⎛ ∆H * ⎞ ⎤ ∆E * = ⎢⎜ ⎟ + ⎜ ⎟ + ⎜ ⎟ ⎥ 94 ⎜ k S ⎟ ⎜ k S ⎟ ⎜ k S ⎟ ⎣⎢⎝ L L ⎠ ⎝ C C ⎠ ⎝ H H ⎠ ⎦⎥

where, S L = 1

SC = 1+ .045C *

and S H = 1+ .0015C *

In the textile industry it has been proposed that kL=2, kC=1 and kH=1

(Heggie et al., 1996). However, others have proposed that kL should be 1.5

(Guan & Luo, 1998b).

40

The biggest differences between CMC and CIE94 are the usefulness

of lightness and the hue weighting factor. Unlike CMC, CIE94 does not

weight the importance of lightness since SL=1, whereas in CMC, SL is an

increasing function of lightness (Melgosa, 2000). SH has similarities to the SH of CMC. In CMC this adjustment factor for hue is dependent upon hue angle as well as chroma. In CIE94, the adjustment factor is dependent upon chroma and is an increasing function like CMC. In CIE94 though, SH is not

dependent on hue (Melgosa, 2000; Kuehni, 1999c), because data based on small color differences used to create CIE94 did not show a need for the hue difference correction (Kuehni, 1999a).

An issue with the weighting functions is that, in the CMC equation, the values of these functions are different depending upon which color is designated as the standard and which color is designated as the sample.

Therefore, Guan and Luo recommend the geometric mean of the C* of the standard and the C* of the sample be used when calculating SC. Similarly, the

CIE suggests that the geometric mean of the chromas be used when calculating SC for CIE94 (Guan & Luo, 1999a).

Research in 1996 found that CIE94 has better performance than any

other published equation (Heggie et al., 1996). For large color differences,

CIE94 has also been shown to perform the best (Guan & Luo, 1999a). For

example, in samples that differ only by chroma and have a constant color

difference of one CIE94 unit, they vary from 1.1-1.6 CIELAB units (Melgosa,

41

2000). However, the AATCC test manual states that CIE94 shows no

improvement over CMC (CMC, 1998). Other researchers have also found

that CMC and CIE94 work equally suggesting that the hue dependency of the

hue weighting factor in CMC is unnecessary (Qiao et al., 1998). Currently, a

new committee, the CIE TC 1-47, has been formed in order to improve the

performance of CIE94 (Melgosa, 2000). This committee has completed its

work with the development of CIEDE2000.

2.1.6.3 CIEDE2000. Since the CIE believes that CIE94 is not a

sufficient color difference equation, they developed another equation,

CIEDE2000, which was the fifth recommendation of the CIE since 1964

(Kuehni, 2002). Another reason for developing a new equation is that the hue

difference correction function in the CMC equation was considered by some

not to be accurate and a new one was developed for CIEDE2000. This

equation is fitted to the COM dataset and is a compilation of four datasets,

RIT-DuPont, Witt, Leeds, and BFD-P. Three of these datasets use glossy

paint samples while BFD-P uses various substrates. They also use varying

methods of evaluation including gray scale comparison and paired

comparison (Luo, Cui & Rigg, 2001). This equation uses 5 correction factors

including a correction factor for lightness, for chroma, and one for hue as well

as a factor for the interaction between hue and chroma, RT, to improve color

difference predictions in the blue region, and a factor for rescaling a* (Cui et

al., 2002).

42

Some claim CIEDE2000 is the best equation we have so far as it offers significant improvements over CMC and CIE94 (Kuehni, 2002). In experiments, CIEDE2000 has been shown to perform better than the other color difference equations (Xu, Yaguchi & Shioiri, 2002; Cui, Luo, Rigg & Li,

2001;Luo et al., 2001). However, other research has found that equations like CMC and CIE94 perform equally well or better than CIEDE2000 (Luo et al., 2001). While factors have been introduced into this equation to adjust hue differences depending on hue angle, the applicability to all levels of chroma and lightness have been questioned. The reliability of the datasets is questioned too since different experimental conditions are used in each, such as light sources and substrates (Kuehni, 2002).

2.1.7 Non-Uniformity in Color Difference Studies. There is still

disagreement in the evaluation of small color differences. Kuehni suggests

that too little attention is given to color difference problems (Kuehni, 1990).

Others complain that colorimetry is used improperly or that people have unrealistic expectations regarding color difference measurement (Color

Forum, 1993). A number of reasons contribute to this disagreement, including the color samples in question (Kuehni, 1990). Samples for each study differ depending on the intended use for the study. For example the sample may be paint, plastics, textiles or other colored objects (Alman et al.,

1989). The presentation of the sample also varies. In each study the size

differs, the separation between the standard and the sample differs, and the

43

background on which the sample is mounted differs (Alman et al., 1989; Guan

& Luo, 1998b). Not only the physical size and mounting change between studies, but the colors change as well. Each study employed a different

magnitude of color difference. For example, some only have small color

differences, while others have a random sample of color differences, and

some only concentrate on a few colors while others employ the entire hue

circle. Color can be changed through hue, lightness, and chroma, all of which

have different changes in each study (Alman et al., 1989). Lastly, each

equation is designed to fit a specific experimental dataset and each equation

uses a different dataset. Studies have shown that visual results differ least

when a neutral gray background is used while the amount of separation

between samples does not affect the results (Guan & Luo, 1998b).

Another important factor in color evaluation is the observers.

Observation can be affected by mood, time of day, fatigue and age (Perez,

Hita, del Barco & Nieves, 1999). Most studies use a small sample size since

so much time and effort is required in making color judgments. Since the

sample size is small, differences in individuals’ judgments are likely to be

significant. Tasks given to observers also differ between each experiment.

Some observers may be asked if they perceive a color difference, while

others may be asked if they accept the two samples as a color match. The

44

methods of obtaining the results differ, too. For example, an experiment can use paired comparisons, magnitude estimation, or categorical scaling such as the gray scale, and more (Alman et al., 1989; Guan & Luo, 1998b).

2.2 Variability of Daylight

As discussed in 2.1 of this literature review, light is one of the three

components that create the sensation of color and color changes with incident

light. Lighting is very important in quality control of colors since it affects color

perception. Lights with similar color temperature, but different spectral power

distributions will show degrees of metamerism (Thiry, 2004). To ensure that

two colors match, industry colorists view the samples under a number of light

sources. Since garments are generally worn indoors and out, such light sources include fluorescent light that is used in most offices, incandescent

light that is often used in the home, and most importantly daylight. Daylight

actually reveals the “most accurate and unbiased rendition of color” (Thiry,

2004, p. 34).

While daylight imparts true color, natural daylight is difficult to classify

since it constantly changes due to changes in the weather. At any given time,

daylight varies greatly from location to location. Meanwhile, daylight at 1:00

P.M. can be largely different from daylight at 1:05 P.M. at the same location,

thereby resulting in changes in perception of color. Therefore, it is unrealistic

to evaluate color matches under natural daylight and an artificial standard for

daylight was developed by the CIE in the 1960s (Xu, Luo & Rigg, 2003a).

45

To develop this standard, natural daylight was measured in three locations, Teddington, UK, Washington, DC, and Ottawa, CN. The spectral power distribution (SPD) was measured every 10 nm at these locations, then it was averaged across them. Next, the SPD was interpolated to every 5 nm.

Finally a mathematical procedure was used to approximate the SPD at several correlated color temperatures (CCT). The CIE then recommended standard daylight illuminants with CCT’s at 5000K, 5500K, 6500K, and

7500K. The daylight illuminant that is preferred for textiles, paints, and plastics is that with a CCT of 6500K, known as D65. D65 represents average daylight of visible light and a small part of ultraviolet light. That is, it represents daylight from the 300 nm to 700 nm range of the electromagnetic spectrum (Xu et al., 2003a).

What has been defined by the CIE as a standard illuminant for average daylight is only a spectral power distribution (SPD), not an actual source. So far, it has been impossible to make an actual light source that accurately represents D65, instead sources have only been made to simulate the standard illuminant (Xu et al., 2003a). The CIE has not recommended the

SPD of any daylight simulator as the illuminant of choice (Lam & Xin, 2002).

In CIE No. 51, the CIE recommends that the quality of a source be judged by the CIE Colour Rendering Index (CRI) and the Metamerism Index (MI) (Xu et al., 2003a) which are discussed further in this review.

46

Existing daylight simulators have been classified into three types.

These types included xenon lamps with high pressure and short-arcs known as Xe-sources, filtered tungsten-iodine or tungsten-halogen lamps known as

T-sources, and lastly fluorescent lamps known as F-sources. This last type is not to be confused with the standard fluorescent light used in offices which has a largely different SPD and CCT. There are 12 different F-sources that

CIE has further broken down into three groups. The standard group contains

F1-F6, the broadband group contains F7-F9, and the tri-band group contains

F10-F12. Of these 12 different fluorescent lamps, the CIE has recommended

F7 of the broadband groups as the best F-source to simulate D65 (Xu et al.,

2003a).

Of the different sources, F and T sources are the most used in the textile industry. T- sources were specifically designed to simulate the SPD of

D65 and are the closest in SPD to D65, but they are low in energy efficiency and high in cost (Xu et al., 2003a). On the other hand, F-sources have high energy efficiency and are relatively low in cost, but they are largely different from D65 in SPD. Tri-band fluorescent sources are still used in industry, even though researchers concluded that they are not qualified to be D65 simulators and the CIE has recommended use of F7 as the best F-source. Tri-band sources are best in terms of efficiency (Thiry, 2004), but they have energy concentrated around 3 wavelengths and provide the most discrepancy in

47

terms of SPD to the D65 standard illuminant (Xu et al., 2003a). Of the D65 simulators, they reveal inconstancy and metamerism most strongly (Thiry,

2004).

The inability to create an accurate D65 simulator has led to problems in color matching since visual color matching is affected by different SPD’s

(Lam, Xin & Sin, 2001). Pairs that match under daylight do not necessarily

match under a daylight simulator and vice-versa. Ideally, this should not

occur (McCamy, 1999). Also, since a variety of daylight simulators exist,

each having a different SPD, visual results under two different simulators can

be different (Xu et al., 2003a). This lack of standardization can lead to

illuminant metamerism. Many color matches can be conditional. For

example, two colors may match with instrumental evaluation using D65, but

they may not match with visual evaluation using a simulator. If another

simulator is used, then more uncertainty of a color match is added. To make

matters worse, there is also variability between light booths that use the same

source for daylight (Thiry, 2004), including surround, size, and SPD due to

age of lamp. Therefore, judgments of colorists can create confusion in

matching (Lam & Xin, 2002). In order to reduce confusion, it is necessary to

understand how the choice of simulator affects color matching. Also, since

none of the D65 simulators accurately correspond to the SPD of D65, the

48

actual SPD of the simulator needs to be used in color difference calculations

to minimize the variability between instrumental and visual data (Kuehni,

2003; Xu et al., 2003a; Lam & Xin, 2002).

Since daylight allows us to perceive the most accurate interpretation of

color (Thiry, 2004) a high quality of daylight is needed for quality control

purposes. Colored objects must be consistent, thus sources used to evaluate

them in industrial settings must be standardized. Efforts have been made

toward this goal as American (ASTM), British (BSTM), and international (ISO)

standards have been set forth to specify the quality of daylight simulators. Of

course, additional issues would arise in the color difference perceived as the

consumer purchases and uses the product. However, the main purpose of

these standards is not to choose the “best” source for everyone to use, but to

maintain standard viewing conditions. These standards are all based on the

standard daylight illuminants of the CIE and provide specifications for

simulators of these illuminants as they define the range of such

characteristics of the light sources such as the chromaticity coordinates, CCT,

Colour Rendering Index (CRI) and Metamerism Index (MI) (Xu et al., 2003a).

The Colour Rendering Index or CRI was first recommended by the CIE

in 1965. It was revised in 1995 (Xu et al., 2003a). Its purpose is to indicate

how similar a given color would appear under the standard illuminant and

simulated light (Xu, Luo & Rigg, 2003b). This index involves instrumentally

calculating color difference of 14 test color samples. The color difference is

49

calculated between the color of the sample using the SPD of D65 and the color of the sample using the SPD of the simulator taking the chromatic adaptation into account. The values are then converted to an index through a linear transformation for each sample. Eight of the samples are from the

Munsell color atlas. They cover the hue circle at a similar lightness with a moderate chroma. The index of the average of these 8 samples is the general CRI and is denoted as Ra. The Ra defines the color rendering

properties of a light source. The maximum for Ra is 100 which would indicate

that the source is a perfect for the illuminant (Xu et al., 2003a).

The other six colors used in the CRI are one color that represents each

skin and foliage and a highly saturated red, yellow, green, and blue. The

purpose of these samples is to define color rendering properties of a light

source under extreme conditions. This index is denoted as Ri (Xu et al.,

2003a).

In general a Ra above 95 and a Ri above 85 indicate that the source has good color rendering properties. The ISO 3664 standard states that a good quality D65 simulator should have a Ra greater than or equal to 90 and

a Ri greater than or equal to 80 (Xu et al., 2003a). However, critics claim that

the CRI is antiquated (Thiry, 2004) and is not a good indicator of the quality of

a light source for use in visual assessment (Thiry, 2004; Lam & Xin, 2002)

and it is not used in the American standard (Xu et al., 2003a).

50

In constrast, both the international and American standards define the

range for the Metamerism Index (MI) of a quality simulator (Xu et al., 2003a).

While the CRI only involves the changing appearance of one color under

different SPD’s, the MI assesses the change in color difference of color pairs under the standard illuminant and the simulator (Xu et al., 2003b). This index was first recommended by the CIE in 1981 and later revised in 1999. The

samples for MI include 8 virtual metameric color pairs (Xu et al., 2003a; Xu et

al., 2003b). No physical colors exist and the colors are defined by reflectance

factors (McCamy, 1999). Five of the 8 color pairs are in the visible region of

the electromagnetic spectrum while the other 3 are in the ultraviolet region.

Each pair has a ∆E* close to 0 for D65. Color difference is then calculated using the SPD of the simulator (Xu et al., 2003a). Under a good quality daylight simulator, each color pair should have very small color differences

(McCamy, 1999). The average color difference is calculated for each of these two regions to relay the MIvis for the visible region and the MIuv for the

ultraviolet region (Xu et al., 2003a; McCamy, 1999). The average color

difference is then indexed based on the following rating system in Table 2.1

(Xu et al., 2003b):

51

Average ∆E* Index category <.25 A .25-.50 B .50-1.00 C 1.00-2.00 D >2.00 E

Table 2.2: MI rating system.

An “A” rating indicates that both the simulator and the illuminant reveal

metamerism similarly, while an “E” rating means that the two reveal very

different effects of metamerism. If a simulator has a “B” rating for MIvis and a

“C” rating for MIuv then it is denoted with a MI of “BC” (Xu et al., 2003b).

The ISO 3664 standard claims that a good quality daylight simulator

should have a MIvis above a “C” rating and the MIuv should at least be less

than 4 ∆E* units. The American standard as given by ASTM D1729-89

Standard for Visual Evaluation of Opaque Materials for Daylight, designates a

“BC” rating as the minimum rating for a quality daylight simulator. This

standard has also been adopted by the American Association of Textile

Chemists and Colorists (AATCC) (Xu et al., 2003a). The current

recommendation is probably not the last to be proposed by the CIE. The

color pairs used in the MI are only in the first and fourth quadrant of the

CIELAB color space providing what is thought to be insufficient coverage of

realizable colors (Xu et al., 2003b). In addition, McCamy has recently

proposed new samples for the index (McCamy, 1999).

52

However, studies have found that even though a simulator may meet these standards, it does not necessarily mean that it is a quality simulator or that it can be compared to other simulators that also meet the specifications.

A study by Xu, Luo, and Rigg (2003), assessed 15 different daylight

simulators. The illumination level of these simulators ranged from 263 to 985

cd/m2 with most between 400 and 600. This variation is small compared to

the variation in illumination of natural daylight. CCT ranged 800K, but all were within 500K of D65. Chromaticity was within CIE tolerance and Ra was above 90 for all simulators except the tri-band which was 88. The T-sources received A ratings for MIvis, while some F7 lamps received B ratings, and the

tri-band lamp was rated as E (Xu et al., 2003a). When results from each

simulator were compared, tri-band lamps were found to have the worst

agreement, while a F7 lamp had the best agreement with other simulators.

Overall though, with the exception of the tri-band lamp, results were similar

for each combination of simulators implying that the variation in color

difference evaluation of metameric color pairs due to use of different light

sources is not as significant as other sources of variation in color difference evaluations such as observer accuracy and repeatability (Xu et al., 2003b).

Another study by Lam and Xin (2002) found visual results were different

depending upon the MIvis of the simulator and that simulators that had similar

53

quality as defined by the MIvis agreed better. Similarly, visual results for

simulators with MIvis of “A” and those with a “B” rating did not significantly

differ when metameric color pairs were evaluated (Lam & Xin, 2002).

2.3 Variability of Samples

Previous research has concentrated on the use of metameric samples.

Metameric pairs are color pairs that match under one light source, but not

under another due to differing reflectance curves. Consequently, it is likely

that different daylight simulators would reveal the effects of metamerism.

However, a single color, instead of a color pair as in the case of metamerism,

can look very different under different light sources. Color constancy refers to

the color change of a single substrate under different illumination. When

color appears perceptually the same in different illumination, it is said to be

color constant. Otherwise, if color changes, it is said to be color inconstant

(Luo, 2004).

Color constancy stems from the phenomenon of color adaptation.

Humans are equipped with an adaptation mechanism for natural objects. Our

perception of color of a natural object will adjust for varying daylight

illumination from 5000K to 20,000K. This adaptation is often not present for

objects that are artificially colored or for artificial light sources theoretically

because artificial products were not in existence when the human color vision

system evolved (Kuehni, 2003b).

54

Color constancy can in turn affect color matching of pairs of objects.

Two samples can appear very different under different light sources due to color constancy of one of the objects as well as the degree of color difference and metamerism between the objects (Kuehni, 2003b). Colorists have even come to realize that in many cases, it is more important to have color constant specimens than to minimize metamerism (Luo, 2004). Often color constancy is not even a problem (Luo et al., 2003), but if a standard, upon which all subsequent samples are based, is not color constant, it is more difficult to match since the sample will need the same degree of inconstancy

(Luo, 2004). Similarly, if the standard is constant, samples will need the same degree of constancy also. Without similar constancy between standard and sample, companies will get a series of non-constant and perhaps non- metameric garments (Luo et al., 2003). This can lead to consumer complaints (Luo, 2004). For example, if a customer buys a dress that is olive green in the store and in daylight, she may be unhappy.

Similar to metamerism, color constancy can vary by degrees.

That is, the difference in color change displays varying levels of severity upon changing light sources (Luo et al., 2003). In practice, the severity of color constancy is difficult to ascertain. Visual measurement of color constancy involves switching between light sources, but if you wait the proper amount of time for adaptation to occur, often it is not possible to remember the original color. Therefore, a color constancy index has been created based on

55

instrumental measurement (Luo, 2004). However, a color constancy index needs to correlate to visual assessments and so the problem is the same or worse than with color difference evaluations. At this time though, results of extensive field testing have not been published.

To calculate a color inconstancy index, the tristumulus values (X, Y, and Z) are calculated for each illuminant, where X, Y, and Z are the values for illuminant 1 and Xr, Yr, and Zr are the values for illuminant 2. The difference

in color is an illuminant colorimetric shift, which is the change in X, Y, and Z

that is caused by a change in color of the illuminant. Once we have fully

adapted to the new light, the appearance of a color will change again. This is

known as the adaptive color shift. The corresponding color under the second

illuminant, Xc, Yc, and Zc, is calculated through a chromatic adaptation

transform or CAT applied to the X, Y, and Z from the first illuminant. This

corresponding color in the second illuminant will have the same color

appearance as under the first illuminant. The color difference between the

corresponding colors and Xr, Yr, and Zr and Xc, Yc, and Zc is the color

inconstancy index. A value of zero would indicate total color constancy (Luo

et al., 2003).

The current revised color inconstancy index is CMCCON02. It was

revised from CMCCON97 as it uses a different chromatic adaptation

transform or CAT (Luo et al., 2003). CMCCON97 was proposed in 1997 by

the Society of Dyers and Colourists (SDC). It used the CMCCAT97 chromatic

56

adaptation transform and an appropriate color difference equation such as

CMC (Luo, 2004). In 2000, the CMCCAT97 was modified to CMCCAT02 which incorporated all available experimental data and is a simplified version of CMCCAT97 (Luo et al., 2003). This was then modified to CAT02 which is based on all but one of the previous datasets (Luo et al., 2003; Luo, 2004).

CMCCON02 uses the CAT02 chromatic adaptation transform and is the current recommendation of the CIE (Luo et al., 2003; Luo, 2004) and is being incorporated into ISO Standard 105 (Luo et al., 2003).

2.4 Variability of Observers

The most unreliable aspect of color matching is observer variability

which is further amplified by factors such as different light sources and color

inconstancy (Kuehni, 2004b). Worse yet, is that this variability in regard to

observers cannot be controlled. Our perception of any given color cannot be

exactly measured since it is yielded by our sensory system which is thus far

unexplainable. Even observers who have been tested and have normal color

vision do not all see color the same way. Each perception is unique to that

observer. This experience is private as well. We do not know what anyone

else experiences (Kuehni, 2004b).

An example of this uniqueness can be seen in the variability of

responses when individuals are asked to choose samples representing for

them unique hues (Kuehni, 2004b). According to Hering, there are four

unique hues: red, yellow, blue and green (Kuehni, 2004a). Unique red is one

57

that is not yellowish, or bluish, but only red (Kuehni, 2004a; Kuehni, 2004b).

This unique hue can have many different levels of lightness and chroma

though. A large number of experiments have been performed under a variety

of conditions to determine the unique hue choices (Kuehni, 2004a). Overall,

observers have chosen colors from two-thirds of a complete Munsell hue

circle with 40 chips as unique hues. This indicates that there are large

differences from person to person, but also, upon further investigation, it is seen that relative differences between the choices for the four hues is varied too (Kuehni, 2004b). Of the four colors, green has the largest variability

(Kuehni, 2003b). Choices for a green cover 65% of the range of choices for all colors, but most observers repeat their choices with a high degree of reliability. How these hues relate to color perception is unexplained. We do not know how the brain generates the perceptions. Thus far, no model has been able to explain the variation. Models based on individual differences in color matching functions have failed. It is not yet evident how this affects color difference perception of individuals (Kuehni, 2004a).

Since researchers assume there is a direct relationship between our color vision properties and color matching functions (Kuehni, 2003b) it was thought that differences in color vision from individual to individual are related to an individual’s color matching functions (Kuehni, 2004b) since it is known that each person varies in their measured ¯x , ¯y , ¯z , each with a different magnitude from the average observer (Kuehni, 2003b). Using color matching

58

functions as a predictor of color vision is not viable since measuring ¯x , ¯y ,

and ¯z only takes the output of cone receptors into consideration, when color

perception is comprised of more than cone responses (Kuehni, 2004b). In a

simpler relationship, if we had a standard light source and a given set of color

matching functions, our color perception would only be affected by the

reflectance of the substrate. Unfortunately, this is not the case. This simple

relationship only works for one observer at one time. Changes in mood and

even time of day change our color perception (Kuehni, 2003b). It is very likely

that in the case of color inconstant samples, that observer variability is even

larger than in simple cases of color difference (Kuehni, 2004b). Such shifts

lead to inherent variability in defining a method of instrumentally measuring

color that replicates perceived color (Kuehni, 2003b).

Human observers are notoriously variable. One individual observer

shows disagreement with himself/herself in deciding the size of color

difference between two samples and in the acceptability of a color match

(Kuehni, 2003b). Part of this variability stems from individuals being non-

neutral in color judging. More is at work than perception alone. Our judgment

is affected by both conscious and unconscious plans (Kuehni, 2004b). Some

people, regardless of perceptions are just more willing to indicate a color

match. Others are pressured by outside circumstances. If it is Friday, and

there is a deadline, the colorist may have to approve matches he/she would

otherwise reject.

59

So far, in color research, little is known about such types of variation

within individual observers. While many experiments are replicated by

observers to ensure reliability of the panel, most reports do not publish this

sort of variance data. The few that include such information indicate that there is as much as 30% variation, or state that variation is “large”. Overall reported results indicate a significant amount of individual variability. Experts believe that variation is about 30% for observer panels larger than 30

(Kuehni, 2003b). It is unknown how many observers are needed in order to obtain an unvarying average (Kuehni, 2004b).

With such issues in observer variability, is objective color measurement even possible? The lack of reliable theory and the number of variables has caused the development of color difference equations to be based on data fitting (Kuehni, 2003b). That is, sample pairs are judged by a panel of observers and mathematical techniques are used to create equations that fit the visual results. However, if these data are flawed to begin with, then the formulae that are fitted are incorrect too. A formula can only be as good as the visual data used (Kuehni, 2004b). Instead of taking such problems into consideration, researchers seem to be getting further out of focus. In the development of CIEDE2000, the formula was fitted to a variety of data sets, each with different substrates, different conditions, and different methods of measuring color judgments.

60

2. 5 Research Questions

Given the multitude of problems in color difference evaluation and

unpredictable sources of variation, it is not surprising to see that instrumental

color difference equations do not accurately predict the evaluation of average

observers. Thus the goal of the research reported in this dissertation is to

closely control specific parameters of experimental conditions while using the

same set of observers repeated times. Then the variation that is represented by the observer panel may be quantified, and the contributors to this variation

separated from other components. Thus the effects of two different light

sources (T and F7) and color constant and inconstant samples on the

accuracy of four differently color difference equations might be seen.

The research questions proposed therefore were:

(1) What is the variation within and between observers for each color

difference judgment when controlling for light and color constancy?

(2) For each color difference equation, is there a difference in

accuracy of prediction of color difference equations compared to

that of a pool of observers when examining samples under two

different light sources?

(3) For each color difference equation, is there a difference in

accuracy of prediction of color difference when comparing color

constant versus inconstant samples?

61

CHAPTER 3

METHODOLOGY

3.1 Summary

This research is a within subjects lab experiment involving the independent variables (manipulated): light source, color constancy, and

repetition of experiment, and the dependent variables (measured): visual

color difference judgments and instrumental color difference measurements.

In short, observers were asked to make judgments of the relative size of color

difference of both constant and inconstant color pairs in a paired comparison

method under two different daylight simulators. This experiment was

repeated four times, creating 16 treatment combinations. Color difference

equations were calculated and compared to visual results with Performance

Factor/3 analysis. With similar analysis, visual results were used to assess

the variability of observers, light sources and samples.

3.2 Light Sources

Two D65 simulators from GretagMacBeth were used in this research.

The first was a GretagMacBeth Spectralight II Light Booth with a Munsell gray

N7 interior and a filtered tungsten source for D65. The second was a

62

GretagMacBeth Judge Light Booth with a Munsell gray N7 interior and an F7

broad band source to represent D65. The SPD (spectral power distribution) of each booth was evaluated using a GretagMacBeth LightSpex spectroradiometer placed directly under the center of the light source in each booth. After a warm-up time of 5 minutes and with all other light sources extinguished, as suggested by Xu et al. (2003b), five measurements were taken using the auto-measurement mode, and an average SPD was calculated. The wavelength interval of the instrument is 5nm, ranging from

360 to 750nm. For calculation purposes the SPD’s were reduced to 10nm intervals, ranging from 380 to 700nm (Table 3.1).

63

λ Filtered F7 λ Filtered F7 Tungsten Tungsten 380 0.828 0.353 380 0.828 0.353 385 1.055 0.375 390 1.311 0.565 390 1.308 0.404 400 1.830 1.019 395 1.572 0.487 410 2.252 0.891 400 1.836 0.883 420 2.488 1.502 405 2.077 1.146 430 2.651 3.451 410 2.267 0.901 440 2.760 2.542 415 2.396 0.834 450 2.809 2.498 420 2.488 0.996 460 2.814 2.596 425 2.578 1.261 470 2.788 2.535 430 2.655 2.49 480 2.740 2.415 435 2.714 4.039 490 2.700 2.317 440 2.764 3.234 500 2.653 2.285 445 2.797 2.275 510 2.581 2.331 450 2.811 2.385 520 2.501 2.363 455 2.815 2.511 530 2.433 2.416 460 2.815 2.584 540 2.481 3.154 465 2.809 2.606 550 2.640 2.315 470 2.789 2.587 560 2.768 2.022 475 2.763 2.539 570 2.620 2.187 480 2.739 2.476 580 2.384 2.042 485 2.719 2.412 590 2.185 1.893 490 2.701 2.359 600 2.169 1.859 495 2.679 2.313 610 2.196 1.885 500 2.654 2.283 620 2.193 2.053 505 2.623 2.279 630 2.112 2.004 510 2.581 2.298 640 1.983 2.087 515 2.539 2.332 650 1.869 2.697 520 2.502 2.362 660 1.819 2.158 525 2.459 2.37 670 1.847 1.227 530 2.424 2.348 680 1.939 0.925 535 2.424 2.317 690 2.054 0.753 540 2.473 2.68 700 2.145 0.673 545 2.552 3.488 550 2.64 2.961 555 2.728 2.13 560 2.789 2.037 565 2.765 2.008 570 2.616 2.036 575 2.481 2.232 580 2.397 2.248 Continued

Table 3.1: SPD of each simulator at 5 nm intervals and at 10 nm intervals.

64

Table 3.1 continued

λ Filtered F7 Tungsten 585 2.259 2.003 590 2.164 1.914 595 2.151 1.893 600 2.169 1.873 605 2.187 1.857 610 2.196 1.848 615 2.205 1.867 620 2.201 1.957 625 2.166 2.072 630 2.115 2.11 635 2.051 2.002 640 1.982 1.902 645 1.918 2.024 650 1.865 2.397 655 1.829 2.801 660 1.814 2.787 665 1.819 2.159 670 1.843 1.526 675 1.884 1.179 680 1.937 1.024 685 1.996 0.921 690 2.056 0.833 695 2.108 0.753 700 2.145 0.673

The filtered tungsten D65 simulator had been in use 150 hours before

experimenting began, while the F7 source had 0 hours of use. Each booth

was assessed before and after experimenting, a period of 3 months and

approximately 80 hours of use. To ensure stability of the light sources,

Pearson correlation coefficients were calculated between the first and second

readings. For the filtered tungsten source, r=.98, while for F7, r=.87. Since

the light bulbs in the F7 source had not been used previously, it is not

65

surprising that the SPD’s are not as highly correlated since aging is inversely exponential for such light bulbs. It needs to be noted that the second readings of SPD were not used in any other calculations in this research.

As per CIE 15.2 recommendations, the original SPD for each simulator was normalized. Table 3.2 shows the normalized SPD for each simulator.

These relative SPD’s were also graphed in Figure 3.2 to show the differences between the simulators.

66

λ Filtered F7 Tungsten normalized normalized 380 0.335391 0.156622 390 0.530934 0.250795 400 0.741363 0.452119 410 0.912096 0.395437 420 1.007589 0.66642 430 1.073614 1.530948 440 1.117867 1.127967 450 1.137614 1.108223 460 1.139639 1.151705 470 1.129108 1.124862 480 1.109867 1.071397 490 1.093665 1.028026 500 1.074425 1.013718 510 1.045463 1.034238 520 1.012855 1.048214 530 0.985412 1.07173 540 1.004754 1.399505 550 1.069361 1.026917 560 1.121108 0.897249 570 1.061058 0.970347 580 0.965463 0.906012 590 0.884856 0.840013 600 0.878578 0.824706 610 0.889514 0.836242 620 0.8884 0.910782 630 0.855388 0.889152 640 0.803337 0.925867 650 0.75716 1.196406 660 0.736806 0.957369 670 0.748249 0.544406 680 0.785211 0.410301 690 0.831995 0.334098 700 0.868856 0.298602

Table 3.2: Relative SPD for each simulator.

67

2

1.5 FilTung n 1 F7 n 0.5 Relative SPD

0 380 420 460 500 540 580 620 660 700

λ, nm

Figure 3.1: Graphical representation of relative SPD’s for each simulator.

Additionally, to assess the quality of the D65 simulators, the MIvis

(Metamerism Index) was calculated by determining the CIELAB ∆E of five color pairs defined by reflectance data as outlined in CIE publication 51

(Table 3.3). For the filtered tungsten source, the average ∆E was 0.23, rating this source as ‘A’. For the F7 source, the average ∆E was 0.58, rating it as

‘C’. According to ASTM and AATCC standards, ‘B’ is the minimum rating for a quality simulator. This suggests that the simulation of D65 by this F7 lamp is unacceptable in industrial applications. Similar ratings were found by Xu et al. (2003b), where four F7 broad band simulators were assessed with two receiving ‘B’ ratings and two receiving ‘C’ ratings.

68

Filtered F7 Tungsten

MIvis pair ∆E (CIELAB) ∆E (CIELAB) 1 0.206557 0.73237 2 0.106919 0.643663 3 0.123465 0.457704 4 0.330356 0.729863 5 0.3813 0.312503 Average ∆E 0.229719 0.575221 Rating "A" "C"

Table 3.3: CIELAB ∆E’s, average ∆E, and categorical rating of each simulator for color pairs used for MIvis ratings.

3.3 Color Pairs

3.3.1 Properties. Color pairs for this research were made available

from dyeings conducted at DyStar L.P. and shared by Professors Rolf Kuehni

and David Hinks of North Carolina State University. All samples are made

with disperse dyes on polyester knit. The color set consists of six color

centers including , blue, green, dark olive, light olive, and brown that

account for the color standards. For each standard, with the exception of blue, there are 6 color constant samples (only 5 for blue), for a total of 35 constant samples. Additionally, color inconstant samples were included: 3 for green, 2 for dark olive, and 6 for blue. This totals 11 inconstant samples.

Overall, including both constant and inconstant colors, 46 samples were used.

3.3.2 Mounting. Each of the 6 standards and 46 samples were

mounted in the same fashion. Due to the size and opacity of the available

69

dyeings, each fabric was cut to 3” by 3.5” and mounted, unfolded, to a neutral gray mat board (L*=65.59, a*=0.83, b*=3.37) with double stick tape. The mat board was a 3” by 4” rectangle, and the sample was mounted such that there is a one inch handle of uncovered board on one side and a ½ inch excess of fabric on the other side that was folded over the back as shown in the Figure

3.2, so that 3” x 3” samples were viewed in a manner similar to other research

(Guan & Luo, 1999b; Xu et al., 2003b). No writing appeared on the front of

the card, and the samples were numbered 1 to 46 on the back.

Sample Mount

Figure 3.2: Diagram of sample mounting.

70

3.4 Color Measurement

After mounting, samples were measured using a DataColor

International Spectaflash SF-600 spectrophotometer, a monochromatic

abridged spectrophotometer with ColorTools V3.0 software and with the

measurement wavelength ranging from 380nm to 700nm at 10 nm intervals.

For standardization, before each use and after every 8 hours the spectrophotometer was calibrated according to AATCC procedures. All measurements were taken at four places on the sample with a large aperture

plate (20 mm diameter area of view) with specular and UV included (Lam &

Xin, 2002; Instrumental, 1999). The four readings were then averaged

(Instrumental, 1999).

To ensure non-contamination of the samples, measurements were taken before, during, and after experimenting and color difference values

(CIELAB) under the D65 standard illuminant, were calculated between each

individual measurement and the average of all three measurements (Xu et al.,

2003b). The mean ∆E for all samples was 0.21 with a maximum of 0.77.

3.5 Color difference calculations

Color differences were calculated in accordance with ASTM E308

(Standard Practice, 2002). Color matching functions from 380nm to 700 nm at 10nm intervals for the 10º observer as found in E308 were normalized, such that the sum over all wavelengths totaled 100 for x, y and z (Table 3.4).

71

λ x bar y bar z bar x bar 10º y bar 10º z bar 10º 10º 10º 10º normalized normalized normalized 380 0.0002 0 0.0007 0.001716 0 0.006001 390 0.0024 0.0003 0.0105 0.020591 0.002572 0.090017 400 0.0191 0.002 0.086 0.163867 0.017149 0.73728 410 0.0847 0.0088 0.3894 0.726677 0.075454 3.338334 420 0.2045 0.0214 0.9725 1.754491 0.183489 8.337263 430 0.3147 0.0387 1.5535 2.699943 0.331824 13.31819 440 0.3837 0.0621 1.9673 3.291923 0.532462 16.8657 450 0.3707 0.0895 1.9948 3.180391 0.767397 17.10146 460 0.3023 0.1282 1.7454 2.593559 1.099221 14.96335 470 0.1956 0.1852 1.3176 1.678134 1.587955 11.29581 480 0.0805 0.2536 0.7721 0.690643 2.174435 6.619229 490 0.0162 0.3391 0.4153 0.138987 2.907535 3.560375 500 0.0038 0.4608 0.2185 0.032602 3.951024 1.873205 510 0.0375 0.6067 0.112 0.321728 5.20201 0.960178 520 0.1177 0.7618 0.0607 1.009798 6.531879 0.520382 530 0.2365 0.8752 0.0305 2.029033 7.504201 0.261477 540 0.3768 0.962 0.0137 3.232725 8.248448 0.11745 550 0.5298 0.9918 0.004 4.545377 8.503961 0.034292 560 0.7052 0.9973 0 6.050207 8.55112 0 570 0.8787 0.9555 0 7.538736 8.192715 0 580 1.0142 0.8689 0 8.701247 7.450183 0 590 1.1185 0.7774 0 9.596081 6.665638 0 600 1.124 0.6583 0 9.643268 5.644442 0 610 1.0305 0.528 0 8.841092 4.527215 0 620 0.8563 0.3981 0 7.346557 3.413417 0 630 0.6475 0.2853 0 5.555174 2.446239 0 640 0.4316 0.1798 0 3.702878 1.541654 0 650 0.2683 0.1076 0 2.301858 0.922591 0 660 0.1526 0.0603 0 1.309219 0.517029 0 670 0.0813 0.0318 0 0.697507 0.272662 0 680 0.0409 0.0159 0 0.350898 0.136331 0 690 0.0199 0.0077 0 0.17073 0.066022 0 700 0.0096 0.0037 0 0.082362 0.031725 0

SUM 11.6558 11.6628 11.6645 100 100 100

Table 3.4: CMF’s and normalized CMF’s for the 10º observer from 380nm- 700nm at 10nm intervals.

72

The normalized CMF’s (color matching functions) were then multiplied by the relative spectral power distributions as calculated previously for the

filtered tungsten and F7 sources respectively. The relative SPD for D65 was

found in ASTM E308. These results for filtered tungsten and F7 are shown in

table 3.4, along with the tristimulus values of the sources that result from the

sum across the wavelengths. (Note that the original SPD’s were normalized

such that the tristimulus value Y of the light source totals 100). Such results

for D65 can be found in ASTM E308 and any color calculation software

package.

73

λ x bar y bar z bar x bar y bar z bar normalized normalized normalized normalized normalized normalized * Fil Tung * Fil Tung * Fil Tung * F7 * F7 * F7 380 0.000575 0 0.002013 0.000269 0 0.00094 390 0.010932 0.001366 0.047793 0.005164 0.000645 0.022576 400 0.121485 0.012713 0.546592 0.074087 0.007753 0.333338 410 0.662799 0.068821 3.044882 0.287355 0.029837 1.320102 420 1.767807 0.184882 8.400538 1.169229 0.122281 5.556121 430 2.898698 0.356251 14.2986 4.133472 0.508006 20.38945 440 3.679934 0.595222 18.85362 3.713182 0.6006 19.02396 450 3.618058 0.873002 19.45486 3.524583 0.850447 18.95224 460 2.955722 1.252716 17.05282 2.987014 1.265978 17.23336 470 1.894795 1.792972 12.75419 1.887669 1.786229 12.70622 480 0.766522 2.413335 7.346467 0.739953 2.329683 7.091823 490 0.152005 3.179869 3.893858 0.142882 2.989023 3.66016 500 0.035028 4.245077 2.012617 0.033049 4.005222 1.898901 510 0.336355 5.438507 1.003831 0.332744 5.380117 0.993053 520 1.022779 6.615848 0.527072 1.058484 6.846809 0.545472 530 1.999434 7.394733 0.257663 2.174575 8.042476 0.280233 540 3.248094 8.287662 0.118009 4.524215 11.54374 0.164372 550 4.86065 9.093807 0.036671 4.667726 8.732865 0.035215 560 6.782934 9.586728 0 5.428544 7.672486 0 570 7.999033 8.692942 0 7.31519 7.949776 0 580 8.400734 7.192877 0 7.883436 6.749957 0 590 8.49115 5.898129 0 8.060837 5.599225 0 600 8.472358 4.95908 0 7.952863 4.655006 0 610 7.864277 4.027022 0 7.393293 3.785848 0 620 6.526683 3.032481 0 6.69111 3.108878 0 630 4.751828 2.092483 0 4.939394 2.175078 0 640 2.97466 1.238468 0 3.428373 1.427367 0 650 1.742876 0.69855 0 2.753958 1.103794 0 660 0.964641 0.38095 0 1.253406 0.494987 0 670 0.521909 0.204019 0 0.379727 0.148439 0 680 0.275529 0.107049 0 0.143974 0.055937 0 690 0.142047 0.05493 0 0.057041 0.022058 0 700 0.071561 0.027564 0 0.024594 0.009473 0

SUM 96.01389 100.0001 109.6521 95.16139 100 110.2075 X Y Z X Y Z

Table 3.5: Relative SPD’s of filtered tungsten and F7 multiplied by normalized CMF’s and the resulting tristimulus values.

74

The tristimulus values for each light source for each sample were then

calculated using the reflectance factors obtained during the first color

measurement. These tristimulus values were transformed to L*, a*, and b*.

CIELAB, CMC(2:1), CIE94(2:1:1) were then calculated using ProPalette 5.2 from GretagMacBeth (Table 3.6). CIEDE2000(2:1:1) was calculated using a spreadsheet created by Sharma, Wu, and Dalal (2005) (Table 3.6).

75

Filtered Tungsten Constancy Color Sample DECIE76 CMC(2:1) CIE94(2:1) CIEDE2000(2:1:1) C Purple 1 3.652 1.696 1.450 1.277 C Purple 2 6.505 3.132 2.630 1.662 C Purple 3 4.289 2.326 2.069 1.604 C Purple 4 1.546 0.773 0.663 0.455 C Purple 5 2.888 1.607 1.471 1.303 C Purple 6 1.281 0.597 0.498 0.337 C Dark Green 7 3.228 2.157 1.884 2.301 C Dark Green 8 2.104 0.381 1.203 1.347 C Dark Green 9 2.057 1.658 1.517 1.430 C Dark Green 10 1.495 1.261 1.159 1.100 C Dark Green 11 1.634 0.760 0.853 0.770 C Dark Green 12 3.621 2.663 2.524 2.302 C Dark Olive 13 3.061 1.967 1.714 1.836 C Dark Olive 14 2.744 1.865 1.639 1.998 C Dark Olive 15 1.647 1.374 1.237 1.419 C Dark Olive 16 2.154 1.842 1.664 1.945 C Dark Olive 17 2.989 1.890 1.589 1.364 C Dark Olive 18 2.014 1.256 1.064 1.029 C Light Olive 19 1.869 1.001 0.878 0.856 C Light Olive 20 3.014 1.590 1.376 1.449 C Light Olive 21 1.737 1.212 1.239 1.168 C Light Olive 22 2.168 1.324 1.275 1.197 C Light Olive 23 1.582 0.635 0.785 0.611 C Light Olive 24 1.658 0.654 0.828 0.645 C Blue 25 5.684 3.851 3.914 3.509 C Blue 26 7.428 4.303 4.053 3.551 C Blue 27 3.554 2.222 2.215 1.956 C Blue 28 6.896 4.599 4.578 3.958 C Blue 29 5.812 3.680 3.769 3.305 C Brown 30 2.717 1.243 1.046 1.023 C Brown 31 4.629 2.198 1.808 1.892 C Brown 32 1.832 2.056 1.190 1.396 C Brown 33 3.231 3.599 2.090 2.434 C Brown 34 1.791 0.888 0.832 0.775 C Brown 35 1.410 0.700 0.680 0.631 I Dark Green 36 4.089 3.323 3.101 2.897 I Dark Green 37 5.099 3.770 3.570 3.335 I Dark Green 38 3.102 2.170 1.871 1.996 I Dark Olive 39 2.552 1.571 1.321 1.203 I Dark Olive 40 3.180 1.985 1.707 1.705 I Blue 41 1.791 1.113 1.078 0.9699 I Blue 42 4.306 2.917 2.923 2.7270 I Blue 43 5.724 3.251 3.008 2.6943 I Blue 44 3.176 1.671 1.495 1.4120 I Blue 45 4.653 3.079 3.132 2.7653 I Blue 46 3.327 2.154 2.170 2.0300 Continued

Table 3.6: ∆E for CIELAB, CMC(2:1), CIE94(2:1:1), and CIEDE2000(2:1:1) for each of the 46 color pairs under each light source.

76

Table 3.6 continued

F7 Constancy Color Sample DECIE76 CMC(2:1) CIE94(2:1:1) CIEDE2000(2:1:1) C Purple 1 3.751 1.749 1.493 1.251 C Purple 2 6.696 3.260 2.734 1.668 C Purple 3 4.341 2.366 2.099 1.620 C Purple 4 1.604 0.811 0.693 0.462 C Purple 5 2.894 1.607 1.473 1.305 C Purple 6 1.314 0.619 0.516 0.333 C Dark Green 7 3.333 2.129 1.868 2.241 C Dark Green 8 2.163 1.357 1.186 1.301 C Dark Green 9 2.098 1.619 1.511 1.424 C Dark Green 10 1.512 1.222 1.148 1.090 C Dark Green 11 1.647 0.757 0.854 0.768 C Dark Green 12 3.653 2.588 2.501 2.287 C Dark Olive 13 3.179 1.959 1.708 1.787 C Dark Olive 14 2.849 1.826 1.611 1.908 C Dark Olive 15 1.695 1.330 1.230 1.352 C Dark Olive 16 2.180 1.785 1.662 1.858 C Dark Olive 17 2.983 1.882 1.593 1.356 C Dark Olive 18 2.078 1.269 1.076 1.025 C Light Olive 19 1.922 1.006 0.886 0.859 C Light Olive 20 3.119 1.605 1.388 1.447 C Light Olive 21 1.737 1.175 1.220 1.129 C Light Olive 22 2.176 1.304 1.275 1.185 C Light Olive 23 1.599 0.640 0.791 0.614 C Light Olive 24 1.671 0.658 0.832 0.648 C Blue 25 5.421 3.602 3.696 3.311 C Blue 26 7.162 4.111 3.909 3.433 C Blue 27 3.220 1.984 1.991 1.764 C Blue 28 6.643 4.349 4.378 3.793 C Blue 29 5.515 3.447 3.567 3.131 C Brown 30 2.690 1.240 1.042 1.020 C Brown 31 4.553 2.134 1.773 1.852 C Brown 32 1.869 2.128 1.219 1.431 C Brown 33 3.299 3.746 2.149 2.502 C Brown 34 1.783 0.887 0.829 0.772 C Brown 35 1.415 0.706 0.683 0.633 I Dark Green 36 3.663 2.875 2.744 2.576 I Dark Green 37 4.532 3.236 3.147 2.930 I Dark Green 38 2.507 1.597 1.424 1.514 I Dark Olive 39 2.603 1.708 1.477 1.330 I Dark Olive 40 2.966 1.873 1.672 1.507 I Blue 41 1.491 0.889 0.858 0.780 I Blue 42 4.027 2.642 2.662 2.481 I Blue 43 5.463 3.052 2.839 2.554 I Blue 44 3.116 1.567 1.388 1.330 I Blue 45 4.371 2.797 2.886 2.544 I Blue 46 2.991 1.872 1.888 1.770 Continued

77

Table 3.6 continued

D65 Constancy Color Sample DECIE76 CMC(2:1) CIE94(2:1:1) CIEDE2000(2:1:1) C Purple 1 3.558 1.693 1.465 1.260 C Purple 2 6.378 3.200 2.688 1.575 C Purple 3 4.279 2.372 2.099 1.602 C Purple 4 1.541 0.800 0.684 0.449 C Purple 5 2.903 1.632 1.486 1.329 C Purple 6 1.283 0.626 0.524 0.321 C Dark Green 7 1.610 1.015 0.889 1.035 C Dark Green 8 2.217 1.384 1.214 1.321 C Dark Green 9 2.061 1.559 1.454 1.364 C Dark Green 10 1.463 1.168 1.099 1.036 C Dark Green 11 1.629 0.741 0.842 0.749 C Dark Green 12 3.632 2.527 2.447 2.225 C Dark Olive 13 3.011 1.888 1.647 1.744 C Dark Olive 14 2.562 1.750 1.554 1.883 C Dark Olive 15 1.733 1.400 1.276 1.440 C Dark Olive 16 2.308 1.933 1.772 2.049 C Dark Olive 17 2.978 1.861 1.564 1.333 C Dark Olive 18 2.107 1.303 1.106 1.070 C Light Olive 19 1.832 0.954 0.839 0.815 C Light Olive 20 3.055 1.561 1.346 1.407 C Light Olive 21 1.776 1.198 1.244 1.156 C Light Olive 22 2.152 1.281 1.250 1.165 C Light Olive 23 1.579 0.630 0.781 0.605 C Light Olive 24 1.681 0.665 0.837 0.654 C Blue 25 4.18 2.734 2.798 2.560 C Blue 26 6.03 3.296 3.051 2.751 C Blue 27 1.81 1.089 1.077 0.972 C Blue 28 5.32 3.427 3.488 3.054 C Blue 29 3.98 2.470 2.578 2.287 C Brown 30 2.721 1.241 1.042 1.019 C Brown 31 4.590 2.122 1.766 1.843 C Brown 32 1.794 2.012 1.163 1.365 C Brown 33 3.133 3.492 2.024 2.359 C Brown 34 1.788 0.884 0.828 0.771 C Brown 35 1.407 0.701 0.678 0.628 I Dark Green 36 3.893 3.027 2.895 2.707 I Dark Green 37 4.940 3.508 3.397 3.181 I Dark Green 38 3.220 2.074 1.848 1.937 I Dark Olive 39 2.821 1.724 1.451 1.321 I Dark Olive 40 3.408 2.090 1.796 1.764 I Blue 41 0.971 0.445 0.454 0.420 I Blue 42 3.425 2.035 1.989 1.933 I Blue 43 4.606 2.399 2.152 2.004 I Blue 44 3.043 1.471 1.378 1.309 I Blue 45 3.703 2.128 2.277 2.013 I Blue 46 2.410 1.331 1.245 1.247

78

3.6 Psychophysical Experiment

3.6.1 Pilot Test. Prior to experimenting, a pilot test was run to

determine the usefulness of oral instructions, the efficiency of the

questionnaire, and the appropriate time to allocate for completion.

3.6.2 Observers. Human observers were involved in this stage of

research as approved by The Ohio State University Institutional Review

Board. The observer panel was a purposive sample of 59 students enrolled

in Textiles and Clothing 371, autumn 2004, at the Ohio State University and

each was given extra credit for the course and entered into a drawing in

which 3 observers were awarded $100 each. These observers did not have

previous experience in industrial color matching and had normal to superior

color vision as tested by the Farnsworth-Munsell 100 Hue Test, which tests

for defective color vision and low color discrimination abilities.

3.6.3 Experimental Assembly. In the light box, placed against a 45º

angle stand was a 12”x15” rectangle of the same neutral gray mat board used

for mounting. This set-up utilizes the 45º/0º geometry specified by Evaluation

Procedure 9 of the AATCC (Visual, 1999). Toward the top of the mat board was the neutral standard difference pair. This pair was made of two 3”

square adjacently placed Munsell grays. N4.5 (L*=46.78, a*=-1.06, and b*=-

0.15) and N4.75 (L*=49.48, a*=-1.60, and b*=-0.22) which represent a color

difference that is approximately the average of perceptual difference of the

color pairs used in this experiment. There was no separation between the

79

samples. The color pairs were presented 2” below the neutral difference pair.

The color pairs were the same size as the neutral difference pair and adhered to the mat board with Velcro. The standard and sample were placed up against one another so that there was no space between them. This experimental assembly is diagrammed in Figure 3.2. The assembly was placed within each light box so that the viewing distance from the observer was approximately 700 ± 150 mm (Visual, 1999).

Neutral Difference Pair

Color Pair

Figure 3.2: Diagram of experimental assembly.

80

3.6.4 Procedure: Paired comparison. Each observer was brought into

the lab four times, with at least 72 hours between evaluations. The first time

the observer was given a briefing on the purpose of the experiment and

instructions on how to carry out their part. They were also given a consent

form to be signed, indicating their willingness to participate in the experiment.

Instructions and consent form are shown in Appendix A, Figures A.1 and A.2

respectively.

A flip of a coin decided which of the two light boxes the observer first viewed the samples. He or she was then seated in front of that light box in a

chair that was adjusted so that the samples are being viewed at a 90º angle

(Visual, 1999). The room was darkened and the observer was given two minutes to adapt to the light source (Xu et al., 2003b). Each color pair was then arbitrarily presented and the observer was asked to judge which pair, the color pair or the neutral difference pair, exhibited a larger color difference

(Guan & Luo, 1999b). Their judgment was recorded by the researcher on a coding sheet (Appendix A, Figure A.3). Once all 46 color pairs were judged, the procedure was then repeated with the other daylight simulator. Each

experiment time took approximately 30 minutes.

3.7 Data Analysis

3.7.1 Visual results. Visual results or ∆V were obtained for each daylight simulator, for each repetition of the experiment, and for constant and

inconstant color pairs. To calculate ∆V, the visual probability for each color

81

pair, Pi,d,c,t was calculated first. The visual probability is the ratio of

observations judging the sample pair as having a larger color difference than

the standard neutral pair (Guan & Luo, 1999b):

Si,d ,c,t Pi,d ,c,t = N i,d ,c,t

Where,

i = color pair 1 to 74

d = 1 (filtered tungsten) to 2 (F7)

c = 1 (color constant) to 2 (color inconstant)

t = 1 to 4 (repetition of experiment)

Si is the number of observations judging that the ith color

pair has a larger color difference than the reference pair,

And Ni is the number of observations for the ith color pair.

The visual color difference or, ∆V, for each color pair under each condition

was calculated as follows (Guan & Luo, 1999a, Guan & Luo, 1999b):

⎛ 1 ⎞ ∆V = α − log ⎜ −1⎟ i,d ,c,t e ⎜ ⎟ ⎝ Pi,d ,c,t ⎠

82

In the previous equation, α was chosen such that all ∆V values would be

positive in order to ease subsequent calculations. The number 3 was chosen as the maximum value for α.

3.7.2 Instrumental results. Instrumental results are given in terms of

∆Ei,d,c for each of the four color difference equations: CIELAB, CMC, CIE94,

CIEDE2000. This is calculated for each color pair, i = 1 to 74, for each light

source, d = 1 for tungsten, 2 for F7, and 3 for D65, and for color constancy, c

= 1 for color constant or 2 for color inconstant. These results can be seen in

Table 3.6.

3.7.3 Performance Factor/3 Analysis. In this study, it was necessary to

make comparisons between 2 sets of data, for example comparing visual

results from the 2 different daylight simulators or comparing visual results

from one simulator vs. CMC ∆E of that simulator. To enable comparisons of

the results of this research with other research studies, one statistical

measure needs to be applied. PF/3 (Performance Factor), devised by Luo

and Rigg (1987), is a single value measure that eases understanding and

comparison of color research data (Guan & Luo, 1999b). A PF/3 value of 30

means that between the two datasets there is 30% disagreement or variation

(Luo et al., 2001). Therefore a lower PF/3 value indicates better agreement.

If there is perfect agreement, PF/3 would be 0 (Xu et al., 2003b). So far,

researchers have not been able to statistically calculate a difference in PF/3’s.

Instead, they have been limited to ranking the PF/3’s to indicate higher or

83

lower variability. However, through work with the Statistical Consulting

Service at The Ohio State University, a statistical method, known as the

bootstrap method, was used to calculate 95% confidence intervals for PF/3’s

so that statistical differences can be determined. The bootstrap is a

computer-based method for assigning measures of accuracy to statistical

estimates. The core idea is to create replication of a statistic by random

sampling from the data set with replacement. For this research, original data

was randomly sampled 5000 times to get 95% lower quantile and upper

quantile as the confidence bounds for the PF/3 factors.

Calculation of PF/3 combines 3 statistical measures. The first of these

measures is the coefficient of variation (CV). The coefficient of variation

measures the deviation from a linear relationship between the two datasets.

The smaller the CV, the more correlated the two datasets are (Xu et al.,

2003b). The CV is calculated as follows (Kim, Cho & Kim, 2001; Xu et al.,

2003b):

84

1 N ()∆E − f∆V 2 N ∑ i i CV = i=1 ×100 ∆E where N ∑∆Ei × ∆Vi i=1 f = N 2 ∑()∆Vi i=1 1 ∆E = ∆E N ∑ i

and, where ∆E and ∆V are any two sets of data.

The second statistical measure is the gamma factor (γ) which measures “the proportional relationship” between the two datasets (Xu et al.,

2003b) and avoids having the units of ∆E or ∆V affect the result. Gamma is calculated as follows (Kim et al., 2001; Xu et al., 2003b):

2 1 N ⎡ ⎛ ∆E ⎞ ⎛ ∆E ⎞⎤ log ()γ = ⎢log ⎜ i ⎟ − log ⎜ i ⎟⎥ e N ∑ e ⎜ ∆V ⎟ e ⎜ ∆V ⎟ i=1 ⎣⎢ ⎝ i ⎠ ⎝ i ⎠⎦⎥

85

Finally, VAB is the variance between the two sets of data and is shown

in the following equations (Kim et al., 2001):

1/ 2 ⎛ 1 N ()∆E − F∆V 2 ⎞ V = ⎜ i i ⎟ AB ⎜ ∑ ⎟ ⎝ N i=1 ∆Ei ⋅ F ⋅∆Vi ⎠ where 1/ 2 ⎛ N ∆E ⎞ ⎜ i ⎟ ∑ ∆V F = ⎜ i=1 i ⎟ ⎜ N ∆V ⎟ ⎜ ∑ i ⎟ ⎝ i=1 ∆Ei ⎠

The three measures are combined in the equation for PF/3 (Xu et al.,

2003b):

PF / 3 = 100(γ −1+VAB + CV /100)/ 3

3.7.4 Assessment of observer accuracy. Visual results were

calculated for each light source and for each individual under each

observation time. For example, if during experiment 1, observer 1 indicated

that the color pair showed a larger difference than the reference pair, then

P=1/1. During the calculation of ∆V this value would be undefined. Therefore, a linear transformation was needed to ensure definable values.

86

The ∆V equation was rewritten by replacing Yi with Si/Ni:

S ⎛1− i ⎞ ⎜ N ⎟ ∆V = α − log ⎜ i ⎟ e S ⎜ i ⎟ ⎝ Ni ⎠

This equation can be simplified as:

⎛ N − S ⎞ ⎜ i i ⎟ ∆V = α − loge ⎜ ⎟ ⎝ N i ⎠

A device employed by statisticians to eliminate undefined values is to add the value 0.5 to both the numerator and to the denominator of the logit function. This results in the equation:

⎛ N − S + 0.5 ⎞ ⎜ i i ⎟ ∆V = α − loge ⎜ ⎟ ⎝ N i + 0.5 ⎠

Each of the 4 ∆V’s for each light source was compared to the visual results of all observers for the specific experiment time using the PF/3 measure. The four PF/3 values for each observer for each light source were then averaged to characterize that observer’s accuracy (Xu et al., 2003b).

87

These results were used to determine the degree of variation of observers and were compared to previous research. To assess under which simulator observer accuracy is better, paired t-tests were run and α=.05 was chosen.

3.7.5 Assessment of observer repeatability. Visual results were

calculated for each observer across all 4 experiments. In this situation, visual

probability can range from 1 to 4 out of 4, making it necessary to employ the

method discussed in section 3.7.4 to calculate visual results. These visual

results for each observer were then compared to visual results for all

observers across all experiments using the PF/3 measure (Xu et al., 2003b).

These results were also used to determine variation of observers and which

simulator observer repeatability is better through paired t-tests.

3.7.6 Assessment of inter- and intra-observer reliability. Since binary

ratings were used in the paired comparison procedure, Kappa coefficients

(also known as Kappa statistics) were used to calculate inter and intra- observer reliability. Kappa coefficients are always between 0 and 1 and are often interpreted in the following way (Landis and Koch, 1977):

• Poor agreement = Less than 0.20 • Fair agreement = 0.20 to 0.40 • Moderate agreement = 0.40 to 0.60 • Good agreement = 0.60 to 0.80 • Very good agreement = 0.80 to 1.00

88

Others interpret kappa coefficients differently (Cicchetti, 1994):

• Poor agreement = Less than 0.40 • Fair agreement = 0.40-0.59 • Good agreement = 0.60-0.74 • Excellent agreement = 0.74 to 1.00

This method of analysis is used to assess variation within and between

observers and to determine the difference between simulators. The Kappa

statistics for intra-observer reliability for each simulator were compared using

a paired t-test. However, for inter-observer reliability, only 4 values were

calculated for each simulator, one for each experiment. Therefore a Wilcoxon signed rank test was used since it is the non-parametric equivalent of a paired t-test.

3.7.7 Assessment of observer drift. A time series analysis of PF/3

results for observer accuracy was employed to assess whether observer

judgments changed over time. Time series plots and ANOVA including

factors of date, time of day, and simulator were also used to determine which

factors influence observer accuracy.

3.7.8 Comparing simulators. The performance of visual results was analyzed with both PF/3 and paired-comparison t-tests. For PF/3’s, visual results for each simulator and for both color constant and color inconstant samples were computed. Paired-comparison t-tests were also run between each simulator for both types of samples (Lam & Xin, 2002). This analysis indicates the amount of variation between simulators.

89

3.7.9 Performance of color difference formulae. Performance of color

difference formulae were calculated with PF/3 comparing the ∆E for each

color difference equation calculated with each SPD vs. visual results from the

first experiment for each observer. Using the bootstrap method, 95%

confidence intervals were constructed for each PF/3. A color difference

equation was determined as significantly better than another If no overlap

exists between confidence intervals,

3.7.10 Assessment of quality of simulators. The quality of each

simulator was assessed with three measures. These measures include MIvis,

PF/3’s of visual results and color difference equations calculated with the standard reference illuminant, and PF/3’s calculated between ∆E’s calculated with the SPD of the simulator and ∆E’s calculated with the standard reference illuminant. 95% confidence intervals were calculated for each PF/3 to help determine which simulator performed better.

90

CHAPTER 4

RESULTS AND DISCUSSION

4.1 Observer Performance

4.1.1 Observer Accuracy. The PF/3’s for all 4 experiments for each observer were averaged and recorded in Table 4.1. These PF/3’s appear to be quite large relative to those found by Xu et al. (2003b) whose PF/3’s ranged from 27 to 42 with an average of 33, whereas this research found

PF/3’s ranging from 46.6 to 81.0 with an average of 64. This, however, does not necessarily imply that the observers were less accurate, since the differences between these studies include the method of visual assessment, the panel of observers, type of samples, as well as the number of repetitions.

When a process is repeated 4 times instead of 2, there is more room for variation. Also, differences in observer accuracy from other studies may be due to the color centers used in this research which were selected to represent areas of color space not well represented in previous studies.

91

The results reveal a significant difference in observer accuracy under the 2 different simulators. A paired t-test of the two light boxes indicates that

the average PF/3’s for observer accuracy under filtered tungsten (M=54.69,

SD=3.31) are significantly different than those for F7 (M=73.22, SD=3.82),

t(59)=42.96, p=0.000. In fact, average PF/3’s for observer accuracy are lower

under filtered tungsten. This implies that observers are more accurate when

viewing color pairs under filtered tungsten than they are under F7.

4.1.2 Observer repeatability. Observer repeatability was calculated

across the 4 experiments for each observer in terms of PF/3’s (Table 4.1).

Similar to the results for observer accuracy, these results are much larger

than those reported in other research. Xu et al.’s (2003b) results range from

20 to 32 with an average of 26. PF/3’s for observer repeatability in this

research range from 28.6 to 124.6 with an average of 64.8. Consequently, on

average, observers did not repeat previous judgments 65% of the time. This variation is much larger than estimated by Kuehni (2003b) in a review of previous research. However, observers here are not necessarily less repeatable than previous studies. Repeatability scores are expected to be lower when more repetitions are performed. With the increased number of repetitions, this research indicates that observers are less accurate and less repeatable than previously thought, since it is rare for color researchers to perform more than 2 repetitions of a color matching experiment. It is also interesting to note that observers who have lower PF/3’s for repeatability tend

92

to have lower PF/3’s for accuracy indicating that observers who are more

repeatable are also more accurate; however, this was not statistically

assessed. This is contrary to implications of Xu et al.’s (2003b) research

where no direct relationship between accuracy and repeatability was implied.

Like observer accuracy, observer repeatability calculations reflect a

significant difference in simulators. A paired t-test of the two light boxes indicated that PF/3’s for observer reliability under filtered tungsten (M=46.92,

SD=8.66) are significantly different than those for F7 (M=89.80, SD=12.48), t(59)=28.20, p=0.000, and on average, are smaller. Observers tend to have better repeatability when viewing samples under filtered tungsten.

93

PF/3 (Observer Accuracy) PF/3 (Observer Repeatability) Observer Filtered F7 Avg Filtered F7 Avg Tungsten Tungsten 1 53.7 70.4 62.1 44.7 81.6 63.1 2 55.1 76.6 65.8 43.9 116.4 80.1 3 46.6 65.2 55.9 35.4 70.0 52.7 4 50.7 70.9 60.8 45.8 78.2 62.0 5 50.8 70.1 60.5 37.5 73.0 55.3 6 52.7 70.7 61.7 46.0 86.1 66.0 7 56.0 74.6 65.3 54.3 95.9 75.1 8 52.6 69.9 61.3 53.7 83.7 68.7 9 55.8 70.8 63.3 45.9 75.6 60.7 10 58.9 72.8 65.8 48.0 90.1 69.1 11 60.5 75.3 67.9 52.0 96.9 74.5 12 55.9 78.5 67.2 46.0 86.6 66.3 13 55.3 80.2 67.7 49.7 106.6 78.1 14 54.3 72.5 63.4 48.7 83.8 66.2 15 58.0 73.3 65.6 56.0 81.3 68.7 16 48.7 68.8 58.7 57.9 78.9 68.4 17 51.1 69.0 60.0 37.3 82.2 59.8 18 59.6 81.0 70.3 61.7 104.7 83.2 19 48.9 72.4 60.7 34.3 90.6 62.4 20 57.8 74.7 66.2 44.0 105.9 75.0 21 55.3 72.4 63.8 41.3 82.5 61.9 22 48.7 71.0 59.9 28.6 78.7 53.7 23 55.6 69.9 62.7 41.9 88.6 65.2 24 53.8 73.0 63.4 34.2 73.2 53.7 25 50.5 68.0 59.2 49.4 88.0 68.7 26 49.3 64.5 56.9 43.3 61.5 52.4 27 49.0 73.5 61.3 42.8 85.7 64.2 28 56.3 73.4 64.8 45.1 79.4 62.2 29 50.3 72.6 61.5 42.0 86.9 64.5 30 52.3 72.4 62.3 35.3 95.6 65.5 31 55.1 77.3 66.2 45.1 96.3 70.7 32 62.1 72.2 67.2 49.5 92.9 71.2 33 59.5 70.6 65.1 48.6 73.0 60.8 34 53.6 76.1 64.8 43.4 95.4 69.4 35 53.9 69.5 61.7 47.5 79.8 63.7 36 54.5 72.0 63.2 47.5 82.3 64.9 37 57.1 78.7 67.9 52.3 103.1 77.7 38 57.6 73.2 65.4 62.9 90.1 76.5 39 56.0 77.6 66.8 43.6 109.4 76.5 40 51.1 68.8 60.0 38.7 80.0 59.3 Continued

Table 4.1: PF/3 results for observer accuracy and repeatability.

94

Table 4.1 continued

PF/3 (Observer Accuracy) PF/3 (Observer Repeatability) Observer Filtered F7 Avg Filtered F7 Avg Tungsten Tungsten 41 50.0 71.1 60.6 29.3 73.3 51.3 42 53.7 73.5 63.6 49.2 81.4 65.3 43 51.4 72.1 61.7 47.2 87.5 67.3 44 49.7 68.0 58.8 44.2 85.6 64.9 45 59.9 79.5 69.7 61.0 124.3 92.7 46 55.8 79.8 67.8 48.8 107.1 78.0 47 69.4 80.3 74.9 64.8 101.7 83.2 48 54.9 73.0 63.9 45.6 85.9 65.7 49 56.3 77.0 66.7 52.9 101.7 77.3 50 59.3 77.9 68.6 36.9 93.5 65.2 51 57.2 75.9 66.5 67.6 105.7 86.7 52 55.2 72.4 63.8 48.7 81.0 64.8 53 53.2 69.6 61.4 53.3 85.9 69.6 54 55.9 79.3 67.6 39.6 101.8 70.7 55 56.2 73.6 64.9 49.3 86.4 67.8 56 55.6 75.3 65.4 46.4 103.4 74.9 57 56.0 71.7 63.9 60.2 112.2 86.2 58 49.7 68.5 59.1 34.0 90.0 62.0 59 63.1 76.9 70.0 64.1 99.5 81.8 Avg 54.7 73.2 64.0 46.9 89.8 68.4 Min 46.6 64.5 55.9 28.6 61.5 51.3 Max 69.4 81.0 74.9 67.6 124.3 92.7

4.1.3 Drift of observer results. Drift of observer results was investigated 2 ways: time series plots of observer accuracy for each experiment in succession and analysis of variance in observer accuracy by date of experiment, time of day, simulator, and experiment number. Time series plots for each simulator are shown below. These plots graph observer accuracy against experiment number (Figures 4.1-4.2). For the filtered tungsten light source, it can be seen that experiment number may be related in some manner to observer accuracy. As experiment number increases,

95

PF/3’s for observer accuracy decrease. Figure 4.2 shows that under F7, observers are also more accurate with subsequent experiments. Under F7, there is a more severe slope between experiments 1 and 3 than for filtered tungsten, and changes in accuracy level off by experiment 4 under each simulator. Since, for both simulators, PF/3’s for accuracy decrease over time, and therefore observers became more accurate with experience, and a training effect may be present.

Time series plot for Tungsten Lightbox for four expriments

Variable Abby Mortine 90 Amber Reames Gina Cobb Lindsay Williams H eather J ordan Alison Vrable Jamie Bradley Youngju Lee Amy Shunk 80 Laura Yakumithis Rebecca G ehlhar Teri Buts ch Amanda Montgomery Heather Fulmer Amber Childs Melanie Mitchell Renee Zamborsky 70 Venus Arias Elizabeth Jackson Ashley Spitler Lisa Fnu Belda Farika Janelle Harris Tricia Riffle Katelynn Burkholder 60 Katie Stahl

Data Megan Brenner Megan Cavinee J ennifer Beck Teresa Broussard J enn Scott Sarah Stern Buncha A 50 Beth Silverman J ennifer Solnes Kris ten O bermiyer Katie U hl Elyse Schaas Lisa Poppovich Nga Trinh 40 Amber Lawrenc e Alice Tkac Lauren G onso Lindsay Rogers Brianna Frost Sarah Poland Lisa Metzler Ashley Schaffeld 30 Avonne Thompson Sara I jams J ennifer Bubnick 1 2 3 4 Amanda Sc ott Lauren Klatt Sarah U ldricks Index J enny Lewis D arren W as hi ngton Sarah Musa Heather Dawson

Figure 4.1: Time series plot of observer accuracy vs. experiment number for filtered tungsten.

96

Time series plot for F7 Lightbox for four expriments

Variable Abby Mortine Amber Reames 110 Gina Cobb Lindsay Williams H eather J ordan Alison Vrable Jamie Bradley Youngju Lee Amy Shunk Laura Yakumithis 100 Rebecca G ehlhar Teri Buts ch Amanda Montgomery Heather Fulmer Amber Childs Melanie Mitchell Renee Zamborsky 90 Venus Arias Elizabeth Jackson Ashley Spitler Lisa Fnu Belda Farika Janelle Harris Tricia Riffle Katelynn Burkholder 80 Katie Stahl

Data Megan Brenner Megan Cavinee J ennifer Beck Teresa Broussard J enn Scott Sarah Stern 70 Buncha A Beth Silverman J ennifer Solnes Kris ten O bermiyer Katie U hl Elyse Schaas Lisa Poppovich Nga Trinh 60 Amber Lawrenc e Alice Tkac Lauren G onso Lindsay Rogers Brianna Frost Sarah Poland Lisa Metzler 50 Ashley Schaffeld Avonne Thompson Sara I jams J ennifer Bubnick 1 2 3 4 Amanda Sc ott Lauren Klatt Sarah U ldricks Index J enny Lewis D arren W as hi ngton Sarah Musa Heather Dawson

Figure 4.2: Time series plot of observer accuracy vs. experiment number for F7.

To verify the presence of a training effect, an ANOVA of observer accuracy was run with the following variables: date of experiment, time of day of experiment, simulator, and number of experiment (Table 4.2). Neither date of experiment nor time has a significant effect on observer accuracy, but this may be attributed to their high degrees of freedom. On the other hand, both simulator and number of experiment, each with low degrees of freedom, have significant effects (p=.0001) on observer accuracy. The significant effect of simulator is consistent with the paired t-test of observer accuracy by

97

simulator. As already shown, observer accuracy is better when samples are

viewed in the filtered tungsten simulator. The ANOVA results confirm the

presence of a training effect as indicated in Figures 4.1 and 4.2.

Effect Num DF Den DF F p-value Date 34 359 1.24 .1716 Time 16 359 0.82 .6587 Simulator 1 359 1111.08 .0001 Experiment 3 359 184.63 .0001

Table 4.2: ANOVA of date of experiment, time, and simulator on observer accuracy.

4.1.4 Inter- and intra-observer reliability. Humans are notoriously

inconsistent. When they are used as part of an experimental procedure, consistency of results may be questioned. This certainly holds true for color

difference judgments, as color vision is not static over time. Reliability

calculations are used to assess the degree to which different observers give

consistent estimates of the same phenomenon. Inter-observer reliability

relates to the issue of whether several observers would judge color difference

similarly to each other when the color samples are viewed under the same

conditions. Since measures were dichotomous, Kappa statistics were used to

assess inter-observer reliability for each of the 4 experiments (Table 4.3).

The maximum Kappa coefficient for either simulator is .33075. According to

the scale discussed in Chapter 3, there is a poor level of agreement among

observers. This is not a surprising result since all persons differ in color

98

vision. A Wilcoxon signed rank test for the difference between simulators

does not show a difference in inter-observer reliability between the simulators,

W(4)=2.0, p=.361.

Filtered Tungsten F7 Experiment Kappa Std. Error z-value Kappa Std. Error z-value Coefficient Coefficient 1 .33075 .0036 92.79 .30316 .0036 85.05 2 .28356 .0036 79.55 .30316 .0036 85.05 3 .28356 .0036 79.55 .23206 .0036 65.10 4 .26581 .0036 74.57 .25887 .0036 72.63 p-value<.0001 in all cases

Table 4.3: Kappa coefficients for inter-observer reliability.

Another issue with consistency of human observers is intra-observer

reliability, i.e., whether an individual observer rates color difference similarly in

repeated experiments. Since one goal of this research was to assess the

variability of each individual observer, the experiment was repeated 4 times in

order to gain a valid assessment of variability. Kappa coefficients were

calculated for each individual observer across the 4 repetitions for each

simulator (Table 4.4). In general, the Kappa coefficients show poor

agreement within an observer. Results for filtered tungsten ranged from .029

(very poor) to .712 (good), while the range for F7 is smaller (.138 to .652).

However, a paired t-test of the two simulators indicates that Kappa

coefficients for intra-observer reliability under filtered tungsten (M=0.377,

SD=0.135) is not significantly different from those for F7 (M=0.373,

99

SD=0.119), t(59)=0.28, p=0.780. Consequently, intra-observer reliability is not improved by either simulator, and agreement within observers is poor overall. On average, the observers disagree with themselves 63% of the time, which is similar to the average PF/3 for observer repeatability (64.8).

This result, too, is much larger than the 30% estimated by Kuehni (2003b).

Since observers disagree with themselves so often, perhaps small color differences do not even matter to the average consumer. In such a case, the precision of ∆E may not need to be so tight with regard to textile products.

Because color difference equations are created by fitting visual results that have a degree of variability comparable to that obtained in this research, then the equations may be incorrect. Perhaps, since observers are so variable, it is unlikely that any color difference formula will achieve better agreement with average visual judgments than that which is already achieved.

On the other hand, since it is possible that some observers are more accurate and repeatable, and there appears to be a training effect in which observers evaluations actually come closer to the ∆E’s calculated from instrumental measurements, then the instrumental measurements are likely to be valid. They reflect the color differences evaluated by more sophisticated, trained observers. Thus, if these measurements predict judgments of trained observers, and trained observers are the most discriminating in color

100

difference judgments, color differences identified by instrumental measures are ones that should also be acceptable to the population at large, who has much less discriminating color judgment.

101

Filtered Tungsten F7 Observer Kappa Std. Error z-value Kappa Std. Error z-value Coefficient Coefficient 1 .35439 .0610 5.89* .36625 .0610 6.08* 2 .30269 .0610 5.03* .41413 .0610 6.88* 3 .62851 .0610 10.44* .60539 .0610 10.06* 4 .51970 .0610 8.62* .3047 .0610 6.32* 5 .41130 .0610 8.63* .63647 .0610 6.04* 6 .41781 .0610 6.94* .44016 .0610 7.31* 7 .43240 .0610 7.18* .28420 .0610 4.72* 8 .51810 .0610 8.61* .50136 .0610 8.33* 9 .24931 .0610 4.14* .18542 .0610 3.08** 10 .28707 .0610 4.77* .13750 .0610 2.28** 11 .087302 .0610 1.45*** .22421 .0610 3.72* 12 .36204 .0610 6.01* .18400 .0610 3.06** 13 .30517 .0610 5.07* .27722 .0610 4.61* 14 .36976 .0610 6.14* .43664 .0610 7.25* 15 .31086 .0610 5.16* .28644 .0610 4.76* 16 .71163 .0610 11.82* .40050 .0610 6.65* 17 .41587 .0610 6.91* .51612 .0610 8.57* 18 .37391 .0610 6.21* .26326 .0610 4.37* 19 .48232 .0610 8.01* .35662 .0610 5.92* 20 .23178 .0610 3.85* .31365 .0610 5.21* 21 .34572 .0610 5.74* .40108 .0610 6.66* 22 .43034 .0610 7.15* .35672 .0610 5.93* 23 .45336 .0610 7.53* .53885 .0610 8.95* 24 .31852 .0610 5.29* .19331 .0610 3.21** 25 .62686 .0610 10.41* .65201 .0610 10.83* 26 .51194 .0610 8.50* .39293 .0610 6.52* 27 .45449 .0610 7.55* .28882 .0610 4.80* 28 .33049 .0610 5.49* .28265 .0610 4.70* 29 .51770 .0610 8.60* .49872 .0610 8.29* 30 .26984 .0610 4.48* .45507 .0610 7.56* 31 .35132 .0610 5.84* .23554 .0610 3.91* 32 .18841 .0610 3.13** .50980 .0610 8.47* 33 .097258 .0610 1.62** .29757 .0610 4.94* 34 .38399 .0610 6.38* .26414 .0610 4.39* 35 .47045 .0610 7.82* .46199 .0610 7.68* 36 .34884 .0610 5.80* .46377 .0610 7.70* 37 .25612 .0610 4.25* .29103 .0610 4.83* 38 .48031 .0610 7.98* .36234 .0610 6.02* 39 .29429 .0610 4.89* .43834 .0610 7.28* 40 .38841 .0610 6.45* .44704 .0610 7.43* Continued

Table 4.4: Kappa coefficients for intra-observer reliability.

102

Table 4.4 continued

Filtered Tungsten F7 Observer Kappa Std. Error z-value Kappa Std. Error z-value Coefficient Coefficient 41 .40326 .0610 6.67* .35672 .0610 5.93* 42 .46793 .0610 7.77* .34720 .0610 5.77* 43 .54896 .0610 9.12* .44484 .0610 7.39* 44 .50631 .0610 8.41* .57854 .0610 9.61* 45 .33130 .0610 5.50* .44449 .0610 7.38* 46 .39791 .0610 6.61* .26209 .0610 4.35* 47 .029125 .0610 0.48**** .22456 .0610 3.73* 48 .39102 .0610 6.50* .44692 .0610 4.42* 49 .29911 .0610 4.97* .31908 .0610 5.30* 50 .15610 .0610 2.59** .28726 .0610 4.77* 51 .52850 .0610 8.78* .29757 .0610 7.94* 52 .45449 .0610 7.55* .37811 .0610 6.28* 53 .49994 .0610 8.31* .54007 .0610 8.97* 54 .13173 .0610 2.19** .19298 .0610 3.21** 55 .35699 .0610 5.93* .34301 .0610 5.70* 56 .34050 .0610 5.66* .38268 .0610 6.36* 57 .46014 .0610 7.64* .49598 .0610 8.24* 58 .47292 .0610 7.86* .40203 .0610 6.68* 59 .17857 .0610 2.97** .26766 .0610 4.45* *p-value<.0001 **p-value<.05 ***p-value<.07 ****p-value<.3142

4.2 D65 Simulator Performance

4.2.1 Comparing simulators. Performance of simulators was analyzed with both PF/3 values and paired t-tests. PF/3’s were calculated between the visual results obtained under each simulator (Table 4.5). For the set of color samples, including both constant and inconstant colors, a PF/3 of 45.33 was found, indicating considerable variation between simulators. This is high compared to the results of Xu et al. (2003b) which range from 11 to 52, with an average of 21. Again though, one must keep in mind that the current

103

study differs in the method of comparison, panel of observers, and samples, leading to different PF/3’s. While this research indicates a larger amount of variation between simulators, this variation is smaller than that contributed by observers, since the PF/3 between the visual results obtained under each simulator is 45.33 which is approximately 20 units less than the average

PF/3’s for observer accuracy and repeatability.

PF/3 95% lower CI 95% upper CI All samples 45.33 11.58 73.56 Constant samples 51.00 11.51 83.46 Inconstant samples 15.42 6.22 22.46

Table 4.5: PF/3’s of visual results between simulators.

A paired t-test of the difference between visual results for the two simulators demonstrated that filtered tungsten (M=2.52, SD=1.42) is significantly different from F7 (M=3.24, SD=1.53) t(46)= -14.22, p=0.000.

When testing only constant colors, filtered tungsten (M=2.34, SD=1.35) is significantly different from F7 (M=2.99, SD=1.40) t(35)= -14.69, p=0.000. The same result was found when testing only inconstant colors: filtered tungsten

(M=3.10, SD=1.55) is significantly different from F7 (M=4.05, SD=1.71) t(11)=

-6.62, p=0.000. These results demonstrate what has been shown in previous analyses: that the light sources are different and observers judge colors differently under each. PF/3 values also indicate that the difference between

104

the visual observations obtained between the two simulators is large.

Consequently, it can be inferred that the difference in simulators contributes to overall disagreement in color matching, and this difference is a major contributor to color difference variation. With only two simulators to compare, it is impossible to determine which is better with this analysis.

It must be noted, however, that the results from the comparison of simulators may not be dependable, due to an issue with the standard neutral difference pair which was discovered after experimenting began. Many observers noted that the color difference between the standard neutral pair seemed larger when viewed under F7 than under filtered tungsten. Further investigation, however, revealed that ∆E’s of the grays, calculated with the appropriate SPD, did not differ greatly (Table 4.6).

∆E Filtered F7 D65 Tungsten CIELAB 2.74 2.76 2.76 CMC(2:1) 1.46 1.48 1.51 CIE94(2:1:1) 1.43 1.44 1.45 DIEDE2000(2:1:1) 1.51 1.53 1.53

Table 4.6: Color differences of standard neutral reference pair under each simulator.

Although the formulae indicate only small differences in ∆E of the gray pairs, an additional experiment was performed in which the simulators were placed side by side, with a thin divider placed between them. The same

105

standard difference pair was placed on 45º angle boards in each simulator.

Eleven observers with average to superior color vision as tested by the 100

Hue Test were placed with their forehead at the same point on the divider so

that the left eye could only view F7 and the right eye could only view filtered

tungsten. Observers remained in the position for 5 minutes and then were

told that the gray pair on the left (F7) had a color difference of 1. They were

asked to indicate the color difference of the gray pair illuminated by filtered

tungsten (Table 4.7). 73% of the observers perceived the color difference of the gray pair to be larger under F7. This only further shows that color appears different under different D65 simulators and that color difference calculated by equations does not accurately predict visual observations.

Observer Color Difference under filtered tungsten 1 .75 2 .3 3 .5 4 .6 5 .7 6 1 7 1 8 2 9 .9 10 .5 11 .5

Table 4.7: Observer indications of color difference of standard neutral difference pair under filtered tungsten relative to a color difference of 1 under F7.

106

4.2.2 Performance of formulae. Four color difference formulae were

tested: CIELAB, CMC(2:1), CIE94(2:1:1), and CIEDE2000(2:1:1). The

performance of these formulae was analyzed using PF/3’s of visual results vs.

the ∆E calculated with the SPD of the appropriate simulator. This was done for all experiments, combined and separated, for all samples, and for samples separated into constant and inconstant groups (Table 4.8).

Previous studies use a simple ranking of PF/3’s to determine performance of color difference equations. When looking at all samples,

CIEDE2000 performs best based on the ranking method. The same results hold true for each experiment. PF/3’s ranged from 35.8 to 99.6, indicating that agreement between visual observations and color difference equations is less than found by Luo et al. (2001). In their comparison of color difference formulae compared against 5 major sets of visual results, PF/3’s ranged from

19.0 to 70.3 for the same equations compared in this research (Luo et al.,

2001). For the data investigated here, though, no equation shows a significant difference over any other. After using a bootstrap method of deriving the probability distribution of PF/3, the null hypothesis of no difference among color difference equations could not be rejected at the

α=.05 level. The 95% confidence intervals overlap. Since there is no significant difference between equations, there is no justification in using a

107

different equation than the one already employed. Confidence intervals for

CIEDE2000 have only a small overlap with those for CIELAB. Therefore, it is concluded that CIEDE2000 performs better than CIELAB.

Separating out constant and inconstant samples also does not

statistically demonstrate an advantage of one color difference equation over

another as all confidence intervals overlap. It appears that PF/3’s for

inconstant samples are smaller than those for constant samples, but this is

not statistically significant. The inconstant samples display larger color

differences than constant samples, thereby increasing agreement between

observers and equations. Therefore, nothing can be concluded from this

research about the variation in color difference studies caused by color

inconstancy.

An interesting result found in this analysis, is that PF/3’s tend to

decrease as experiment time increases. Since ∆E’s of the color samples do

not change for subsequent experiment times, this must be attributed to

changes in visual judgments, indicating that observer judgments of color

differences are more closely represented by color difference equations after

observers gain experience in color matching. This result is similar to the

analysis of observer drift where observer accuracy increased with subsequent

experiments (Figures 4.1 and 4.2).

108

All Samples Filtered 95% 95% F7 95% 95% Tungsten lower upper lower upper CI CI CI CI All experiments CIELAB 51.1 40.2 60.0 99.6 42.3 147.7 CMC(2:1) 42.9 32.1 51.5 83.7 31.1 127.7 CIE94(2:1:1) 42.4 34.3 48.6 84.6 35.5 125.7 CIEDE2000(2:1:1) 35.7 27.4 42.0 73.3 32.7 107.5

Experiment 1 CIELAB 83.7 42.5 118.7 94.8 70.8 160.0 CMC(2:1) 73.5 36.3 105.1 81.1 55.5 144.2 CIE94(2:1:1) 71.3 36.5 100.7 85.5 59.9 147.2 CIEDE2000(2:1:1) 62.3 32.9 87.3 84.9 50.3 147.1

Experiment 2 CIELAB 80.7 39.6 116.0 58.9 48.9 71.3 CMC(2:1) 72.0 34.8 104.1 45.5 37.7 55.5 CIE94(2:1:1) 68.7 33.9 98.4 49.4 41.0 58.9 CIEDE2000(2:1:1) 58.6 29.8 83.1 45.8 36.3 54.5

Experiment 3 CIELAB 51.1 41.6 58.5 52.0 41.5 60.9 CMC(2:1) 37.8 30.6 43.7 39.2 31.1 45.9 CIE94(2:1:1) 43.6 35.9 49.6 42.2 33.6 49.3 CIEDE2000(2:1:1) 37.2 29.9 43.2 38.0 27.7 46.7

Experiment 4 CIELAB 46.9 37.2 75.4 53.4 42.1 62.4 CMC(2:1) 41.1 28.9 79.1 40.7 32.9 46.8 CIE94(2:1:1) 40.6 29.3 78.5 43.9 34.7 51.2 CIEDE2000(2:1:1) 35.8 23.8 76.1 39.2 30.5 46.4

Continued

Table 4.8: Performance of color difference equations (PF/3’s of ∆V vs. ∆E(sim.)).

109

Table 4.8 continued

Constant Samples Filtered 95% 95% F7 95% 95% Tungsten lower upper lower upper CI CI CI CI All experiments CIELAB 54.9 41.4 65.4 110.7 41.6 168.5 CMC(2:1) 46.8 33.4 57.2 93.4 30.0 146.0 CIE94(2:1:1) 45.6 35.8 52.7 95.1 35.5 144.5 CIEDE2000(2:1:1) 37.5 27.7 45.1 81.4 32.2 122.3

Experiment 1 CIELAB 93.6 46.0 134.0 104.1 42.5 156.1 CMC(2:1) 83.4 39.7 120.6 89.3 28.5 146.1 CIE94(2:1:1) 80.0 39.1 114.2 95.1 36.3 152.7 CIEDE2000(2:1:1) 69.4 34.5 98.7 94.4 32.3 164.6

Experiment 2 CIELAB 91.2 41.4 131.7 62.3 50.1 76.3 CMC(2:1) 82.8 37.2 120.1 47.8 38.2 59.2 CIE94(2:1:1) 78.2 35.6 112.4 52.9 42.5 63.7 CIEDE2000(2:1:1) 66.2 31.1 94.4 48.3 36.2 58.4

Experiment 3 CIELAB 54.6 43.5 63.2 55.4 42.7 65.7 CMC(2:1) 40.0 31.9 46.4 41.7 32.1 49.5 CIE94(2:1:1) 46.8 38.0 53.3 45.6 35.5 53.6 CIEDE2000(2:1:1) 39.2 30.5 45.9 40.3 27.8 50.6

Experiment 4 CIELAB 49.5 39.5 57.1 53.9 40.2 65.0 CMC(2:1) 44.2 32.8 53.5 40.4 31.2 47.7 CIE94(2:1:1) 42.8 34.9 48.4 44.5 33.3 53.6 CIEDE2000(2:1:1) 37.2 27.5 44.9 38.5 27.6 47.4

Continued

110

Table 4.8 continued

Inconstant Samples Filtered 95% 95% F7 95% 95% Tungsten lower upper lower upper CI CI CI CI All experiments CIELAB 37.7 21.0 48.9 44.4 26.1 56.3 CMC(2:1) 29.5 15.5 39.3 35.2 19.8 45.6 CIE94(2:1:1) 33.2 18.9 43.1 37.0 21.6 47.5 CIEDE2000(2:1:1) 30.6 16.2 40.8 36.0 21.7 45.7

Experiment 1 CIELAB 42.8 25.1 54.9 54.9 26.4 74.9 CMC(2:1) 34.6 18.3 46.2 45.7 18.2 65.4 CIE94(2:1:1) 38.3 21.0 50.6 46.8 19.1 66.6 CIEDE2000(2:1:1) 35.6 18.3 48.1 45.2 18.7 64.3

Experiment 2 CIELAB 35.7 20.8 45.7 45.6 28.9 56.1 CMC(2:1) 27.6 15.7 36.1 36.6 23.2 45.1 CIE94(2:1:1) 31.2 18.7 40.3 38.1 24.0 47.3 CIEDE2000(2:1:1) 28.4 15.6 37.8 37.7 24.8 45.9

Experiment 3 CIELAB 39.1 19.4 52.6 35.6 19.4 46.6 CMC(2:1) 30.8 14.9 42.0 26.9 13.6 36.2 CIE94(2:1:1) 34.1 18.6 44.8 29.1 16.1 38.0 CIEDE2000(2:1:1) 31.7 16.6 42.4 28.6 16.1 37.3

Experiment 4 CIELAB 37.4 -11.5 150.7 47.8 28.7 60.3 CMC(2:1) 29.9 13.6 123.4 38.4 22.8 48.8 CIE94(2:1:1) 34.2 3.8 133.7 40.7 25.2 50.8 CIEDE2000(2:1:1) 31.6 7.9 135.9 39.6 25.5 48.8

111

4.2.3 Assessment of quality of simulators. The assessment of the

quality of simulators was evaluated with 3 different measures. First the MIvis was determined as reported in Chapter 3 with the filtered tungsten and F7 simulators having a MIvis of .23 and .58 respectively. According to this

measure, filtered tungsten is shown to be a better light source.

The second measure used to assess the simulators was the

comparison of PF/3’s of visual results and color difference equations

calculated with the D65 illuminant. This measure is used since it can be

assumed that the best agreement in results between observers and

instrumental calculations using the standard illuminant would indicate a better

simulator (Xu et al., 2003b). For this measure, 4 color difference equations

were used and results were calculated for the combination of all 4

experiments as well as for each individual experiment and for all samples as

well as for constant and inconstant samples.

If simple ranking of PF/3 values is used to determine quality of

simulators, then the filtered tungsten simulator is better since PF/3 values

were smaller than those of F7. However, confidence intervals indicate that

the PF/3’s are not statistically different from one another at the .05 level,

except in a few instances where CIEDE2000 was significantly better than

CIELAB. Overall, this measure does not indicate that either simulator is

statistically better than the other; however, since much variation in industrial

color evaluation is caused by using multiple simulators in visual judgments,

112

variation would be reduced by using only one simulator. Therefore, this

research indicates that of the two simulators tested, filtered tungsten should

be used as a D65 simulator, since overall, there is less variation with filtered

tungsten.

The third measure used to assess the quality of the simulators is the

comparison of PF/3’s of ∆E’s calculated with the SPD of the simulator and the

∆E’s calculated with the D65 illuminant. This measure is used since the

variability in the observers is removed and it is a similar measure to MIvis except that it uses many more metameric samples (Xu et al., 2003b). Color samples used in this research all exhibit metamerism since reflectance curves are different for all. As in the other examples, these PF/3’s were calculated for

all samples as well as constant and inconstant samples. Simple ranking

alone indicates F7 as a better simulator; however, examination of confidence

intervals lead to the conclusion that there is no statistical difference between

PF/3 measures.

Overall, PF/3’s using visual results tend to be higher than those comparing ∆E’s. PF/3’s involving visual results show that there is at least

24% variation between observers and color difference formulae and this result generally indicates better agreement between visual judgments under filtered tungsten and color difference equations than for visual judgments

113

under F7. At most, this variation is 99%. When comparing ∆E’s, variation is

between 17% and 30%. Overall, it can be concluded that variations between

light sources are overshadowed by the variation in visual judgments.

All Samples 95% 95% 95% 95% Filtered lower upper lower upper Tungsten CI CI F7 CI CI All experiments ∆V (sim.) vs. ∆E*ab (CIE D65 illum.) 46.4 33.8 57.0 96.7 36.9 146.6 ∆V (sim.) vs. ∆E*cmc (CIE D65 illum.) 33.8 22.5 43.0 81.5 24.4 128.8 ∆V (sim.) vs. ∆E*94 (CIE D65 illum.) 34.1 24.2 42.3 81.3 27.7 126.1 ∆V (sim.) vs. ∆E*00 (CIE D65 illum.) 24.6 18.4 29.4 68.0 24.8 104.3

Experiment 1 ∆V (sim.) vs. ∆E*ab (CIE D65 illum.) 80.2 35.0 118.1 83.5 67.1 141.8 ∆V (sim.) vs. ∆E*cmc (CIE D65 illum.) 67.3 23.6 103.5 68.1 52.6 121.8 ∆V (sim.) vs. ∆E*94 (CIE D65 illum.) 65.8 24.5 100.1 72.3 55.5 125.6 ∆V (sim.) vs. ∆E*00 (CIE D65 illum.) 53.8 20.9 81.2 69.5 49.1 118.3

Experiment 2 ∆V (sim.) vs. ∆E*ab (CIE D65 illum.) 78.3 33.5 116.6 54.2 43.8 67.9 ∆V (sim.) vs. ∆E*cmc (CIE D65 illum.) 66.7 23.7 103.1 39.8 31.9 51.3 ∆V (sim.) vs. ∆E*94 (CIE D65 illum.) 65.0 23.9 99.7 42.6 33.9 53.9 ∆V (sim.) vs. ∆E*00 (CIE D65 illum.) 52.1 19.0 80.1 38.2 29.4 47.2

Experiment 3 ∆V (sim.) vs. ∆E*ab (CIE D65 illum.) 47.4 36.9 55.9 47.0 35.8 56.5 ∆V (sim.) vs. ∆E*cmc (CIE D65 illum.) 33.2 25.1 40.1 33.8 26.0 40.4 ∆V (sim.) vs. ∆E*94 (CIE D65 illum.) 36.5 28.1 43.5 35.3 26.3 43.0 ∆V (sim.) vs. ∆E*00 (CIE D65 illum.) 28.4 20.6 35.2 29.5 22.2 35.7

Experiment 4 ∆V (sim.) vs. ∆E*ab (CIE D65 illum.) 41.2 30.5 73.8 49.3 37.1 59.3 ∆V (sim.) vs. ∆E*cmc (CIE D65 illum.) 30.3 17.7 74.6 35.9 27.3 42.9 ∆V (sim.) vs. ∆E*94 (CIE D65 illum.) 30.6 17.7 77.0 37.8 27.1 46.7 ∆V (sim.) vs. ∆E*00 (CIE D65 illum.) 23.8 10.1 75.0 32.0 22.9 39.7

Color calculations ∆E*ab (sim.) vs. ∆E*ab (CIE D65 illum.) 18.9 12.6 24.5 17.3 10.9 23.0 ∆E*cmc (sim.) vs. ∆E*cmc (CIE D65 illum.) 29.7 15.7 42.0 19.9 13.4 25.6 ∆E*94 (sim.) vs. ∆E*94 (CIE D65 illum.) 22.9 16.0 29.1 20.1 14.0 25.6 ∆E*00 (sim.) vs. ∆E*00 (CIE D65 illum.) 21.8 14.6 28.2 20.1 13.3 25.9 Continued

Table 4.9: PF/3’s for performance of simulators.

114

Table 4.9 continued

Constant samples 95% 95% 95% 95% Filtered lower upper lower upper CI Tungsten CI CI F7 CI All experiments ∆V (sim.) vs. ∆E*ab (CIE D65 illum.) 50.9 35.7 62.9 109.5 37.7 169.2 ∆V (sim.) vs. ∆E*cmc (CIE D65 illum.) 36.9 23.5 47.3 91.9 24.0 147.9 ∆V (sim.) vs. ∆E*94 (CIE D65 illum.) 37.8 26.0 47.1 92.4 28.5 145.4 ∆V (sim.) vs. ∆E*00 (CIE D65 illum.) 26.2 18.9 31.7 76.3 24.6 119.1

Experiment 1 ∆V (sim.) vs. ∆E*ab (CIE D65 illum.) 90.7 39.6 133.8 93.3 44.1 133.1 ∆V (sim.) vs .∆E*cmc (CIE D65 illum.) 76.4 27.6 117.2 76.0 31.4 117.2 ∆V (sim.) vs. ∆E*94 (CIE D65 illum.) 74.9 28.6 113.6 81.4 35.6 126.0 ∆V (sim.) vs. ∆E*00 (CIE D65 illum.) 60.7 23.8 91.5 78.0 35.8 131.6

Experiment 2 ∆V (sim.) vs. ∆E*ab (CIE D65 illum.) 88.9 35.6 132.2 58.8 46.0 74.1 ∆V (sim.) vs. ∆E*cmc (CIE D65 illum.) 75.6 24.6 116.6 42.3 32.9 55.1 ∆V (sim.) vs. ∆E*94 (CIE D65 illum.) 74.1 25.4 113.0 46.2 35.7 58.7 ∆V (sim.) vs. ∆E*00 (CIE D65 illum.) 58.9 19.4 90.3 40.2 29.7 50.2

Experiment 3 ∆V (sim.) vs. ∆E*ab (CIE D65 illum.) 47.4 36.9 55.9 51.1 37.5 62.2 ∆V (sim.) vs. ∆E*cmc (CIE D65 illum.) 33.2 25.1 40.1 35.2 26.0 42.7 ∆V (sim.) vs. ∆E*94 (CIE D65 illum.) 36.5 28.1 43.5 37.9 27.1 46.6 ∆V (sim.) vs. ∆E*00 (CIE D65 illum.) 28.4 20.6 35.2 30.1 21.7 37.1

Experiment 4 ∆V (sim.) vs. ∆E*ab (CIE D65 illum.) 44.2 33.4 52.7 52.1 37.1 64.3 ∆V (sim.) vs. ∆E*cmc (CIE D65 illum.) 32.1 23.4 39.2 37.1 26.7 45.6 ∆V (sim.) vs. ∆E*94 (CIE D65 illum.) 33.1 26.2 38.4 39.9 26.8 50.7 ∆V (sim.) vs. ∆E*00 (CIE D65 illum.) 24.8 18.3 30.1 32.1 20.8 41.5

Color calculations ∆E*ab (sim.) vs. ∆E*ab (CIE D65 illum.) 18.9 10.3 26.0 16.9 8.2 24.0 ∆E*cmc (sim.) vs. ∆E*cmc (CIE D65 illum.) 30.0 11.9 44.8 17.8 9.1 25.0 ∆E*94 (sim.) vs. ∆E*94 (CIE D65 illum.) 20.7 11.8 28.1 18.2 9.6 25.3 ∆E*00 (sim.) vs. ∆E*00 (CIE D65 illum.) 20.3 10.4 28.3 18.6 8.9 26.4 Continued

115

Table 4.9 continued

Inconstant samples 95% 95% 95% 95% Filtered lower upper lower upper Tungsten CI CI F7 CI CI All experiments ∆V (sim.) vs. ∆E*ab (CIE D65 illum.) 28.1 12.9 38.8 34.4 22.4 42.2 ∆V (sim.) vs. ∆E*cmc (CIE D65 illum.) 19.0 7.7 27.3 25.4 14.9 33.3 ∆V (sim.) vs. ∆E*94 (CIE D65 illum.) 19.0 9.6 25.8 25.9 16.7 32.8 ∆V (sim.) vs. ∆E*00 (CIE D65 illum.) 18.0 9.2 24.5 26.3 15.7 34.4

Experiment 1 ∆V (sim.) vs. ∆E*ab (CIE D65 illum.) 31.6 18.9 40.3 42.8 25.3 54.8 ∆V (sim.) vs. ∆E*cmc (CIE D65 illum.) 20.6 11.6 27.1 31.8 18.8 41.1 ∆V (sim.) vs. ∆E*94 (CIE D65 illum.) 21.8 13.6 27.6 32.7 19.4 42.2 ∆V (sim.) vs. ∆E*00 (CIE D65 illum.) 20.4 12.3 26.3 31.9 19.6 40.6

Experiment 2 ∆V (sim.) vs. ∆E*ab (CIE D65 illum.) 27.5 11.9 38.7 37.7 25.7 45.0 ∆V (sim.) vs. ∆E*cmc (CIE D65 illum.) 20.1 10.0 27.6 29.7 15.8 39.6 ∆V (sim.) vs. ∆E*94 (CIE D65 illum.) 20.1 11.9 26.1 29.8 17.6 38.5 ∆V (sim.) vs. ∆E*00 (CIE D65 illum.) 18.8 10.9 24.7 30.9 15.6 41.9

Experiment 3 ∆V (sim.) vs. ∆E*ab (CIE D65 illum.) 29.9 11.9 42.8 29.2 16.6 37.9 ∆V (sim.) vs. ∆E*cmc (CIE D65 illum.) 21.3 8.1 30.9 25.0 13.5 33.5 ∆V (sim.) vs. ∆E*94 (CIE D65 illum.) 20.5 9.0 28.7 24.3 14.2 31.7 ∆V (sim.) vs. ∆E*00 (CIE D65 illum.) 20.1 9.8 27.5 25.7 13.4 34.8

Experiment 4 ∆V (sim.) vs. ∆E*ab (CIE D65 illum.) 27.9 3.9 132.4 38.0 22.7 48.1 ∆V (sim.) vs. ∆E*cmc (CIE D65 illum.) 19.6 1.9 131.1 29.0 16.2 38.0 ∆V (sim.) vs. ∆E*94 (CIE D65 illum.) 19.4 -10.5 149.1 29.6 17.7 38.0 ∆V (sim.) vs. ∆E*00 (CIE D65 illum.) 18.9 -11.1 152.3 30.0 18.1 38.3

Color calculations ∆E*ab (sim.) vs. ∆E*ab (CIE D65 illum.) 18.8 9.6 25.3 18.2 11.1 23.0 ∆E*cmc (sim.) vs. ∆E*cmc (CIE D65 illum.) 26.1 12.3 36.1 24.8 13.9 32.5 ∆E*94 (sim.) vs. ∆E*94 (CIE D65 illum.) 26.8 14.3 35.5 24.7 14.9 31.4 ∆E*00 (sim.) vs. ∆E*00 (CIE D65 illum.) 24.7 12.4 33.5 23.8 14.2 30.4

116

4.3 Implications of Results

The primary goal of instrumental color measurement is to achieve an accurate and reliable representation of visual color perception, yet “individuals are notorious for their variability in judging the magnitude of perceptual difference between two colored samples” (Kuehni, 2003, p. 164). In effect we are trying to make a precise measure of a very fluid property. How do we improve such a measure? For over 80 years, color researchers have made many efforts to refine the methods of evaluating color. To reduce variation in visual and instrumental color difference, researchers have standardized factors such as light source and viewing conditions; however the inter- and intra-observer variability is so large that it overwhelms other more controllable factors such as light and substrate.

Unfortunately there is very little that can be done in controlling observers. The color industry tries to limit variability by using trained observers who understand the company and the customer whom the company serves. That is, the colorist knows what level of color difference is acceptable to the target market. In this research, inter-observer reliability appeared to increase over time, but more importantly, results show that certain observers are more reliable than others. In further research, results for a group of highly reliable observers should be compared to a group of less reliable observers in order to better understand variability. In this research, there were no consequences, good or bad, for an observer to judge samples

117

as more or less different from the gray pair, but incorrect color judgments of industrial colorists can lead to negative consequences for the company and its customers. Without consistent judgments, the result may be mismatched colors in the apparel line, increased lead times, and loss of profit. The responsibility for accurate color evaluation is likely to affect reliability, as judgments might be made more carefully. Due to training and experience, industrial colorists, on average, may also be more reliable than the observers used in this research. It is possible that even when conscious of the consequences of their decisions, some particular observers will be more consistent in their judgments than others who are equally well trained and experienced. If the color industry could identify this type of observer to perform technical color evaluations, the reliability of visual judgments might be increased. Therefore, not only should applicants for industrial colorist positions be pre-screened for their color discrimination abilities, but they should also be required to do an experiment such as in this research to determine their reliability.

Performance of color difference formulae should also be assessed with a group of reliable observers. Perhaps, an improvement in observer agreement with calculated color differences will be gained.

In this research, observer accuracy and repeatability is low in comparison to other color difference studies, meanwhile agreement between average observers and calculated color differences is poor. This indicates

118

that the formulae do not represent these observers and samples very well.

However, this does not in any way lead to the conclusion that instrumental measurement is useless in practical applications. One unresolved issue lies in the fact that the colors used in this research were purposely chosen to represent areas of color space that have not been well investigated in other experiments. In addition, in this work, no distinction in color was made in analysis of visual observation data. Much more work will need to be done to compare the results of these materials with those of other research investigations. In fact this research supports the necessity of instrumental measurement, simply because observers are so variable. Instrumental measurement, on the other hand, is reliable. It performs consistently over time as long as the instrument is properly calibrated.

Not only does the instrument give reliable measurements, but results show that for both simulators, PF/3’s decreased as observer experience increased. This indicates that even though the observers are constantly varying, overall improvement in estimation of observer judgments by color difference equations is gained, although the increased estimation is poor.

Hence, the agreement between visual results and color difference equations improves in spite of the variability in visual judgments, and at least there is increased repeatability of relatively moderate color matching.

119

Another important use of instrumental measurement is in

communications and ease of product development in the global economy.

With numerical values for colors, it is easier to communicate a color and the

difference between colors to others. With the technology of virtual color samples, instrumental measurement not only facilitates communications

across the globe but shortens lead times in color development.

Since instrumental measurement is useful, variables can be controlled

in order to achieve maximum agreement. Choice of simulator has a large

effect on the variation between instrumental and visual color measurement.

Certainly colors appear different in each simulator as shown by the larger estimated difference of the standard neutral reference pair in the F7 simulator. Standardization of simulators can provide some contribution to the improvement in color difference evaluation. For research and practice, if simulators were standardized, the results from different simulators would not

be compared to one another and one source of variation would be eliminated.

Although it could not be statistically determined which simulator produced

better agreement between visual results and color difference equations,

PF/3’s for filtered tungsten are consistently smaller than those for F7. In

addition, observer accuracy and repeatability improved when viewing

samples under filtered tungsten illumination. For each repeated experiment,

filtered tungsten performs consistently if not better in terms of accuracy,

120

performance of formulae, and quality of the simulator. Therefore, filtered

tungsten is a more desirable choice for the standard simulator of daylight

conditions.

A core problem lies in the fact that filtered tungsten and F7 are

different from one another in SPD and visual results, and they are both very

different from the standard reference . So far, it has proven to

be impossible to create a physical light source which accurately simulates the

standard D65. To simplify matters, it would be in the best interest of color

technology to re-define the standard reference illuminant to the SPD of the

filtered tungsten source. Of course, this does not address the issue of store illumination, which is where customers make most of their choices, or lighting

in a home or a restaurant, or even actual daylight. In real life, there are too

many variables to control. With respect to store illumination, if retailers were

willing to make an effort to control lighting situations and display their

products in a standard daylight, at least in critical areas where consumers

make their purchasing decisions such as the dressing room, display windows,

and high traffic areas (Thiry, 2004), then the standard reference illuminant

could be changed to the SPD of filtered tungsten. However, it may be easier

to have retailers use F7 to simulate daylight. In spite of the larger variation

with respect to color difference equations and visual results of this source, it is

at least a more economical and energy efficient daylight simulator, and

consequently, more appealing to retailers than filtered tungsten. Although in

121

this research, F7 has not been shown to be a better simulator than filtered tungsten, it use would at least lead to some improvement in retail lighting conditions compared to the current range of SPD’s found in retail environments. Although, clothing is ultimately worn in many light sources, for quality control purposes, the use of a limited number of light sources would be better, according to Kuehni (Thiry, 2004). Therefore, to avoid metamerism, retailers should make an effort to reduce the number of sources by standardizing lighting. A counter argument could be raised that retailers should offer a broad range of light sources so that a consumer can match items under many possible lighting situations that could occur in their use.

This plan could be expensive, and also could deter purchases, as consumers would observer metamerism in the materials.

While in experimentation it is possible to control which samples are color constant, it is not possible in practical color matching situations and color inconstant samples abound. However, since this research focused on the lack of agreement between average visual evaluations and color difference calculations, no conclusions were reached concerning the effect of color constancy. Since color inconstancy indices for the samples and light sources used were not calculated, the inconstancy issue could not be properly addressed, although it should be pursued in further work. Also, the inconstancy that may mainly express itself under lights not included in this research, such as illuminant A or a tri-band fluorescent, was not considered.

122

This is not to say that there is no such effect, but only that due to the light sources and nature of the sample set used in this research, no effect was apparent. The sample set employed contained far too few inconstant samples and each inconstant sample’s color difference from the standard is far too large, so that there is little disagreement within and between observers. This only goes to further indicate the problem that all color researchers are grappling with - the extreme difficulty of obtaining the perfect set of color pairs. If this task were easier, substrates used in datasets upon which color difference equations are based would not vary so greatly.

This research questions the meaning of observer averages and the ability of formulae to predict them. If the inter-observer variability is small, averages are meaningful; if it is large, they are less so. Nevertheless, the important question is how well a formula predicts the average. If variability is large then there is likely to be frequent disagreement between real observers and the calculated results even if, on average, the formula has high accuracy.

For this data set no formula predicts the average well, with statistical significance. However, overall CIEDE2000 consistently has less variation with visual results in comparison to the other formulae. However, in determining the best observers for industrial color matching, since variability is so large, it may be useful to use only the “best” observer, with whom average observers would agree in color matching decisions.

123

Even though this research was unable to statistically determine which color difference equation best represents the corollary visual results, it does indicate a flaw in the manner in which researchers come to their conclusions.

In other research, PF/3’s are compared by simple ranking from best to worst with no statistical support for claiming one is better than another. Can a difference of 1 or 2 PF/3 units represent an actual difference? In this research, the addition of analytically derived confidence intervals shows that differences between calculated PF/3 values are often due to chance variation.

Either something is missing in the PF/3 measure, or color difference formulae are being changed through the addition of weighting functions, but they are not necessarily better representing visual judgments. With the addition of the constantly varying observer, it may be impossible for color difference equations to represent visual judgments with better than 60% accuracy.

As is determined in all research, more research is needed. First and foremost, this research needs to be repeated in order to gain a better understanding of the factors that are significant and to confirm the findings.

Will the light boxes perform similarly? Will another panel of observers be as variable? These are questions that must be answered. Fortunately, research at The Ohio State University is proposed to perform a retest of the same experiment with the filtered tungsten simulator.

Perhaps more insight will be gained when these results are analyzed in new ways. Another analysis that should be attempted is to separate samples

124

by color, which could lead to differences in variation of observers or simulators for different colors or even for specific samples.

Similar research with another sample set would also be beneficial.

Such a set should include more color centers throughout color space since results of this research only pertain to the samples used in this research.

Also, more inconstant samples with threshold color differences would allow determination of the effect of inconstancy on variation. In an ideal world, colors covering more areas of color space may also lead to new insights.

It is also necessary to use more D65 simulators of different manufacturers and aging of bulbs. This will yield a closer look at how simulator differences change results as well as how this impacts practical color applications since such simulators are often used in industry.

As discussed earlier, another component of future research would include a comparison of highly reliable and highly unreliable observers as well as a comparison of highly reliable observers’ visual judgments to color difference formulae to determine if variability decreases and if agreement with color difference equations is in fact gained.

125

CHAPTER 5

CONCLUSIONS

The research reported in this dissertation encompassed the

investigation of the sources of variation in visual judgment of color differences

in textiles and the consequent disparities in color differences calculated from

instrumental color measurement. The research questions and findings related

to each question are stated in the following conclusions.

5.1 Research Question 1

What is the variation within and between observers for each color

difference judgment when controlling for light and color constancy?

5.1.1 Summary of results. The first source of variation examined was

that of a group of average observers as tested in a series of color difference

evaluations. It was found that overall, observers yield highly variable results.

In fact the variation of observers is the single largest contributor to variation in

color difference evaluation. Reliability was found to be low when comparing

observers to one another, but even more stunning is the low reliability found

within individual observers. Not only do observers disagree with each other,

but, due to an observer’s ever changing color vision and other physical and

126

psychological factors, individuals do not on average agree with their own

judgments approximately 65% of the time. However, particular observers

were found to have good agreement with themselves, while others had very

poor agreement with themselves. In other words, there was a large range for

reliability scores.

Variation of observers was also analyzed with PF/3’s for observer

accuracy and repeatability. On average, PF/3 values indicates similar results

to reliability measures and accuracy and repeatability is poorer than in

previous studies. However, more repetitions of experiments were performed

in this research, so this panel of observers may not necessarily be more

variable, but the true variability may not have been captured in other research

where, at most, 2 repetitions are performed.

5.2 Research Question 2

For each color difference equation, is there a difference in accuracy of

prediction of color difference equations compared to that of a pool of

observers when examining samples under two different light sources?

5.2.1 Summary of results. Color difference equations were unable to

exactly predict average visual judgments approximately 50% of the time.

However, confidence intervals of PF/3’s indicate that, with 95% certainty, agreement between color difference equations and visual judgments ranged from 0-90%. This research questioned if prediction was better for different daylight simulators. Samples were viewed under filtered tungsten and F7

127

simulators and visual judgments were made. Through PF/3 analysis, visual results under neither simulator were significantly better predicted by color difference equations, but overall PF/3’s show less variability when the filtered tungsten simulator was used. The filtered tungsten and F7 simulators were found to be quite different from each other in PF/3 analysis and in a paired t- test of the visual results. While no statistical difference between PF/3’s of visual results and color difference equations for each simulator was found,

PF/3’s are consistently smaller when the filtered tungsten simulator was employed, indicating a reduction of variability. In addition, of those tested, no color difference equation was found to be a statistically significant better predictor of visual results, but CIEDE2000 consistently ranked better in terms of PF/3’s.

Observer variation was also assessed for each simulator with PF/3 analysis for observer accuracy and repeatability and Kappa statistics for inter- and intra-observer reliability. No difference between simulators was found in inter- and intra-observer reliability, but PF/3’s indicate that observers tend to be more accurate and more consistent with themselves when viewing samples under filtered tungsten simulated daylight than under F7 simulated daylight. Because of this result, along with better agreement between visual results and color difference equations when samples are viewed under filtered tungsten, it is suggested that filtered tungsten should be the standard simulator in color difference judgments.

128

5.3 Research Question 3

For each color difference equation, is there a difference in accuracy of

prediction of color difference when comparing color constant versus

inconstant samples?

5.3.1 Summary of results. The variation in color matching caused by

color inconstancy was investigated. Color difference equations appear to

better represent visual judgments of inconstant colors, but confidence

intervals indicate that there is no difference in the prediction of color

difference equations between constant and inconstant samples. However,

with this research, no conclusion could be reached as to the variation in

prediction caused by color inconstancy due to the small number of color

inconstant samples used, most of which had large color differences.

5.4 General Conclusions

This research was conducted to determine some of the major contributors to variation which impede the ability of color difference equations to predict visual judgments. It was found that the choice of daylight simulator does not significantly affect prediction of visual judgments by color difference

equations, but overall variation was reduced when these judgments were

made in the filtered tungsten simulator. This research leads to the conclusion

that due to a reduction in variability between visual results and color

difference equations as well as significantly less variation in observer

accuracy and repeatability, filtered tungsten should be used as the standard

129

light source to simulate daylight. However, the contribution to variation in color difference studies of using different daylight simulators is smaller than the variation contributed by the observers. Observers were found to be widely variable, and since they cannot be controlled, it is unlikely that color difference equations will be able to achieve better prediction of visual judgments than that which has already been accomplished. However the use of more reliable observers could at least lead to improvement in consistency, especially if they are experienced in color difference evaluations, since this research showed that observer accuracy increased with experience.

Nonetheless, these results do not imply that instrumental color evaluation is ineffective or unnecessary. A last implication found in this research is that

Munsell grays may not be color constant, as color difference between the standard neutral reference pair, appeared larger in the F7 simulator.

5.5 Further Research

In order to continue to explore some of the questions raised in this research, further work is suggested:

1) This experiment should be duplicated to ensure the validity of

the results and to determine if other panels of observers are as

variable.

2) Observers determined to be reliable should be used in a similar

study to determine if they are less variable and if prediction by

color difference equations is improved.

130

3) Color inconstancy indices should be calculated and inconstancy

should be determined for other light sources as well to better

investigate the effect of constancy on agreement between visual

results and calculated color differences.

4) A larger sample set needs to be developed and should include

color centers across color space and more inconstant color

samples with smaller color difference and samples covering

more of color space.

5) The results presented herein should be assessed with additional

statistical measures and colors should be separated to

determine if variation is different for different colors.

6) Another experiment should be performed using more daylight

simulators including F7 with aged bulbs and tri-band fluorescent

as well as other models of filtered tungsten and F7.

131

REFERENCES

Alman, D.H., Berns, R.S., Snyder, G.D. & Larsen, W.A. (1989). Performance testing of color-difference metrics using a color tolerance dataset. Color Research and Application, 14, 139-151.

Aspland, J.R. (1993). Chapter 15: Color, color measurement and control. Textile Chemist and Colorist, 25, 34-42.

Aspland, J.R. & Jarvis, J.P. (1986). Color tolerances, specification, and shade sorting. Textile Chemist and Colorist, 18, 27-29.

Berger-Schunn, A. (1994). Practical color measurement: A primer for the beginner, a reminder for the expert. New York: Wiley.

Berns, R.S. (1996). Industrial applications: Deriving instrumental tolerances from pass-fail and colorimetric data. Color Research and Application, 21, 459-472.

Billmeyer, F.W., Jr. & Hammond, H.K. (1990). ASTM standards on color- difference measurement. Color Research and Application, 15, 206- 209.

Billmeyer, F.W., Jr. & Saltzman, M. (1981). Principles of color technology. New York: Wiley.

Chrisment, A. (1998). Color and colorimetry. Paris: Edition 3C Conseil.

Cicchetti, D.V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6, 284-290.

Color Forum: CIE technical committee 1-29, industrial color-difference evaluation. (1991). Color Research and Application,16, 219-220.

132

Color Forum: The CIE 1931 standard colorimetric observer: Mandatory retirement at age 65? (1993). Color Research and Application, 18, 129-136.

Colorimetry, 3rd edition. (2004). CIE 15:2004. Vienna: CIE Central Bureau. CMC: Calculation of small color differences for acceptability. (1998). AATCC Test Method 173.

Cui, G., Luo, M.R., Rigg, B. & Li, W. (2001). Colour-difference evaluations Using CRT colours. Part I: Data gathering and testing colour difference formulae. Color Research and Application, 26(5), 394-402.

Cui, G., Luo, M.R., Rigg, B., Roesler, G. & Witt, K. (2002). Uniform colour spaces based on the DIN99 colour-difference formula. Color Research and Application, 24(4), 282-290.

Guan, S., & Luo, M.R. (1999a). A colour-difference formula for assessing large colour differences. Color Research and Application, 24(5), 344- 355.

Guan, S., & Luo, M.R. (1999b). Investigation of parametric effects using small colour differences. Color Research and Application, 24(5), 331- 343.

Heggie, D., Wardman, R.H. & Luo, M.R. (1996). A comparison of the colour differeces computed using the CIE94 , CMC(l:c) and the BFD(l:c) formulae. Journal of the Society of Dyers and Colourists, 112, 264-269.

Hunter, R.S. (1942). Photoelectric tristimulus colorimetry with three filters. Journal of the Optical Society of America, 32, 509-538.

Hunter, R.S. (1975). The measurement of appearance. New York: Wiley.

Huntsman, J.R. (1989). A fallacy in the definition of ∆H*. Color Research and Application, 14, 41-43.

Instrumental color measurement. (1999). AATCC Evaluation Procedure 6. Research Triangle Park, NC: AATCC.

Judd, D.B. & Wyszecki, G. (1975). Color in business, science and industry. New York: Wiley.

133

Kim, D., Cho, E.K., & Kim, J.P. (2001). Evaluation of CIELAB-based colour – difference formula using a new dataset. Color Research and Application, 26(5), 369-375.

Kránicz, B. & Schanda, J. (2000). Reevaluation of daylight spectral distributions. Color Research and Application, 25(4), 250-259.

Kuehni, R.G. (1990). Industrial color difference: Progress and problems. Color Research and Application,15, 261-265.

Kuehni, R.G. (1998a). The conundrum of supra-threshold hue differences. Color Research and Application, 23, 335-336.

Kuehni, R.G. (1998b). Hue uniformity and the CIELAB space and color difference formula. Color Research and Application, 23, 314-322.

Kuehni, R.G. (1999a). Calculation of CIELAB hue difference adjustment factors from an ideal hue circle. Color Research and Application, 24, 292-294.

Kuehni, R.G. (1999b). Hue scale adjustment derived from the Munsell system. Color Research and Application, 24, 33-37.

Kuehni, R.G. (1999c). Towards an improved uniform color space. Color Research and Application, 24, 253-265.

Kuehni, R.G. (1999d). Why CIELAB needs to be replaced for industrial color difference calculation. Textile Chemist and Colorist, 31, 11-15.

Kuehni, R.G. (2002). Communications and comments: CIEDE2000, milestone for final answer. Color Research and Application, 27(2), 126-127.

Kuehni, R.G. (2003a). Color space and its divisions. Hoboken, New Jersey: Wiley-Interscience.

Kuehni, R.G. (2003b). Colour difference formulas: Accurate enough for objective colour quality control of textiles? Coloration Technology, 119, 164-169.

Kuehni, R.G. (2004a). Communications and comments: Variability in unique hue selection: A surprising phenomenon. Color Research and Application, 29(2), 158-162.

134

Kuehni, R.G. (2004b, March). Visual color difference evaluations: Mind the conditions. Paper presented at the AATCC Color Science Symposium, Raleigh, NC.

Kuehni, R.G. (2005). Color: An introduction to practice and principles, 2nd ed. Hoboken, NJ: Wiley & Sons.

Lam, Y., & Xin, J.H. (2002). Evaluation of the quality of daylight simulators for visual assessment. Color Research and Application, 27(4), 243-251.

Lam, Y.M., Xin, J.H. & Sin, K.M. (2001). Study of the influence of various D65 simulators on color matching. Coloration Technology, 117, 251- 256.

Landis, J.R. & Koch, G.G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159-174.

Luo, M.R. (2004, March). Colour inconstancy: What does it really mean?, Paper presented at the AATCC Color Science Symposium, Raleigh, NC.

Luo, M.R., Cui, G., & Rigg, B. (2001). The development of the CIE colour- difference formula: CIEDE2000. Color Research and Application, 26(5), 340-350.

Luo, M.R., Li, C.J., Hunt, R.W,G., Rigg B., & Smith, K.J. (2003). CMC 2002 colour inconstancy index: CMCCON02. Coloration Technology, 119, 280-285.

Luo, M.R. & Rigg, B. (1987). BFD(L:c) colour-difference formula, Part II- Performance of the formula. Journal of the Society of Dyers and Colourists, 103, 126-132.

MacAdam, D.L. (1985). Color measurement: Theme and variations. Berlin: Springer-Verlag.

Melgosa, M. (2000). Testing CIELAB-based color-difference formulas. Color Research and Application, 25, 49-55.

McCamy, C.S. (1999). New metamers for assessing the visible spectra of daylight simulators and a method of evaluating them. Color Research and Application, 25(5), 322-330.

135

McDonald, R. (1980). Industrial pass/fail colour matching: Part 1- preparation of visual color matching data. Journal of the Society of Dyers and Colourists, 96, 372-376.

McDonald, R. (1988). Acceptability and perceptibility decisions using the CMC color difference formula. Textile Chemist and Colorist, 20, 31-37.

McDonald, R. (1990). European practices and philosophy in industrial colour- difference evaluation. Color Research and Application, 15, 249-260.

Morely, D.I., Munn, R. & Billmeyer, F.W., Jr. (1975). Small and moderate color differences: The Morely data. Journal of the Society of Dyers and Colourists, 91, 229-242

Perez, F., Hita, E., del Barco, L.J. & Nieves, J.L. (1999). Contribution to the experimental review of the colorimetric standard observer. Color Research and Application, 24, 377-388.

Pointer, M.R., & Attridges, G.G. (1998). The number of discernible colours. Color Research and Application, 23, 52-54.

Pridmore, R.W. & Melgosa, M. (2005). Effect of luminance of samples on color discrimination ellipses: Analysis and prediction of data. Color Research and Application, 30(3),186-197.

Qiao, Y., Berns, R.S., Reniff, L. & Montag, E. (1998). Visual determination of hue suprathreshold color-difference tolerances. Color Research and Application, 23, 302-313.

Seve, R. (1991). New formula for the computation of CIE 1976 hue difference. Color Research and Application, 16, 217-218.

Sharma, G,. Wu, W., & Dalal, E.N. (2005). The CIEDE2000 color-difference formula: Implementation notes, supplementary test data, and mathematical observations. Color Research & Application, 30(1), 21 - 30.

Standard practice for computing the colors of objects by using the CIE system (2002). ASTM E 308-01. West Conshohocken, PA: ASTM International.

Thiry, M.C. (2004). Turn on the light: The importance of lighting for textiles in a retail environment. AATCC Review, 4, 33-38.

136

Vanderhoeven, R.E. (1992). Conversion of a visual to an instrumental color matching system: An exploratory approach. Textile Chemist and Colorist, 24, 19-25.

Visual assessment of color difference of textiles. (1999). AATCC Evaluation Procedure 9. Research Triangle Park, NC: AATCC.

Wright, W.D. (1959). Color standards in commerce and industry. Journal of the Optical Society of America, 49, 384-388.

Wyszecki, G. & Fielder, G.H. (1971). New color matching ellipses. Journal of the Optical Society of America, 61, 1135-1153.

Xu, H., Luo, M.R., & Rigg, B. (2003a). Evaluation of daylight simulators. Part 1: Colorimetric and spectral variations. Coloration Technology, 119, 59-69.

Xu, H., Luo, M.R., & Rigg, B. (2003b). Evaluation of daylight simulators. Part 2: Assessment of the quality of daylight simulators using actual metameric pairs. Coloration Technology, 119, 253-263.

Xu, H., Yaguchi, H. & Shioiri, S. (2002). Correlation between visual and colorimetric scales ranging form threshold to large color difference. Color Research and Application, 27(5), 349-359.

137

APPENDIX A

138

Variability in experimental color matching conditions: Effects of daylight simulators, color inconstancy, and observers

Oral instructions to subjects

First I would like to thank you for participating in this research that is being conducted by Heather Mangine under the supervision of Dr. Kathryn Jakes, Professor at The Ohio State University.

The purpose of this research is to compare the ability of the average observer to distinguish small color differences to color difference equations. There is no risk involved in this experiment in it may benefit the textile industry by helping them to reduce time in color matching potential fabrics.

You will repeat this experiment 4 separate times, each time under two different light sources. Check WebCT for T&C 371 for appointment times.

At each time, you will be following the same procedure 1. You will be shown a set of 70 cards, which contain two color swatches. 2. Each set cards will be placed below the standard neutral pair in a viewing box, which will be turned on to standard daylight lighting conditions. 3. Then the lights in the room will be turned off. 4. For the color pair, you will need to decide which has the larger color difference, the neutral pair or the colored pair. 5. After you have made a decision, you inform me of your decision and I will indicate your decision on the questionnaire. 6. You will be shown the next color pair and repeat the process. 7. Once decisions have been made for all 70 cards, you are finished repeat the entire process in the second viewing box. 8. In total, each session will take approximately 1 hour. 9. Once you are finished, please confirm your next appointment time.

Please remember that your participation in this research is voluntary. You may refuse to answer any question, and you may withdraw your consent at anytime without penalty.

Do you have any questions? If you have any at a later date, please feel free to call Heather Mangine at 292-2108 or Dr. Jakes at 292-5518.

Now you will be given a consent form to read over and sign if you have decided to participate in the experiment.

Figure A.1: Instructions to observers.

139

CONSENT FOR PARTICIPATION IN RESEARCH

I consent to participating in research entitled “Variability in experimental color matching conditions: Effects of daylight simulators, color inconstancy, and observers.”

Dr. Kathryn Jakes, Heather Mangine, or a representative has explained the purpose of the study, the procedures to be followed, and the expected duration of my participation. Possible benefits of the study have been described, as have alternative procedures, if such procedures are applicable and available.

I acknowledge that I have had the opportunity to obtain additional information regarding the study and that any questions I have raised have been answered to my full satisfaction. Furthermore, I understand that I am free to withdraw consent at any time and to discontinue participation without prejudice to me.

Finally, I acknowledge that I have read and fully understand the consent form. I sign it freely and voluntarily.

Signed:______Date:______(participant)

Signed:______(Principal investigator or representative)

Witness:______

Figure A.2: Consent form.

140