<<

The Pennsylvania State University

The Graduate School

College of Engineering

MODELING RENDITION AND COLOR DISCRIMINATION

WITH AVERAGE FIDELITY, AVERAGE , AND GAMUT SHAPE

A Dissertation in

Architectural Engineering

by

Tony Esposito

© 2016 Tony Esposito

Submitted in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

December 2016

The dissertation of Tony Esposito was received and approved* by the following:

Kevin W. Houser Professor of Architectural Engineering Dissertation Adviser Chair of Committee

Richard G. Mistrick Associate Professor of Architectural Engineering Department of Architectural Engineering Graduate Program Officer

Stephen Treado Associate Professor of Architectural Engineering

James L. Rosenberger Professor of Statistics

M. Kevin Parfitt Interim Department Head of Architectural Engineering

* Signatures are on file with the Graduate School

ii

ABSTRACT Background Color rendering and color discrimination are complex topics and have been the subject of many investigations. Only one color rendering metric, the CIE General Color Rendering Index (Ra), is widely accepted in the industry, despite its well-documented flaws. Despite numerous attempts by the International Commission on Illumination (CIE) to update/replace the metric, they have been largely unsuccessful. The Illuminating Engineering Society (IES) has recognized the industry need for more accurate and predictive color rendition measures and recently published TM-30-15 The IES Method for Evaluating Source Color Rendition. The technical memorandum (TM-30-15) outlines a two-metric system consisting of an average fidelity metric (Rf), an average gamut metric (Rg), a Color Vector Graphic (CVG), and a suite of other metrics. Rf and Rg were designed to utilize the same set of statistically selected color samples, reference source, and uniform such that a tradeoff between them can be explicitly demonstrated. The goal of this study was to explore these tradeoffs by modeling participant responses—various subjective ratings and FM100 test performance—under systematically varied light spectra. Methodology The IES TM-30-15 Rf-Rg space was partitioned into 12 bins whose centers were the target for spectral optimization. The nominal target Rf values were 65, 75, 85, and 95. The nominal target Rg values were 80, 90, 100, 110, and 120. Two SPDs were designated at each Rf-Rg combination to have conceptually orthogonal gamut shapes; one CVG generally oriented in the direction of hue angle bin 1 (‘CB1’) and one generally oriented in the direction of hue angle Bin 7 (‘CB7’). All spectra were created to be a metameric match to 3500 K blackbody radiation and calibrated to an illuminance of 600 lux. A single viewing booth was filled with 12 familiar objects with strong memory associations and which span the hue circle, as much as practically possible. Objects were chosen to fit nominally into 6 color groups: “,” “,” “,” “,” “,” and “.” Objects were split into two categories: 1. Consumer Goods (6 objects); and 2. Natural Food (6 objects). Each of the 24 light spectra were evaluated by 20 participants. Experimentation was blocked such that 20 participants saw a randomly selected 12 of the 24 light spectra, and 20 different participants saw the other 12 light spectra. A total of 40 individuals participated in this experiment. There were 23 males and 17 females with ages ranging from 20 to 41 years and with a mean age of 26 years.

The independent variables for this experiment were Rf, Rg, and gamut shape (specified with variable CB). The dependent variables were subjective ratings of naturalness, vividness, preference, and skin preference and objective measures of color discrimination (error scores from the Farnsworth-Munsell 100 hue test). Analysis Subjective ratings Using a combination of best subset and stepwise regression analyses, the best fitting models for each of the subjective rating scales were determined. It was discovered that the CB variable (the nominal orientation of the CVG) did not provide the granularity needed for model prediction. The visual observation that most of the experimental CVGs can be approximated by an ellipse suggested that a best-fit ellipse approach may be suitable to quantify the shape of the CVG. A direct least-squares fitting of an ellipse is proposed to approximate the CVGs of the 24 experimental SPDs. The resulting best-fit ellipses can be defined by the length of their semi-major axis (a), the length of their semi-minor axis (b), and their angular rotation (ψ). The best-fit models for each of the subjective rating scales follows:

iii

2 2 NAT = 1.464 + 0.02674 Rf + 0.188 Rcs,h1 - 15.41 Rcs,h1 (r = 0.92) - 0.05305 ψ + 0.000602 Rf*ψ 2 VIV = 3.332 + 4.594 Rcs,h16 (r = 0.86) 2 2 LIKE = 1.629 + 0.02686 Rf + 3.423 Rcs,h16 - 10.01 Rcs,h16 (r = 0.86) - 0.04866 ψ + 0.000566 Rf*ψ 2 2 SKIN = 0.128 + 3.758 b + 1.161 Rcs,h16 - 8.41 Rcs,h16 (r = 0.85)

Overall, the best-fit models demonstrate strong predictive power of the subjective responses. The bolded terms above are those which take a similar form to the models recently published by Royer and others [2016]. Every best-fit model includes some parameter extracted from the color vector graphic (either Rcs,hi and/or a best-fit ellipse parameter) and an ode to red rendering. The models presented here show marked consistency with the recent results of Royer and others. Color discrimination The standard scoring software for the FM100 hue test—which is often used when administering this test—assumes the order of colored caps to be their order under CIE Illuminant C (under which they were designed). Direct application of the standard scoring software, therefore, assumes that the test light source does not transpose caps relative to their order under illuminant C. By directly applying the scoring software to an experiment which purposefully varies the illuminant, errors could be miscounted and the results distorted.

To decouple the error calculation from the , an adjusted Total Error Score (TESadj) was computed which compared participants’ responses to the correct order of colored caps under the experimental SPD (not their order under Illuminant C). Seventy percent of the experimental light sources (17 of the 24) exhibited at least one transposition relative to CIE Illuminant C. No single metric—of the main TM-30 metrics and best-fit ellipse parameters—has a higher r2 than 0.57. 2 Rg was a fairly poor predictor of TESadj (r = 0.47) and is consistent with recent studies that show average gamut indices are not strong indicators of color discrimination for LED light sources. The poor predictive power of the considered metrics prompted a post-hoc development of custom measures to predict TESadj based on two assumptions: (1) A light source which transposes the colored caps is more likely to cause difficulty discriminating between those caps (measured by the custom metric Rdt), and (2) A light source which compresses caps in color space will make it more difficult to discern between adjacent caps (i.e. smaller average hue angle difference, ∆̅̅̅ℎ̅푖). This custom approach to modeling the adjusted total error score demonstrates strong predictive ability (r2 = 0.87) and is significantly stronger than any single metric or metric combination considered in this study. Conclusions Overall, the importance of the CVG and red rendering is apparent. It is evident that even a two-measure system of color rendition cannot fully encapsulate the complexity of color rendering and this work confirms that the CVG is primary information. The proposed best-fit ellipse approach to the CVG provides the granularity necessary for model prediction and an objective measure to quantify its shape. Distilling the graphic into few parameters can simplify its specification. The results of the present study, combined with past research, suggest that it is time for researchers to abandon average fidelity and average gamut measures for the prediction of color discrimination ability of LED sources. The robust predictive power of the proposed light source transposition error score (Rdt) provides strong evidence that a more nuanced approach to predicting color discrimination is viable. With accurate methods to predict color discrimination ability, an ordinal based rating scale should be developed for ease of specification.

iv

Table of Contents

LIST OF FIGURES ...... ix LIST OF TABLES ...... xi ACKNOWLEDGEMENTS ...... xii 1. INTRODUCTION ...... 1 1.1 Problem Statement ...... 1 1.2 Objective ...... 1 1.3 General approach ...... 1 1.4 Goals and hypotheses ...... 2 1.4.1 Subjective Ratings ...... 2 1.4.2 Color Discrimination ...... 2 2. BACKGROUND ...... 3 2.1 History the CIE color rendering recommendations ...... 3 2.2 IES TM-30-15 The IES Method for Evaluating Light Source Color Rendition ...... 4 3. LITERATURE REVIEW ...... 7 3.1 Metrics ...... 7 3.1.1 Existing Color Rendering Metrics ...... 7 3.1.2 Multi-metric systems ...... 8 3.1.3 Ordinal Based Color Rendering Scales ...... 10 3.1.4 Graphical Representation ...... 11 3.2 Previous research ...... 11 3.2.1 Subjective ratings—preference ...... 11 3.2.2 Subjective ratings—naturalness ...... 14 3.2.2 Color Discrimination ...... 15 3.2.3 Royer and others [2016] ...... 17 3.3 Figures ...... 19 4. PILOT STUDY: VOCABULARY FOR DESCRIBING OBJECT ...... 20 4.1 Purpose ...... 20 4.2 Methods ...... 20 4.2.1 Image Manipulation ...... 20 4.2.2 Survey Design ...... 20 4.2.3 Participants/Distribution ...... 20 4.3 Results ...... 20 4.4 Conclusion ...... 21

v

4.5 Figures ...... 22 5. METHODOLOGY ...... 30 5.1 Apparatus ...... 30 5.1.1 Light Booth ...... 30 5.1.2 Equipment ...... 30 5.1.3 Measurement Equipment ...... 30 5.1.4 Object Selection ...... 30 5.2 Lighting Conditions ...... 31 5.2.1 Correlated (CCT) ...... 31 5.2.2 Target Illuminance ...... 31 5.2.3 Spectral Optimization ...... 31 5.3 Participants ...... 32 5.4 Variables ...... 32 5.4.1 Independent Variables ...... 32 5.4.2 Dependent Variables ...... 32 5.5 Procedure ...... 32 5.5.1 Blocking ...... 32 5.5.2 Pre-experiment preparation ...... 33 5.5.3 Calibration and measurement schedule ...... 33 5.5.4 Experimental Trials ...... 34 5.5.5 Closing Questionnaire ...... 34 5.5.6 Compensation ...... 34 5.5 Figures ...... 35 5.6 Tables ...... 42 6. RESULTS ...... 43 6.1 OVERALL Subjective ratings ...... 43 6.2 OBJECT SPECIFIC Subjective responses ...... 43 6.3 Color Discrimination ...... 43 6.2.1 Calculating error score ...... 43 6.4 Closing Questionnaire ...... 43 6.4.1 Object influence on OVERALL ratings—absolute ranking ...... 43 6.4.2 Object influence on OVERALL ratings—relative ranking ...... 44 6.4.3 Color Discrimination ...... 44 6.5 Figures ...... 45 6.6 Tables ...... 51

vi

7. ANALYSIS...... 53 7.1 Modeling the Color Vector Graphic ...... 53 7.2 Model building ...... 53 7.3 Subjective ratings—OVERALL ...... 54 7.3.1 Naturalness ...... 54 7.3.2 Vividness ...... 55 7.3.3 Preference...... 56 7.3.4 Skin Preference ...... 56 7.4 Subjective Ratings—OBJECT SPECIFIC ...... 57 7.4.1 Naturalness ...... 58 7.4.2 Vividness ...... 58 7.4.3 Preference...... 58 7.5 Color discrimination ...... 59 7.4.1 Modeling the adjusted TES—a direct approach ...... 59 7.4.2 Modeling the adjusted TES—a segmented approach ...... 60 7.4.3 Modeling the adjusted TES—a segmented approach with unifying model ...... 60 7.6 Figures ...... 62 7.7 Tables ...... 76 8. DISCUSSION ...... 79 8.1 Subjective ratings ...... 79 8.1.1 Comparison to Royer and others [2016] ...... 79 8.1.2 Is it all about the ? ...... 79 8.1.3 The Color Vector Graphic is primary information ...... 80 8.1.4 The top ranked light sources ...... 81 8.2 Color discrimination ...... 82 8.2.1 A new approach to predicting error score ...... 82 8.2.2 Towards a categorization of light source color discrimination...... 83 8.3 Color rendition and luminous efficacy ...... 84 8.4 Figures ...... 86 9. CONCLUSION ...... 97 REFERENCES ...... 99 APPENDIX A: The block effect on OVERALL subjective rating scales ...... 104 APPENDIX B: Spectra stability and uniformity ...... 108 APPENDIX C: Color Vector Graphics ...... 117 APPENDIX D: Object specific questionnaires ...... 129

vii

APPENDIX E: Mean OBJECT SPECIFIC ratings ...... 135 APPENDIX F: FM-100 hue test, correct chip arrangement ...... 138 APPENDIX G: CVG Best-fit ellipses ...... 163 APPENDIX H: Best-fit model statistics ...... 175

viii

LIST OF FIGURES Figure 3-1 Color Rendering, graphical representations ...... 19 Figure 4-1 Pilot study—Image variations ...... 22 Figure 4-2 Pilot study—Survey flow diagram ...... 23 Figure 4-3 Pilot study—Response templates for each block ...... 24 Figure 4-4 Pilot study—Participant demographics...... 25 Figure 4-5 Pilot study—Block 2 responses (saturation variations)...... 26 Figure 4-6 Pilot study—Block 2 responses (hue variations) ...... 27 Figure 4-7 Pilot study—Block 3 responses (continuous rating scales) ...... 28 Figure 4-8 Pilot study—exit question ...... 29 Figure 5-1 Experimental Apparatus ...... 35 Figure 5-2 Absolute spectral power distributions ...... 35 Figure 5-3 coordinates (CIE CAM02, a’b’) of the experimental object set...... 36 Figure 5-4 Determining the experimental CCT ...... 37 Figure 5-5 Spectral Power Distributions for the experimental spectra ...... 38 Figure 5-6 Color Vector Graphics for experimental spectra ...... 39 Figure 5-7 Identification numbers (ID) for the 24 experimental SPDs ...... 40 Figure 5-8 OVERALL questionnaire ...... 41

Figure 6-1 Summary Rf-Rg maps of mean OVERALL ratings ...... 45

Figure 6-2 Summary Rf-Rg maps of average adjusted error score...... 46

Figure 6-3 Average adjusted Total Error Score (TESadj) as a function of Rf, Rg, and CB ...... 47 Figure 6-4 FM100 error score (adjusted) as a function of FM100 error score (standard) ...... 48 Figure 6-5 Closing questionnaire—object influences ...... 49 Figure 6-6 Closing questionnaire—FM100 tray difficulty ranking...... 50

Figure 7-1 Best-fit ellipse parameters as a function of Rf and Rg...... 62 Figure 7-2 Independent variables as predictors of mean naturalness rating...... 63 Figure 7-3 Best-fit ellipse parameters as predictors of mean naturalness rating ...... 64 Figure 7-4 Independent variables as predictors of mean vividness rating ...... 65 Figure 7-5 Best-fit ellipse parameters as predictors of mean vividness rating ...... 66 Figure 7-6 Independent variables as predictors of mean preference rating ...... 67 Figure 7-7 Best-fit ellipse parameters as predictors of mean preference rating ...... 68 Figure 7-8 Independent variables as predictors of mean skin preference rating...... 69 Figure 7-9 Best-fit ellipse parameters as predictors of mean skin preference rating ...... 70 Figure 7-10 Summary of model performance—OVERALL subjective ratings ...... 71 Figure 7-11 OVERALL rating versus COMPOSITE rating ...... 72 Figure 7-12 Consumer products vs. Natural foods—OBJECT SPECIFIC subjective ratings ...... 73

Figure 7-13 Independent variables as ors of mean adjusted Total Error Score (TESadj) ...... 74

Figure 7-14 Light source transposition error score (Rdt,i) (tray-specific and total) ...... 75

ix

Figure 8-1 Light source potential to color shift vs actual shifts...... 86 Figure 8-2 Top ranked SPDs for OVERALL ratings – by scale ...... 87 Figure 8-3 Top ranked SPDs for OVERALL ratings – combined ...... 88 Figure 8-4 Top 6 light sources—FM100, tray specific ...... 89 Figure 8-5 Top 6 light sources—FM100, TES ...... 90

Figure 8-6 Light source-specific error score (Rdt) is a strong predictor of TESadj...... 91

Figure 8-7 Rdt can be used to rank-order adjusted total error score (TESadj) ...... 92 Figure 8-8 LER values for the experimental SPDs ...... 93 Figure 8-9 LER as a function of SPD ID—Top 6 sources highlighted...... 94 Figure 8-10 Select color statistics for commercially available light sources ...... 95

Figure 8-11 Rf/Rg values for 210 commercially available sources ...... 96

x

LIST OF TABLES Table 5-1 Characteristics of the 24 experimental light sources ...... 42 Table 6-1 Summary of OVERALL ratings ...... 51 Table 6-2 Summary of FM100 hue test adjusted error scores ...... 52 Table 7-1 Summary of model-included terms—OVERALL subjective ratings...... 76 Table 7-2 Various metrics as predictors of adjusted error scores ...... 77 Table 7-3 Summary of model-included terms—FM100 hue test ...... 78 Table E-1 Mean OBJECT SPECIFIC ratings for the red and orange objects...... 135 Table E-2 Mean OBJECT SPECIFIC ratings for the yellow and green objects...... 136 Table E-3 Mean OBJECT SPECIFIC ratings for the blue and purple objects...... 137

xi

ACKNOWLEDGEMENTS

I would like to thank the many people who have made this journey possible for me. It takes a village, and I am greatly indebted to those that have helped me along the way:

My advisor, Dr. Kevin Houser, whose insight, drive, and passion for our industry led me to consider graduate work in the first place. His academic guidance has helped cultivate my ability to think about light artfully, and his professional advice has helped me define the path of my career.

Dr. Richard Mistrick, for advising as part of my doctoral committee and whose instructions have helped me develop a strong technical foundation in lighting. His dedication to teaching is admirable.

My other committee members, Dr. Stephen Treado and Dr. James Rosenberger whose guidance has been an important part of my doctor work.

Jeffrey Mundinger, who assisted in the 8-week data collection process for this study. His commitment made the completion of this work possible.

Moses Ling, professor of Architectural Engineering, who supported me very early in my academic career and who motivated me to push forward when my grades were too low for admittance into the Architectural Engineering program. His dedication to students is unparalleled and without him, I’m certain my path would have been much different.

All of the staff of the Architectural Engineering department whose assistance and support have made this process much more manageable. I want to specifically recognize Deb Sam, who has always righted me when I was off course. Her assistance has been crucial to my success.

My Penn State friends, Devon Saunders, Reinhardt Swart, Andrea Wilkerson, Craig Casey, Reza Sadeghi, Sarith Subramaniam, and Yamile Rodriguez for their friendship and support.

My family, who continuously support me in all that I do. A special dedication is given to my mother, Michelle Deniz, whose love is unwavering and whose support is unbreakable. Her stern emphasis on education throughout my life has unquestionably led me here.

My partner, Hallah Elbeleidy, who continuously enriches my life. Her patience, love, and care has been indispensable throughout this process. Her passion inspires me and her intellect keeps me grounded.

xii

Light thinks that it travels faster than anything, but it is wrong. No matter how fast light travels, it finds the darkness has always got there first, and it is waiting for it. – Terry Pratchett, Reaper Man

xiii

1. INTRODUCTION Color is not an intrinsic property of an object, but rather a human psychological interpretation of light. Quantifying such a perception is a challenging task, and is the subject of many studies. The proliferation of Light Emitting Diode (LED) technology has further complicated the undertaking by exposing the inability of current indices to correctly quantify highly-structured spectra (i.e. spectra with sharp peaks and valleys) [CIE 2007]. It is important that lighting quality, of which color rendition is only one facet, not be sacrificed in the race to reduce energy consumption. Accurate metrics are needed to strongly communicate such quality, and there currently exists an industry-wide need for improved metrics.

The current industry standard, the CIE General Color Rendering Index (Ra), has well documented flaws. While many color rendering metrics (and several systems) have been proposed to update or supplant this metric, the two-metric system recently proposed by the Illuminating Engineering Society [IES 2015]

has gained notable momentum. The system details an average fidelity metric (Rf), an average gamut

metric (Rg), and a Color Vector Graphic (CVG). The system uses the latest advances in color science research and was proposed by the authors to be a standard framework for research and lighting practice.

1.1 Problem Statement Rf and Rg were designed to utilize the same set of color samples, reference source, and uniform color space such that a tradeoff between them can be explicitly demonstrated. These tradeoffs have not been thoroughly explored in the literature, and as such, design recommendations, performance thresholds, or preference suggestions have not been determined.

1.2 Objective The over-arching goal of this project is to develop an intuition/understanding of how lighting spectra of various color rendering characteristics (average fidelity, average gamut, and gamut shape) will affect the visible appreciation of chromatic objects and color discrimination ability. It is hoped that the results can be used to inform a practical system of recommendations to assists professionals in making well- informed design decisions that encapsulate multiple dimensions of color rendering. The primary intent of this study is not to validate (or invalidate) the IES TM-30-15 method in general, but instead utilized the IES method as a conceptual framework to quantify strategically varied light spectra.

1.3 General approach The general approach of this project was such to design spectra using a specialty 16- LED fixture,

with systematically varied color rendering abilities (average fidelity measured with Rf, average gamut

measured with Rg, and gamut orientation measured with variations of the CVG), present them in a single lighted booth, and have participants provide evaluations of the scene. Results were analyzed within the conceptual framework of Fidelity (ratings of “naturalness”), Preference (ratings of “like,” “vivid,” and

“skin preference”), and Discrimination (measured with the FM100 hue test). Strategic variations of Rf, Rg and CVG allowed for specific hypothesis testing.

1

1.4 Goals and hypotheses 1.4.1 Subjective Ratings

1. As Rf increases, naturalness ratings will generally increase

2. As Rg increases, preference (and skin preference) ratings will generally increase a. Precedence would suggest this will trend will be non-linear and will plateau

3. As Rg increases, vividness ratings will generally increase 4. Generally, spectra with increased red saturation will be more preferred than spectra with increased green saturation. 5. Generally, spectra with increased red saturation will be more preferred for skin rendering than spectra with increased green saturation. 6. Generally, spectra with increased red saturation will be more vivid than spectra with increased green saturation.

1.4.2 Color Discrimination

1. As Rf increases, color discrimination will increase (error scores decrease)

2. As Rg increases, color discrimination will increase (error scores decrease) a. Based on the idea that as gamut area increases, colored chips are likely to be spaced further apart and thus easier to distinguish 3. Gamut orientation will not have an effect on color discrimination

2

2. BACKGROUND 2.1 History the CIE color rendering recommendations Currently only one color rendering metric is formally recognized—the CIE General Color Rendering Index

(Ra) [CIE 1995]. It is governed by the International Commission on Illumination (CIE) which is an international society that…

“…is devoted to worldwide cooperation and the exchange of information on all matters relating to the science and art of light and lighting, colour and vision, photobiology and image technology.”1

As a professional organization governing the only universally recognized color rendering metric [CIE 1995], The CIE has

“…been accepted as representing the best authority on the subject [color rendering]...”1

The CIE proposed their first recommendation in 1948 [CIE 1948] which was based on the spectral bands work performed by Bouma in 1937 which evaluated the degree of difference—between two sources—of a large number of color samples [Bouma 1937].

In 1965, the CIE offered their 1st edition of the CIE Method of Measuring and Specifying Colour Rendering Properties of Light Sources, which concluded that it was “unlikely that a [spectral] band system… will provide a reliable method for colour rendering” and that “a Test Colour Method is regarded as the fundamental method of colour rendering appraisal” [CIE 1965, p. 11, 12]. The CIE subsequently offered a test-color method based on work performed by Nickerson and Jerome [CIE 1965;

Nickerson 1965]. This method offered a single general color rendering rating (Ra)—based on 8 test color samples—for practical light sources whose chromaticity did not differ significantly from its reference source (due to limitations with the available transforms). The 2nd edition, published in 1974, offered several revisions [CIE 1974]:

1. 6 test color samples for supplementary indices were added: a. Four of high-chroma (saturation), 1 skin tone, and 1 foliage 2. A von Kries type chromatic adaptation transform was included (widening the chromaticity tolerance between the test and reference illuminant) 3. Mathematical definitions of reference illuminants were included 4. Tolerances for reference illuminants were detailed 5. Slight changes in the calculation procedure

Since the publication of the second edition, no changes have been made to the technical recommendations of the CIE. A technical committee (TC) in the 1980’s worked on amendments to the 1974 publication, but the TC was closed without offering recommendations due to disagreements between its members [CIE 1999]. Following the close of the TC, TC1-33 Colour Rendering was formed to continue working on the issue of revising the CIE method for measuring and specifying the color rendering properties of light sources. The committee also could not reach a final conclusion and closed without producing a recommendation. In 1995, the CIE published CIE 13.3 to fix some errors, but no changes were made to the technical recommendations [CIE 1995]. In 2002 CIE TC1-62 Colour Rendering of LED Light Sources was formed to explore the validity of the CIE color rendering index based on

1 http://www.cie.co.at/index.php/LEFTMENUE/About+us

3 a series of visual experiments. The committee concluded that “…the CIE CRI is generally not applicable to predict the color rendering rank order of a set of light sources when white LED light sources are involved” [CIE 2007]. It was suggested that a new color rendering index be used to supplement the CIE CRI, though no specific measures were recommended.

In 2006, the CIE established TC1-69 Color rendition by white light sources to “investigate new methods for assessing the color rendition properties by white-light sources used for illumination, including solid- state light sources, with the goal of recommending new assessment procedures.”2 Due to disagreements between members, the committee eventually disbanded without issuing any recommendations. Two committees were formed to follow TC1-69. TC1-90 Colour Fidelity Index was formed to “To evaluate available indices based on colour fidelity for assessing the colour quality of white-light sources with a goal of recommending a single colour fidelity index for industrial use.” TC1-91 New Methods for Evaluating the Colour Quality of White-Light Sources was formed to “evaluate available new methods for evaluating the colour quality of white-light sources with a goal of recommending methods for industrial use” [CIE 2012]. TC1-90 will consider fidelity and , whereas TC1-91 will examine the preference, harmony, memory, and color discrimination indices (excluding fidelity-based indices).3

Houser and others [2013] suggest that individual metrics—which are not reconceptualization’s of the basic underlying dimensions of color rendering—will offer only marginal improvements over existing measures. This calls into question the pragmatism of asking the lighting industry to wait for the proceedings of these committees, while there is an urgent need for improved color rendering measures.

2.2 IES TM-30-15 The IES Method for Evaluating Light Source Color Rendition The Illuminating Engineering Society is a North American-based engineering society that is recognized as a “technical authority on illumination.”4 For over 100 years, the society’s objective has been to

“…communicate information on all aspects of good lighting practice to its members, to the lighting community, and to consumers, through a variety of programs, publications, and services.”2

Their mission is to

“…improve the lighted environment by bringing together those with lighting knowledge and by translating that knowledge into actions that benefit the public.”5

The IES has recognized the divide between industry needs—namely a more accurate and predictive system of color rendition—and what is currently available and easily interpretable. They have vocalized the need for improved measures of color rendition that better serve the lighting community and have formed a task group whose objective is to develop said improved measures [IES 2014]. The task group has recently proposed the Technical Memorandum TM-30-15 The IES Method for Evaluating Light Source Color Rendition [IES 2015] which details a two-metric system of color rendition consisting of an average fidelity index, an average gamut index, a suite of supplementary indices, and a Color Vector Graphic (CVG). Specific details for the main metrics are provided below:

2 http://div1.cie.co.at/?i_ca_id=549&pubid=239 3 http://www.color.org/events/colorimetry/Yaguchi-ICC_CIE_Workshop_2013_CRI.pdf 4 http://www.ies.org/about/what_is_iesna.cfm 5 http://www.ies.org/about/iesna_about_profile.cfm

4

. Rf: is a test sample fidelity metric (which measures the closeness to a reference source) that follows a similar calculation procedure of the CIE General Color Rendering Index [CIE 1995] and accounts for many of its well-documented drawbacks [Davis and Ohno 2010; DiLaura and

others 2010; Smet and others 2013]. Rf uses a more robust set of color samples than CRI, an updated chromatic adaptation formula and color space, RMS averaging, and a continuous reference illuminant. Its ranges from 0 to 100, with a value of 100 indicating perfect replication—i.e. no color shift of the 99 Color Evaluation Samples (CES)—of the reference source. The reference source varies to match the Correlated Color Temperature (CCT) of the test light source, similar to other fidelity metrics.

. Rg: is a gamut-based index (a measure of increased or decreased chroma relative to a reference source) that is calculated as 100 times the ratio of the average area spanned by the (a’,b’) coordinates of the 99 CES in the CIE CAM02 Uniform Color Space (UCS); under both the test and the reference source. It varies from traditional gamut-based indices in that it: 1) Computes an average area for the CES by consolidating the samples into 16 bins (instead of using the largest area which is spanned by the most saturated samples) and 2) Uses a floating reference source

with a CCT equal to that of the test source. The former is notable because an Rg of 100 does not indicate perfect replication of the reference source, and instead, indicates that on average, the test source does not increase or decrease object saturation (in relation to the reference

source). The latter is notable because Rg avoids the CCT bias of fixed-reference gamut indices whereby gamut area of CIE reference sources tend to increase with CCT [Thorton 1972]. The

achievable range of Rg for practical “white” light sources depends upon the sources value of Rf,

where the possible range of Rg decreases with increasing values of Rf. . CVG: the color vector graphic is a graphical representation of average hue and chroma shifts of the 99 CES. The graphic is created by splitting the (a’b’) color plane into 16 equal bins of 22.5 degrees. The CES within each bin, based on their coordinates under the reference illuminance, are averaged to create 16 average hue and chroma shifts. These shifts are overlaid on the reference illuminant (normalized into the shape of a circle) to show average hue and chroma shifts of the 99 CES relative to the reference illuminant.

The proposed system denotes the efforts of a diverse committee representing various sectors of the lighting industry; manufacturing, specification, and research. The proposed method is overall, notably different from other color rendering metrics in several ways:

1. The method uses 99 color samples which were statistically down-selected from a collection of more than 100,000 (real) spectral reflectance samples. The set of samples has two distinct features: 1. It is representative of the larger set, and 2. It is spectrally uniform (i.e. not susceptible to selective optimization or spectral “gaming”). The small set of 99 samples is advantageous for calculation and manipulation and its spectral uniformity is statistically robust.

2. Calculating reference source spectra differs from the conventional method in that it uses a mix of daylight and blackbody radiation between 4500 K and 5000 K to prevent the discontinuity at 5000 K, which occurs with the standard CIE reference illuminant calculation.

5

3. Color shifts are calculated in the CIE CAM02 Uniform Color Space (UCS) based on the CIECAM02 . These are the most recently standardized models and the CAM02-UCS is more perceptually uniform than the CIELAB color space, which it supplants. The chromatic adaptation transform intrinsic to the CIE CAME02 replaces traditional von Kries-type transforms.

4. The CIE 1931 2° color matching functions were retained for procedures standardized by the CIE.

For calculation of the Rf and Rg metrics, the TM-30-15 method utilizes the CIE 10° color matching functions which are more representative of a full field of view [CIE 2004].

Tradeoffs between Rf and Rg have not been explored and design recommendations and performance thresholds have not been determined within this framework.

6

3. LITERATURE REVIEW 3.1 Metrics 3.1.1 Existing Color Rendering Metrics There are many existing color rendering measures with varied calculation procedures, applicability, and originality. Many of the existing metrics are outlined below separated by the psychophysical dimension of color rendering they intend to quantify. These dimensions are; fidelity [Nickerson and Jerome 1965], preference [Judd 1967], and discrimination [Thorton 1972].

Fidelity Fidelity refers to a light source’s ability to render colors as they would appear under familiar reference illuminants, typically natural light sources. As such, high fidelity is often equated to the ability of a light source to render objects “naturally.” A series of proposed fidelity metrics are listed below:

1948 – CIE Spectral Bands Method [CIE 1948] 1965 – Former CIE Test Color Method [CIE 1965] 1986 – Pointer’s Index (PI) [Pointer 1986]

1995 – CIE Color Rendering Index (Ra, or “CRI” colloquially) [CIE 1995] 2008 – Full Color Index (FSCI) [Rea 2004] 2009 – Color Fidelity Index (CFI), (also “number of Rendered Colors” (Nr)) [Žukauskas 2008, 2009] 2010 – Hue Distortion Index (HDI) and Luminance Distortion Index (LDI) [Žukauskas 2009, 2010]

2010 – CQS 9.0 – Qf [Davis and Ohno, 2010] 2011 – Rank-Order Color Rendering Index (RCRI) [Bodrogi and others 2011] 2012 – Monte Carlo method [Whitehead and Mossman 2012] 2013 – CRI 2012 Color Rendering Index (CRI 2012 ) [Smet and others 2013] The bolded metric above—the CIE General Color Rendering Index—stands alone as the only formally recognized color rendering metric, despite its application to only the fidelity dimension of color rendering.

Preference Preference relates to the ability of a light source to render objects such that they appear pleasing (or preferred) and are typically related to increased object saturation relative to a reference source. Preference assessments are typically evaluated through judgements of “colorfulness,” “attractiveness,” and “likeness,” and are often linked to “saturation” and “vividness.” A series of metrics related to preference are listed below:

Color Preference Index (CPI) [Thornton 1974]

Flattery Index (Rf) [Judd 1967] Opponent-colors formulation [Worthey 1982] Feeling of Contrast (FCI 94 and FCI 02) [Hashimoto 1994, 2007] Harmony Rendering Index (HRI) [Szabo and others 2009] Color Saturation Index (CSI), together with Color-dulling Index (CDI) [Žukauskas and others 2009, 2010]

CQS 7.5 – Qp [Davis and Ohno 2010]

7

CQS 9.0 – Qg [Davis and Ohno 2010] Memory Color Rendering Index (MCRI) [Smet and others 2010] Discrimination The color discrimination ability of a light source describes its capability to reveal a large number of colors such that an observer can distinguish between them when viewed simultaneously. Several proposed metrics for predicting color discrimination are listed below:

Color Discrimination Index (CDI) [Thornton 1972] Farnsworth-Munsell Gamut (FMG) [Boyce 1977] Color Rendering Capacity (RCR 84 and RCR 93) [Xu 1984, 1993] Cone Surface Area (CSA) [Fotios 1997] Gamut Area Index (GAI) [Rea 2008] Categorical Color Rendering Index (CCRI) [Yaguchi and others 2013] Due to their conflicting optimization criteria, it is not possible to simultaneously maximize all three aspects of color rendering for a light source [Houser and others 2004, 2013; Jerome 1972; Judd 1967; Thorton 1972a]. As an example, the average person tends to prefer objects which appear more highly- saturated than they appear under natural light sources, which reduces a light sources fidelity rating (by definition). 3.1.2 Multi-metric systems It has been recognized for several decades that color rendering is multifaceted; the previous section outlined many attempts at quantifying these dimensions. The CIE CRI—despite only quantifying fidelity—performs satisfactorily for traditional light sources. The proliferation of Light Emitting Diodes (LED’s) has exposed weaknesses of the CIE Ra for quantifying highly-structured spectra [Narendran and Deng 2002; Sandor and Schanda 2006], sparking an exigent need for a new system of color rendition [CIE 2007; Davis and Ohno 2005; Ohno 2005]. Several multi-metric systems have been proposed to encapsulate the multidimensionality of color rendering. A chronology is outlined below:

Opponent-colors model [Worthey 1982] Worthey’s approach to quantifying the color rendering ability of a light source is based on the opponent- put forth by Hering in 1878 [Hering and Hurvich 1964]. Opponent-color theory states that human color perception is comprised of three (independent) perceptual channels: red- versus-green, yellow-versus-blue, and -versus-white (achromatic). Worthey proposed two parameters, 푡̂ and 푑̂, which “…express an illuminant’s ability to realize red-green and blue-yellow contrasts of objects.”

CRI-CPI [Schanda 1985] Schanda contests that the acceptability of a light source depends critically on the sources ability to render human complexion in a favorable manner. Schanda notes the superior ability of the Color

Preference Index (CPI) to quantify the preference of skin tones, an area where the CIE Ra lacks. Schanda does not propose a two-metric system, and instead, suggests a modified CIE Ra calculation procedure to include skin complexion preferences. The proposal is referred to as the combined color-rendering–color- preference index (CRI-CPI).

8

Pointers Index [Pointer 1986] The work by Pointer is based on a color appearance model of color perception which produces parameters relating to hue, chroma, and of a measured sample [Hunt 1982]. Using these parameters, Pointer calculates average hue, chroma, and lightness shits for each of the four distinct (red, yellow, green, and blue) for a total of 12 indices (4 hues x 3 parameters). These parameters can be reduced to mean hue, lightness, and chroma indices which can be further reduced in one overall index that is analogous to the CIE Ra. The method offers a total of 16 indices.

Reference-based and Volume/Gamut-based [Guo and Houser 2004] Using several statistical analyses (including ranks, correlations, composite z-scores, and factor analysis), Guo and Houser investigated the relationship between 9 different color rendering indices common in the literature. The authors concluded that a single number cannot fully encapsulate the multidimensionality of color rendering and suggest that multiple measures should be used when making lighting design decisions. The authors recommended the use of a reference-based and volume/gamut- based metric, using Pointer’s Index for supplemental information.

NICU Recommendations [Figueiro 2006] Figueiro and others have proposed a series of recommended lighting standards for Newborn Intensive Care Units (NICU). Their recommendations are as follows: 1. A minimum CRI of 80, 2. A minimum FSCI of 55, and 3. A gamut-area (GA) between 65 and 100. This recommendation adheres to Guo and Houser’s guidelines, namely that CRI is a reference-based metric and GA is a gamut (area) based metric. FSCI, which measures an SPD’s deviation from an equal-energy illuminant, is intended to be indicative of the degree to which a source contains radiation in all portions of the , and is assumed by the authors to provide meaningful information in addition to CRI and GA.

Class A designation [Freyssinier 2010; Rea 2008, 2010, 2012] Rea and Freyssinier have produced a body of work investigating the validity of a two-metric system of color rendition that includes CRI and GAI. Several human factors experiments have shown the inability of either metric to correctly rank-order participant evaluations of lighted scenes, leading the authors to propose the use of both metrics for evaluating a light source. The culmination of the work between the pair has been synthesized into a series of recommendations they call the “Class A Color Designation for Light Sources.” The Class A designation requires; 1. CRI between 80 and 100, 2. GAI between 80 and 100, and 3. Tint requirements based on work from the same authors [Rea and Freyssinier 2011].

CFI and CSI [Žukauskas 2010] Žukauskas and others have proposed a statistical approach to color rendering which uses a large number of color samples that evaluates color shift (between a test and reference source) in terms of noticeable- differences [MacAdam 1981]. The authors have developed several metrics, but two in particular—the Color Fidelity Index (CFI) and the Color-Saturation Index (CSI)—were suggested to be sufficient.

MCRI and GAI_Ra [Smet 2011] Smet and others performed a meta-analysis of thirteen color metrics by “…calculating the average correlation of the metric predictions with the visual scaling of the perceived color quality obtained in several psychophysical studies.” Their results show that MCRI (a metric proposed by the authors) is significantly better than other metrics at predicting perceived appreciation (preference). The arithmetic mean of CRI and GAI (GAI_Ra = CRI+GAI / 2) was shown to produce the best predictions of naturalness, though it lacks strong theoretical underpinnings.

9

Qa and Qg [Houser and others 2013] Houser and others performed a multidimensional scaling analysis on 22 existing color rendering measures. The analysis revealed that the metrics cluster into three distinct neighborhoods relating to fidelity, preference, and discrimination. Further evidence supports the suggestion that a single number cannot fully encapsulate all dimensions of color rendering. The authors suggest that the most amount of information can be conveyed using a reference-based metric (consistent with the concept of color fidelity) and a measure of relative gamut. Qa and Qg of CQS [Davis and Ohno 2010] were offered as suggestions.

Rf and Rg [IES 2015] In 2015 the IES released the specification of a two metrics system of color rendition [IES 2015] which contains an average fidelity metric (Rf), average gamut metric (Rg), and a color vector graphic (CVG). The system is the latest system to be proposed and incorporates the most recent color science in the field. See section 2.2 IES TM-30-15 and David and Others [2015] for specifics.

The outline of multi-metrics systems above represents several different approaches that are worth noting: 1. The quantification of color perception through parameters that describe the biological mechanism of human perception (trichromacy and opponent-channels) 2. The use of color appearance models to describe correlates of hue, chroma, and lightness. 3. Forming a recommendation system by imposing limits on (specific) existing measures (or supplementing existing metrics with others), and 4. Designing multiple metrics with the same conceptual and computational backbone (i.e. same color samples, color space, reference illuminants, etc.)

The IES TM-30-15 method, which is the main focus of the present study, falls into the latter category, though it isn’t alone. It is, however, the first (and only) two-metric system endorsed by a professional lighting organization.

3.1.3 Ordinal Based Color Rendering Scales Ordinal rating scales have potential to reduce complex color rendering information into a form that is intuitive and useful. Ordinal rating scales have been mentioned several times in the literature, but have not been substantially researched or developed. Several occurrences of ordinal based scales in the literature are outlined below:

Rank-Order Color Rendering Index (RCRI) [Bodrogi 2011] The RCRI is a number based index which is constructed on ordinal-based ratings of illuminated environments. That is, this metric does not attempt to distill color rendering information into an ordinal scale, and instead, uses semantic ratings of visual environments to define a numerical rating scale. The five-step rating scale used to rate the color-difference between two booths is: (1) excellent; (2) good; (3) acceptable; (4) not acceptable; (5) very bad. The metric itself was not any more predictive than the CIE CRI but it does have the advantage of suggesting the number of “badly-rendered” colors.

Pass/Fail Whitehead and others have suggested the idea that their method—which is based on the number of test colors that do not experience noticeable color shifts from its reference illuminant—could be

10 distilled into a Pass/Fail distinction [Whitehead 2012]. The suggestion is intriguing, though no specific procedures are offered for calculating such a rating.

Rea and others have proposed a series of recommendations that constitute what they call the “Class A Color Designation for Light Sources” [Rea 2012]. The Class A designation requires; 1. CRI between 80 and 100, 2. GAI between 80 and 100, and 3. Tint requirements based on work from the same authors [Freyssinier 2010, Rea 2011]. Though this method isn’t specifically promoted as a pass/fail system, it is predicated on the idea that light sources that meet the criteria are acceptable (Pass) and those that do not are unacceptable (Fail).

Categories Houser and others [2013] performed a multidimensional scaling analysis on 22 measures of color rendition ultimately recommending that the lighting community work to develop a two-metric system of color rendition. The authors suggest that a system be formulated such that color information “…can be simplified into grades, classes, or words that would be understood by the general public.” The suggestion to reduce color information into a simple scale would serve lighting end-users (homeowners, consumers, contractors, etc.). No method was suggested for achieving this.

Correlated color temperature (CCT) Correlated color temperature describes the color appearance of a light source along a continuum; the practical lower limit of 2700 K is the color of the familiar incandescent and the upper limit exceeds 25000 K for some phases of daylight. It’s relevance in this context is that—although it is a continuous scale—lighting manufacturers often market it as an ordinal decision between “Warm White / Cool White.” This is an attempt to convey color information to a general audience.

3.1.4 Graphical Representation Graphical representations typically illustrate color shifts (of color samples) between a test and a reference source. They easily display a large amount of data, and for a knowledgeable user, can be very informative. Žukauskas and others developed Color Quality Charts (CQC’s) (Figure 3-1, top left) to display the color rendition vectors (CRV’s) of the 8,000 color samples used in their statistical approach to color rendering [Žukauskas and others 2009]. The color rendering icon proposed by Van der Burgt and others (Figure 3-1, top right) shows the average hue and saturation shifts for each of 36 hue segments [Van der Burgt and others 2010]. The Color Saturation Icon (Figure 3-1, bottom left) was developed in support of the Color Quality Scale develop by Davis and Ohno [Davis and Ohno 2010]. In addition to indicating color shifts, it provides an indication of relative gamut area, which has been shown to correlate with average human preference [Joist-Boissard and others 2014]. The IES TM-30-15 method for evaluating light source rendition—by which this study is motivated—includes a Color Vector Graphic (CVG) (Figure 3-1, bottom right) indicating the average chromaticity shift of the CES within each of 16 hue-angle bins [IES 2015].

3.2 Previous research 3.2.1 Subjective ratings—preference The CIE General Color Rendering Index was established in 1965 [CIE 1965], but it was immediately recognized that color rendering is multi-dimensional and that the CIE CRI may not correlate well with public preference. After the publication of CIE CRI, DB Judd—among the authors of the CIE CRI—states: “…the color-rendering index penalizes any departure from the true colors of objects produced by the light sources being appraised. If a light source of low-color-rendering index was preferred for general

11

lighting to one of higher-color-rendering index, it must be that some of the distortions were such to as to flatter the object…” Based on the concept that some color shifts may be more flattering (i.e. preferred) than their true color—that is, perfect replication of color between test and reference source—Judd proposed the flattery index (Rf) [Judd 1967]. The flattery index was based on a similar concept to that of the CIE CRI, except that “…the target colors will not be the true colors computed for the standard reference source, but instead will be the preferred colors of the test samples viewed under the standard reference source” (based on the work of Newhall and others 1957 and Sanders 1959). This work, which detailed a way to calculate an index based on established preferred color shifts of objects, became the foundation for color preference. In 1972, Jerome attempted to validate the flattery index by designing a mixing experiment using a standard cool white and an experimental fluorescent lamp (“W-13”) [Jerome 1972]. The experimental lamp had approximately the same chromaticity as the cool white fluorescent and distorted the CIE CRI test color samples in the opposite direction of the cool white fluorescent lamp. In an experimental booth, approximately 50 participants were asked to create a mixture of the two lamps that produced the most desirable rendition of the objects. Jerome found that the preferred mixture of light contained 90% W-13, which suggests participants preferred vivid colors, and actually “…more vivid…than indicated for the flattery index.” While Jerome mentioned that some variation in of the experimental apparatus might have distorted the results, the work nonetheless provided early validation of the flattery index, and thus the idea that colored objects could be shifted to be more preferred than their appearance under the reference illuminant. In a separate experiment, Jerome used the same two fluorescent lamps (cool white and experimental W-13) in a mixing experiment and asked participants to adjust the mixture of the lamps to find the most desirable mixture for each of the 14 CIE CRI test color samples [Jerome 1973]. Jerome found, again, that the preferred mixture of lamps was comprised mostly of the W-13 lamp. Jerome concluded that “…there is a definite preference for the more vivid colors obtained with the W-13 lamps.” Following the work of Judd and Jerome, Thorton introduced a preference index, called the Color Preference Index (CPI), based on the pattern of preferred established by Judd [Thorton 1974]. While the flattery index assigned weighting of shifts depending on object (hue), the CPI assigns equal weight to all objects. In an attempt to validate the CPI, Thorton designed 4 light sources—by varying the phosphor blends of linear fluorescent lamps—with different “prime-color” content [Thorton 1972] and varying values of CPI. Each lamp was placed in a booth and used to illuminate real food objects. All four booths were presented simultaneously to 267 participants, one at a time. Participants were asked to rank the booths according to their preferred coloration of objects. The top ranked source had the highest CPI (a value of 120, higher than the reference illuminant) and the highest prime-color content. The results of this experiment showed that the CPI was a very strong predictor of how many participants ranked the source 1st, and provided further validation of the work performed by Judd. In 2005, Ohno used the concept of gamut area—the computed area of the polygon enclosed in a specified color space by select test color samples—to explain the desirability of the neodymium incandescent which could not be predicted by CIE CRI [Ohno 2005]. At the time of this publication, this source was gaining popularity, despite having an Ra of 77 and an R9 of 15. Because the CIE CRI doesn’t account for the direction of color shift and penalizes all departure from the reference illuminant, the metric couldn’t capture the desirable increase of chroma in the red and green hues caused by this lamp. The correlation between color preference and higher saturation (increased chroma) has been demonstrated numerous times in recent years:

12

Narendran and Deng [2002] asked participants (n = 30) to evaluate their preference of various objects (and their skin tone) under an incandescent and several LED’s. The authors found that the color-mixed LED’s (red-green-blue primaries) had the highest overall preference (for object and skin rendering) and that this could not be predicted by the CIE CRI. Among the highest preferred SPDs (“RGB_High”) had a low average fidelity (Rf = 68) and average gamut equal to 101 (Rg = 101) and 6 increased chroma in the red and red-orange hue bins (Rcs,h1 and Rcs,h2 > 0). Jost-Boissard and others [2009] evaluated ratings of attractiveness and naturalness (n = 40) for various color-mixed LED’s (and a halogen and linear fluorescent source). The results showed that the CIE CRI is a weak predictor of both ratings. The authors concluded that their white-green-red LED mix was considered by most to render the objects more attractive than the traditional sources. This LED also had the highest gamut area (measured with a modified GAI) with the highest chroma in the red and red-orange hues. Smet and others [2010a] designed a custom apparatus to uniformly illuminate 9 familiar objects of various hue and chroma. They performed an experiment in which participants (n = 32) rated how similar the rendered color was to their memory of the color of that object. The results showed that “…the chromaticity of the highest rating tended to be shifted toward higher chroma in comparison with the chromaticity calculated under D65 illumination. Islam and others [2013] performed a study (n = 60) with 21 LED spectra and 3 fluorescent lamps, of various CCT. Under different lighting conditions participants evaluated the naturalness of select objects, the colourfulness of the Macbeth Colour Checker Chart, and “...the visual conditions of the lighting booths.” The results showed that the observers preferred the SPDs “…under which the chroma and colourfulness values of the object colours were higher.” The authors mention that the CIE CRI is not a good indicator of preference and that chroma and colorfulness should be considered “as important factors” for the determination of the color preference of a light source. Jost-Boissard and others [2014] performed a similar analysis to Jost-Boissard and others [2009] and evaluated 9 color-mixed SPDs at 3000 K (n = 45) and 4000 K (n = 36). Participants were asked to evaluate the colored objects in the visual scene (a triple booth side-by-side apparatus) on scales of naturalness, attractiveness and colorfulness. The authors state that “When observers were asked to make comparisons between LEDs and halogen light, they often said that with LEDs there was an increase in colour contrast and that is why the object appeared more attractive to them, which suggests that the attractiveness of fruits and vegetables is linked to the saturation of their colours.” The authors conclude that colorfulness and attractiveness are most correlated with gamut-based indices. Wei and others [2014] performed an experiment (n = 52) that tested a typical blue-pumped LED and an experimental blue-pumped LED with diminished yellow emission (both at 3000 K). The authors found a statistically higher preference for objects under the yellow diminished LED—including red, orange, green, and wood objects—which increased the chroma of objects of most hues. They also found a statistically higher preference for skin rendering under the yellow diminished LED for Caucasian participants. Wei and others [2014a] performed three separate experiments (n = 48) which evaluated perceptual responses between a standard blue-pumped LED (CIE CRI = 85) and a -pumped LED (CIE CRI = 97). The results show that red, , and skin rendition were preferred under the violet-pumped LED.

6 These values were calculated by the present author by using a plot digitizer to record the SPD in the original paper which was analyzed using the IES TM-30-15 excel spreadsheet. Some error should be expected.

13

The authors conclude the blue-pumped LED (with a lower CRI) “…rendered warm colours with a large colour error and a desaturating shift. These differences were easily perceived and the higher fidelity and saturation of the [violet-pumped LED] was valued.” They concluded the same analysis was valid for preferred skin rendering (excluding cultural factors). Ohno and others [2015] performed color preference evaluations (n = 20) for various objects (fruits, vegetables, and skin tones) under 9 different lighting spectra with various saturation levels (and three different nominal CCT’s). The results showed that the preference was for enhanced chroma, consistent for all objects at all CCT’s. Teunissen and others [2016] evaluated attractiveness of object appearance over three separate experiments (n = 34): 1. Fresh food, 2. Packaged food, and 3. Skin tone. Seven different SPDs were used (tunable LED’s) which had varying CIE CRI and average gamut values. The authors found that “…object appearance was rated as more attractive for light sources with larger colour gamut.” They also found that the most red-enhancing spectra were the most preferred.

Wei and others [2016] conducted two experiments (n = 40) with 12 spectra (6 with Rg = 110, 5 with Rg = 120, and a high fidelity reference). The first experiment used a side-by-side booth apparatus which simulated a retail environment. The results show that all sources with enhanced gamut were preferred over the reference source and that there was no difference in preference between gamut shape. The second experiment used a single viewing booth which mimicked a restaurant setting. The results show that both gamut area (Rg) and gamut shape have an effect on participant ratings, particularly the “warm” colors. This body of research provides strong evidence for the link between increased chroma (relative to the reference source) and higher preference ratings. Recent work, however, has shown that there is a limit to the increase in chroma that will be perceived favorable. Ohno and others [2015] showed a plateau effect of saturation on preference ratings with marked consistency across several CCT’s, a wide array of colored objects, and skin tones. Wei and Houser [2016a] found a similar effect in a pilot study using a two metric system to characterize color preference. Another recent study by Royer and others [2016] shows the same plateau effect on preference and very strong predictive power using a proxy for red saturation (IES TM30 chroma shift for hue angle bin1, Rcs,h1). This study is detailed more thoroughly in section 3.2.3 Royer and others [2016]. Other researchers have shown that preference is related to saturation and fidelity [Rea and Freyssinier 2010; Lin and others 2015, Wei and Houser 2016a]. A particular study of interest by Smet and others [2010] included a series of visual experiments which asked participants to evaluate an array of objects in terms of preference, fidelity, vividness, naturalness, and attractiveness. A factor analysis grouped the descriptors into 3 separate groups: vividness, preference/attractiveness, and fidelity/naturalness. The factor loadings for the preference/attractiveness group “…suggested that they were a combination of vividness and the fidelity/naturalness descriptors.” Smet and others [2011] performed a meta-analysis examining 9 previous psychophysical experiments which evaluated user ratings of appreciation (preference/attractiveness) and naturalness. The analysis showed that the fidelity metrics were the worst predictors of visual appreciation, including CIE Ra, CQSf and Ra,cam02ucs35 (all of which were not statistically different). These studies suggest that while fidelity may be an important factor in the visual appreciation of a light source, that no studies have found fidelity metrics to be predictive of preference. 3.2.2 Subjective ratings—naturalness Because fidelity metrics are calculated using familiar reference illuminants (e.g. daylight, which is thought to be the ideal light source), fidelity is often considered to be a measure of how natural a light

14 source will render objects. Several studies have found fidelity metrics to be a predictor of naturalness ratings: Žukauskas and others [2012] performed a psychophysical experiment (n = 100) with a 4-channel LED light fixture illuminating various objects in a single-booth configuration. The authors found that the “…blends that render a highest number of colors with high fidelity have, on average, been attributed to ‘most natural’ lighting.” Jost-Boissard and others [2015]—experiment detailed briefly in previous section—found that more traditional sources (halogen and fluorescent) rendered objects statistically more natural. The analysis revealed that fidelity metrics correlate best with judgements of naturalness and conclude that naturalness is best captured by a fidelity metric. Teunissen and others [2016]—experiment detailed briefly in previous section—found “…a high correlation between fidelity indices and naturalness.” In conjunction with their finding on preference (i.e. that it can be best predicted with a gamut area index), the authors recommend the use of a two- metric system. Some researchers have not found such a strong correlation between fidelity and naturalness. Smet and others [2010] found that naturalness ratings and fidelity ratings did not provide the same rank orders, and that the fidelity rank order was more similar to that of preference and attractiveness. Despite this discrepancy, their factor analysis indicated that naturalness is most closely related to fidelity and indicate that “…the results suggest that the terms fidelity and naturalness were not considered to be identical in the observers’ minds.” The recent work of Royer and others [2016] recorded participant evaluations of normalness for 26 light spectra with systematically varied average fidelity, average gamut, and gamut shape (min/max red saturation). The results show that CIE Ra and IES Rf (both fidelity metrics) poorly predict the normalness ratings (r2 = 0.06 and 0.35, respectively). This study is detailed more thoroughly in section 3.2.3 Royer and others [2016]. 3.2.2 Color Discrimination In 1972, Thorton introduced the concept of color discrimination as an important aspect of color quality, and defined it as “the extent to which the illumination allows the observer to discriminate among a large variety of object colors simultaneously viewed” [Thorton 1972]. To quantify this ability, Thorton proposed the Color Discrimination Index (CDI), which is computed as the area enclosed (i.e. the gamut area) by eight test color samples of the CIE CRI calculation [CIE 1995] in the CIE 1960 UCS, scaled so that the CIE Illuminant C has a score of 100. Thorton proposed using this gamut area as a way to predict color discrimination and as a way to distinguish light sources with the similar color rendering abilities (i.e. CIE CRI) and compare color discrimination between sources with different CCT’s. While there are several computational limitations to Thorton’s proposal—non-uniform color space and a limited number of color samples—his suggested link between gamut area and color discrimination has remained at the forefront of discussion over the past several decades.

The body of work from Boyce [1976] and Boyce and Simmons [1977] explored the effect of SPD, illuminance, age, and participant experience on color discrimination (measured by the FM100 hue test [Farnsworth 1957]). These studies utilized commercially available sources (namely fluorescent and discharge lamps) at which time the tri-band fluorescent was a novel light source. The results showed that the CIE CRI and CDI (referred to as “CIE Gamut Area”) are approximate predictors of error score (of the FM100 hue test) and that above 300 lux, lamp type is a much more “…important factor in

15 determining performance on a colour discrimination task than is illuminance.”7 Error scores for three age groups tested—less than (or equal to) 30 years of age, greater than 30 but less than 50 years of age, and greater than (or equal to) 50 years of age—were statistically different, where the old age group produced significantly higher mean error scores than the other two groups. Participant experience was not statistically significant. While these studies suggested the (moderate)8 strength of CRI and CDI in the prediction of color discrimination, the sources they tested were not representative of the highly structured SPDs common with LED’s today. In fact, the authors noted that the tri-band fluorescent lamp, likely the most structured source they tested, produced more errors “…than would be expected from its CRI and gamut area.”

In 2007, Mahler and others [2007] performed a study of the effect of light source on color discrimination using 4 LED sources and a custom color discrimination test (which they call “Cercle 32”). The authors found that the LED sources produced significantly more errors than the control source (incandescent), particularly in the greenish-blue and purpleish-red hues. The authors found that CIE CRI’s correlated well with errors for the light sources tested (all of which were LED) and stressed that increasing the chroma of samples does not “…imply the improvement of colour discrimination.” They suggest that actually, the color discrimination ability of “…RGB LED illumination is reduced precisely for the falsely saturated colors.” The results of this study suggest that for (RGB) LED sources, it is not necessarily true that increased gamut results in increased color discrimination ability—as was originally suggested by Thorton.

Rea and Freyssinier [2008] performed a study which analyzed a total of 8 light sources, grouped into 2 nominal CCT categories—“warm white” (with a CCT range of 900 K) and “cool white” (with a CCT range of 1700 K)—with two separate illuminance levels (5 and 50 fc). The study consisted of three experiments: one which considered only the warm white sources; one with only the cool white sources; and one with a mixture of both. In terms of color discrimination, the authors conclude only that error scores are consistently lower at the higher illuminance (the lower illuminance which was much lower than the recommended illuminance for administering this test) and that GAI is a better, and more consistent, predictor than CRI. Overall, the authors endorse the use of GAI in addition to CRI and recommend a CRI criterion greater than 80 (CIE Ra > 80) and a GAI between 80 and 100 (80 < GAI < 100). The authors do not, however, address that their recommendation eliminates all but 2 of their experimental light sources and do not provide justification for why such a cut off for discrimination would be appropriate.

Royer and others [2011] investigated the effect of four specific sources on color discrimination—an RGB LED with peaks near Thortons prime regions [Thorton 1971], two linear fluorescents, and an incandescent—all at a CCT of 2700 K. Thorton theorized that a light source with peaks in the prime regions (450, 530, and 610nm, respectively) would have superior color discrimination ability. The RGB LED in this study had peaks at 452, 530, and 610nm, respectively and resulted in statistically worse color discrimination ability (measure by the FM100 hue test) than the other 3 sources. Overall, the authors found that CRI, CDI, and FM Gamut all fail to predict (or correctly rank order) the four experimental SPDs

7 Several caveats were listed. See Boyce and Simmons 1977. 8 The authors used descriptions such as “approximate predictors” and in the 1976 study by Boyce, he concludes that neither CRI nor CDI were “…completely accurate predictors of the performance under different lamps.”

16 and conclude that gamut area measures “…are not accurate predictors of color discrimination capability when highly structured SPDs are included.”

The Rea and others study [2008] and the Royer and others study [2011] both specifically mention not considering the interaction between the SPD and the SRD of the FM100 caps. The latter study specially mentions that light sources may cause juxtaposed caps, which may be easier to distinguish, but are recorded as errors.

A small-scale experiment by Wei [2011] examined the color discrimination ability between two linear fluorescent lamps—the SPX (3000 K) and a Reveal® source—for participants under the age of 25. Wei found that there was a significant difference for the red-green partial error score, no statistical difference for the blue-yellow partial error score, and no statistical difference for the total error score. The results could not be predicted by CRI, CDI, or FM Gamut area—because these metrics would predict different performance—and the authors mention consistency with the results of Royer and other [2011]. A similar experiment by Wei and others [2012] use these same sources for participants older than 60 years of age. The authors found a significant effect between participants who did and did not have cataract surgery, but overall found a similar effect as the previous study by Wei [2011]. The authors found that CRI, CDI, FM Gamut, CQS, CQS-Qf, CQS-Qg, FCI 94, CRI-CAM02UCS, nor HRI were able to characterize the color discrimination capability of the two sources. The authors did, however, find that the results could be explained by the opponent channel responses of the two sources and that “the expansion and shift of the gamut areas . . . were found to be able to provide some useful information about colour discrimination ability.”

Overall, the results of these studies suggest the need for a much more direct, nuanced approach to the quantification of the color discrimination capability of a light source.

3.2.3 Royer and others [2016] The present study is similar enough to a recent article by Royer and others [2016] that a thorough review is warranted. The authors systematically varied average fidelity (Rf), average gamut (Rg), and gamut shape (by maximizing and minimizing the chroma shift in hue angle bin 1 (Rcs,h1) of the IES TM-30- 15 color vector graphic). Participant ratings of normalness, saturation, and preference were recorded (n = 28) for 26 lighting spectra presented in a full-scale room with various chromatic (and achromatic) objects intended to represent a general (unidentifiable) lighting application. The results showed that fidelity metrics were poor predictors of normalness ratings. Linear regression 2 models for Rf, Ra, and Qf were all poor fits with r < 0.34 for all three. Results were similar for average 2 gamut measures where linear regression models for Rg, GAI, and Qg were all poor fits with r ≤ 0.32 for all three. A third-order polynomial was fit using the gamut metrics—based on the concept that infinitely increasing gamut should not be expected to produce normal conditions—which slightly increased the fit (r2 = 0.54 for all three). Using the metrics available, the authors found a fairly strong fit to the data with Rf and a second-order polynomial fit of the chroma shift in hue angle bin 1 (Rcs,h1), which is nominally red.

2 The results show that saturation ratings are highly correlated with Rg (r = 0.76) and very highly correlated with the chroma shift in hue angle bin 16 (r2 = 0.95). Best subset analyses generally showed good correlation between saturation ratings and the nominally red and red-orange hue angle bins, all of which are highly correlated. Increasing the chroma in one of these bins almost always “…dictated increased saturation in the adjacent bins.” Nominally red rendering appears to be a strong indicator of saturation ratings.

17

The results show that Rf and Rcs,h1 are significant predictors of preference ratings (p < 0.001 for both). Rg is also a significant factor, which supports their hypothesis that preference will increase with increased gamut, but also suggests that Rg is not an ideal predictor of preference because the results show a strong tie to red rendering. The authors find that the best model—which is a very strong fit (r2 = 0.94) contains Rf and a second-order polynomial fit of Rcs,h16. The authors suggest that inclusion of the Rf term in this model serves to “…mitigate the effect of oversaturating.” In a closing questionnaire the authors asked participants to rank the top 3 color categories that most influenced their judgements. The most frequently ranked in the top 3 was red (74%), orange (67%), and green (63%). While the importance of red rendering is apparent in this experiment and the authors appropriately conclude that the results do not necessarily suggest that participants were more sensitive to red hues; instead they suggest participants were responding to the hues which had “…the greatest change in rendition when illuminated by this range of sources.” Overall, the work by Royer and others is one of few studies in the literature that systematically varies average fidelity, average gamut, and gamut shape—for subjective ratings—over such a wide range of values. The current study also systematically varied average fidelity (Rf), average gamut (Rg), and gamut shape. The gamut shapes in this study generally exhibit more variation than the Royer study—likely due to the increased flexibility of the 16 channel fixture used—and stronger gamut contrasts were achieved. The investigation by Royer and others did not include observer ratings of skin preference or tests of color discrimination. As such, the current study can be considered among the first to collect ratings of skin preference and color discrimination ability under many light spectra which systematically vary in average fidelity, average gamut, and gamut shape, over a wide range of values.

18

3.3 Figures

Figure 3-1 Color Rendering, graphical representations. This figure shows various graphical representations that have appeared in the literature. (Top left) Color Quality Chart [Žukauskas 2009] (Top right) Color Rendering Icon [van der Burgt 2010] (Bottom left) Color Saturation Icon [Davis 2010] (Bottom right) Color Distortion Icon [IES 2015].

19

4. PILOT STUDY: VOCABULARY FOR DESCRIBING OBJECT COLORS This experiment was reviewed and accepted by the Penn State University Institutional Review Board on August 4, 2015 (STUDY00003016). 4.1 Purpose The main goal of this pilot study was to help determine the best anchor words for the continuous rating scales used in the main experiment. The desire was to minimize variation in the data due to misunderstanding of vocabulary and use anchor words that best captured the visual color changes.

4.2 Methods 4.2.1 Image Manipulation A base image was chosen which contains several real fruit objects of various colors (red, orange, yellow, green, , and purple) (Figure 4-1, center). A total of 8 variations of the image were created: two with increased saturation (S +35 and S +70), two with decreased saturation (S -35 and S -70), two hue- shifted towards green (H +15 and H +35), and two hue-shifted towards red (H -15 and H -35). Including the original image, participants saw a total of 9 different images (Figure 4-1).

4.2.2 Survey Design The survey was split into 5 major sections: (1) The introduction, which provided background information related to the study and collected basic demographic information (age and gender); (2) Block 1, which asked for a single-descriptor, open-ended response for each image; (3) Block 2, which asked participants to select descriptors from a predefined list of words (select all that apply); (4) Block 3, which asked participants to provide ratings for each image along several rating scales; and (5) A series of closing questions (Figure 4-2 and Figure 4-3). Before each of the three blocks, the original image was shown for reference, which was then followed by the 9 images (original and 8 variations), presented in random order. The order of the images in each block was randomized for each participant.

4.2.3 Participants/Distribution The survey was designed in Qualtrics: The World's Leading Research & Insights Platform and administered via email within the department of Architectural Engineering at Penn State University. The survey was open for 10 days, during which time 27 (complete) responses were recorded. Incomplete surveys were not analyzed. One participant was under the age of 18 and their responses were discarded. Of the 26 total participants, 7 were male, 19 were female, and 57.7% (15 respondents) were between the ages of 18 and 21 (Figure 4-4). Eighty-five percent of respondents were students in the department.

4.3 Results The results from Block 1 (open ended response) did not provide substantial findings. The most repeated responses were for H +35 (where 20 people described the image as “green/greenish”), H -35 (where 14 people described the image as “purple/purpleish”) and the original image (which was labeled “normal” 7 times).

The results of Block 2 (select all that apply from a presented list of words) are shown in Figure 4-5 and Figure 4-6. Saturation variations (compared to the original) are shown in Figure 4-5. The most frequent responses for the images with increased saturation (S +35 and S +70) were vibrant, vivid, and exaggerated. The most frequent responses for images with decreased saturation (S -35 and S -70) were dull and desaturated. The original image was most frequently described as natural, normal, and

20 appropriate. Hue variations (compared to the original) are shown in Figure 4-6. The most hue-shifted images (H -35 and H +35) were most commonly described as unnatural (followed by inaccurate, fake, and wrong). H -15, which is slightly hue shifted towards red, was very highly rated normal and natural.

The results of Block 3 (continuous scales) are shown in Figure 4-7. The images with increased saturation (S +35 and S +70) were both rated the most vivid, but mostly unnatural. The images with decreased saturation (S -35 and S -70) were rated the least vivid. The images at the extremes of the hue and saturation scale (H +35, H -35, S +70, and S -70) were generally rated the least preferred and the least natural. The original image was rated the most natural, the most attractive, and the most preferred. The second most attractive and preferred image was H -15 (slight hue shifted towards red).

In the exit survey, participants were asked to name (as many as desired) the objects in the image that most strongly influenced their judgements. Figure 4-8 shows the number of times each object was mentioned. The most frequently cited objects were the watermelon, tomato, purple cabbage, and red pepper (respectively). Three of the four objects were nominally red and suggest the importance of red rendering in participant evaluations. 4.4 Conclusion The results suggest that when respondents could detect dramatic changes in the color of the objects in the image, which often occurred at the extreme hue shifted and saturation conditions, the participants were likely to apply the descriptor unnatural. The original image was mostly described as natural and normal, and was rated with these descriptors at a higher frequency than any of the other image variations. As a result of this work, natural and unnatural were chosen as the anchor words for the naturalness scale.

The results show that respondents could detect differences in saturation, despite the images being shown in isolation, and the descriptors vivid, vibrant, dull, and desaturated were chosen at a high frequency to describe these changes. As a result, vivid and desaturated were chosen as the anchor words for the vividness scale.

21

4.5 Figures

Figure 4-1 Pilot study—Image variations. Eight total image variations were created from a base image (center) which was chosen because of the presence of natural fruit objects and a wide range of colors. Image variations were created in Adobe Photoshop and the image tags (i.e. S +35) refer to the photo adjustment applied in the software. Two images have increased saturation relative to the original (S +35 and S +70), two have decreased saturation (S -35 and S-70), two are hue-shifted towards green (H +15 and H +35), and two are hue-shifted toward red (H -15 and H -35). Each participant saw all 9 of these images in random order.

22

Figure 4-2 Pilot study—Survey flow diagram. The survey was split into 5 major sections. The first, Introduction, provided basic background information related to the study and collected simple demographic information, include age, gender, and profession. Block 1 presented all 9 images to the participant, in random order, and asked for a single open-ended descriptor of the image. Block 2 also presented all 9 images to the participant, in random order, but asked the participant to select all applicable descriptors from a predefined list of words. Block 3 also presented all 9 images to the participant, in random order, but asked for ratings along 4 continuous scales. Before each block, the original image was shown for reference. At the end, each participant was asked several closing questions.

23

Distorted Dull Natural Vivid

Unnatural Calm Normal Vibrant

Shifted Desaturated Accurate Rich

Inaccurate Flat Precise Saturated

False Hazy Standard Cartoonish

Fake Grayed out True Exaggerated

Wrong Subdued Appropriate Enhanced

Off Washed out Familiar Brilliant

Figure 4-3 Pilot study—Response templates for each block. (Top) Block 1: single-descriptor open-ended response. (Middle) Block 2: list of descriptors participant was permitted to choose from (select all that apply). (Bottom) Block 3: continuous rating scales (participants could click within the range to record a response). Participants were also permitted to select the radio button on the right, which would suggest they thought the descriptor wasn’t appropriate. No participants selected this.

24

20 16

15 12

10 8

5 4

7 19 15 5 4 0 0 Male Female 18 - 21 22 - 29 30 - 39 40 - 49 50+

Figure 4-4 Pilot study—Participant demographics. Twenty-seven complete surveys were recorded. One participant was under the age of 18 and their responses were discarded. (Left) Gender. Seven participants identified as male and 19 identified as female. (Right) Age. Majority of the participants were in the age range of 18 to 21 years of age (57.7%). All respondents had an affiliation with the Penn State Department of Architectural Engineering—the department email list was used for distribution of the survey—and 87% of the respondents were students.

25

Figure 4-5 Pilot study—Block 2 responses (saturation variations). The original image was mostly described as natural and appropriate. The descriptors dull and desaturated were most frequently used for the images with decreased saturation (S -35 and S -70). The descriptors vibrant and vivid were most frequently used for the images with increased saturation (S +35 and S +70).

26

Figure 4-6 Pilot study—Block 2 responses (hue variations). The original image was mostly described as natural and appropriate. The descriptors unnatural and inaccurate were most frequently used for H +15 and H +35 (hue- shifted towards green) and for H -35 (hue shifted towards red). H -15 (slightly hue-shifted towards red) was rated exceptionally natural and normal.

27

Natural 88 Vivid 74 Original Attractive 89 Preferred 88

Natural 20 Vivid 41 H+35 Attractive 13 Preferred 11

Natural 49 Vivid 53 H+15 Attractive 43 Preferred 42

Natural 70 Vivid 71 H-15 Attractive 71 Preferred 69

Natural 18 Vivid 71 H-35 Attractive 27 Preferred 14

Natural 28 Vivid 7 S-70 Attractive 11 Preferred 9

Natural 63 Vivid 30 S-35 Attractive 40 Preferred 36

Natural 61 Vivid 93 S+35 Attractive 67 Preferred 60

Natural 40 Vivid 93 S+70 Attractive 54 Preferred 48

0 20 40 60 80 100

Figure 4-7 Pilot study—Block 3 responses (continuous rating scales). The original image was rated highly on all scales and was rated the most natural, attractive, and preferred of all the images. The images with increased saturation (S +35 and S +70) were rated the most vivid, but were rated low for the natural scale. The images with decreased saturation (S -35 and S -70) were rated low on the vivid and preference scales. The highly hue- shifted images (H -35 and H +35) were rated low on the natural, attractive, and preference scales.

28

Figure 4-8 Pilot study—exit question. In the exit portion of the survey participants were asked to mention which objects (as many as desired) most strongly influenced their judgements. The numbers indicate how many times each object was mentioned. The most frequently mentioned objects were the watermelon, tomato, purple cabbage, and red pepper, respectively. Three of the top four objects are nominally red. No object of a nominally different color was mentioned with any significance.

29

5. METHODOLOGY This experiment was reviewed and accepted by the Penn State University Institutional Review Board on November 5, 2015 (STUDY00003519). 5.1 Apparatus 5.1.1 Light Booth A viewing booth with nominal dimensions of 0.81 m (width) × 0.41 m (depth) × 1.04 m (height) was used as shown in Figure 5-1 (left). The interior of the booth was painted with Behr Premium Ultra Paint and Primer One®, a matte white paint with a relatively flat reflectance distribution across the visible spectrum. A chin rest was mounted in the center of the opening of the booth so viewing angle was consistent across all participants. A was mounted on the back wall of the booth for skin evaluations. The room containing the booth was kept dark during all experimental trials. A small reading light set to the side of the booth and never visible to participants was used by the researcher for logistics.

5.1.2 Lighting Equipment Lighting spectra were created using the TeleLumen Light Replicator (TELELUMEN, Saratoga, CA, USA) which is a 16-channel, spectrally tunable LED luminaire (Figure 5-2). The luminaire was controlled via software from a laptop connected to the fixture by an Ethernet cable. The fixture itself was placed at the top of the viewing booth and suspended over a circular aperture in the top surface of the booth. Dimming was performed using a combination of Rosco® diffusion filters of varying transmittance placed over the circular opening below the luminaire.9

5.1.3 Measurement Equipment Spectral measurements (SPDs and SRDs) were measured using a calibrated PR-655 SpectraScan spectroradiometer (Photo Research Inc., Cary, NC, USA) and a diffuse reflectance standard (SRT-MS-100, ρ = 99%) (Labsphere North America, North Sutton, NH, USA). Illuminance measurements were taken with a Minolta T-10 illuminance meter (KONICA MINOLTA, Ramsey, NJ, USA) (Figure 5-1, right).

5.1.4 Object Selection The goal of object selection was to reduce the amount of objects, insofar as possible, so it would be practical to ask targeted questions about each object and eliminate the guesswork in determining which objects influenced a participant’s judgements. Twelve familiar objects with strong memory associations were chosen which span the hue circle (as much as practically possible) and that fit nominally into the categories of “Red,” “Orange,” “Yellow,” “Green,” “Blue,” and “Purple”. Objects were split into two categories—1. Consumer Goods (Figure 5-1 left, back row) and 2. Real fruit (Figure 5-1 left, front row)—to represent both manufactured and natural objects. See Figure 5-3 for (a’, b’) chromaticity coordinates for all objects. Spectral measurements for the Consumer Goods were taken on a representative matte area of the object, which is also where participants were directed to focus their

9 Block 1 of experimentation used a mechanical iris for dimming. Part way though experimental trials in Block 1 it was noticed that the iris caused visibly noticeable color differences between the left and right sides of the booth, likely due to unequal occlusion of the individual LED chips in the fixture. In subsequent blocks the dimming mechanism was changed to the use of Rosco® diffusion filters which helped tighten the color uniformity across the booth. An analysis of participant responses between Block 1 and Block 3 (a replication of Block 1) showed no statistical difference. See APPENDIX A: The block effect on OVERALL subjective rating scales for the full statistical analysis. See section 5.5.1 Blocking for description of blocking methods.

30 attention when making their judgements. Rf and Rg are very highly correlated (Pearsons r = 0.908 and 0.981, respectively and p < 0.000 for both) between the 99 CES in the IES TM-30-15 calculation procedure and the objects used for experimentation.

5.2 Lighting Conditions 5.2.1 Correlated Color Temperature (CCT) Correlated Color Temperature was held constant in this experiment to eliminate (insofar as possible) its effect on the results. To choose a CCT, a MATLAB simulation was performed which computed 30 million linear combinations of the 16 Telelumen channels:

ퟏퟔ 푺푷푫 = ∑풊=ퟏ 풓ퟏ푪ퟏ + 풓ퟐ푪ퟐ + ⋯ + 풓ퟏퟔ푪ퟏퟔ (Equation 5-1)

Where, 푟푘 is equal to a random number between 0 (0% output) and 1 (100% output), and 퐶푘 is equal to the power of each channel across the visible spectrum

Of the 30 million SPDs generated this way, all spectra which fell within ±25 K of 4 common CCT’s (3000, 3200, 3500, and 4000 K) and ±0.02 Duv of the Blackbody locus were retained. As CCT increases, more spectra are retained (Figure 5-4, left), indicating a general trend for linear combinations of the spectra of the TeleLumen channels to favor higher CCT’s. Ultimately, 3500 K was chosen as the best tradeoff between the ability of the TeleLumen to produce spectra and prevalence of the CCT in lighting practice (i.e. 4000 K is generally less common in design, in North America, than 3500 K).

5.2.2 Target Illuminance Without calibration, each of the 24 spectra had a different initial illuminance (measured on the booth floor). SPD 1 (65|120|1) had the lowest initial illuminance (approximately 690 lux) and was the restraining spectra for the experiment. An illuminance of 600 lux was chosen as the target and is sufficiently above the threshold for photopic vision [IES 2011].

5.2.3 Spectral Optimization The Rf-Rg Space—defined as the triangular area in the two-dimensional Cartesian plane created by the

IES TM-30-15 fidelity metric (Rf) and gamut metric (Rg) for approximately white light sources—was partitioned into 12 bins whose centers were the target for spectral optimization. The nominal target Rf values were 65, 75, 85, and 95. The nominal target Rg values were 80, 90, 100, 110, and 120.

A custom Excel optimizer was used to design light spectra—by varying weighting functions of 12 of the 10 16 TeleLumen channels --to match the 12 target Rf-Rg combinations (Figure 5-5). Two SPDs were designated at each Rf-Rg combination to have conceptually orthogonal gamut shapes (Figure 5-6). This variable is referred to as CB and can take on a value of 1 (i.e. the CVG generally oriented in the direction of CVG Hue bin 1) or 7 (i.e. the CVG generally oriented in the direction of Hue Bin 7).11 CB is a proxy for opposing red saturation and is intended to distinguish between SPDs with the same average fidelity and

10 The two channels at either end of the visible spectrum had such a small effect on the composite SPD (and thus the visual experience) that they were removed from simulation to simplify the calibration process. 11 Orthogonality was not perfectly achieved at each Rf-Rg combination so it is important to intellectualize this variable in regards to the intended conceptual differences when the light spectra were designed. CB7, for example, does not indicate specifically what is happening in hue bin 7, nor does it indicate what effect the light stimulus has in any other hue bin. Refer to the CVG graphics in Figure 5-6 for specific information.

31 average gamut. The two opposing CVGs can be understood as a deliberate subset of CVGs from the many that exist at any given Rf-Rg combination. See APPENDIX C: Color Vector Graphics for a full summary of CVGs. SPDs are numbered (for easy referencing) according to the ID numbers in Figure 5-7.

5.3 Participants Forty Participants were recruited from various departments within the university, including; Architectural Engineering, Architecture, Civil Engineering, Computer Science, Ecology, Education, Geography, and Industrial Engineering. None had any particular knowledge in architectural lighting. Participants were split 23 males and 17 females with ages ranging from 20 to 41 years and a mean age of 26 years (25 for males and 27 for females). Participants came from 12 unique countries and spoke several unique languages. Seventy percent of the participants were graduate students or professionals. No participants had any abnormal vision deficiencies and none had any detectable form of (Ishihara 24 Plate).

5.4 Variables 5.4.1 Independent Variables The independent variables in this experiment are Rf, Rg, and CB. These three factors were systematically varied. All spectra were designed to be a metameric match to blackbody radiation at 3500 K and to provide an illuminance of 600 lux on the floor of the viewing booth. Actual measurements showed the 12 following ranges: Rf ± 2.0, Rg ± 1.3, CCT ± 25 K, Duv ± 0.001, and E ± 1.3 Fc. 5.4.2 Dependent Variables The dependent variables in this experiment are OVERALL participant ratings among several scales: “preference” (dislike (0)–like (5)), “naturalness” (unnatural (0)–natural (5)), “vividness” (desaturated (0)– vivid (5)), and “skin preference” (dislike (0)–like (5)) (Figure 5-8). Scales were continuous, exactly 5.0 inches long, and each response was measured to a precision of 1/32 inches with a value of 0 at the left anchor and 5 at the right anchor. Additionally, participants rated each object, under all 12 lighting conditions in their block, along the preference, naturalness, and vividness scales (See APPENDIX D: Object specific questionnaires). A total of 19,200 measurements (ratings) were made ((12 objects * 3 scales + 4 Overall Scales) * 20 people * 24 SPDs). Color discrimination was recorded for each SPD using error scores from the Farnsworth-Munsell 100 hue test.

5.5 Procedure 5.5.1 Blocking Preliminary calibrations of the TeleLumen Light Replicator showed that spectra could not be switched— on, off, or between spectra—while holding the calibrated chromaticity within an acceptable tolerance. The solution was to present a single spectrum per day, because recalibration between scenes, which took up to 4 hours, was not feasible. With calibration time, measurement time, and 30 minutes to complete a session for a single participant, 10 experimental sessions could be completed in a single day.

To complete 480 experimental sessions (24 spectra x 20 participants/spectra), allow 4 hours for calibration, and complete 10 experimental trials per day, 40 total participants were each asked to

12 Ranges are shown for measurements taken across all days, at the beginning of experimental trials, at the center of the viewing booth floor (directly under the fixture). Thorough measurements were taken at the beginning, middle, and end of experimentation on all days at three locations within the booth: booth left, center, and booth right. For complete data see APPENDIX B: Spectra stability and uniformity.

32 commit to 13 consecutive days of trials (1 day of pre-experiment preparation and 12 days of data collection). Consequently, data collection was split into 4 blocks: Block 1 was 12 randomly selected spectra (of the 24), 6 of each were CB1 and 6 were CB7; Block 2 was the 12 spectra not evaluated in the Block 1; Block 3 was a replication of Block 1; and Block 4 was a replication of Block 2. Spectra within each block were presented in a random order. Data collection took a total of 8 weeks to complete (2 weeks per block).

5.5.2 Pre-experiment preparation Upon arrival on the first day of experimentation each participant completed the informed consent form, a demographics survey, and the Ishihara 24 Plate test for color deficiency [Ishihara 1972]. After paperwork was completed participants were instructed to put on a black lab coat and sit with their chin in the chin rest of the apparatus. The researcher read orienting materials out loud about the purpose and procedure of the study. Participants were then shown 3 of the extreme lighting conditions to help bookend their responses: Condition 12 (Rf = 95 | Rg = 100 | CB = 1), Condition 1 (65|120|1), and Condition 17 (65|80|7).

Participants were then asked to close their eyes while the lighting scenario was switched to the first scene and the illuminance was calibrated (with a Wybron 87250 Eclipse IT Iris 1K Dowser mechanical iris). Participants adapted to the lighting environment for 2 minutes. Condition 12 was always shown first and described as “true, accurate, and natural.”

Participants were then asked to answer a series of questionnaires to familiarize themselves with the forms. The first questionnaire asked participants to rate their OVERALL impression of the colored objects in the lighted scenes along 4 scales: “likeness,” “naturalness,” “vividness,” and “skin preference” (Figure 5-8). Participants were told to consider their hands and face (in the mirror) when providing the rating of their skin. The second questionnaire was a packet of 6 pages asking the participant to provide OBJECT SPECIFIC ratings (2 objects and 6 scales per page) along the “likeness,” “naturalness,” and “vividness” scales. This process was repeated for the two remaining lighting scenarios; their order was counterbalanced across the experiment. Condition 1 was described as “saturated, vibrant, and vivid.” Condition 17 was described as “muted, dull, and desaturated.”

When introducing participants to the rating scales, naturalness and vividness were defined as:

“Naturalness is defined as existing in nature. In the context of this environment, naturalness refers to how similarly the object would look under daylight. That is, if objects appear as you would expect them to under daylight, they would be considered natural.” “A vivid color is defined as intensely deep or rich. In the context of this environment, vivid refers to objects that look richer and more colorful than they would look under daylight.” Each session took approximately 30 minutes.

5.5.3 Calibration and measurement schedule One spectrum was presented per day and it was fully calibrated (chromaticity and illuminance) over the course of 4 hours. Calibration followed the followed a four-step procedure: (1) The fixture was turned on in the morning and left for approximately 1 hour to stabilize; (2) After 1 hour, Rosco filters were added to achieve the target illuminance of 600 lux. The chromaticity was immediately recalibrated since the filters always caused a large shift in chromaticity (due to differential transmittance of wavelengths);

33

(3) The fixture was left for another hour to stabilize; and (4) After 1 hour the chromaticity was checked for stability and recalibrated as necessary. Shortly after calibration was completed, the first participant was invited inside and the researcher began the first experimental trial (which began at 12pm on all days). To verify stability, measurements were also performed in the middle—after approximately 5 experimental sessions—and end of the experimental trials, every day. At each point in time (beginning, middle, and end) measurements were performed at 3 locations within the booth: booth left, center, and booth right. See APPENDIX B: Spectra stability and uniformity for all measurement data.

5.5.4 Experimental Trials Upon entering the lab on days following the pre-experiment calibration session, participants were instructed to wear a black lab coat and proceed directly to the viewing booth. A two-minute timer was set to ensure substantial chromatic adaptation. After 2 minutes, the participant was given the first questionnaire (OVERALL ratings). After completion they were given the second questionnaire (OBJECT SPECIFIC ratings). After all questionnaires were checked to ensure no missing data the participant was instructed to roll slightly to the side of the booth. The chin rest and objects were removed from the booth, and one tray of the Farnsworth-Munsell 100 hue test was administered. Participants were instructed to spend at least two minutes with the tray, though they were permitted to take as much additional time as needed. This process was repeated for all 4 trays in the hue test. The trays were administered in a random order. With completion of the 4 trays of the FM100 test the participant completed their daily session. Each session took approximately 20–30 minutes.

Real fruit objects were replaced approximately every 3–4 days. Objects which visibly decayed more quickly than others—the blueberries and the onion, for example—were spot-replaced as necessary.

5.5.5 Closing Questionnaire Each participant was asked to complete an exit survey on the last day of their experimental trials. All participants were asked to response to the following prompts:

1. Please reflect upon how each object contributed to your overall judgements during this experiment (7-point Likert scale from “Had no influence” to “Very strong influence”) 2. Please order the 12 objects from most important (1) to least important (12). 3. Please explain your general strategy for the arrangement task 4. Please order the trays from 1 (most difficult) to 4 (least difficult)13

Additionally, several other questions were asked, at the researcher’s discretion, which sometimes varied across participants. These questions were intended to gain further insight into participants’ answering strategies. Participants were not privy to this.

5.5.6 Compensation For participation in this experiment, each participant had the potential to earn $200. A compensation structure was implemented to encourage participants to complete all twelve of their experimental sessions to avoid missing data. At the end of the first week, participants were automatically paid $50. At the end of the second week, participants were automatically paid an additional $50. If, at the end of the experiment, a participant had completed all 12 sessions, they were awarded another $100. All participants completed all 12 of their experimental sessions and there was no missing data.

13 A colored image of the 4 Farnsworth-Munsell 100 Hue Test trays accompanied this question.

34

5.5 Figures

Figure 5-1 Experimental Apparatus. (Left) Front view of the experimental apparatus showing the chin rest, fixed mirror, and object layout. (Right) Placement of measurement devices.

0.0040

0.0035

0.0030

0.0025

0.0020

0.0015 Power[W/nm] 0.0010

0.0005

0.0000 380 430 480 530 580 630 680 730 780 Wavelength [nm]

Figure 5-2 Absolute spectral power distributions for the 16 channels in the TeleLumen Light Replicator. The two channels at either end of the visible spectrum (shown dashed) had very little visual effect in the booth and thus were eliminated from spectral optimization to simplify calibration.

35

Figure 5-3 Chromaticity coordinates (CIE CAM02, a’b’) of the experimental object set. (Top left) Chromaticity coordinates for objects in the Consumer Goods category. (Top right) Chromaticity coordinates for objects in the Natural Foods category. (Bottom) The number of experimental objects in each quadrant of the a’b’ section- plane of the CIE CAM02 UCS. Quadrant I (hue angle bins 1, 2, 3, and 4) contains the most objects.

36

180 160 140 120 100 80 60 Countthousands) (in 40 20 0 3000 3200 3500 4000 Correlated Color Temperature [K]

Figure 5-4 Determining the experimental CCT. A MATLAB simulation was performed to help determine the experimental CCT. The simulation calculated 30 million random linear combinations of the 16 TeleLumen channels and retained the SPDs which fell within ±0.02 Duv of the blackbody locus and ±25 K of 4 common nominal CCT’s: 3000, 3200, 3500, and 4000 K. (Left) The number of SPDs retained at each nominal CCT. (Right) the Rf and Rg values for the retained SPDs coded by nominal CCT. A CCT of 3500 K was chosen for the experimental SPDs because it is more common in design practice than 4000 K and the simulation suggested that a relatively large number of spectra could be produced at 3500 K.

37

Rf 65 75 85 95 Rf = 65 Rf = 75 Rf = 85 Rf = 95 Rg

CB1 SPD's

CB7 SPD's Rg = 120120 (SPD's are normalized to Y = 100)

380 480 580 680 780

Rg = 110110

380 480 580 680 780 380 480 580 680 780 380 480 580 680 780

Rg = 100100

380 480 580 680 780 380 480 580 680 780 380 480 580 680 780 380 480 580 680 780

Rg = 9090

380 480 580 680 780 380 480 580 680 780 380 480 580 680 780

Rg = 8080

380 480 580 680 780

Figure 5-5 Spectral Power Distributions for the experimental spectra. All SPDs are scaled to have a Y Tristimulus value equal to 100 to facilitate meaningful visual comparison. SPDs shown in red are CB1 SPDs and those in green are CB7 SPDs. While the experimental spectra are not particularly discontinuous, the amount of peaks and valleys are notable.

38

Rf Rf =65 65 Rf =75 75 Rf =85 85 Rf 95= 95 Rg

Rg = 120120

Rg = 110110

CB7 CB1

Rg = 100100

Rg =90 90 Y Y Y

Rg =80 80

Figure 5-6 Color Vector Graphics for experimental spectra. Graphics are arranged according to their nominal Rf and Rg value. CVGs with a solid line are for the CVGs generally oriented in the direction of hue angle bin 1 (CB1). CVGs with a dashed line are for CVGs generally oriented in the direction of hue angle bin 7 (CB7). Note that the SPDs with the same nominal Rf and Rg share a single graphic (i.e. CB1 and CB7 SPDs are shown together).

39

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

Figure 5-7 Identification numbers (ID) for the 24 experimental SPDs. SPDs are numbered 1 through 12 for CB1 SPDs and 13 through 24 for CB7 SPDs. Summary information for these 24 SPDs—based on these ID numbers— can be seen in Table 5-1.

40

Figure 5-8 OVERALL questionnaire. The above questionnaire was used to measure participants’ OVERALL impression of the colored objects. Each scale was exactly 5 inches long and responses were measured to a precision of 1/32 inches. Note that the scales in this image are not-to-scale. Participants were instructed to mark the scale with a vertical line to increase the precision of hand-measurements.

41

5.6 Tables

Table 5-1 Characteristics of the 24 experimental light sources.

IES TM-30-15 CIE Ellipse Parameters

ID CB Block LER* CCT (K) Duv Rf Rg Rf,skin Ra R9 a b ψ 1 1 1 (3) 196 3498 0.0000 66 120 75 58 -82 1.20 0.99 -15 2 1 1 (3) 243 3503 0.0002 64 110 68 53 -90 1.20 0.92 7 3 1 2 (4) 244 3502 0.0000 66 100 68 60 -64 1.18 0.86 15 4 1 1 (3) 251 3503 0.0000 65 90 66 69 -16 1.15 0.80 22 5 1 2 (4) 282 3496 -0.0002 65 81 80 75 64 1.07 0.77 33 6 1 1 (3) 257 3502 0.0000 75 110 75 64 -53 1.16 0.95 5 7 1 2 (4) 256 3499 0.0000 75 99 79 73 3 1.12 0.89 18 8 1 1 (3) 257 3500 0.0000 76 91 78 84 38 1.08 0.86 27 9 1 2 (4) 247 3502 -0.0001 85 109 84 79 -17 1.10 0.99 -4 10 1 1 (3) 247 3501 0.0001 85 100 84 83 3 1.07 0.93 17 11 1 2 (4) 253 3499 0.0000 86 90 91 88 91 1.01 0.89 41 12 1 2 (4) 298 3501 0.0000 96 100 96 97 98 1.01 0.99 27 13 7 2 (4) 268 3500 -0.0001 65 119 80 67 -16 1.22 0.97 -42 14 7 1 (3) 308 3503 -0.0001 66 109 67 83 50 1.19 0.91 -58 15 7 2 (4) 311 3501 -0.0001 65 99 71 70 17 1.18 0.84 -64 16 7 2 (4) 302 3499 -0.0002 66 90 66 67 -46 1.11 0.81 -72 17 7 1 (3) 341 3499 -0.0003 66 80 63 58 -139 1.05 0.76 -86 18 7 2 (4) 346 3503 0.0000 75 110 79 86 53 1.16 0.94 -58 19 7 1 (3) 300 3502 0.0000 75 99 75 77 23 1.12 0.88 -66 20 7 2 (4) 343 3502 0.0000 75 90 74 73 -78 1.06 0.84 -81 21 7 2 (4) 329 3496 -0.0001 86 109 92 88 94 1.10 0.99 -48 22 7 1 (3) 324 3503 0.0000 85 99 82 88 12 1.07 0.93 -70 23 7 1 (3) 324 3498 0.0000 83 91 82 81 -24 1.03 0.89 -88 24 7 1 (3) 306 3501 0.0000 95 101 96 97 79 1.02 0.99 -75

* Luminous Efficacy of Radiation. Calculated as the ratio of luminous flux to radiant flux. LER was calculated using the IES TM-30-15 worksheet [IES 2015a].

42

6. RESULTS 6.1 OVERALL Subjective ratings Average OVERALL ratings—for the preference, naturalness, vividness, and skin preference scales—are mapped in Figure 6-1 as a function of Rf, Rg, and CB. Each number is an average of 20 participant ratings. 6.2 OBJECT SPECIFIC Subjective responses Average OBJECT SPECIFIC ratings—for the preference, naturalness, and vividness scales—are summarized in Table E-1, Table E-2, and Table E-3 (APPENDIX E: Mean OBJECT SPECIFIC ratings). Again, each number is an average of 20 participant responses. 6.3 Color Discrimination 6.2.1 Calculating error score The error scores for each tray can be computed using the Farnsworth-Munsell 100 Hue Test Scoring Software which accompanies the physical test. The score for any individual cap “is the sum of the difference between the number of that cap and the numbers of the caps adjacent to it” minus 2 [Farnsworth 1957]. For example, the arrangement 29-30-31-32 has an error score of zero, and 29-31-30- 32 has an error score of 4. This is the standard scoring method and has been used by past researchers [Rea and others 2008, Royer and other 2013]. Because the FM-100 hue test is primarily a color discrimination test whose main purpose is to identify human color deficiency, the standard scoring software assumes the correct order of chips to be their order under a standard reference light—the CIE Standard Illuminant C, by design—which is fixed when testing a single participants’ . By directly applying the scoring software which assumes a fixed reference illuminant to an experiment which purposefully varies the reference source, errors could be miscounted and the results distorted. To decouple the error calculation from the standard illuminant, a custom error calculation software was created which compares participants’ responses to the correct order of chips under the experimental SPD (APPENDIX F: FM-100 hue test, correct chip arrangement)—which very likely differs from their order under CIE Illuminant C—and calculates an adjusted total error score (TESadj). For example, if a participant arranged caps 29-31-30-32, and this was the correct order when compared to the order of caps under the SPD being tested, no error would be attributed to this participant. A map of the tray- specific adjusted error scores is shown in Figure 6-2. A map of the adjusted total error score (TESadj) is shown in Figure 6-3. The relationship between the standard FM100 error scores and the adjusted error scores is shown in Figure 6-4. Note that the error scores for tray A and tray D are perfectly correlated because the experimental SPDs never caused transpositions in these two trays. 6.4 Closing Questionnaire 6.4.1 Object influence on OVERALL ratings—absolute ranking First, participants were asked to “Please reflect upon how each object contributed to your overall judgements during this experiment (7-point Likert scale from “Had no influence” (1) to “Very strong influence” (7)). Figure 6-5 (top) shows the average response for each object. The top 3 most influential objects were P2 (red onion), O2 (Orange), and R2 (red apple), all of which are nominally red/orange and are located in IES TM-30-15 hue angle bin 1, 3, and 2, respectively. This is consistent with the finding of Rea and Freyssinier [2010], Wei and others [2014, 2016], and Royer and others [2016]. In all (but one) pairs within a single color category (i.e. R1 vs. R2) the natural food object was rated higher than the consumer good counterpart. This supports the claim by Wei and others [2016] that “natural objects were generally given higher importance.” The only exception is the “blue” category, where the

43 blueberries were given a lower importance than the Pepsi can. This may be related to the blueberries’ very low chroma, and thus the very small change in chroma over the course of the experiment. In an open-ended question about rating strategies, one participant commented the blueberries “didn’t seem to get more or less vivid”, while another commented the blueberries “were the hardest to judge, by far.”

6.4.2 Object influence on OVERALL ratings—relative ranking Second, participants were asked to “Please order the 12 objects from most important (1) to least important (12).” One person answered the question incorrectly and their response was excluded. Figure 6-5 (bottom) shows the number of times each object was listed in the top 3. The results are consistent with the absolute rankings where the top three are the same objects in the same order (onion, orange, and red apple, respectively).

6.4.3 Color Discrimination In the closing questionnaire, participants were asked to order the 4 FM100 hue test trays from 1, most difficult, to 4, least difficult, to perform. The results are shown in Figure 6-6. Tray B was most frequently ranked the hardest (Rank 1, 18 participants). Tray C follows Tray B as most difficult (Rank 1, 12 participants) and is most frequently ranked the second hardest tray (Rank 2, 13 participants). Tray A was most frequently ranked third (Rank 3, 15 participants) and Tray D was most frequently ranked fourth (Rank 4, 12 participants). The results suggest that participants found Tray B the hardest to complete, Tray C the second hardest, and Tray A and D the easiest (both approximately equal). This order of difficulty is predicted by the average number of transpositions each tray experienced across the experimental SPDs. That is, the caps in Tray B experienced the most transpositions (on average), Tray C experienced the second most, and the caps of Tray A and D were never transposed. These results suggests that participants may have experienced increased difficulty in correctly ordering chips in the trays that experienced transpositions.

44

6.5 Figures

Rf 65 75 85 95 65 75 85 95 Rg

Overall "PREFERENCE" Rating Overall "NATURALNESS" Rating 120 4.0 4.3 2.4 3.7 "Dislike - Like" (Scale 0 - 5) "Unnatural - Natural" (Scale 0 - 5)

110 3.5 3.9 3.9 4.3 4.1 4.1 2.3 3.8 3.0 3.9 3.6 3.8

100 3.2 3.9 3.9 3.6 4.1 3.7 4.2 3.7 2.6 3.8 3.3 3.6 3.5 3.7 4.0 3.6

90 3.2 3.9 3.7 3.5 3.7 3.2 2.6 3.9 3.1 3.3 3.8 3.8

80 2.4 2.5 CB1 CB7 2.7 2.7

Overall "VIVIDNESS" Rating Overall "SKIN PREFERENCE" Rating 120 4.3 3.8 3.6 3.8 "Desaturated - Vivid" (Scale 0 - 5) "Dislike - Like" (Scale 0 - 5)

110 4.2 3.5 4.0 3.8 4.2 3.7 3.5 3.8 3.5 3.8 3.9 4.0

100 4.0 3.2 3.6 2.8 3.7 2.9 3.4 3.1 3.4 3.1 3.6 3.1 3.7 3.3 3.9 3.6

90 3.2 3.1 3.3 2.9 3.0 2.6 3.2 2.9 3.3 3.3 3.6 2.8

80 2.7 1.7 2.5 2.0

Figure 6-1 Summary Rf-Rg maps of mean OVERALL ratings. (Top left) Mean OVERALL preference rating as a function of Rf, Rg, and CB (CB1 on the left, CB7 on the right). (Top right) Mean OVERALL naturalness rating as a function of Rf, Rg, and CB. (Bottom left) Mean OVERALL vividness rating as a function of Rf, Rg, and CB. (Bottom right) Mean OVERALL skin preference rating as a function of Rf, Rg, and CB. Each cell is an average of 20 participant ratings. Color is shown to highlight trends and is not linked to statistics. All four graphs are scaled to the same values (from 0 to 5).

45

Rf 65 75 85 95 65 75 85 95 Rg

Farnsworth-Munsell 100 Hue Test Farnsworth-Munsell 100 Hue Test 120 2.8 2.8 35.4 47.2 Average error score - Tray A Average error score - Tray B

110 5.0 2.4 3.0 2.4 4.2 2.6 18.8 31.4 17.2 23.8 25.8 25.0

100 6.6 3.8 4.4 3.0 3.8 3.6 3.0 2.8 14.2 31.0 11.6 19.6 17.8 10.8 9.6 7.6

90 7.2 3.2 6.4 1.8 6.8 4.8 11.6 19.0 9.8 9.2 4.4 14.6

80 7.4 3.6 4.2 6.8 CB1 CB7

Farnsworth-Munsell 100 Hue Test Farnsworth-Munsell 100 Hue Test 120 9.8 7.2 8.8 4.6 Average error score - Tray C Average error score - Tray D

110 10.0 10.4 11.0 7.0 10.8 5.8 4.4 6.0 2.4 3.2 4.6 2.8

100 12.0 6.2 9.8 5.2 13.8 10.2 7.0 7.6 3.4 6.2 4.2 4.6 3.6 4.2 1.6 2.2

90 22.0 6.8 19.8 7.0 12.8 11.2 2.2 5.6 2.2 3.2 5.8 6.2

80 22.2 8.2 5.2 5.8

Figure 6-2 Summary Rf-Rg maps of mean adjusted error score. (Top left) Average error score for tray A as a function of Rf, Rg, and CB (CB1 on the left, CB7 on the right). (Top right) Average error score for tray B as a function of Rf, Rg, and CB. (Bottom left) Average error score for tray C as a function of Rf, Rg, and CB. (Bottom right) Average error score for tray D as a function of Rf, Rg, and CB. Each cell is an average of 20 participant ratings. Color is shown to highlight trends and is not linked to statistics. All four graphs are scaled to the same values (from 0 to 5).

46

Rf 65 75 85 95 Rg

Farnsworth-Munsell 100 Hue Test 120 56.8 61.8 Average Total Error Score (TES)

110 38.2 50.2 33.6 36.4 45.4 36.2

100 36.2 47.2 30 32.4 39 28.8 21.2 20.2

90 43 34.6 38.2 21.2 29.8 36.8

80 39 24.4 CB1 CB7

Figure 6-3 Average adjusted Total Error Score (TESadj) as a function of Rf, Rg, and CB. The total error score is calculated as the sum of the individual tray-specific error scores. Each cell is an average of 20 participant ratings. Color is shown to highlight trends and is not linked to statistics.

47

8 50 Tray A Tray B 7 R² = 1 40 R² = 0.8736

6

Adjusted Adjusted

- - 5 30 4 3 20 2 10

1

FM100 ErrorScore FM100 ErrorScore 0 0 0 1 2 3 4 5 6 7 8 0 10 20 30 40 FM100 Error Score - Standard FM100 Error Score - Standard 30 10 Tray C Tray D

R² = 0.9648 8 R² = 1

Adjusted Adjusted - - 20 6

4 10

2

FM100 ErrorScore FM100 ErrorScore 0 0 0 10 20 30 0 2 4 6 8 10 FM100 Error Score - Standard FM100 Error Score - Standard 70 Total 60 R² = 0.841

Adjusted 50 - 40

30

20

10 FM100 ErrorScore 0 0 20 40 60 FM100 Error Score - Standard

Figure 6-4 FM100 error score (adjusted) as a function of FM100 error score (standard). The standard error score is calculated directly with the error calculation software provided with the physical FM-100 hue test. The adjusted error score was calculated with a custom error calculation tool which accounts for the correct order of chips under the test source. (Top left) Error scores for Tray A. (Top Right) Error Scores for Tray B. (Middle left) Error Scores for Tray C. (Middle right) Error Scores for Tray D. (Bottom) The Total Error Score calculated as the sum of the tray-specific error scores. Note that the chips in tray A and tray D were never transposed under any of 24 experimental light sources and thus the standard and adjusted error scores are perfectly correlated.

48

7.0

Most 5.9 6.0 5.5 5.0 4.8 5.0 4.5 4.7 3.4 4.2 4.1 4.0 3.7 3.6 3.1 3.0

2.0 ColaCan

Coca

Red Apple Red Can Orange Crush Orange Bottle Sprite Can Green Apple Can Pepsi Blueberries Crush Grape Onion Average Contribution Rating Contribution Average

Least 1.0 R1 R2 O1 O2 Y1 Y2 G1 G2 B1 B2 P1 P2 Object 40 35 Number of times ranked in top 3 30 25 20

Count 15

10 ColaCan

5

Coca

Red Apple Red Can Orange Crush Orange Bottle Mustard Lemon Sprite Can Green Apple Can Pepsi Blueberries Crush Grape Onion 0 R1 R2 O1 O2 Y1 Y2 G1 G2 B1 B2 P1 P2 Object

Figure 6-5 Closing questionnaire—object influences. (Top) Average response, per object, to the prompt “Please reflect upon how each object contributed to your overall judgements during this experiment (7-point Likert scale from ‘Had no influence’ (1) to ‘Very strong influence’ (7)).” The top three influential objects were the onion, orange, and red apple, respectively. (Bottom) The number of times each object was ranked in the top 3 when prompted to “Please order the 12 objects from most important (1) to least important (12).” The onion, orange, and red apple were rated as the most influential objects, respectively, for both questions.

49

20 18 18 16 15 14 13 12 12 12 12 11 10 10 9

Count 8 8 8 7 6 6 5

6 4

B B B 4 B

2

TrayA TrayA TrayA TrayA

TrayC Tray TrayD Tray TrayC TrayD Tray TrayC TrayD Tray TrayC TrayD 0 Rank 1 Rank 2 Rank 3 Rank 4

Figure 6-6 Closing questionnaire—FM100 tray difficulty ranking. At the end of a participants’ 12 experimental sessions, they were given a closing questionnaire. They were prompted to “Please order the trays from 1 (most difficult) to 4 (least difficult).” The results show that Tray B was rated the hardest to complete, Tray C the second hardest, and Tray A and D are roughly third and fourth. This ranking is predicted by the average number of transpositions that each tray experienced under the experimental light sources. That is, Tray B experienced the most transposition of caps, Tray C the second most, and Tray A and D experienced no transpositions.

50

6.6 Tables

Table 6-1 Summary of OVERALL ratings. Summary statistics are shown for each of the four OVERALL scales (preference, naturalness, vividness, and skin preference). Each statistic is from the 20 responses under each SPD.

Q1: Dislike - Like Q2: Unnatural - Natural Q3: Desaturated - Vivid Q4: Dislike - Like (Skin) ID Min Mean Max SD Min Mean Max SD Min Mean Max SD Min Mean Max SD

1 1.0 4.0 4.9 1.1 0.6 2.4 4.8 1.4 3.3 4.3 4.9 0.5 1.2 3.6 4.9 1.2 2 0.4 3.5 4.9 1.4 0.9 2.3 4.4 1.1 2.8 4.2 5.0 0.8 0.8 3.5 4.9 1.3 3 0.8 3.2 4.9 1.4 0.3 2.6 4.9 1.5 2.0 4.0 4.9 0.8 0.8 3.4 5.0 1.2 4 0.8 3.2 4.9 1.3 0.8 2.6 4.6 1.1 0.2 3.2 4.9 1.2 1.2 3.2 4.8 1.2 5 0.3 2.4 4.9 1.4 0.4 2.7 4.9 1.2 0.4 2.7 4.9 1.3 0.4 2.5 5.0 1.4 6 1.8 3.9 5.0 0.9 1.0 3.0 5.0 1.2 2.6 4.0 5.0 0.7 0.3 3.5 5.0 1.4 7 3.0 3.9 5.0 0.6 0.6 3.3 5.0 1.2 1.8 3.6 4.9 1.0 2.1 3.6 5.0 0.9 8 0.9 3.7 4.9 1.1 0.4 3.1 4.9 1.2 1.6 3.3 4.9 1.1 1.0 3.3 4.9 1.4 9 2.4 4.1 5.0 0.8 0.9 3.6 5.0 1.3 3.3 4.2 5.0 0.6 1.9 3.9 5.0 0.9 10 1.5 4.1 4.9 0.9 1.5 3.5 4.9 1.1 1.8 3.7 4.9 0.8 0.8 3.7 5.0 1.2 11 1.8 3.7 5.0 1.0 1.9 3.8 5.0 1.0 1.2 3.0 5.0 1.2 1.8 3.6 4.8 0.9 12 2.8 4.2 5.0 0.6 2.3 4.0 5.0 0.8 1.7 3.4 4.9 0.9 2.4 3.9 4.9 0.7 13 3.2 4.3 5.0 0.5 0.6 3.7 4.9 1.2 2.4 3.8 5.0 0.9 0.5 3.8 5.0 1.1 14 2.3 3.9 5.0 0.7 1.8 3.8 5.0 0.9 2.3 3.5 4.6 0.8 1.4 3.8 5.0 0.9 15 2.2 3.9 5.0 0.9 1.0 3.8 5.0 1.2 1.3 3.2 4.8 1.1 0.4 3.1 4.9 1.3 16 2.1 3.9 5.0 0.9 1.9 3.9 5.0 0.9 0.3 3.1 5.0 1.3 0.8 2.9 4.8 1.1 17 0.3 2.5 4.3 1.3 0.4 2.7 4.8 1.2 0.4 1.7 4.2 1.1 0.4 2.0 4.0 1.1 18 3.3 4.3 5.0 0.6 1.8 3.9 5.0 0.8 2.3 3.8 4.8 0.7 2.3 3.8 5.0 0.8 19 1.7 3.6 4.8 0.9 1.3 3.6 4.9 1.2 0.6 2.8 4.7 1.0 0.9 3.1 4.8 1.2 20 1.2 3.5 5.0 1.2 1.3 3.3 4.8 1.3 0.3 2.9 4.8 1.2 1.1 3.3 5.0 1.0 21 3.1 4.1 5.0 0.5 0.8 3.8 5.0 0.9 2.4 3.7 4.9 0.6 1.6 4.0 5.0 0.8 22 1.6 3.7 4.9 1.0 1.2 3.7 5.0 1.1 1.0 2.9 4.8 1.0 1.3 3.3 4.9 1.1 23 2.2 3.2 4.9 0.9 2.1 3.8 4.9 0.8 1.2 2.6 4.3 0.8 1.1 2.8 4.8 1.2 24 1.4 3.7 5.0 1.2 1.7 3.6 5.0 1.1 0.7 3.1 4.7 1.2 1.7 3.6 5.0 1.1

51

Table 6-2 Summary of FM100 hue test adjusted error scores. Each statistic is from the 20 responses under each SPD.

Tray A Tray B Tray C Tray D TES ID Min Mean Max SD Min Mean Max SD Min Mean Max SD Min Mean Max SD MinMean Max SD 1 0 2.8 12 3.7 24 35.4 48 6.4 0 9.8 24 7.6 0 8.8 20 6.9 32 56.8 96 14.3 2 0 5.0 36 8.3 8 18.8 40 7.5 0 10.0 24 7.0 0 4.4 16 4.7 12 38.2 84 19.1 3 0 6.6 28 7.5 4 14.2 36 8.5 0 12.0 44 10.2 0 3.4 20 5.2 12 36.2 80 22.7 4 0 7.2 24 6.8 0 11.6 24 6.1 8 22.0 44 9.8 0 2.2 8 3.5 20 43.0 76 17.3 5 0 7.6 36 7.9 0 4.2 16 4.6 8 22.2 60 14.1 0 5.2 20 6.0 12 39.2 112 24.3 6 0 3.0 16 4.7 4 17.2 36 8.4 0 11.0 28 7.7 0 2.4 12 3.5 4 33.6 64 16.3 7 0 4.4 12 3.4 4 11.6 24 6.7 0 9.8 24 8.0 0 4.2 12 4.4 4 30.0 68 17.9 8 0 6.4 24 6.8 4 9.8 16 4.6 4 19.8 36 8.5 0 2.2 8 2.4 8 38.2 68 15.4 9 0 4.2 16 3.8 12 25.8 36 7.9 0 10.8 48 12.3 0 4.6 24 5.5 20 45.4 100 22.6 10 0 3.8 20 5.6 8 17.8 36 7.2 0 13.8 28 9.1 0 3.6 8 3.4 8 39.0 76 17.7 11 0 6.8 28 8.1 0 4.4 16 5.0 0 12.8 32 8.0 0 5.8 24 7.7 0 29.8 96 23.6 12 0 3.0 12 3.9 0 9.6 24 7.2 0 7.0 28 8.5 0 1.6 16 3.8 4 21.2 80 19.6 13 0 2.8 20 5.0 32 47.2 64 8.6 0 7.2 28 7.1 0 4.6 20 6.9 40 61.8 112 18.2 14 0 2.4 20 4.8 20 31.4 52 7.4 4 10.4 28 5.7 0 6.0 16 4.9 32 50.2 76 12.8 15 0 3.8 20 6.0 16 31.0 52 8.0 0 6.2 24 5.3 0 6.2 32 7.2 32 47.2 88 17.0 16 0 3.4 36 8.5 8 19.0 52 10.0 0 6.8 24 6.6 0 5.6 24 6.3 12 34.8 132 27.0 17 0 5.6 56 12.5 0 6.8 28 6.8 0 8.2 24 7.5 0 5.8 16 4.8 8 26.4 84 21.0 18 0 2.4 12 3.3 12 23.8 36 6.6 0 7.0 32 8.0 0 3.2 28 6.4 20 36.4 84 15.8 19 0 3.0 12 3.9 8 19.6 36 8.2 0 5.2 16 6.2 0 4.6 16 4.9 8 32.4 68 15.1 20 0 1.8 12 3.0 0 9.2 20 5.5 0 7.0 24 6.7 0 3.2 24 6.0 0 21.2 68 14.4 21 0 2.6 12 4.2 16 25.0 32 5.3 0 5.8 24 6.2 0 2.8 20 4.7 20 36.2 68 14.2 22 0 3.6 12 3.6 0 10.8 20 5.5 0 10.2 20 5.7 0 4.2 16 4.4 4 28.8 52 13.1 23 0 4.8 20 5.1 4 14.6 36 7.7 4 11.2 24 5.9 0 6.2 24 5.3 12 36.8 80 16.7 24 0 2.8 12 3.7 0 7.6 20 5.0 0 7.6 28 6.9 0 2.2 12 3.0 0 20.2 44 12.5

Avg. 4.2 Avg. 17.8 Avg. 10.6 Avg. 4.3 Avg. 36.8 Std. Dev. 1.7 Std. Dev. 10.7 Std. Dev. 4.8 Std. Dev. 1.7 Std. Dev. 10.4

52

7. ANALYSIS 7.1 Modeling the Color Vector Graphic The fundamental motivation for the design of the CB variable (the nominal orientation of the CVG) was to quantify opposing CVG shapes which were hypothesized to be perceptually different despite having

the same average fidelity (Rf) and average gamut area (Rg). While the variable itself was useful for spectral optimization, the optimization itself did not lead to SPDs with CVGs that were always oriented along the bin 1 to 9 or the bin 7 to 15 axes (i.e. opposing axes in the CVG). Thus, the CB variable cannot distinguish SPDs with very different vector orientations, which consequently does not provide the granularity necessary for model prediction. Additionally, because CB is a designed variable, it cannot be computed automatically for the prediction of a new SPD and must be assigned manually. For speed and convenience of the future application of any specified models, it is desirable to replace the CB variable with a more precise, objective, automated measure.

The visual observation that most of the experimental CVGs can be approximated by an ellipse suggested that a best-fit ellipse approach may be a suitable way to quantify the shape of the CVG. A direct least- squares fitting of an ellipse [Fitzgibbon 1990] was used to approximate the CVGs of the 24 experimental SPDs (see Table 5-1 and APPENDIX G: CVG Best-fit ellipses).14 The resulting best-fit ellipses can be defined by the length of their semi-major axis (a), the length of their semi-minor axis (b), and their rotation angle (ψ).15 For the experimental SPDs considered, the rotation angle (ψ) has a range of 55.8ᵒ for CB1 SPDs and a range of 46.2ᵒ for CB7 SPDs, indicating that ellipse fitting is a more numerically granular method of explaining the orientation of the CVG. Additionally, ellipse eccentricity (e)—which is a measure of ‘out of roundness’ of the ellipse and ranges from 0.0 to 1.0—was calculated for each of the

best-fit ellipses. An eccentricity of 0.00 is a circle and can only occur when Rf is equal to 100. See Figure

7-1 for best-fit ellipse parameters plotted as a function of Rf and Rg.

7.2 Model building A principal goal of this work was to understand how human perceptual responses (rated along several subjective scales) relate to spectra with strategically varied average fidelity, average gamut, and gamut shape. To test the initial hypotheses, best-fit regression models were developed to define a relationship

between the responses and a suite of predictors (including IES TM-30-15 metrics Rf, Rg, Rcs,hi, and the best-fit ellipse parameters a, b, ψ, and e). Best-fit models were developed for mean responses for each SPD using a combination of ANOVA, regression, and best subset analyses. Because a large number of predictors were examined, great caution was exercised to avoid over-fitting the regression model.

Mallows Cp [Mallows 1973] was used to determine initial model candidates (lower is better, indicating 2 2 that the model is relatively precise), and a combination of r and r pred [Neter and others 1983] were used to determine the best models. The predicted r2 is a measure for determining how good a model is for predicting responses for new observations—which is desirable for any future use of these models— and provides good indication if a model is modeling noise and not the underlying trend in the data. 2 Because a large suite of predictors was considered, r pred was used to avoid artificially strong models which have high r2 values (and appear strong) simply because they have a large number of predictors.

14 Matlab code can be downloaded at: http://research.microsoft.com/en-us/um/people/awf/ellipse/ [Accessed: Aug 2016] 15 The rotation angle (ψ) measures the angle between the major axis of the ellipse and the positive x-axis and ranges from –π/2 to π/2. Positive values are counterclockwise.

53

Therefore, the simplest model (with the least amount of terms), which explains most of the variance (r2), 2 and is particularly strong at predicting responses for new observations (high r pred), was considered the best model. A summary model terms is included in Table 7-1. See APPENDIX H: Best-fit model statistics for a statistical summary of the best-fit models.

7.3 Subjective ratings—OVERALL 7.3.1 Naturalness Initial analyses of single predictors showed that Rf is a significant linear predictor of mean naturalness 2 rating (r = 0.30, p = 0.006) (Figure 7-2, top left). Rcs,h1 and Rcs,h2—the strongest Rcs,hi of the 16 hue angle bins—are also significant linear predictors (p = 0.046 and 0.000, respectively) but fit the data poorly (r2 = 2 0.17 and 0.23, respectively). Rg is not a significant linear predictors of mean naturalness (r = 0.09, p =

0.663). For all factors except Rf, a better fit can be achieved with each factor’s quadratic counterpart: 2 2 2 2 2 2 (Rg) (r = 0.21, p = 0.031) , (Rcs,h1) (r = 0.59, p = 0.000), (Rcs,h2) (r = 0.63, p = 0.000). This suggests a quadratic relationship between mean naturalness and these predictors (Figure 7-2). Best-fit ellipse parameters a, b, and ψ are not statistically significant linear (or quadratic) predictors (p = 0.136, 0.120 and 0.079, respectively) (Figure 7-3). Best-fit ellipse eccentricity is a significant linear predictor of mean naturalness rating (p = 0.027) (Figure 7-3, top left).

While Rf is a significant predictor of mean naturalness and seems to corroborate the hypothesis that light sources with higher average fidelity will be perceived as more natural, the full relationship appears to be more complex. When Rg (linear or quadratic) is added to the model with Rf, neither provide additional benefit and both are statistically insignificant (p = 0.618 and 0.662 respectively). Phi (ψ), 2 however, provides additional benefit when added to a model with Rf (r = 0.44, p-Rf = 0.003, p-ψ =

0.031). There is also a significant interaction effect between Rf and ψ (p = 0.011) which indicates that the trend between mean naturalness and Rf varies across the levels of ψ. This interaction can be clearly seen in Figure 7-2 (top left) where the trend between Rf and mean naturalness varies for the different levels of CB (here CB is acting as an ordinal substitute for ψ to simplify graphing).

While a model containing Rf and ψ is better than a model with either predictor alone, other factors can be shown to provide additional benefit when included in the model. A best subset approach was taken to determine the best model fit to mean naturalness. All major factors were included as was the Rf*ψ 2 2 interaction effect. The best model—with r = 0.92, r pred = 0.84, and the lowest Mallows Cp—includes Rf, 2 16 Rcs,h1, Rcs,h1 , ψ, and Rf*ψ. All predictors are statistically significant with p < 0.000. The regression model is as follows: 17

2 NATURALNESS = 1.464 + 0.02674 Rf + 0.188 Rcs,h1 - 15.41 Rcs,h1 - 0.05305 ψ + 0.000602 Rf*ψ (Equation 7-1)

Trends consistent with the recent work of Royer and others [2016] are the significance of Rf as a predictor of naturalness ratings, the trend for naturalness ratings to increase with increasing Rf, inclusion of a proxy for red saturation (Rcs,h1 in this study), and the general form of the model (shown in

16 Rcs,h1 is not statistically significant (p = 0.612) in the model but was included to precede its own quadratic term in the regression equation to maintain a hierarchical model. 17 2 2 2 While Rcs,h2 is stronger than Rcs,h1 directly (r = 0.63 and 0.59, respectively), independent analysis showed that models using Rcs,h1 are stronger.

54 bold above). Additionally, the influence of red rendering has been documented by past researchers [Ohno 2005, 2015; Wei and others 2014; Wei and others 2016a, 2016b].

The SPD with the highest rated naturalness (SPD 12) also has the highest average fidelity (Rf = 95) and an

Rcs,h1 of approximately 0%. Contrarily, the three lowest rated SPDs (1, 2, and 17) have the lowest fidelity

(Rf = 65) and the extreme values of Rcs,h1 (+23%, +23%, and -31%, respectively) (Figure 7-2, top left and bottom right). These results suggest that, on average, naturalness ratings will increase with increasing average fidelity, and decrease with departure from Rcs,h1 of 0% (in either the positive or negative direction).

7.3.2 Vividness Initial analyses of single predictors showed that Rf is not a significant linear predictor of mean vividness 2 rating (r = 0.002, p = 0.838). Rg and Rcs,h16 are both significant linear predictors (p = 0.000 for both) and both fit the data well (r2 = 0.673 and 0.863, respectively). The quadratic counterparts for these two terms do not provide a better fit (p = 0.253 and 0.418, respectively) (Figure 7-4). Ellipse parameters a and b are significant linear predictors (p = 0.002 and 0.001, respectively) and ellipse rotation (ψ) is a statistically significant quadratic predictor (p = 0.000) (Figure 7-5).

The trend in Figure 7-4 (top right) supports the hypothesis that SPDs with increasing Rg and CVGs nominally oriented in the red direction (CB1) will be perceived as more vivid than CVGs nominally oriented in the green direction (CB7).18 The importance of CVG orientation can be most explicitly demonstrated for the pairs of SPDs with an Rg of 100 where CB1 and CB7 are most strongly contrasted: (3, 15), (7, 19), and (10, 22). CB is statistically significant (p < 0.000) for all three of these pairs despite each of the SPDs in the pair having the same average fidelity and gamut (Figure 7-4, bottom). SPD 12 and 24—whose CVGs are contrasted but are negligibly different because of their high fidelity value (Rf = 95)—are not statistically different (p = 0.115).19

For predicting vividness, the importance of the CVG orientation is evident (represented here by CB and 2 2 modeled by ψ) and a robust model can be built when ψ is combined with Rg (r = 0.85, r pred = 0.79, p <

0.000 for both). A best subset analysis with main predictors (excluding any Rcs,hi) reveals a particularly 2 2 strong model (r = 0.89 , r pred = 0.82) which contains ellipse parameters ψ and b, and the quadratic counterpart for both (p < 0.046 for all). A simpler model, however, can be built directly with Rcs,h16 (p = 2 2 0.000), which has a comparable r of 0.86 and a higher r pred (0.83) (Figure 7-4, middle right). This model also has the lowest Mallows Cp when Rcs,h16 is included in the best subset analysis. Due to its increased 2 simplicity and higher r pred, it can be considered a better model. The regression equation is as follows:

VIVIDNESS = 3.332 + 4.594 Rcs,h16 (Equation 7-2)

SPD 1, which has the highest value of Rcs,h16, also has the highest mean vividness rating. SPD 17, which has the lowest Rcs,h16, also has the lowest mean vividness rating (Figure 7-4, top right and middle right). The strong connection between judgements of vividness and a proxy for red saturation—specifically

Rcs,h16—is markedly consistent with the recent work of Royer and others [2016].

18 CB1 is equivalent to -15.1 < ψ < 40.7 and CB7 is equivalent to -88.3 < ψ < -42.1. 19 This trend holds true for ψ as a predictor (in place of CB) for these same pairs (p = 0.02, 0.017, 0.007, and 0.115, respectively).

55

7.3.3 Preference Initial analyses of single predictors showed that Rf is not a significant linear predictor of mean 2 preference rating (r = 0.12, p = 0.100) (Figure 7-6, top left). Rg and Rcs,h15 are both significant linear predictors (r2 = 0.54 and 0.59, respectively and p = 0.000 for both) but their quadratic counterparts provide a better fit to the data (r2 = 0.66 and 0.70, and p = 0.012 and 0.011, respectively) (Figure 7-6, top right and bottom right). The trend in Figure 7-6 (top right) supports the a priori hypothesis that preference ratings will increase with increased gamut, up to a point, and is consistent with past research work which suggests that too much saturation is undesirable [Ohno and others 2015; Wei and others 2016; Royer and others 2016]. CB did not have a significant effect on preference ratings (p = 0.727) which does not provide any evidence to support the hypothesis that CB1 SPDs would be more preferred than CB7 SPDs (Figure 7-6, bottom left). Phi (ψ), however, is a significant quadratic predictor of mean preference rating (r2 = 0.38, p = 0.002) and the regression equation plateaus at -22.2ᵒ, which falls roughly on the boundary line for hue angle bin 15 and 16 (-22.5ᵒ) (Figure 7-7, top right). While this suggests that higher preference occurs with CVG best-fit ellipses oriented in the region of hue angle bin 16, the relationship is not strong enough to permit any direct conclusion, except that the full relationship is more complex.

2 2 The most robust regression model (r = 0.86, r pred = 0.70, and lowest Mallows Cp) can be built with a combination of Rf, a second-order polynomial fit for Rcs,h16, ψ, and an Rf*ψ interaction (all terms significant at p < 0.003). The interaction between Rf and ψ can be seen in Figure 7-6 (top left), where the trend between Rf and mean preference differs for the SPDs oriented in opposing directions (here CB is acting as an ordinal substitute for ψ to simplify graphing). The interaction, and the resulting regression equation, are similar to the naturalness model above. The regression model is as follows:

2 LIKE = 1.629 + 0.02686 Rf + 3.423 Rcs,h16 - 10.01 Rcs,h16 (Equation 7-3) - 0.04866 ψ + 0.000566 Rf*ψ

The bolded portion of the model is consistent, in form, with the preference model of Royer and others

[2016].The models suggests that increasing average fidelity and Rcs,h16 will tend to increase mean 2 preference rating. The negative coefficient of Rcs,h16 accounts for the plateau effect between preference and saturation. The six SPDs with the highest mean preference rating (SPD 13, 18, 12, 21, 10, and 9, respectively) have high average gamut (Rg ≥ 100) each with an Rcs,h16 ≥ 0%. The two of the six which did not have an Rg > 100 also had a high fidelity (Rf ≥ 85), indicating a general trend for sources with high gamut and high fidelity, without desaturation in Rcs,h16, to be most preferred. The top rated source (SPD

13) had the second highest values of Rcs,h16 (18.6%), while SPD 1, with the second highest value of Rcs,h16

(22.5%), was ranked 7th. Based on these rankings, further increases in Rcs,h16 can be expected to further decrease mean preference.

7.3.4 Skin Preference Initial analyses of single predictors showed that Rf is a significant linear predictor of mean skin 2 2 preference rating (r = 0.17, p = 0.048) as was Rg (r = 0.57, p = 0.001) (Figure 7-8, top left and right). Rg’s quadratic counterpart provides a better fit (r2 = 0.72, p = 0.001) and provides direct support for the hypothesis that mean skin preference ratings will increase as Rg increases, though the significant quadratic fit suggests a plateau effect similar to the mean preference ratings above. Additionally, the 6 sources with the highest mean skin preference rating have an Rg ≥ 100, where the only source with Rg =

100 has high average fidelity (Rf = 95). CB, alone, did not have a significant effect (p = 0.345) (Figure 7-8,

56 middle left), thus there is no direct evidence to support the hypothesis that CB1 SPDs would have higher mean preference ratings than CB7 SPDs. Phi (ψ), however, is a significant quadratic predictor (p = 0.001), and the regression equation peaks in IES TM-30-15 hue angle bin 16, which is the orientation of the best-fit ellipse for SPD 9, the source with the second highest mean skin preference rating. Semi-minor axis length (b) is a significant linear and quadratic predictor (r2 = 0.69 and 0.75, p = 0.000 and 0.038, respectively), where sources with higher values of b (which signifies less desaturation in the minor axis direction) have higher mean skin preference ratings. (See Figure 7-9)

2 2 Another notable predictor, Rcs,h16, has the highest linear and quadratic r of any Rcs,hi (r = 0.49 and 0.66, respectively) and are both statistically significant (p = 0.000 and 0.005, respectively) (Figure 7-8, middle right). The analysis produced two models which are comparable in strength (Table 5-1), both of which 20 include Rcs,h16. Despite the close performance of the two models, the model containing parameter b was chosen due to its simplicity and high significance of all model terms (p < 0.015 for all), which does not hold true for the other model. The regression equation for the chosen model is as follows:

SKIN = 0.128 + 3.758 b + 1.161 Rcs,h16 - 8.41 Rcs,h162 (Equation 7-4)

The SPDs with the top rated mean skin preference (SPD 21, 9, 12, 14, 13, and 18, respectively) have among the highest values for best-fit ellipse semi-minor axis length (b)—five of which have minor axes within quadrant 1 of the CVG (i.e. hue angle bins 1, 2, 3 and 4)—and Rcs,h16 values ≥ 0%. Additionally, 5 of the 6 SPDs have an Rg > 100, where the source with the lowest gamut (Rg = 100) has the highest average fidelity (Rf = 96). This is the same source whose semi-minor axis is not oriented within the first 4 hue angle bins (because its major axis is), though the major/minor axis information has little meaning for spectra with such high average fidelity values. Thus, the most preferred sources have less desaturation in hue angle bins 1 through 4, which is where IES TM-30-15 ‘skin’ samples are generally located, and positive values for Rcs,h16. Additionally, five of the six of these SPDs were also included in the top 6 preferred SPDs. In fact, mean preference and mean skin preference are highly correlated (Pearson’s r = 0.834, p = 0.000) indicating that as mean skin preference increases, so does overall mean preference.

Rf,skin is a TM-30-15 complexion-specific fidelity index and is an average of the 2 sample-specific fidelity values (Rfces,i) for skin complexion (CES 15 and 18) [DOE 2016]. While Rf,skin is a slightly stronger predictor 2 of mean skin preference than Rf when compared directly (r = 0.25, p = 0.013), models with Rf,skin always 2 produce a (slightly) smaller r pred and larger Mallows Cp than those same models with Rf. Because of this, best subset and stepwise regression analyses consistently eliminate Rf,skin in favor of Rf. Though Rf,skin is a 2 stronger predictor of mean skin preference than Rf when compared directly (r = 0.25 and 0.17, respectively)—which may be useful for simple specification of light sources where skin rendering is an important design consideration—the above analysis suggests that Rf is a stronger metric for more precisely predicting skin preference when assisted by other metrics.

A summary of the best models can be seen in Table 7-1 and Figure 7-10.

7.4 Subjective Ratings—OBJECT SPECIFIC Ratings along the like, natural, and vivid scale were recorded for each object under each of the light settings. To test for consistency, the ratings under each light setting for all twelve objects were averaged (separately for each scale) and compared to the OVERALL responses for the same light setting. The

20 2 Independent analyses with other strong Rcs,hi show that models with Rcs,h16 have the highest r pred.

57 correlation between the OVERALL response and the composite rating (made by averaging all 12 object- specific responses) show strong consistency (Pearson’s r = 0.923, 0.870, and 0.959, respectively and p < 0.000 for all)(Figure 7-11). The in-depth analysis below of the object-specific responses provides valuable insight into the rating strategies of participants. Results are detailed in Figure 7-12.

7.4.1 Naturalness Four of the six color categories (Red, Yellow, Green, and Blue) had a statistical difference between the mean naturalness rating for the two object categories, with the Natural Foods category having a higher mean for all four. This intuitively suggests that the food objects were generally considered more natural than the manufactured Consumer Goods objects. This is consistent with several unsolicited comments from participants stating that it was difficult to give a naturalness rating to the consumer good objects. The two object categories that did not exhibit any statistical difference (Orange and Purple) contained the orange and onion, which also had the highest average color shift (ΔEJab) across the 24 light settings (Figure 8-1, bottom). It is likely that the large color shifts of these two objects contributed to them not being perceived as more natural than their consumer product pairing in the experiment.

7.4.2 Vividness Three pairs had statistical difference with natural objects having a higher mean (Orange, Yellow, and Purple), two pairs exhibited no statistical difference (Red and Green), and one pair exhibited a statistical difference with consumer goods having a higher mean (Blue). Rationale by color category follows:

Red: the Coca Cola can had a higher chroma, on average, than the red apple although there was not a statistical difference between their means. Both of these objects were highly chromatic and it is possible that participants did not see large differences between the two highly chromatic items.

Orange: the orange had a higher chroma than the Orange Crush can under all sources and the orange was rated statistically higher, on average.

Yellow: The lemon was rated statistically higher despite the mustard having a higher chroma under all sources. The mustard bottle was highly matte and it is likely that this led to it being rated as less vivid than the lemon.

Green: No statistical difference between the two objects is consistent with the average delta chroma of zero across all light settings.

Blue: The Pepsi can had a significantly higher chroma than the blueberries under all light sources (blueberries have a very low chroma), leading to the Pepsi can having a statistically higher mean vividness. This is the only pair in which the object in the Consumer Goods category has a statistically higher mean vividness, and could be the result of the large chroma difference between the two objects.

Purple: the onion had the largest average color shift among all objects (ΔEJab) and had a higher chroma than the grape crush can under all light sources. It is likely that these two factors led to the onion being perceived as statistically more vivid, on average.

7.4.3 Preference In the exit survey, 12 participants explicitly mentioned that they tended to prefer objects that were more highly saturated, while 5 participants explicitly mentioned a link between increased preference and decreased naturalness. Four participants explicitly mentioned a plateau, where too much saturation was undesirable. Using the general concepts that higher vividness is preferred (up to a point) and that

58 too much saturation is unnatural and/or undesirable, we can attempt to rationalize the preference results:

Red: there was no statistical difference in the vividness between the Coca Cola can and the red apple, suggesting participants did not, on average, see a chromatic difference between these two objects (which are both highly chromatic). This may have led to a statistical insignificance between the preference ratings of these two objects.

Orange: The orange, on average, was rated statically more vivid than the Orange Crush can, and it was also statistically more preferred than the Orange Crush can. This is consistent with the logic that increased vividness leads to increased preference.

Yellow: The lemon, on average, was rated statistically more vivid and more preferred than the mustard bottle. This is also consistent with the logic that increased vividness leads to increased preference. Additionally, because the lemon was also rated statistically more natural than the mustard bottle, we might infer that while the lemon was generally perceived as more vivid, it wasn’t distorted (on average) to the point of being perceived as being unnatural.

Green: Because there was no statistical difference between the vividness of these objects, and the apple was statistically more natural than the Sprite bottle, we would expect the green apple to be statistically more preferred than the Sprite bottle. The opposite is true, in fact, and cannot be rationalized based on the logic presented above.

Blue: The link between increased vividness and increased preference is unambiguous with this comparison where the blueberries were statistically more natural than the Pepsi Can, but the Pepsi can was rated statistically more vivid and more preferred.

Purple: The onion was statistically more vivid than the Grape Crush can but there was no statistical difference between these two for naturalness or preference. Because of the large chroma shifts of the onion we might infer that the chroma of the onion generally passed the plateau whereby too much saturation is both unnatural and undesirable.

7.5 Color discrimination 7.4.1 Modeling the adjusted TES—a direct approach 2 An initial analysis of predictors shows that Rf is a significant linear predictor of mean TESadj (p = 0.005, r 2 = 0.301) and Rg is a significant quadratic predictor (p = 0.018, r = 0.474) (Figure 7-13, top left and right). As Rf increases, mean TESadj decreases, which initially supports the hypothesis that increased average fidelity will increase color discrimination ability. The convex quadratic fit between Rg and TESadj is minimized at Rg = 92 and mean total error scores increase with departure from Rg = 92 in both directions. This does not support the hypothesis that increased Rg would provide increased color discrimination ability, and instead suggests the opposite. While Rf and Rg are both significant predictors of mean total error score, the fits are notably weak, with a large range of error scores occurring within a fixed value of Rf or Rg. A combination of Rf and Rg produces the strongest model that can be built for 2 2 21 predicting TESadj (without the use of any Rcs,hi) and is also a notably weak model (r = 0.61, r pred = 0.49). These results suggest that Rg is not a defensible predictor of TESadj, even when paired with other metrics.

21 2 A stronger model (r = 0.73) can be built for TESadj when Rcs,h15 is included. There is no theoretical evidence that prediction of TES should favor a single hue angle bin since the colored caps of the test span the entire hue circle. Thus, this model was not included.

59

This is consistent with the findings of Royer and others [2012] which suggest Rg does not correlate well with total error scores for highly structured spectra (all spectra in this study were highly structured). 7.4.2 Modeling the adjusted TES—a segmented approach Modeling the tray-specific adjusted error scores—of which the TES is comprised—shows that most metrics have differential predictive power across the 4 separate trays (Table 7-2). As an example, CVG best-fit ellipse semi-minor axis length (b) is a statistically significant linear predictor of mean error scores for Tray A, B, and C. It happens to be highly insignificant for predicting mean error scores for Tray D, and consequently, is not a significant predictor of the mean total error score. It appears that when the tray- specific error scores are combined to create the total error score, the differential predictive ability for any single metric across the trays causes most metrics to be mediocre predictors of TESadj. The strongest 2 single predictor is Rcs,h15 (quadratic fit) (p = 0.002, r = 0.611). Hue angle bin 15 contains FM100 chips directly in the center of Tray D, which has a statistically lower average TESadj than Tray B and C and never experiences any transposition of caps, across all experimental light sources. Rcs,h15 is a statistically significant predictor of mean error score for all trays and is among the strongest predictors of Tray D mean error scores alongside Rg. Because all metrics are notably poor predictors of Tray D mean error scores, the consistency of Rcs,h15 across all tray-specific error scores and its relative strength in predicting Tray D mean error scores may explain why it is the strongest individual predictor of TESadj. Because individual metrics are not universally strong predictors across all tray-specific error scores, the TESadj absorbs these poor fits when the sum of the tray-specific scores is computed. A more accurate prediction of TESadj can be made by modeling the tray-specific scores, individually, then combining them into a predicted total error score. A segmented approach to predicting TESadj is as follows: (1) Model tray-specific mean error scores individually (i.e. 4 separate models) (Table 7-3, “TM30 & Ellipse Parameters”), (2) Based on these models, compute new (predicted) error scores for each tray under each light source, and 3. Compute a new total error score—following the logic of the FM100 calculation of TES—which is the sum of the predicted error scores for each of the trays. The resulting predicted total 2 2 error score (TESpred) is a significantly stronger fit (p = 0.000, r = 0.78, r pred = 0.75) than the Rf/Rg model prediction of TESadj (Table 7-3). This suggests that the differential predictive power of metrics across the error scores for the 4 trays causes any metric (or model) to be a poor fit to TESadj. A more segmented approach, which individually considers the tray-specific error scores, appears to be a more robust predictive method. 7.4.3 Modeling the adjusted TES—a segmented approach with unifying model While a segmented approach appears to be a more precise method than the direct approach for predicting mean adjusted total error score, the differential predictive ability of any single metric leads to different best-fit models for each of the tray-specific mean error scores (Table 7-3). A less prescriptive method is desired, and theoretically, a single unifying model should exist which moderately explains each of the tray-specific mean error scores and which combine to produce a predicted mean total error score which is an equal or better predictor than the segmented method (TESpred) previously described. Two assumptions for an underlying model are made: 1. A light source which compresses the chips of a single tray within color space and decreases the average hue angle difference between adjacent chips within that tray will make chips harder to distinguish, and increase the average number of errors made, and 2. A light source which transposes the chips in a tray (based on the hue angle in a’b’ space, relative to the illuminant under which the chips were designed) will further increase the average number of errors made. The proposed underlying model for the four tray-specific error scores is as follows:

60

ퟐ ̅̅̅̅ 푹퐝,풊 = 휶ퟎ + 휶ퟏ푹퐝퐭,풊 + 휶ퟐ푹퐝퐭,풊 + 휶ퟑ∆풉풊 + 휺 (Equation 7-5)

Where,

푅d,푖 is the mean tray-specific predicted error score

훼0,1,2,3 are regression coefficients to be estimated i = 1, 2, 3, and 4 for FM100 trays A, B, C, and D, respectively

푅dt,푖 is an FM100-based light source-specific error calculation based on the transpositions caused by the light source in comparison to their natural order under the CIE Standard Illuminant C (the source under which they were designed). If a light source does not transpose any caps (relative to the reference), then this value is equal to zero and Rd,i is computed entirely based on the average hue angle difference between the colored caps in the tray. (Figure 7-14) ̅̅̅̅ 푛 ∆ℎ푖 = ∑푗=1((∠ℎ푗+1 − ∠ℎ푗)/푛), is the average hue angle difference between adjacent color caps in the FM100 trays (n = the number of caps in the tray minus one. n = 21 for Tray A and n = 20 for Tray B, C, and D). Once this equation has been estimated for each of the four trays, a new total error score can be computed—following the logic of the FM100 calculation of TES—which is the sum of the predicted error scores for each of the four trays: ퟒ 푹퐓퐄퐒 = ∑풊=ퟏ 푹퐝,풊 (Equation 7-6)

Where, 푅TES is the predicted total error score for the light source 푅d,푖 is the predicted tray-specific error score Once the predicted total error score has been calculated, it can be fit as a linear predictor of the mean total error scores:

푹퐝 = 휷ퟎ + 휷ퟏ푹퐓퐄퐒 + 휺 (Equation 7-7)

Where, 푅d is the final predicted error score 훽0,1 are regression coefficients to be estimated 푅TES is the sum of the tray-specific error predictions

The performance of the unifying model for each of the four trays (and TESadj) is shown in Table 7-3 (“Custom Metrics”).22 The unifying model is slightly weaker than the best-fit models using IES TM-30-15 metrics and CVG best-fit ellipse parameters for Tray A, significantly stronger for Tray B, slightly stronger for Tray C, and significantly worse for Tray D. For predicting TESadj, the custom metric RTES is a 2 2 significantly stronger predictor (p = 0.000, r = 0.87, r pred = 0.84) than the IES TM-30-15 metrics and CVG best-fit ellipse parameters, and still stronger than the segmented approach (TESpred) detailed in 7.4.2 Modeling the adjusted TES—a segmented approach. Overall, this model does not solve the problem of uniformly predicting all tray-specific responses. Nonetheless, it is still a stronger prediction method than any combination of the available metrics considered.

22 Note that Rdt,i is equal to zero for Tray A and D because no caps in these trays were transposed under any of the 2 24 experimental light sources. The predicted r-squared (r pred) cannot be estimated for Tray C error scores when 2 Rdt,i is included, because the quadratic fit cannot be estimated when SPD 14 is removed (Figure 7-14, middle left).

61

7.6 Figures

Rf 65 75 85 95 65 75 85 95 Rg

Best-fit ellipse parameter map Best-fit ellipse parameter map 120 1.20 1.22 0.99 0.97 Semi-major axis length, a Semi-minor axis length, b

110 1.20 1.19 1.16 1.16 1.10 1.10 0.92 0.91 0.95 0.94 0.99 0.99

100 1.18 1.18 1.12 1.12 1.07 1.07 1.01 1.02 0.86 0.84 0.89 0.88 0.93 0.93 0.99 0.99

90 1.15 1.11 1.08 1.06 1.01 1.03 0.80 0.81 0.86 0.84 0.89 0.89

80 1.07 1.05 0.77 0.76

Best-fit ellipse parameter map Best-fit ellipse parameter map 120 165 138 0.57 0.61 Rotation angle, ψ [°degrees] Eccentricity, e

110 7 122 5 122 176 132 0.65 0.65 0.57 0.58 0.45 0.44

100 15 116 18 114 17 110 27 105 0.69 0.70 0.61 0.62 0.49 0.49 0.19 0.22

90 22 108 27 99 41 92 0.72 0.69 0.61 0.61 0.48 0.50

80 33 94 0.70 0.69

Figure 7-1 Best-fit ellipse parameters as a function of Rf and Rg. (Top left) Semi-major axis length (a). (Top right) Semi-minor axis length (b). (Bottom left) Rotation angle (ψ). (Bottom right) Eccentricity (e).

62

Figure 7-2 Independent variables as predictors of mean naturalness rating. (Top left) Rf as a predictor of mean 2 naturalness rating. Rf is a significant linear predictor (p = 0.006) though it is notably weak (r = 0.30). The fit is improved significantly when the interaction between Rf and ellipse rotation is included (the interaction effect is shown here with CB to simplify graphing) and is evident by the intersecting best-fit lines for CB1 SPDs and CB7 SPDs. (Top right) Rg as a predictor of mean naturalness. Rg is a significant quadratic predictor (p = 0.031), though it is notably weak (r2 = 0.21). (Bottom left) Color Vector Graphic orientation (CB) as a predictor for mean naturalness rating. CB had a statistically significant effect (p = 0.011), where CB7 SPDs were, on average, rated higher than CB1 SPDs. (Bottom right) Chroma shift for hue angle bin 1 (Rcs,h1)—among the strongest Rcs,hi—as a predictor of mean naturalness rating. Rcs,h1 is a statistically significant predictor (p = 0.031) of mean naturalness rating.

63

Figure 7-3 Best-fit ellipse parameters as predictors of mean naturalness rating. (Top left) Best-fit ellipse eccentricity (e) is a statistically significant linear predictor (p = 0.027) of mean naturalness rating but is a notably poor fit (r2 = 0.20). (Top right) Best-fit ellipse rotation (ψ) is not statistically significant predictor (p = 0.079) of mean naturalness rating. (Bottom left) Best-fit ellipse semi-major axis length (a) is a not statistically significant predictor (p = 0.136) of mean naturalness rating. (Bottom right) Best-fit ellipse semi-minor axis length (b) is not a statistically significant predictor (p = 0.120) of mean naturalness rating.

64

Figure 7-4 Independent variables as predictors of mean vividness rating. (Top left) Rf is not a statistically significant predictor (p = 0.838) of mean vividness rating. (Top right) Rg is a statistically significant linear predictor (p = 0.000) of mean vividness rating and is a fairly strong fit (r2 = 0.67). The dashed red and green lines are the best-fit regression lines for the CB1 and CB7 SPDs, respectively. Their parallel nature suggests that there is no interaction effect between CB and Rg, but that generally CB1 SPDs are rated more vivid than CB7 SPDs. (Middle left) CB had a statistically significant effect on mean vividness rating (p = 0.027), where CB1 SPDs were rated, on average, more vivid than CB7 SPDs. (Middle right) Chroma shift for hue angle bin 16 (the highest of any Rcs,hi) is a statistically significant linear predictor (p = 0.000) of mean vividness rating. Full analysis shows that this is also the best model for predicting mean vividness rating. (Bottom) An enlarged portion of the mean ratings for sources with an Rg equal to 100 (where the CVGs are most strongly contrasted). For three of the 4 pairs (Rf = 65, 75, and 85), CB1 has a statistically higher mean vividness rating than CB7. This is not true for the pair at Rf = 95, where the CVGs are theoretically contrasted but negligibly different due to their high fidelity.

65

Figure 7-5 Best-fit ellipse parameters as predictors of mean vividness rating. (Top left) Best-fit ellipse eccentricity (e) is not a statistically significant predictor (p = 0.620) of mean vividness rating. (Top right) Best-fit ellipse rotation (ψ) is a statistically significant quadratic predictor (p = 0.000) of mean vividness rating and is a notably strong fit (r2 = 0.817). (Bottom left) Best-fit ellipse semi-major axis length (a) is a statistically significant linear predictor (p = 0.002) of mean vividness rating. CB1 SPDs were shown to have a statistically higher mean vividness rating (on average) and can be easily seen here where the best-fit linear regression line for CB1 is higher (and parallel) to the regression line for CB7 SPDs. (Bottom right) Best-fit ellipse semi-minor axis length (b) is a statistically significant linear predictor (p = 0.001) of mean vividness rating.

66

Figure 7-6 Independent variables as predictors of mean preference rating. (Top left) Rf is not a statistically significant predictor (p = 0.100) of mean preference rating. There is, however, a statistically significant interaction effect between Rf and CB and can be seen here as the intersection between the best-fit regression lines for CB1 and CB7 SPDs. (Top right) Rg is a statistically significant quadratic predictor (p = 0.012) of mean preference rating and is a notably strong fit (r2 = 0.66). (Bottom left) CB is not a statistically significant predictor (p = 0.727) of mean preference rating. (Bottom right) Chroma shift for hue angle bin 15 (the strongest of the 2 Rcs,hi) is a significant quadratic predictor (p = 0.011) of mean preference rating and is a notably strong fit (r = 0.70).

67

Figure 7-7 Best-fit ellipse parameters as predictors of mean preference rating. (Top left) Best-fit ellipse eccentricity (e) is not a statistically significant predictor (p = 0.051) of mean preference rating. (Top right) Best-fit ellipse rotation (ψ) is a statistically significant quadratic predictor (p = 0.002) of mean preference rating but is a poor fit (r2 = 0.38). (Bottom left) Best-fit ellipse semi-major axis length (a) is a not statistically significant predictor (p = 0.230) of mean preference rating. (Bottom right) Best-fit ellipse semi-minor axis length (b) is a statistically significant linear predictor (p = 0.000) of mean preference rating and is a mediocre fit (r2 = 0.56).

68

Figure 7-8 Independent variables as predictors of mean skin preference rating. (Top left) Rf is a statistically significant linear predictor (p = 0.048) of mean skin preference rating but is a notably poor fit (r2 = 0.17). (Top right) Rg is a statistically significant quadratic predictor (p = 0.004) of mean skin preference rating and is a notably strong fit (r2 = 0.72). (Middle left) CB is not a statistically significant predictor (p = 0.345) of mean skin preference rating. (Middle right) Chroma shift for hue angle bin 16 (the strongest of the Rcs,hi) is a significant quadratic predictor (p = 0.005) of mean skin preference rating and is a notably strong fit (r2 = 0.66). (Bottom) Rf,skin (the IES TM-30-15 complexion-specific fidelity index) is a significant linear predictor (p = 0.013) of mean skin preference rating. Though it is a notably poor fit (r2 = 0.25), it is a stronger predictor of mean skin preference than Rf.

69

Figure 7-9 Best-fit ellipse parameters as predictors of mean skin preference rating. (Top left) Best-fit ellipse eccentricity (e) is a statistically significant linear predictor (p = 0.012) of mean skin preference rating, though it is a notably poor fit (p = 0.25). (Top right) Best-fit ellipse rotation (ψ) is a statistically significant quadratic predictor (p = 0.002) of mean skin preference rating but is a poor fit (r2 = 0.44). (Bottom left) Best-fit ellipse semi-major axis length (a) is a not statistically significant predictor (p = 0.269) of mean skin preference rating. (Bottom right) Best-fit ellipse semi-minor axis length (b) is a statistically significant linear predictor (p = 0.000) of mean skin preference rating and is a notably strong fit (r2 = 0.69).

70

5 5

R² = 0.92 R² = 0.86 Vivid Natural 4 4

3 3

2 2

Rating(Predcited) Rating(Predcited)

1 1 Desaturated Unnatural 0 0 0 1 2 3 4 5 0 1 2 3 4 5 Unnatural Natural Desaturated Vivid Participant Rating (Measured) Participant Rating (Measured)

5 5

Like R² = 0.86 R² = 0.85 (Skin)

4 4 Like

3 3

2 2

Rating (Predcited) Rating Rating(Predcited)

1 1 Dislike (Skin) Dislike Dislike 0 0 0 1 2 3 4 5 0 1 2 3 4 5 Dislike Like Dislike (skin) Like (skin) Participant Rating (Measured) Participant Rating (Measured)

Figure 7-10 Summary of model performance—OVERALL subjective ratings. Performance is shown for the 2 2 bolded models in Table 7-1. (Top left) The best-fit naturalness model has an r of 0.92 and an r pred of 0.84 (not 2 2 shown). (Top right) The best-fit vividness model has an r of 0.86 and an r pred of 0.83 (not shown). (Bottom left) 2 2 The best-fit preference model has an r of 0.86 and an r pred of 0.70 (not shown). (Bottom right) The best-fit skin 2 2 preference model has an r of 0.85 and an r pred of 0.80 (not shown).

71

5 5

Like LIKE R² = 0.8527 NATURAL R² = 0.7568 4 Natural 4

3 3

2 2 COMPOSITE COMPOSITE pearsons r = 0.923 pearsons r = 0.870 p = 0.000 p = 0.000

1 1 Unnatural Dislike 0 0 0 1 2 3 4 5 0 1 2 3 4 5 Dislike Like Unnatural Natural OVERALL OVERALL 5 R² = 0.9189

Vivid VIVID 4

3

2 COMPOSITE pearsons r = 0.959 p = 0.000 1

Desaturated 0 0 1 2 3 4 5 Desaturated Vivid OVERALL

Figure 7-11 OVERALL rating versus COMPOSITE rating. An analysis was performed to determine the relationship between the OVERALL ratings provided by participants and a COMPOSITE score created by averaging the responses for individual objects. The rating for each of the 12 objects in the booth were averaged together for each of the rating scales under each of the lighting conditions, creating a composite rating along each of the three scales, for each of the 24 SPDs. (Top left) Overall rating versus composite rating for preference (like) scale. The two quantities are highly correlated and the relationship is significant (p = 0.000). (Top right) Overall rating versus composite rating for naturalness scale. The two quantities are highly correlated and the relationship is significant (p = 0.000). (Bottom) Overall rating versus composite rating for vividness scale. The two quantities are highly correlated and the relationship is significant (p = 0.000). Overall, the composite rating agrees very well with the measured OVERALL ratings.

72

Figure 7-12 Consumer Products vs. Natural Foods—OBJECT SPECIFIC subjective ratings. See section 7.4 Subjective Ratings—OBJECT SPECIFIC for discussion of results.

73

Figure 7-13 Independent variables as predictors of mean adjusted Total Error Score (TESadj). (Top left) Rf is a significant (p = 0.005), albeit weak predictor of mean adjusted total error score. (Top right) Rg is a significant quadratic predictor (p = 0.018) of mean adjusted total error score. While is it a stronger predictor than Rf, it is still notably weak. (Bottom left) CB appears to have no effect on the mean adjusted total error score. (Bottom right) The chroma shift for hue angle bin 15 (the strongest of any Rcs,hi) is a significant quadratic predictor of mean adjusted total error score and is fairly strong.

74

SPD 14

Figure 7-14 Light source transposition error score (Rdt,i) (tray-specific and total). (Top left) The mean error score for Tray A as a function of the custom light source transposition error score (Rdt,A). Note that the chips in tray A were never transposed under the experimental SPDs, so Rdt,A is equal to 0 for all sources. (Top right) The mean error score for Tray B as a function of the custom light source transposition error score (Rdt,B). Tray B experienced the most transpositions of caps under the experimental light sources and Rdt,B is also a strong quadratic predictor of mean error score for Tray B (p = 0.002). (Middle left) The mean error score for Tray C as a function of the custom light source transposition error score (Rdt,C). Chips in Tray C only experienced transpositions under 3 of the experimental SPDs and Rdt,C is a notably strong predictor (p = 0.000). Note that 2 2 r pred cannot be estimated when (Rdt,C) is included in the model, so it was excluded from analysis (See Table 7-3). (Middle right) The mean error score for Tray D as a function of the custom light source transposition error score (Rdt,D). Note that the chips in tray D were never transposed under the experimental SPDs, so Rdt,D is equal to 0 for all sources. (Bottom) The total light source transposition error score (calculated as the sum of all Rdt,i) is a significant linear predictor of mean adjusted total error score.

75

7.7 Tables

Table 7-1 Summary of model-included terms—OVERALL subjective ratings. For rating scales where two models were comparably strong (vivid, like, and skin preference), two models are shown. The bolded model can be considered the stronger model. Representations of the performance of the bolded models are shown in Figure 7-10.

Models Base Terms Min Mean Max Range NAT* VIV VIV LIKE* LIKE SKIN SKIN

Rf 64 75 96 31.4 x x x x

Rf,skin 63 78 96 33.6

Rg 80 100 120 39.4 x a 1.01 1.11 1.22 0.2

(2) b 0.76 0.90 0.99 0.2 x x (2) ψ -88.3 -25.6 40.7 129.0 x x x x

e 0.19 0.56 0.72 0.5 R -31.3% 0.8% 23.1% 54.4% (2) cs,h1 x R -20.8% 0.9% 18.5% 39.3% cs,h2

Rcs,h3 -8.3% 0.6% 8.4% 16.7% (2) Rcs,h15 -19.3% 0.2% 18.6% 37.9% x (2) (2) (2) Rcs,h16 -27.1% 0.8% 22.5% 49.7% x x x x

2 r 0.92 0.86 0.89 0.86 0.78 0.85 0.88 2 r pred 0.84 0.83 0.82 0.70 0.69 0.80 0.80

*Model also includes Rf*ψ interaction (2) indicates that the model also includes the quadratic term

76

Table 7-2 Various metrics as predictors of adjusted error scores. Values which are bold and red are statistically significant at α < 0.05. This table demonstrates the variable predictive power for any single metrics across the tray-specific error scores and explains why any single metric is not a strong predictor of TES.

2

-

r

0.611

0.429

0.551

0.365

0.021

0.573

0.377

0.218

0.476

0.301

-

p*

Quadratic

0.002

0.005

0.002

0.005

0.840

0.322

0.002

0.249

0.018

0.935

TES

2

r

0.367

0.159

0.294

0.069

0.019

0.552

0.012

0.165

0.007

0.308

0.301

Linear

p

0.002

0.054

0.006

0.214

0.520

0.000

0.614

0.049

0.704

0.005

0.005

2

-

r

0.235

0.208

0.151

0.148

0.032

0.102

0.065

0.188

0.236

0.197

-

p*

Quadratic

0.024

0.672

0.126

0.655

0.975

0.356

0.911

0.274

0.020

0.836

Tray D Tray

2

r

0.020

0.201

0.049

0.139

0.031

0.064

0.064

0.139

0.024

0.004

0.195

Linear

p

0.508

0.028

0.300

0.072

0.407

0.233

0.233

0.073

0.473

0.779

0.031

2

-

r

0.309

0.850

0.564

0.846

0.169

0.046

0.501

0.079

0.194

0.040

-

p*

Quadratic

0.230

0.000

0.003

0.000

0.864

0.380

0.059

0.782

0.421

0.831

Tray C Tray

2

r

0.259

0.520

0.332

0.624

0.168

0.010

0.406

0.076

0.373

0.168

0.038

Linear

p

0.011

0.000

0.003

0.000

0.047

0.561

0.001

0.192

0.002

0.047

0.361

2

-

r

0.866

0.606

0.770

0.528

0.181

0.061

0.690

0.085

0.693

0.126

-

p*

Quadratic

0.001

0.197

0.083

0.096

0.812

0.141

0.000

0.270

0.053

0.846

Tray B B Tray

2

r

0.781

0.572

0.733

0.459

0.179

0.568

0.057

0.029

0.068

0.632

0.124

Linear

p

0.000

0.000

0.000

0.000

0.039

0.000

0.262

0.425

0.218

0.000

0.092

2

-

r

0.311

0.702

0.511

0.765

0.252

0.022

0.614

0.109

0.246

0.038

-

p*

Quadratic

0.477

0.004

0.083

0.009

0.678

0.719

0.005

0.519

0.871

0.961

Tray A Tray

2

r

0.293

0.553

0.434

0.671

0.246

0.016

0.429

0.090

0.368

0.245

0.038

Linear

p

0.006

0.000

0.000

0.000

0.014

0.553

0.001

0.154

0.002

0.014

0.362

f

g

b

a

e

cs,h7

cs,h5

ψ

R

cs,h15

cs,h11

CB

R

R

R

R

R

Metric in the quadratic model.in thequadratic *p-valuetheforcoefficient ofthesqauredterm

77

Table 7-3 Summary of model-included terms—FM100 hue test. The column group titled “TM30 & Ellipse Parameters” shows the best models that could be built using available IES TM-30-15 metrics and the best-fit ellipse parameters proposed in this study (7.1 Modeling the Color Vector Graphic). Note that the best model is different for each tray-specific model, and the best model for TES—which includes Rf and Rg—is notably weak. The column group titled “Custom Metrics” details the model included terms when model building with the custom metrics detailed in section 7.4.3 Modeling the adjusted TES—a segmented approach with unifying model. Note that Rdt,i cannot be estimated for Tray A and D because caps in those trays were never transposed 2 2 under the experimental light sources and that (Rdt,C) was not included in the analysis because r pred cannot be estimated when SPD 14 is removed from the dataset (see Figure 7-14, middle left).

Base TM30 & Ellipse Parameters Custom Metrics Terms Min Mean Max Range A B C D TES A B C D TES

Rf 64 75 96 31.4 x x - - - - - (2) Rg 80 100 120 39.4 x x x x - - - - - a 1.01 1.11 1.22 0.2 x x x - - - - - b 0.76 0.90 0.99 0.2 x - - - - - ψ -88.3 -25.6 40.7 129.0 - - - - - e 0.19 0.56 0.72 0.5 x x - - - - - (2) Rcs,h5 -28.7% 2.1% 31.6% 60.3% x - - - - - (2) Rcs,h11 -16.1% -1.6% 10.6% 26.7% x - - - - - (2) Rdt,i var. var. var. var. - - - - - x x -

Δhi var. var. var. var. - - - - - x x x x -

RTES 20.2 36.8 61.8 41.6 ------x

r2 0.77 0.87 0.90 0.62 0.61 0.59 0.97 0.90 0.09 0.87

2 r pred 0.71 0.80 0.84 0.36 0.49 0.52 0.96 0.88 0.00 0.84

(2) indicates that the model also includes the quadratic term "Var." indicates that the quantity varies per tray

78

8. DISCUSSION 8.1 Subjective ratings 8.1.1 Comparison to Royer and others [2016] The study detailed herein is similar enough to the recent work of Royer and others [2016]—though designed and executed independently—to warrant a direct comparison. Important differences include: a smaller object set, different experimental apparatus, a larger time investment from participants, and a different method for distinguishing light sources with the same average fidelity and gamut area (CB

variable used in this study versus max/min of Rcs,h1 used by Royer and others).

Despite the differences noted above, the resultant best-fit models are strikingly similar between these two studies. The naturalness model presented in the current study (Equation 7-1) takes the exact form

as the Royer “normalness” model (shown in bold in Equation 7-1)—Rf and a second-order polynomial fit

for Rcs,h1—with the addition of CVG best-fit ellipse rotation (ψ) and an Rf*ψ interaction effect. A similar

trend exists for the preference model where the CVG best-fit ellipse rotation (ψ) and an Rf*ψ interaction 3 effect are appended to the Royer model (without the cubic term Rcs,h16 , shown in bold in Equation 7-3).

Lastly, the vividness model is exactly the same form as the Royer model, which includes Rcs,h16 as the only predictor.

When compared 1-for-1, the models detailed in the present study have a slightly lower performance (in terms of r2) than the models of Royer and others. In the closing questionnaire several participants commented that they would “think back” to the first day of the experiment (the calibration session) when making their ratings, and gauge their response accordingly. Because only one spectra was run per day, over 13 days, the participants who used this strategy relied heavily on their memory when making judgements over the duration of the experiment. The reliance on participants’ long-term memory may explain the dilution of the strength of the model fits in the current study. Nonetheless, the consistency of the results, from two studies with several fundamental methodological differences, suggests the strength of these relationships. Future work may further investigate the validity of these models.

8.1.2 Is it all about the reds? An Rcs,hi for a nominally red hue angle bin was included in every best-fit model and most of the runner- up models; the only exception being the runner up vividness model which instead accounted for this effect using a combination of ψ and b. Ellipse rotation (ψ) was a statistically significant quadratic predictor for 3 of the 4 perceptual responses (all but naturalness) and all 3 models peaked in hue angle bin 16. Additionally, closing questionnaires revealed the larger influence of the red, orange, and yellow objects on participant ratings. The large influence of (nominally) red rendering is consistent with several past research studies [Ohno 2005, 2015; Wei and others 2014, 2016].

Figure 8-1 (top) shows the maximum and minimum Rcs,hi for each of the 16 hue angle bins across all 24 light sources (smaller bar shows the average). Hue angle bin 5 and 1 experienced the largest range

across all light sources (respectively). If we conceptualize each Rcs,hi as the potential for the experimental light sources to affect objects in each of those bins, we might expect, for example, that objects in hue

angle bin 5 would experience larger average shifts across the experiment. Looking at the average ΔEjab for each of the experimental objects—because the light source-object interaction is actually the stimulus participants are responding to—we see that the object located in hue angle bin 5 (green apple) has the 3rd lowest average color shift across the experiment (Figure 8-1, bottom). This suggests that

79 while hue angle bin 5 had large potential to color shift objects located within its bin, in reality it had less of an effect than hue angle bin 1, which had a lower range for Rcs,hi, but created the largest average ΔEjab across the experiment (onion). Additionally, past researchers have commented on the difficultly of color-shifting nominally blue objects [Ohno and others 2016]. The present study found this to be true for the blueberries (likely due to their very low chroma) but did not find this to be true for the Pepsi can, which had the 5th largest average ΔEjab with a very low range of Rcs,hi (i = 11).

The hue angle bins with the largest values of Rcs,hi correspond to the participant responses of the most influential objects. This result agrees with the phenomena eloquently described by Royer and others [2016]:

“these results do not necessarily demonstrate that observers are more sensitive to certain hues. Rather . . . observers were more influenced by hues that had the greatest change in rendition under the range of sources.”

It is conceivable then, that participants generally responded more powerfully to the CB1 SPDs (which were generally causing large red color shifts) than to the CB7 SPDs (which were cause much smaller green color shifts). The may have led to the “separating” effect between the CB1 and CB7 SPDs, explaining the necessity of the Rf*CB (and Rf*ψ) interaction effect that can be observed in the naturalness and preference models. While the interaction effects were necessary for accurate modeling of the experimental data, it is suggested that the differences in the trends between the CB1 and CB7 SPDs is a results of the choice of the object set (which also heavily favors hue angle bins 1 through 4) and not a fundamental difference in the perception of light source spectra with CVGs oriented in opposing directions.

8.1.3 The Color Vector Graphic is primary information Several hue/chroma shift-specific graphics have been proposed by past researchers—the Color Quality Chart (CQC) by Žukauskas and others [2009], the graphic by van der Burgt (a segmented bin average of their Color Rendering Vectors (CRV)) [2010], and the Color Saturation Icon (CSI) by Davis and Ohno [2010]—and are commonly presented as tertiary material to supplement other metrics (typically averaged values). A recent proposal by de Beer and others [2015], however, advocates the primary use of graphics (which communicate hue and chroma shifts) to overcome the limitations of single-number average metrics.

Several previous studies have suggested the importance of the shape of a color vector graphic—the CVG in Wei and Others [2016] and Royer and Others [2016], and the CSI in Wei and others [2014]—on which this study expands. Every best-fit model detailed herein (Table 7-1) contains an index extracted from the

CVG (either an Rcs,hi or ellipse best-fit parameter), which is also true for the models detailed by Royer and others [2016].

While the CVG contains a wealth of mathematically based information, the numerical complexity of the graphic does not easily lend itself to use in specification. The best-fit ellipse approach detailed herein, however, provides an alternative method for capturing the complexity of the CVG shape with fewer parameters. Additionally, the ellipse parameters provide a numerical way to directly compare the entire

80 shape of the CVG for different light sources, which currently would require visual evaluation.23 Further work may investigate appropriate thresholds for when an ellipse is not a sufficient fit to the CVG. Because a least-squares fit was utilized, the residual sum of squares may be an appropriate metric for specifying the quality of the fit.24

Hue and chroma shift-specific information has been consistently shown to provide crucial information in the prediction of perceptual responses to light sources. Because light sources with the same average fidelity and gamut can have different effects on the visual appreciation of objects, the consideration of such a graphic is recommended for any design scenario where color rendering is of even modest importance.

8.1.4 The top ranked light sources The top 6 most preferred light sources in this experiment were SPDs 13, 18, 12, 21, 10, and 9, respectively. All of these SPDs have a nominal average gamut greater than or equal to a value of 100 (Rg

≥ 100) and chroma shifts in hue angle bin 16 greater than or equal to zero percent (Rcs,h16 ≥ 0%).

The top 6 most vivid light sources were SPDs 1, 2, 9, 6, 3, and 13, respectively. All 6 SPDs have a nominal average gamut greater than or equal to a value of 100 (Rg ≥ 100) and 5 of which have an average gamut greater than or equal to 110 (Rg ≥ 110). All six have chroma shifts in hue angle bin 16 greater than or equal to 11 percent (Rcs,h16 ≥ 11%). The SPD ranked 6th has an Rcs,h16 of +2%, while the SPDs ranked first and second (SPDs 1 and 2, respectively) have Rcs,h16 values of 8 and 12%, respectively.

The top 6 most preferred light sources for skin rendering were SPDs 21, 9, 12, 14, 13, and 18, respectively. All 6 SPDs have a nominal average gamut greater than or equal to a value of 100 (Rg ≥ 100) and 5 of which have an average gamut greater than or equal to 110 (Rg ≥ 110). All six have chroma shifts in hue angle bin 16 greater than or equal to zero percent (Rcs,h16 ≥ 0%). Four of the six SPDs have Rcs,h16 greater than or equal to 2%, and the top two most preferred SPDs for skin preference have Rcs,h16 values of 6 and 14%, respectively.

The top 6 most natural light sources in this experiment were SPDs 12, 16, 18, 15, 14, and 11, respectively. All 6 SPDs had a nominal average gamut greater than or equal to a value of 90 (Rg ≥ 90) and less than or equal to a value of 110 (Rg ≤ 110). None of the top 6 sources had Rg values at the extremes

(Rg = 80 or 120), which is true for the 7th and 8th ranked SPDs (both with relatively high average fidelity values). Four of the top six were CB7 SPDs which may have been a result of the effect described in 8.1.2 Is it all about the reds?. That is, if participants were noticing and responding more strongly to the large color shifts of the red objects, it is conceivable that participants would generally rate CB1 SPDs as less natural (on average) than the CB7 SPDs that, by comparison, were not causing large color shifts of objects.

A summary of the top ranked sources for each scale can be seen in Figure 8-2. Across all 4 scales, all SPDs except 2—SPD 16 and 23, both on the naturalness scale—had an average gamut less than (or equal to) a value of 100 (Rg ≤ 100). Given that SPD 16 has a low average gamut and strongly desaturates reds

(Rcs,h1 = -19%), it would have been expected to produce unnatural color shifts. The high rating of this SPD

23 Note that the TM-30-15 metrics and the accompanying CVGs should not be compared for light sources with very different CCT’s. 24 The residual sum of squares (RSS) is the sum of the squared distances between the coordinates of the 16 Rcs,hi and the closest (perpendicular) point on the ellipse. The RSS is equal to 0 when the ellipse is a perfect fit.

81 cannot be rationalized with the logic that light sources which strongly desaturates colors (relative to a natural reference source) will be perceived as unnatural. SPD 16, however, has a statistically high residual indicating that its naturalness rating was much higher than the best-fit model would predict. This suggests that the SPD may have been rated gratuitously, perhaps due to an experimental condition not considered. While SPD 23 has an average gamut lower than 100, it has the second highest average fidelity (Rf = 85). Several participants indicated—some in their exit survey and some unsolicited during their experimental sessions—that they would base their ratings on the scenes that they observed the day before. Though the order of the SPDs were randomized, it happened that SPD 23 was the first SPD shown on the first day of the first block, and the second day of the third block (block 3 was a replication of block 1, with a different order of SPDs and a different set of participants). It is possible that SPD 23, being presented so early in both of its experimental blocks, led to the unpredictability of its naturalness rating (this SPD also had a high residual).

Despite the peculiarities of SPD 16 and 23, the results generally suggest that low average gamut areas are undesirable. The tendency for the top ranked sources to also have average gamut areas greater than (or equal to) a value of 100 can be seen in Figure 8-2 and Figure 8-3.

8.2 Color discrimination 8.2.1 A new approach to predicting error score It was shown in section 7.5 Color discrimination that the independent variables (Rf, Rg, and CB) were not particularly strong predictors of adjusted total error score. A direct modeling approach—which considered transposition and average hue angle of caps in color space—was developed. The custom models show, at least preliminarily, that a direct approach to modeling the FM100 error scores may be a more conceptually thorough, and statistically robust way to predict the color discrimination ability of a light source.

The top 6 light sources for each tray-specific error score (i.e. with the lowest error scores) are shown in Figure 8-4. The top 6 light sources are quite different for each of the four trays and is consistent with the statistics in Table 7-2, namely that most metrics have variable predictive ability for the error scores of different trays. What is evident from Figure 8-4 is that the tray-specific error scores reflect the varied effect of the experimental light sources on the color shifts of objects in the different sections of the color space. The results suggest that the SPDs have variable effects on the color shifts of objects in different portions of color space (by design), which is reflected in the tray-specific error scores, leading to the variable predictive power of any single metric, which ultimately results in the poor predictability of TESadj.

The 6 light sources with the lowest TESadj are shown in Figure 8-5. All 6 light sources have an average gamut less than or equal to a value of 100 (Rg ≤ 100). The distribution of the top 6 light sources with the lowest TESadj is remarkably similar to that of tray B, where 5 of the 6 are exactly the same. It is apparent that the total error score, which is a sum of the tray-specific error scores, is heavily biased towards the effect of the experimental spectra in hue angle bins 5 through 7—where the caps in tray B are located. Visual inspection of Figure 8-5 would suggest, naively, that minimizing average gamut at any given average fidelity would be best for optimizing the color discrimination ability of a light source (as measured by decreasing error score of the FM100 hue test). Figure 7-13 (top right) shows that Rg is not a particularly strong predictor of TESadj, and making conclusions based on Rg alone would be inappropriate (no matter how strong the trend appears to be visually). The top six light sources, instead,

82 are also 6 of the 7 experimental light sources which did not cause any transposition of caps, in any of the trays (Rdt = 0) (Figure 8-6, top). In fact, Rdt—which is the sum of the tray-specific, light source-specific error score based on the transposition of caps in color space25—is a stronger single predictor than any of the metrics considered (p = 0.029, r2 = 0.85) (Figure 8-6, bottom). The analysis detailed in Section 7.5 and the strength of Rdt in predicting the top ranked light sources, suggests that a more direct, nuanced approach is able to predict the color discrimination ability of a light source (measured with the FM100 hue test and modified error calculation).

8.2.2 Towards a categorization of light source color discrimination Several past research studies have suggested the use of ordinal rating scales (or categorization) of light sources (See 3.1.3 Ordinal Based Color Rendering Scales) and only one known study attempted to develop such a scale [Bodrogi and others 2011]. Houser and others [2013] support the partitioning of (a two-metric) color rendition space26 into numerically defined boundaries with associated descriptors. They, however, did not provide suggestion on the boundaries, or to even which dimensions of color rendition this would apply. The results of the present work suggest that the numerical boundaries may differ depending on the dimension of color rendition considered (i.e. fidelity, preference, or discrimination).

None of the studies considered suggest the categorization of a light source specific to color discrimination, possibly due to lack of a robust method to predict color discrimination capability. The method proposed in this study (section 7.4.3 Modeling the adjusted TES—a segmented approach with unifying model) provides a new approach to the prediction of color discrimination ability and is much more robust than many of the other models considered (i.e. combinations of average fidelity, average gamut, gamut shape, and special indices) and significantly more robust than any single metric.

The light source error score (Rdt)—an error score attributed to the light source based on how many colored caps the light source transposes in color space—is a robust predictor of TESadj and correctly categorized the best and worst 6 spectra in the present study (Figure 8-7). Six of the seven spectra with the lowest Rdt, also had the 6 lowest mean TESadj scores.

While it is commonly agreed upon that the industry needs metrics that are simple and easily interpreted [Houser and others 2010, 2013; Rea and Freyssinier 2008, 2010; Rea 2010; Freyssinier and Rea 2010], researchers have not developed the ability to easily quantify (let alone communicate) the color discrimination ability of a light source. Rdt provides a promising method, with a solid conceptual backbone, to categorize a light sources discrimination ability. It is recommended, initially, that the scale should contain three categories: “poor” (indicating poor color discrimination ability, possibly acceptable for general application), “acceptable” (indicating that the color discrimination of this source is acceptable for most general lighting applications), and “superior” (indicating that a light source has superior color discrimination ability and is appropriate for lighting applications where color discrimination is among the top design considerations). These descriptors should be the subject of targeted research to identify which words would best imply these interpretations.

25 See section 7.4.3 Modeling the adjusted TES—a segmented approach with unifying model and Figure 7-14 (bottom). 26 The Rf-Rg space is considered in the present study. Qa and Qg were suggested by the authors as it was the most complete two-metric system developed at the time.

83

The actual numerical boundaries for these descriptors (Rdt values, for example) should be the result of vetted discussion between researchers (wielding with targeted research on this tropic) and industry professionals with knowledge of specification and color discrimination needs of their clients.

8.3 Color rendition and luminous efficacy The luminous efficacy for the experimental sources—calculated as the ratio of luminous flux to radiant power—is shown in Figure 8-8. It is apparent that the orientation of the color vector graphic has an impact on the luminous efficacy of the experimental spectra. CB is a statistically significant predictor of LER (p = 0.000, r2 = 0.68) for these sources, where CB7 SPDs have a higher LER, on average, than the CB1 SPDs. This can be explained by the visual efficiency function used in the definition of luminous flux which peaks at 555nm (nominally green). The effect of CVG orientation on LER can be very easily seen in Figure 8-9 (top), where LER is plotted as a function of SPD ID (with SPDs ordered according to increasing LER). The dashed lines show the average LER for a subset of commercially available sources (various types) and commercially available LED products.27 All CB1 SPDs fall below the average LER demonstrating a tradeoff between optimizing a light source for red rendering and lumen output. The other panels of Figure 8-9 highlight the LER for the top 6 SPDs along each rating scale. Four of the most preferred SPDs have an LER less than or equal to the average for the commercially available sources and 3 of which are much lower. A similar trend can be seen for the top 6 preferred light sources for skin rendering, where 3 of the most preferred SPDs have an LER less than or equal to the average for the commercially available sources, and 2 of which are much lower.

The commercially available sources, themselves, demonstrate symptoms of the optimization of lumen output. Sixty two percent (62%) of the 210 sources analyzed fell above the blackbody locus (67% for commercially available LED’s), and those above the blackbody have a larger average Duv than those below, indicating that light sources above the blackbody are being pushed closer to green to increase the lumen output (Figure 8-10, top). Figure 8-10 (bottom) shows that, on average, the commercially available light sources heavily desaturate reds (average Rcs,h1 = -11%) and oversaturate yellow-

(average Rcs,h5 = 8% and Rcs,h6 = 6%). This suggests that, on average, commercial light sources have been optimized for LER, at the expense of red rendering. The results of this study suggest that this is problematic given that the six most preferred (and skin preferred) sources have a positive chroma shift in hue angle bin 16 (Rcs,h16 ≥ 0%). The average chroma shift in hue angle bin 16 for the commercially available light sources is -4% (Rcs,h16 = -4%). In the optimization of a light source for luminous efficacy, the top 6 most preferred (and skin preferred) in this study would have been completely eliminated.

The commercial light sources examined exhibit a wide range of average fidelity values (mostly between

60 and 100), but have an average gamut of 96.7 (Rg = 96.7), and strongly favor the lower portion of the

Rf-Rg space (93% of sources have Rg < 103). Figure 8-11 shows the commercial sources plotted in Rf-Rg space overlaid with the 6 top rated experimental SPDs from each of the 4 OVERALL scales. The top rated light sources in this experiment are largely secluded from the overall cluster of commercial light sources. The high average gamut of light sources in this experiment is directly at odds with the tendency for commercial light sources to have low average gamut. No commercial light sources, for example, plot remotely close to the top two preferred light sources in this experiment (SPD 13 and 18, see Figure 8-2).

27 The commercially available sources were taken from the IES TM-30-15 Excel calculator library.

84

While luminous efficacy is an important consideration in the spectral design of light sources, it isn’t the only consideration. The results detailed herein demonstrate that color quality, itself, is complex and multidimensional and the importance of high average gamut and red rendering is undeniable. These results suggest that spectral optimization which generally allows desaturation of reds—as is implicit with optimizing for luminous efficacy—is likely to result in subpar light sources where color preference or skin rendering is of importance.

85

8.4 Figures

40% 7 objects 2 objects 2 objects 1 object 30%

20% Max cs,hi R 10% Avg 0%

-10% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Min/Max -20% Min -30% -40% Hue Angle Bin (j)

7.0 6.0

5.0

Jab

E / Oramge /

Δ 4.0 Lemon 3.0

Average Average 2.0 Cola Can / Red Apple Red / ColaCan

1.0

Onion Coca Can Orange Crush Bottle Mustard / Green Apple Sprite Can Can Pepsi Blueberries Crush Grape 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Hue Angle Bin (j)

Figure 8-1 Light source potential to color shift vs actual shifts. (Top) The minimum, maximum, and average chroma shift in each hue angle bin (Rcs,hi) across the 24 experimental SPDs. The smaller bar (outlined in black) shows the average Rcs,hi for all 24 SPDs. The maximum chroma shift occurs in hue angle bin 5 (Rcs,h5) and the minimum in hue angle bin 1 (Rcs,h1). The large max values of Rcs,hi in hue angle bins 5, 6, and 7 would suggest the most potential to shift the (nominally) green objects. (Bottom) The average chroma shift—between the test and the reference—for each hue angle bin, under the 24 experimental SPDs. Objects which occupy the same hue angle bin were averaged together. The onion (hue angle bin 1) experienced the largest average color shift, while objects in hue angle bin 4 and 5 were 2nd and 3rd, respectively. While the Min/Max/Avg Rcs,hi for the experimental SPDs would suggest that the nominally green objects would experience the greatest (average) color shift, the nominally red/orange objects actually experienced the largest average color shifts.

86

Rf 65 75 85 95 65 75 85 95 Rg

Top 6 ranked light sources Top 6 ranked light sources 120 1 "Preference" "Naturalness"

110 2 6 4 5 3 7

100 5 3 4 1

90 2 8 6

80

Top 6 ranked light sources Top 6 ranked light sources 120 1 6 5 "Vividness" "Skin preference"

110 2 4 3 4 6 2 1

100 5 3

90

80

Figure 8-2 Top ranked SPDs for OVERALL ratings—by scale. (Top left) Top 6 preferred SPDs, all of which have an average gamut greater than or equal to a value of 100 (Rg ≥ 100). (Top right) Top 6 most natural SPDs (shaded in green), 4 of which have an average gamut greater than or equal to a value of 100 (Rg ≥ 100), and 2 of which have an average fidelity greater than or equal to a value of 85 (Rf ≥ 85). (Bottom left) Top 6 most vivid SPDs, all of which have an average gamut greater than or equal to a value of 100 (Rg ≥ 100), and 5 of which are greater than or equal to a value of 110 (Rg ≥ 110). Five of the 6 are also CB1 SPDs and all have a chroma shift in hue angle bin 16 (Rcs,h16) greater than or equal to eleven percent (Rcs,h16 ≥ 11%). (Bottom right) Top 6 most preferred SPDs for skin rendering, all of which have an average gamut greater than or equal to a value of 100 (Rg ≥ 100), and 5 of which have an average gamut greater than or equal to a value of 110 (Rg ≥ 110).

87

Rf 65 75 85 95 Rg

Top 6 ranked light sources 120 LIKE, NAT, VIV, and SKIN

110

100

SPD 16 SPD 23 90

LIKE NAT VIV 80 SKIN

Figure 8-3 Top ranked SPDs for OVERALL ratings—combined. The top 6 SPDs for each rating scale are combined into one map. The importance of an average gamut area greater than or equal to a value of 100 (Rg ≥ 100) can be easily seen. Twenty-two of the top 24 SPDs meet this criteria, with the exception of SPD 16 and 23, both of which were on the naturalness scales. Both SPDs have large residuals in the best fit naturalness model (SPD 16 is significant) and may be a result of experimental parameters (possibly presentation order) and not that it is a fundamentally natural light source. Despite the peculiarities of these two SPDs, the trend towards large average gamut areas (especially for preference, vividness, and skin preference) is evident.

88

Rf 65 75 85 95 65 75 85 95 Rg

Top 6 ranked light sources Farnsworth-Munsell 100 Hue Test 120 5 6 FM100 - Tray A FM100 Tray B

110 2 3 4

100 6 4

90 1 5 2

80 1 3

Top 6 ranked light sources Top 6 ranked light sources 120 FM100 - Tray C FM100 - Tray D

110 2 5 6

100 3 1 6 1 2

90 4 5 4 3

80

Figure 8-4 Top 6 light sources—FM100, tray specific. (Top left) Lowest 6 mean error scores for tray A, 5 of which are CB7 SPDs. For Tray A error scores, CB1 SPDs had a statistically higher mean error score than CB7 SPDs (p = 0.002). This may be because CB1 SPDs generally oversaturate colors in the nominally red hue angle bins, which is generally where the Tray A caps are located. (Top right) Lowest 6 error scores for Tray B. A strong model 2 can be built for Tray B error scores with Rf, Rg, and ψ (r = 0.81 and p = 0.002, 0.000, and 0.031, respectively). (Bottom left) Lowest 6 error scores for Tray C error scores, 5 of which are CB7 SPDs. For Tray C error scores, CB1 SPDs had a statistically higher mean error score than CB7 SPDs (p = 0.002). This may be because CB1 SPDs generally oversaturate colors in the nominally blue hue angle bins, which is generally where the Tray C caps are located. (Bottom right) Lowest 6 error scores for tray D, which seem to be generally unpredictable. None of the 2 considered metrics are strong predictors and a strong model cannot be built (r pred = 0.36 for the strongest model). The variable strength of any metric (or model) across these tray-specific scores explain the overall difficulty in developing a strong model to predict TESadj.

89

Rf 65 75 85 95 Rg

Top 6 ranked light sources 120 FM100 - adjusted TES

110

100 5 3 1

90 2 6

80 4

Figure 8-5 Top 6 light sources—FM100, TESadj. The 6 SPDs with the lowest error scores are strikingly similar to the 6 with the lowest tray B error score, where 5 of the six are exactly the same. Because the caps in tray B experienced the most transpositions under the range of experimental light sources—and have a statistically higher mean error score than the other trays—the resultant TESadj is heavily biased towards the effect of the experimental light sources on the colored caps in tray B.

90

40

) Top 6 Sources

dt 35 R (Lowest TES) 30

25 adjusted Total Error Score

20

15

10

Light Light sourcetotalscoreerror ( 5 20 21 21 24 29 30 30 32 34 35 36 36 36 37 38 38 39 39 43 45 47 50 57 62 0 24 20 12 17 22 11 7 19 6 16 3 21 18 23 2 8 5 10 4 9 15 14 1 13 SPD ID (ordered and labeled by increasing adjusted TES) 70 R² = 0.8604

) p = 0.035

adj 60 TES 50

40 23 30 11 TotalError Score ( 20 22 Six light sources with the lowest 17 adjusted Total Error Score (TES ) and 12 adj 10 20 an Rdt of 0.

adjusted 24 0 0 5 10 15 20 25 30 35 40 45 Light source total error score (R ) dt

Figure 8-6 Light source-specific error score (Rdt) is a strong predictor of TESadj. (Top) Rdt is plotted as a function of SPD ID, which are ordered according to increasing TESadj (numbered labels are TESadj values). The six SPDs with the lowest TESadj also are 6 of the 7 experimental SPDs which have an Rdt of 0 and thus an Rdt,B of 0 (meaning no transpositions of caps in tray B). (Bottom) Rdt is a significant quadratic predictor (p = 0.035) of mean adjusted total error score. The six SPDs with the lowest mean TESadj had an Rdt of zero (all of which are labeled).

91

TES adj R dt SPD SPD TES adj R dt 20.2 0 24 24 20.2 0 21.2 0 20 20 21.2 0 21.2 0 12 12 21.2 0 24.4 0 17 17 24.4 0 28.8 0 22 22 28.8 0 29.8 0 11 11 29.8 0 36.8 0 23 7 30 4 30 4 7 19 32.4 8 33.6 4 6 6 33.6 4 36.2 4 3 16 34.6 8 32.4 8 19 3 36.2 4 34.6 8 16 21 36.2 12 36.4 8 18 18 36.4 8 38.2 8 2 23 36.8 0 39 8 5 2 38.2 8 39 8 10 8 38.2 12 36.2 12 21 5 39 8 38.2 12 8 10 39 8 43 12 4 4 43 12 45.4 12 9 9 45.4 12 47.2 12 15 15 47.2 12 50.2 20 14 14 50.2 20 56.8 24 1 1 56.8 24 61.8 40 13 13 61.8 40

Figure 8-7 Rdt can be used to rank-order adjusted total error score (TESadj). The left column is ordered in terms of increasing Rdt (light source transposition error score). The right column is ordered by increasing TESadj. Rdt was able to correctly determine the best 6 spectra (connected at the top with green lines) and the worst 6 spectra (connected at the bottom with red lines).

92

Rf 65 75 85 95

Rg

R f - R g Map 120 196 268 Luminous Efficacy of Radiation (LER)

110 243 308 257 346 247 329

100 244 311 256 300 247 324 298 306

90 251 302 257 343 253 324

80 282 341

Figure 8-8 LER values for the experimental SPDs. It is apparent that the orientation of the CVG has an effect on the LER and CB is a statistically significant predictor (p = 0.000, r2 = 0.68), where CB7 SPDs have a higher LER, on average, than the CB1 SPDs.

93

Figure 8-9 LER as a function of SPD ID—Top 6 sources highlighted. SPDs are ordered by increasing LER. The dashed line represents the average LER for the commercially available sources in the IES TM-30-15 calculation spreadsheet. (Top) Coded for CB. The effect of the visual efficiency function is apparent, where CB1 SPDs (with CVGs oriented in the nominally red direction) have a lower average LER than CB7 SPDs. (Bottom 4) Top 6 rated sources for the like, natural, vivid, and skin preference scale, respectively. Many of the most preferred and skin preferred sources, and all of the top 6 vivid sources, have a lower LER than the average for the commercially available sources.

94

0.40 v

0.35

0.30

Blackbody Locus

u 0.25 0.10 0.15 0.20 0.25 0.30 0.35

0.10

) i

cs,h 0.05 R

0.00

-0.05

-0.10

Below BBL Above BBL

Avg. Avg. Hueangle bin chroma shift ( -0.15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Hue angle bin

Figure 8-10 Select color statistics for commercially available light sources. (Top) Commercially available SPDs color coded for their location relative to the blackbody locus (green for those above, red for those below). The light sources above the blackbody are, on average, further from the blackbody locus than the sources below, meaning they are pushed closer to the green. (Bottom) The average Rcs,hi for the commercially available sources. It is apparent that, on average, the commercially available sources desaturate reds (Rcs,h1 and Rcs,h2) and over saturate yellow-greens (Rcs,h5 and Rcs,h6). The trend in both of these figures indicate, generally, the effect of spectral optimization driven by luminous efficiency (defined by V(λ), which peaks in the nominally yellow-green portion of the visible spectrum).

95

Figure 8-11 Rf/Rg values for 210 commercially available sources. The top 6 rated sources along each OVERALL scale are overlaid. The top rated sources generally favor high average gamut (Rg ≥ 100) and are fairly isolated from the cluster of commercially available sources which have a mean Rg of 96.7 (93% of which have an Rg < 103). The high average gamut of the top rated sources is directly at odds with the low average gamut of the commercially available sources.

96

9. CONCLUSION The present study was designed to investigate color discrimination capacity and observer ratings of

naturalness, vividness, preference, and skin preference over a wide range of average fidelity (Rf), average

gamut (Rg), and gamut shapes. No previous experiments have examined such a wide array of spectra for color discrimination ability or the visible appreciation of skin.

Two gamut shapes were specified at each of the 12 Rf/Rg combinations—one nominally oriented in the direction of hue angle bin 1 and the other in the direction of hue angle bin 7—to distinguish between sources with the same average fidelity and average gamut. The designed spectra generally exhibit strongly contrasted gamut shapes.

A least-squares ellipse fit has been proposed to more simply capture the numerical complexity of the color vector graphic. With this method, the color vector graphic can be described by the length of the best-fit ellipse semi-major axis length (a), semi-minor axis length (b), and angular rotation (ψ).

A combination of the best-fit ellipse parameters and IES TM-30-15 metrics exhibited strong predictive power of participant ratings:

2  The best naturalness model (r = 0.92) includes Rf, a quadratic fit for Rcs,h1, and best fit ellipse 28 rotation (ψ). None of the 21 single-parameter linear models—using Rf, Rg, a, b, ψ, or Rcs,h1-16— had an r2 greater than 0.40.

2  The best vividness model (r = 0.86) contains only Rcs,h16,a proxy for red rendering. Rg was a 2 statistically significant linear predictor (p = 0.000), but a poorer fit than Rcs,h16 (r = 0.67). While it does seem that increased gamut is correlated with mean vividness rating, Rcs,h16 appears to be a much stronger indicator.

2  The best preference model (r = 0.86) contains Rf, a quadratic fit for Rcs,h16, and best fit ellipse rotation (ψ).28 This model is very similar to the naturalness model, where the only difference is the use of a different hue angle bin chroma shift.  The best skin preference model (r2 = 0.85) contains best-fit ellipse semi-minor axis length (b) and a second-order polynomial fit for Rcs,h16. The positive coefficient for parameter b in the regression model indicates that skin preference rating increases as b increases, which also means the light source causes less desaturation. The second-order fit for Rcs,h16 accounts for oversaturating, where skin rendering would no longer be preferred. Overall, the results of the subjective ratings suggest high importance of red rendering in the visual

appreciation of chromatic objects in a visual scene. Though Rcs,h16 contributed to the best fit model for

preference and skin preference, and Rcs,h1 contributed to the best fit model for naturalness, both are proxies for red rendition. Tuning a light source for color rendering, then, is at direct odds with the spectral optimization for luminous efficacy. To increase the overall quality of light sources it will be necessary, in the future, to more specifically consider color rendering. The CVG hue angle bin chroma shifts and best-fit ellipse parameters may have good utility for such optimizations.

Color discrimination of the experimental light sources was tested using the Farnsworth-Munsell 100 hue

test. An adjusted error score calculation has been proposed (TESadj) with the assumption that an error should not be attributed to a participant who orders caps in the test tray the same as the caps are

28 The model also includes an Rf*ψ interaction. See section 7.3 Subjective ratings—OVERALL.

97 ordered in color space under the test illuminant (an error which would be attributed with the standard scoring software). No single metric of the main TM-30 metrics and best-fit ellipse parameters has a 2 higher r than 0.57 (which is for a quadratic fit of semi-major axis length, a). Rg was a fairly poor 2 predictor of TESadj (r = 0.47) and is consistent with recent studies that show average gamut indices are not strong indicators of color discrimination for LED light sources.

The poor predictive power of the considered metrics prompted a post-hoc development of custom measures to predict TESadj. These measures are based on two assumptions: 1. A light source which transposes the colored caps is more likely to cause difficulty discriminating between those caps

(measured by the custom metric Rdt), and 2. A light source which compresses caps in color space will make it more difficult to discern between adjacent caps (i.e. decreasing average hue angle difference between caps in a tray, ∆̅̅̅ℎ̅푖). This custom approach to modeling the adjusted total error score 2 demonstrates reasonably strong predictive ability (r = 0.87) and is significantly stronger than any single metric combination considered in this study.

A suggestion has been made for the development of an ordinal based scale for the specification of color discrimination ability. While the error tolerances and labels for these categories should be the future subject of targeted research, the predictive strength of the custom metric Rdt—which calculates an error score for the light source based on how many transpositions it causes in reference to the ‘correct’29 order of caps—seems to be a strong candidate for the development of such criteria. The results have shown the poor ability of current metrics to predict color discrimination ability and seem to solidify that average gamut measures are not predictive. A more targeted approach, which attempts to model occurrences in color space, is warranted.

Overall, the results of this study suggest that even a two-measure system cannot fully encapsulate the nuances of the visual appreciation of chromatic objects. Every best-fit model of the subjective ratings includes some parameter extracted from the color vector graphic (either Rcs,hi or a best-fit ellipse parameter) which are necessary for a strong fit to the data. These results confirm that the CVG is primary information and should be considered for any general lighting application where color rendering is of even modest importance. The proposed best-fit ellipse approach to quantifying the color vector graphic provides a way to distill the graphic into few parameters to ease specification of its shape.

29 The ‘correct’ order of caps is assumed to be the order of the caps under standard illuminant C, the source under which they were designed.

98

REFERENCES [CIE] Commission Internationale de l’Eclairage. 1948. Compte Rendu. CIE 11th Session. Paris (France). 5 p. [CIE] Commission Internationale de l’Eclairage. 1965. Method of measuring and specifying colour rendering properties of light sources. Paris (France): CIE. Publication No. CIE 13-1965. 34 p. [CIE] Commission Internationale de l’Eclairage. 1974. Method of measuring and specifying colour rendering properties of light sources. Vienna (Austria): CIE. Publication No. CIE 13.2-1974. 81 p. [CIE] Commission Internationale de l’Eclairage. 1987. International lighting vocabulary. CIE. Publication No. CIE 17.4-1987. [CIE] Commission Internationale de l’Eclairage. 1995. Method of measuring and specifying colour rendering properties of light sources. 3rd ed. Vienna (Austria): CIE. Publication No. 13.3:1995. 16 p. [CIE] Commission Internationale de l’Eclairage. 1999. Colour Rendering, TC1-33 closing remarks. CIE. Publication No. 13.5-1999. 14 p. [CIE] Commission International de l’Éclairage. 2004. . 3rd ed. Vienna (Austria): CIE. Publication No. 15:2004. 72 p. [CIE] Commission International de l’Éclairage. 2006. CIE Colorimetry - Part 1: Standard Colorimetric Observers. ISO 11664-1:2007(E)/CIE S 014-1/E:2006. Accessed 2015 March 30. http://cie.co.at/index.php?i_ca_id=483 [CIE] Commission International de l’Éclairage. 2007. CIE 17.7-2007 on Color Rendering of White LED Light Sources. Vienna (Austria): CIE. Publication No. 17.7:2007. 72 p. [CIE] Commission International de l’Éclairage. 2012. CIE, Division 1: vision and color, meeting minutes. Taipei, Taiwan. 16. Bodrogi P, Bruckner S, Tran Quoc K. 2011. Ordinal scale based description of colour rendering. Color research & application 36(4):272-85. Bouma PJ. 1937. Colour reproduction in the use of different sources of “white” light. Philips Tech Rev. 2:1-7. Boyce PR. 1977. Investigations of the subjective balance between illuminance and lamp color properties. Lighting research & technology 9(1):11-24. David A, Fini P, Houser K, Ohno Y, Royer M, Smet K, Wei M, Whitehead L. 2015. Development of the IES method for evaluating the color rendition of light sources. Opt Expr 23(12):15888. Davis W, Ohno Y. 2005. Toward an improved color rendering metric. In: Fergusen IT, Carrano JC, Taguchi T, Ashdown IE, editors. The Proceedings of SPIE, Volume 5941. Fifth International Conference on Solid State Lighting; San Diego, CA.59411G. Davis W, Ohno Y. 2010. Color quality scale. Optical Engineering 49(3):033602. de Beer E, van der Burgt P, van Kemenade J. 2015. Another color rendering metric: do we really need it, can we live without it? Leukos 12(1-2):51–59. DOI: 10.1080/15502724.2014.991793

99

[DOE] United States Department of Energy. 2016. Frequently asked questions. Web: http://energy.gov/eere/ssl/tm-30-frequently-asked-questions

Farnsworth D. 1957. The Farnsworth-Munsell 100-Hue test for the examination of colour discrimination: Manual. Munsell Color Company. 9p. Figueiro MG, Appleman K, Bullough JD, Rea MS. 2006. A discussion of recommended standards for lighting in the newborn intensive care unit. Journal of Perinatology. 26:S19-26 Fitzgibbon AW, Pilu M, Fisher RB. 1999. Direct Least Fitting of Ellipses. IEEE Transactions on Pattern Analysis and Machine Intelligence. 21(5). p 476—480 Fotios SA. 1997. The perception of light sources of different colour properties. [Manchester, UK]: UMIST. Freyssinier JP, Rea MS. 2010. A two-metric proposal to specify the color-rendering properties of light sources for retail lighting. San Diego, CA: Tenth International Conference on Solid State Lighting, Proceeding of SPIE 7784(77840V). Freyssinier JP, Rea MS. 2012. Class A color classification for light sources used in general illumination. Proceedings of Light Sources 2012: The 13th International Symposium on the Science and Technology of Lighting. June 24–29. Troy, New York. 337–338. Guo X, Houser KW. 2004. A review of colour rendering indices and their application to commercial light sources. Lighting Res Technol. 36(3):183-197. Hashimoto K, Nayatani Y. 1994. Visual clarity and feeling of contrast. Color research & application. 19:171–185. Hashimoto K, Yano T, Shimizu M, Nayatani Y. 2007. New method for specifying color-rendering properties of light sources based on feeling of contrast. Color research & application 32:361-71. Hering E, Hurvich LM. 1964. Outlines of a theory of the light sense (1st edition). Harvard University Press. p. 344. Houser KW, Minchen W, David D, Krames MR, Xiangyou SS. 2013. Review of measures for light-source color rendition and considerations for a two-measure system for characterizing color rendition. Optics Express. 21(8):10393-10411. Hunt RGW. 1982. A model of colour vision for predicting colour appearance. Color research & application. 7:95-112. [IES] DiLaura DL, Houser KW, Mistrick RG, Steffy GR. 2011. The lighting handbook reference and application. 10th Ed. New York, NY: The Illuminating Engineering Society of North America. 1328 p. [IES] Illuminating Engineering Society. 2014. PS-8-14 Color Rendering Index (CRI). New York (NY): Illuminating Engineering Society. 2 p. Illuminating Engineering Society of North America. 2015a. IES-TM-30-15 Method for Evaluating Light Source Color Rendition. New York, NY: The Illuminating Engineering Society of North America. EXCEL CALCULATION PROGRAM Illuminating Engineering Society of North America. 2015. IES-TM-30-15 Method for Evaluating Light Source Color Rendition. New York, NY: The Illuminating Engineering Society of North America. 26 p.

100

Ishihara S. 1972. Tests for Colour-Blindness. 24 Plates Edition. Tokyo, Japan: University of Tokyo. 33 p. Jerome CW. 1972. Flattery vs color rendition. J IES. 1(3):208–211. Jerome CW. 1973. The flattery Index. J IES. XX. 351-354. Jost-Boissard S, Fontoynont M, Blanc-Gonnet J. 2009. Perceived lighting quality of LED sources for the presentation of fruit and vegetables. J Mod Opt. 56(13):1420-1432 Joist-Boissard S, Avouac P, Fontoynont M. 2014. Assessing the colour quality of LED sources: Naturalness, attractiveness, colourfulness and colour difference. Lighting research & technology. 0:1-26. Judd DB. 1967. A flattery index for artificial illuminants. Illuminating engineering 62(10):593-598. MacAdam DL. 1981. Color Measurement: Theme and Variation. New York, NY: Springer-Verlag Berlin Heidelberg. 228 p.

Mallows CL. 1973. Some comments on Cp. Technometrics 15(4):661–675. Narendran N, Deng L. 2002. Color rendering properties of LED light sources. Proceedings of the SPIE. 4776:61-67. Neter J, Wasserman W, and Kutner M. 1983. Applied Linear Regression Models. 2nd ed. McGraw- Hill/Irwin. p 450. Newhall SM, Burnham RW, Clark JR. 1957. Comparison of successive with simultaneous color matching. JOSA. Vol 47. 43. Nickerson D, Jerome CW. 1965. Color rendering of light sources: CIE method of specification and its application. Illum Eng (IESNA). 60(4):262-271. Ohno Y. 2005. Spectral design considerations for white LED color rendering. Opt Eng 44(11):111302. Ohno Y, Fein M, Miller C. 2015. Vision experiment on chroma saturation for color quality preference. Proceedings of the 28th CIE Session. Manchester, UK. CIE Publication 216:2015 1(1). Pointer MR. 1986. Measuring colour rendering—a new approach. Lighting research & technology 18(4):175-184. Rea MS. 2010. A practical and predictive two-metric system for characterizing the color rendering properties of light sources used for architectural applications. Proceedings of SPIE-OSA. International Optical Design Conference. Vol 7652. 765206–1 – 765206–7. Rea MS, Deng L, Wolsey R. 2004. NLPIP Lighting Answers: Light Sources and Color. Troy, NY: Rensselaer Polytechnic Institute. Accessed 2016 October 4. http://www.lrc.rpi.edu/nlpip/publicationDetails.asp?id=901&type=2 Rea MS, Freyssinier-Nova JP. 2008. Color Rendering: A tale of two metrics. Color Research and Application. 33(3). 192-202. Rea MS, Freyssinier-Nova JP. 2010. Color rendering: beyond pride and prejudice. Color Research and Application. 35(6). 401–409 Rea MS, Freyssinier-Nova JP. 2011. White lighting. Color research & application 38(2):82-92.

101

Rea MS, Freyssinier JP. 2012. The Class A color designation for light source. Lighting Research Center, Rensselaer Polytechnic Institute. Troy, NY. 4 p. Royer MP, Houser KW, Wilkerson AM. 2011. Color Discrimination Capability under Highly Structured Spectra. Color Research and Application. 37(6): 441-449. Royer MP, Wilkerson A, Wei M, Houser KW, Davis R. 2016. Human Judgements of Color Rendition Vary with Average Fidelity, Average Gamut, and Gamut Shape. Published online before print August 10, 2016, doi: 10.1177/1477153516663615 Sanders CL. 1959. Color preference for natural objects. Illuminating Engineering. Vol 54. 452p. Sandor N, Schanda J. 2006. Visual colour rendering based on colour difference evaluations. Lighting Research and Technology. 38(3):225-239. Schanda J. 1985. A combined colour preference - colour rendering index. Lighting research and technology 17(1):31-34. Schanda J. 2007. Colorimetry: understanding the CIE system. John Wiley & Sons, Inc., Hoboken, New Jersey. p. 459. Smet KAG, Ryckaert WR, Pointer MR, Deconinck G, Hanselaer P. 2010a. Colour appearance rating of familiar real objects. Color Res Appl. 36(3):192-200. Smet KAG, Ryckaert WR, Pointer MR, Deconinck G, Hanselaer P. 2010. Memory colors and color quality evaluation of conventional and solid-state lamps. Optics express 18(25):26229-26244. Smet KAG, Ryckaert WR, Pointer MR, Deconinck G, Hanselaer P. 2011. Correlation between color quality metric predictions and visual appreciation of light sources. Optics express 19(9):8151-8166. Smet KAG, Schanda J, Whitehead L, Luo RM. 2013. CRI2012: A proposal for updating the CIE colour rendering index. Lighting Research and Technology. 45(6):689–709. Szabo F, Bodrogi P, Schanda J. 2009. A colour harmony rendering index based on predictions of colour harmony impression. Lighting research & technology 41:165-82. Thornton, WA. 1971. Luminosity and color-rendering capacity of white light. J of the Opt Soc of Am. 61(9):1155- 1163. Thornton WA. 1972. Color-discrimination index. Journal of the optical society of America 62(2):191-194. Thornton WA. 1972b. Color-rendering capability of commercial lamps. Applied optics 11(5):1078-1086. Thornton WA. 1974. A validation of the color-preference index. Journal of the illuminating engineering society 4(1):48-52. van der Burgt P, van Kemenade J. 2010. About color rendition of light sources: the balance between simplicity and accuracy. Color research & application 35(2):85-93. Wei. 2011. Effects of spectral modification on perceived brightness and color discrimination [dissertation]. University Park (PA): The Pennsylvania State University. 109 p. Available from: Penn State’s eTD database, https://etda.libraries.psu.edu/catalog/26742. Wei M, Houser KW. 2012. Colour discrimination of seniors with and without cataract surgery under illumination from two fluorescent lamp types. CIE x037:359-368

102

Wei M, Houser KW. 2016. What is the Cause of Apparent Preference for Sources with Chromaticity below the Blackbody Locus? Leukos 12(1-2):95–99. Wei M, Houser KW. 2016a. Color preference under light stimuli characterized by a two-measure system: a pilot study. IES Research Symposium III–Light + Color. Gaithersburg (Maryland), USA. Wei M, Houser KW, Allen GR, Beers WW. 2014. Color preference under LEDs with diminished yellow emission. Leukos. 10(3):119–131. Wei M, Houser KW, David A, Krames MR. 2014a. Perceptual responses to LED illumination with CIE General Color Rendering Indices of 85 and 97. Lighting Res Technol 47(7):810–827. Wei M, Houser KW. 2012. What is the Cause of Apparent Preference for Sources with Chromaticity below the Blackbody Locus? Leukos 12(1-2):95–99. Wei M, Houser KW, David A, Krames MR. 2016. Color Gamut Size and Shape Influence Color Preference. Lighting Research & Technology. Published online before print August 13, 2016, doi: 10.1177/1477153516651472 Whitehead LA, Mossman MA. 2012. A Monte Carlo Method for assessing color rendering quality with possible application to color rendering standards. Color research & application 37(1):13-22. Worthey JA. 1982. Opponent-colors approach to color rendering. Journal of optical society of America 72(1):74-82. Xu H. 1984. Color rendering capacity of illumination. Journal of the illuminating engineering society 13:270-276. Xu H. 1993. Colour rendering capacity and luminous efficiency of a spectrum. Lighting research & technology 25(3):131-132. Yaguchi H, Takahashi Y, Mizokami Y. 2013. Categorical colour rendering index based on the CIECAM02. 12th Congress of the International AIC Colour Association; Newcastle, UK:1441-1444. Žukauskas A, Vaicekauskas R, Ivanauskas F, Vaitkevicius H, Shur M. 2008. Rendering a color palette by light-emitting diodes. Applied physics letters 93(2):021109-021109-3. Žukauskas A, Vaicekauskas R, Ivanauskas F, Vaitkevicius H, Vitta P, Shur M. 2009. Statistical approach to color quality of solid-state lamps. IEEE journal of selected topics in quantum electronics 15(4):1189- 1198. Žukauskas A, Vaicekauskas R, Shur M. 2010. Colour-rendition properties of solid-state lamps. Journal of physics D: applied physics 43(35):354006. Žukauskas A, Vaicekauskas R, Vitta P, and others. 2012. Color rendition engine. Optics Expr 20(5):5356– 5367.

103

APPENDIX A: The block effect on OVERALL subjective rating scales

Comparison of Block 1 vs Block 3: Rating Scales

Purpose: During Block 1 it was noticed that the experimental apparatus exhibited some color difference between the left and the right side of the booth due to use of the mechanical iris used for dimming to the target illuminance (due to unequal occlusion of the channels used to create the composite spectra). No correction to the testing method was applied during Block 1. Block 3—which is a replication of Block 1— eliminated the use of the mechanical iris and instead used a composition of Rosco diffusion gels for dimming to the target illuminance. Thus, this test between Block 1 and Block 3 is to determine if there was a statistically significant difference between the results from the two blocks which used slightly different dimming mechanisms (and thus had slightly different color uniformity characteristics).

Method: Several ANOVA tests were conducted to test if there was statistically significant differences between participant Reponses from Block 1 and Block 3. For each of the 4 OVERALL rating scales the block effect was considered, as was the interaction of the three independent variables (Rf, Rg, and CB = Chroma Bin Orientation), for a total of 16 statistical tests.

Results: Of the 16 statistical tests performed, none showed statistical significance. See next page for full results.

Verdict: With no tests showing any statistical significance it is determined that there exists no block effect or block interaction effects. With the two blocks being deemed approximately the same (statistically), there is no reason to believe the change in dimming mechanism had any effect on the results.

104

ANALYSIS OF VARIANCE Block 1 versus Block 3: OVERALL Ratings

OVERALL “LIKE” Source DF Adj SS Adj MS F-Value P-Value Rf 3 2.033 0.67751 0.57 0.637 Rg 4 24.884 6.22092 5.21 0.000 CB 1 0.921 0.92083 0.77 0.381 Block 1 0.022 0.02152 0.02 0.893 Rf*Block 3 0.407 0.13565 0.11 0.952 Rg*Block 4 2.367 0.59177 0.50 0.739 CB*Block 1 0.775 0.77532 0.65 0.421 Error 222 265.004 1.19371 Lack-of-Fit 6 10.314 1.71892 1.46 0.194 Pure Error 216 254.690 1.17912 Total 239 307.049

OVERALL “NATURAL” Source DF Adj SS Adj MS F-Value P-Value Rf 3 6.502 2.1672 1.64 0.182 Rg 4 7.515 1.8789 1.42 0.229 CB 1 19.278 19.2783 14.55 0.000 Block 1 0.361 0.3608 0.27 0.602 Rf*Block 3 1.511 0.5038 0.38 0.767 Rg*Block 4 5.576 1.3940 1.05 0.381 CB*Block 1 0.581 0.5808 0.44 0.509 GREEN = statistically Error 222 294.192 1.3252 insignificant, which, for this Lack-of-Fit 6 11.235 1.8725 1.43 0.204 Pure Error 216 282.957 1.3100 analysis of Block 1 vs. Block 3, Total 239 362.082 is favorable

OVERALL “VIVID” Source DF Adj SS Adj MS F-Value P-Value Rf 3 1.135 0.3785 0.41 0.744 Rg 4 53.250 13.3124 14.53 0.000 CB 1 21.948 21.9481 23.95 0.000 Block 1 0.410 0.4100 0.45 0.504 Rf*Block 3 0.492 0.1641 0.18 0.910 Rg*Block 4 0.214 0.0536 0.06 0.994 CB*Block 1 0.032 0.0316 0.03 0.853 Error 222 203.460 0.9165 Lack-of-Fit 6 4.439 0.7398 0.80 0.569 Pure Error 216 199.021 0.9214 Total 239 328.149

OVERALL “SKIN” Source DF Adj SS Adj MS F-Value P-Value Rf 3 2.559 0.8531 0.57 0.635 Rg 4 34.177 8.5442 5.71 0.000 CB 1 1.715 1.7154 1.15 0.285 Block 1 1.085 1.0850 0.73 0.395 Rf*Block 3 4.283 1.4276 0.95 0.415 Rg*Block 4 3.412 0.8530 0.57 0.685 CB*Block 1 2.551 2.5506 1.70 0.193 Error 222 332.185 1.4963 Lack-of-Fit 6 6.385 1.0641 0.71 0.645 Pure Error 216 325.800 1.5083 Total 239 389.394

105

Comparison of Block 2 vs Block 4: Rating Scales

Purpose: Much like block 3 is a replication of Block 1, Block 4 is a replication of Block 2. The goal of this analysis was to determine if there was a statistically significant difference between the responses from Block 2 and Block 4, which were based on the same spectra (i.e. replication), but separated in time.

Method: Several ANOVA tests were conducted to test if there are statistically significant differences between participant Reponses from Block 2 and Block 4. For each of the 4 rating, the block effect was considered, as was the interaction of the three independent variables (Rf, Rg, and CB = Chroma Bin Orientation), for a total of 16 statistical tests.

Results: Of the 16 statistical tests performed, 1 showed statistical significance: the Rg*Block interaction effect for the vividness scale.

Verdict: With only 1 of 16 tests showing statistical significance—which is 6.25% of the tests performed—it is determined that there exists reasonably little block effect or block interaction effects. With the two blocks being deemed approximately the same (statistically), there is no reason to believe that the separation in time had any substantial effect on the results.

106

ANALYSIS OF VARIANCE Block 2 versus Block 4: OVERALL Ratings

OVERALL “LIKE” Source DF Adj SS Adj MS F-Value P-Value Rf 3 5.616 1.87185 2.20 0.089 Rg 4 24.340 6.08492 7.16 0.000 CB 1 1.278 1.27833 1.50 0.221 Block 1 0.021 0.02094 0.02 0.875 Rf*Block 3 1.030 0.34349 0.40 0.750 Rg*Block 4 1.569 0.39232 0.46 0.764 CB*Block 1 0.246 0.24573 0.29 0.591 Error 222 188.741 0.85019 Lack-of-Fit 6 9.928 1.65467 2.00 0.067 Pure Error 216 178.813 0.82784 Total 239 254.375

OVERALL “NATURAL” Source DF Adj SS Adj MS F-Value P-Value Rf 3 9.961 3.3205 2.59 0.054 Rg 4 2.848 0.7119 0.56 0.696 CB 1 4.716 4.7161 3.68 0.056 Block 1 0.427 0.4266 0.33 0.565 GREEN = statistically Rf*Block 3 3.955 1.3184 1.03 0.381 insignificant, which, for this Rg*Block 4 3.257 0.8143 0.63 0.638 CB*Block 1 0.190 0.1899 0.15 0.701 analysis of Block 2 vs. Block 4, Error 222 284.750 1.2827 is favorable Lack-of-Fit 6 19.423 3.2371 2.64 0.017 Pure Error 216 265.327 1.2284 Total 239 331.692

OVERALL “VIVID” Source DF Adj SS Adj MS F-Value P-Value Rf 3 3.940 1.3133 1.44 0.232 Rg 4 43.564 10.8911 11.93 0.000 CB 1 6.946 6.9456 7.61 0.006 Block 1 0.188 0.1878 0.21 0.651 Rf*Block 3 2.066 0.6887 0.75 0.521 Rg*Block 4 13.408 3.3521 3.67 0.006 RED = statistically significant, CB*Block 1 0.146 0.1456 0.16 0.690 which, for this analysis of Block Error 222 202.587 0.9126 Lack-of-Fit 6 3.613 0.6021 0.65 0.687 2 vs. Block 4, is unfavorable Pure Error 216 198.974 0.9212 Total 239 272.508

OVERALL “SKIN” Source DF Adj SS Adj MS F-Value P-Value Rf 3 6.428 2.14256 2.06 0.106 Rg 4 19.404 4.85089 4.68 0.001 CB 1 0.329 0.32923 0.32 0.574 Block 1 0.070 0.06996 0.07 0.795 Rf*Block 3 1.300 0.43346 0.42 0.740 Rg*Block 4 2.292 0.57304 0.55 0.698 CB*Block 1 0.967 0.96692 0.93 0.335 Error 222 230.342 1.03758 Lack-of-Fit 6 1.193 0.19888 0.19 0.980 Pure Error 216 229.149 1.06087 Total 239 283.666Total 239 389.394

107

APPENDIX B: Spectra stability and uniformity

EXAMPLE INTERPRETATION FROM BLOCK 3

Rfn = 65 3500 K Isoline Rgn = 110 CB1 ANSII Bin Center

L Measurements taken on the left (‘L’) side of the booth.

C Measurements taken at the center (‘C’) of the booth. Also the calibrating point.

R Measurements taken on the right (‘R’) side of the booth. BBR

ΔC = 1 = Start of experiment = Middle of Experiment = End of experiment ΔC = 4 ΔC = 2 Solid Line ( ) is CB1 SPD Dashed Line ( ) is CB7 SPD

"If ∆C = 1, then the chromaticity difference between the standard (center of ellipse) and the sample is equal to the standard deviation of matching in that direction, by PGN. If ∆C = 2, the chromaticity is just noticeably different to such an observer" [MacAdam 1981] Interpretation: For ∆C = 2, any point on the ellipse will be just noticeably different (1 JND) from the center of the ellipse, which, in this case, is 3500 K blackbody radiation. For ∆C = 2, this means that any two points on opposite ends of the ellipse, are twice just-noticeably different (2 JND) from one-another. By halving the size of the ellipse, to ∆C = 1, we ensure that any two points on opposite sides of the ellipse are within 1 JND.

108

FIRST EXPERIMENTAL BLOCK

Rfn = 65 Rfn = 75 Rfn = 75 Rg = 120 ANSII Bin Rg = 90 ANSII Bin Rg = 100 ANSII Bin n Center n Center n Center CB1 CB1 CB7

C

C C

BBR BBR BBR

ΔC = 1 ΔC = 1 ΔC = 1 C

ΔC = 4 ΔC = 4 ΔC = 4 ΔC = 2 ΔC = 2 ΔC = 2

Rfn = 65 L Rfn = 85 Rfn = 85 Rg = 110 ANSII Bin Rg = 100 ANSII Bin Rg = 100 ANSII Bin n Center n Center n Center CB1 CB1 CB7

C C R C

BBR BBR BBR ΔC = 1 ΔC = 1 ΔC = 1

ΔC = 4 ΔC = 4 ΔC = 4 ΔC = 2 ΔC = 2 ΔC = 2

Rfn = 65 Rfn = 65 Rfn = 85 Rg = 90 ANSII Bin Rg = 110 ANSII Bin Rg = 90 ANSII Bin n Center n Center n Center CB1 CB7 CB7

C C C

BBR BBR BBR ΔC = 1 ΔC = 1 ΔC = 1

C ΔC = 4 ΔC = 4 ΔC = 4 ΔC = 2 ΔC = 2 ΔC = 2

R Rfn = 75 Rfn = 65 Rfn = 95 Rg = 110 ANSII Bin Rg = 80 ANSII Bin Rg = 100 L ANSII Bin n Center n Center n Center CB1 CB7 CB7 L R L C C

C

BBR BBR BBR R ΔC = 1 ΔC = 1 ΔC = 1

ΔC = 4 ΔC = 4 ΔC = 4 ΔC = 2 ΔC = 2 ΔC = 2

109

Illuminance Measurements—Block 1 ID Position Start Middle End 65 C 59.8 60 59.9 120 R - - - cb1 L - - - 65 C 60 59.8 59.7 110 R 53.5 52.8 53.1 cb1 L 60.6 60.5 60.2 65 C 59.8 59.7 59.7 90 R - - - cb1 L - - - 75 C 60.6 60.2 60.3 110 R 54.1 53.9 53 cb1 L 63.1 63.4 62.6 75 C 59.8 59.4 59.2 90 R - - - cb1 L - - - 85 C 60 59.5 59.2 100 R - - - cb1 L - - - 65 C 59.8 59.8 59.2 110 R - - - cb7 L - - - 65 C 60.4 60 59.8 80 R 54.6 53.9 53.2 cb7 L 59.1 58.9 57.9 75 C 59.9 60 59.9 100 R - - - cb7 L - - - 85 C 60.2 59.6 59.5 100 R - - - cb7 L - - - 85 C 59.7 59.5 59.9 90 R - - - cb7 L - - - 95 C 60.2 59.7 59.7 100 R - - - cb7 L - - -

110

SECOND EXPERIMENTAL BLOCK

Rfn = 65 Rfn = 85 Rfn = 65 Rg = 100 ANSII Bin Rg = 90 ANSII Bin Rg = 90 L ANSII Bin n Center n Center n Center CB1 CB1 CB7 R L R C C L C

BBR BBR BBR R ΔC = 1 ΔC = 1 ΔC = 1

ΔC = 4 ΔC = 4 ΔC = 4 ΔC = 2 ΔC = 2 ΔC = 2

Rfn = 65 Rfn = 95 Rfn = 75 Rg = 80 ANSII Bin Rg = 100 ANSII Bin Rg = 110 ANSII Bin n Center n Center n Center CB1 CB1 CB7 L

R R

L R C C C L BBR BBR BBR ΔC = 1 ΔC = 1 ΔC = 1

ΔC = 4 ΔC = 4 ΔC = 4 ΔC = 2 ΔC = 2 ΔC = 2

Rfn = 75 Rfn = 65 Rfn = 75 Rg = 100 ANSII Bin Rg = 120 ANSII Bin Rg = 90 ANSII Bin n Center n Center n Center CB1 CB7 CB7 L L L R R

C C C R

BBR BBR BBR ΔC = 1 ΔC = 1 ΔC = 1

ΔC = 4 ΔC = 4 ΔC = 4 ΔC = 2 ΔC = 2 ΔC = 2

Rfn = 85 Rfn = 65 Rfn = 85 Rg = 110 ANSII Bin Rg = 100 ANSII Bin Rg = 110 ANSII Bin n Center n Center n Center CB1 CB7 CB7 L L L

R C C C R R BBR BBR BBR ΔC = 1 ΔC = 1 ΔC = 1

ΔC = 4 ΔC = 4 ΔC = 4 ΔC = 2 ΔC = 2 ΔC = 2

111

Illuminance Measurements—Block 2 ID Position Start Middle End 65 C 59.8 59.8 59.8 100 R 59.1 58.9 58.6 cb1 L 60 59.7 59.2 65 C 60.3 59.8 59.7 80 R 57.7 57.4 56.6 cb1 L 58.7 58.2 57.1 75 C 60.7 60.4 60.1 100 R 59.4 59.1 58.4 cb1 L 60.9 60.5 59.5 85 C 59.7 59.5 59.5 110 R 57.7 57.5 57.2 cb1 L 58.2 58 58.2 85 C 60.5 60.3 60.2 90 R 57.7 57.5 56.8 cb1 L 58.5 58.4 57.3 95 C 60.1 59.8 59.7 100 R 60.1 59.9 59.8 cb1 L 60.6 60.4 60.3 65 C 60.2 60.1 60.1 120 R 59.4 59.3 58.5 cb7 L 59.7 59.6 58.9 65 C 59.6 59.4 59.3 100 R 59.3 58.8 57.9 cb7 L 60.4 60 59.4 65 C 60 59.8 59.7 90 R 57 56 56 cb7 L 57.9 58 57.2 75 C 59.5 59.2 59 110 R 58.6 58.2 57.5 cb7 L 59.4 59.3 58.4 75 C 59.8 59.6 59.6 90 R 59.3 58.8 59.6 cb7 L 59.6 59.2 59.2 85 C 60.4 60.4 60.3 110 R 59.9 59.7 59 cb7 L 60.3 60.1 59.4

112

THIRD EXPERIMENTAL BLOCK

Rfn = 65 L Rfn = 75 Rfn = 75 Rg = 120 ANSII Bin Rg = 90 ANSII Bin Rg = 100 ANSII Bin n Center n Center n L Center CB1 CB1 CB7 R

C C C L

BBR BBR BBR R ΔC = 1 ΔC = 1 ΔC = 1 R

ΔC = 4 ΔC = 4 ΔC = 4 ΔC = 2 ΔC = 2 ΔC = 2

Rfn = 65 Rf = 85 Rf = 85 ANSII Bin n n Rg = 110 ANSII Bin ANSII Bin n Center Rg = 100 Rg = 100 CB1 n Center n Center L CB1 CB7 R R C C L L

C R BBR BBR BBR ΔC = 1 ΔC = 1 ΔC = 1

ΔC = 4 ΔC = 4 ΔC = 4 ΔC = 2 ΔC = 2 ΔC = 2

Rfn = 65 Rfn = 65 Rfn = 85 Rg = 90 ANSII Bin Rg = 110 L ANSII Bin Rg = 90 ANSII Bin n Center n Center n Center CB1 CB7 CB7

R R L

C C R L C BBR BBR BBR ΔC = 1 ΔC = 1 ΔC = 1

ΔC = 4 ΔC = 4 ΔC = 4 ΔC = 2 ΔC = 2 ΔC = 2

Rfn = 75 Rfn = 65 Rfn = 95 Rg = 110 ANSII Bin Rg = 80 ANSII Bin Rg = 100 ANSII Bin n Center n Center n Center CB1 CB7 CB7 L L L

C R R C C R BBR BBR BBR ΔC = 1 ΔC = 1 ΔC = 1

ΔC = 4 ΔC = 4 ΔC = 4 ΔC = 2 ΔC = 2 ΔC = 2

113

Illuminance Measurements—Block 3 ID Position Start Middle End 65 C 59.9 - 59.6 120 R 58.8 - 58.3 cb1 L 57.1 - 56.3 65 C 60.1 60 60 110 R 59.7 59.3 58.5 cb1 L 58.4 58.2 57.7 65 C 60.1 59.8 59.8 90 R 59 58.7 58.1 cb1 L 58.7 58.4 57.4 75 C 60 59.7 59.6 110 R 59.6 59.2 58.4 cb1 L 58.9 58.6 58.1 75 C 60.4 60.1 60 90 R 59.7 59.3 58.6 cb1 L 58.9 58.6 57.9 85 C 59.5 59 59 100 R 58.1 58 57.2 cb1 L 57.6 57.2 57.1 65 C 60.4 60.2 60.2 110 R 59.1 59 58.1 cb7 L 58.1 57.8 57.4 65 C 60.1 59.8 59.5 80 R 58.7 58.5 57.6 cb7 L 57 57.2 56.1 75 C 60.1 - 59.6 100 R 59 - 58.3 cb7 L 58.2 - 57.4 85 C 59.5 59.3 59.1 100 R 59.3 59.3 58.1 cb7 L 59.8 59.4 58.7 85 C 60 59.5 59.2 90 R 59.3 59.1 58.8 cb7 L 59.8 59.4 59 95 C 60.5 60 60 100 R 60.6 60.4 59.3 cb7 L 61 60.6 59.6

114

FOURTH EXPERIMENTAL BLOCK

Rfn = 65 Rfn = 85 Rfn = 65 L Rg = 100 ANSII Bin Rg = 90 ANSII Bin Rg = 90 ANSII Bin n Center n Center n Center CB1 CB1 CB7 R R C L L C

C R BBR BBR BBR ΔC = 1 ΔC = 1 ΔC = 1

ΔC = 4 ΔC = 4 ΔC = 4 ΔC = 2 ΔC = 2 ΔC = 2

Rfn = 65 Rfn = 95 Rfn = 75 Rg = 80 ANSII Bin Rg = 100 ANSII Bin Rg = 110 ANSII Bin n Center n Center n Center CB1 CB1 CB7 R L R L

C C C R BBR L BBR BBR ΔC = 1 ΔC = 1 ΔC = 1

ΔC = 4 ΔC = 4 ΔC = 4 ΔC = 2 ΔC = 2 ΔC = 2

Rfn = 75 Rfn = 65 Rfn = 75 Rg = 100 ANSII Bin Rg = 120 ANSII Bin Rg = 90 ANSII Bin n Center n Center n Center CB1 CB7 CB7 L L C L

R C R C R BBR BBR BBR ΔC = 1 ΔC = 1 ΔC = 1

ΔC = 4 ΔC = 4 ΔC = 4 ΔC = 2 ΔC = 2 ΔC = 2

Rfn = 85 Rfn = 65 Rfn = 85 Rg = 110 L ANSII Bin Rg = 100 ANSII Bin Rg = 110 ANSII Bin n Center n Center n Center CB1 CB7 CB7 L L C C C R R

BBR R BBR BBR ΔC = 1 ΔC = 1 ΔC = 1

ΔC = 4 ΔC = 4 ΔC = 4 ΔC = 2 ΔC = 2 ΔC = 2

115

Illuminance Measurements—Block 4 ID Position Start Middle End 65 C 59.6 59.6 59.4 100 R 59.2 59.3 58.6 cb1 L 58.7 58.6 57.9 65 C 60.4 - 60.3 80 R 59.4 - 58.9 cb1 L 58.3 - 57.9 75 C 60 59.7 59.6 100 R 59.4 59.5 58.6 cb1 L 59.2 59 58.4 85 C 60 59.9 59.9 110 R 59.9 59.8 59.5 cb1 L 58.6 58.4 57.8 85 C 59.8 59.5 59.2 90 R 59.6 58.7 57.8 cb1 L 58.1 57.5 56.6 95 C 59.5 - 59.1 100 R 59.6 - 59.2 cb1 L 59.7 - 59.7 65 C 60 59.6 59.5 120 R 59.5 59.2 58.8 cb7 L 58.8 58.8 57.9 65 C 59.1 59 58.7 100 R 59.5 59.2 58.5 cb7 L 58.9 58.7 58.1 65 C 60.5 - 60.1 90 R 59.6 - 59.1 cb7 L 58.6 - 57.5 75 C 59.7 59.5 59.2 110 R 59.2 59 58.7 cb7 L 58.8 58.3 57.7 75 C 59.8 59.6 59.2 90 R 59.8 59.3 58.2 cb7 L 58.5 58.6 57.6 85 C 60.2 60.2 60.1 110 R 60 59.8 59 cb7 L 59.6 59.7 58.8

116

APPENDIX C: Color Vector Graphics

117

118

119

120

121

122

123

124

125

126

127

128

APPENDIX D: Object specific questionnaires

NOTE: this image has been scaled to fit within the margins of this page. Actual scales were 5.0” long.

129

NOTE: this image has been scaled to fit within the margins of this page. Actual scales were 5.0” long.

130

NOTE: this image has been scaled to fit within the margins of this page. Actual scales were 5.0” long.

131

NOTE: this image has been scaled to fit within the margins of this page. Actual scales were 5.0” long.

132

NOTE: this image has been scaled to fit within the margins of this page. Actual scales were 5.0” long.

133

NOTE: this image has been scaled to fit within the margins of this page. Actual scales were 5.0” long.

134

APPENDIX E: Mean OBJECT SPECIFIC ratings

Table E-1 Mean OBJECT SPECIFIC ratings for the red and orange objects.

R1: Coca Cola Can R2: Apple O1: Orange Crush O2: Orange

ID Rf,n Rg,n L N V L N V L N V L N V 1 65 120 4.4 3.1 4.3 4.2 2.9 4.3 3.5 3.2 3.2 3.7 2.5 4.6 2 65 110 4.1 2.8 4.3 3.8 2.5 4.4 3.8 2.9 3.7 3.5 2.0 4.5 3 65 100 3.9 3.0 4.2 4.0 3.2 3.9 3.7 3.0 3.7 3.3 2.5 4.4 4 65 90 4.0 3.4 3.6 3.9 3.5 3.7 3.6 3.4 3.2 3.7 2.8 4.1 5 65 80 3.6 3.3 2.9 3.4 3.3 2.7 3.3 3.1 2.8 3.6 3.3 3.9 6 75 110 4.4 3.3 4.1 4.2 3.6 4.0 3.9 3.3 3.4 3.8 2.9 4.5 7 75 100 4.3 3.7 4.0 4.1 3.9 3.9 4.0 3.4 3.4 4.1 3.3 4.2 8 75 90 4.3 3.7 3.5 4.4 3.9 3.6 3.5 3.3 2.9 3.8 3.2 3.8 9 85 110 4.4 3.5 4.2 4.4 3.8 4.1 3.8 3.5 3.5 4.2 3.5 4.6 10 85 100 4.3 3.7 3.9 4.3 3.9 3.8 3.8 3.6 3.1 4.2 3.5 4.3 11 85 90 4.2 3.7 3.1 4.1 4.1 3.2 3.2 3.0 2.7 4.2 3.9 3.5 12 95 100 4.3 3.7 3.6 4.2 4.1 3.5 3.8 3.5 3.3 4.3 4.0 3.9 13 65 120 4.4 3.4 4.0 4.1 3.7 4.1 3.6 3.2 3.0 4.0 3.4 4.0 14 65 110 4.0 3.9 3.3 4.1 3.8 3.5 3.5 3.4 2.8 4.1 3.5 3.8 15 65 100 4.0 3.5 3.0 3.9 3.8 3.2 3.2 2.9 2.6 4.2 4.0 3.1 16 65 90 4.0 3.7 3.0 3.9 4.0 3.0 3.2 3.1 2.6 3.9 4.0 3.2 17 65 80 2.7 2.4 1.7 2.8 2.9 1.9 2.5 2.6 2.0 3.0 3.0 2.5 18 75 110 4.3 3.8 3.6 4.1 4.0 3.4 3.7 3.6 3.3 4.3 3.8 4.1 19 75 100 4.0 3.8 2.7 3.8 3.9 2.7 3.1 3.3 2.3 3.8 3.8 3.3 20 75 90 3.4 3.0 2.3 3.1 3.0 2.4 3.4 3.2 2.8 3.9 3.5 3.1 21 85 110 4.2 3.3 3.6 4.1 3.9 3.5 3.7 3.4 3.1 4.3 3.8 4.1 22 85 100 4.0 3.6 2.7 3.8 3.8 2.7 3.7 3.4 2.6 4.0 3.9 3.4 23 85 90 3.6 3.4 2.6 3.5 3.5 2.6 3.2 3.3 2.6 3.8 3.7 3.2 24 95 100 4.2 3.8 3.3 4.3 4.1 3.3 3.8 3.8 2.9 4.2 3.7 3.8

135

Table E-2 Mean OBJECT SPECIFIC ratings for the yellow and green objects.

Y1: Mustard Y2: Lemon G1: Sprite Can G2: Apple

ID Rf,n Rg,n L N V L N V L N V L N V 1 65 120 3.7 3.2 3.2 3.8 3.4 3.6 4.2 3.6 3.6 4.1 3.7 3.8 2 65 110 3.2 2.4 3.5 4.1 3.3 3.6 4.2 3.3 3.7 4.2 3.6 3.6 3 65 100 3.0 2.7 3.5 3.5 3.1 3.9 4.0 3.6 3.5 3.6 3.4 3.5 4 65 90 3.1 2.7 3.3 3.1 3.0 2.8 3.6 3.2 3.2 3.2 3.3 2.7 5 65 80 2.7 2.6 2.9 2.9 2.8 2.8 3.7 3.2 2.9 1.9 2.1 1.8 6 75 110 3.8 3.2 3.2 3.8 3.5 3.3 4.2 3.6 3.6 4.3 3.9 3.8 7 75 100 3.5 3.0 3.3 4.0 3.9 3.6 4.3 3.7 3.7 4.1 4.1 3.3 8 75 90 3.5 3.3 3.1 3.9 3.6 3.2 4.0 3.6 3.0 3.8 3.8 2.9 9 85 110 3.9 3.6 3.7 4.1 3.8 4.0 4.3 3.6 3.8 4.3 4.0 4.1 10 85 100 4.0 3.8 3.4 4.0 3.7 3.5 4.2 3.7 3.0 4.4 4.2 3.7 11 85 90 3.7 3.4 2.8 4.0 3.8 3.1 4.2 3.8 3.0 3.5 3.6 2.7 12 95 100 3.9 3.8 3.4 4.2 3.9 3.6 4.3 3.8 3.5 4.2 4.3 3.6 13 65 120 3.7 3.5 3.0 3.8 3.5 3.6 4.4 3.8 3.7 4.4 3.9 3.8 14 65 110 3.9 3.5 3.0 3.7 3.3 3.4 3.9 3.7 3.1 4.1 3.9 3.4 15 65 100 3.2 3.1 2.5 3.5 3.5 3.0 4.1 3.4 3.1 4.3 4.0 3.2 16 65 90 3.7 3.4 2.9 3.6 3.5 3.2 4.2 3.8 3.3 4.0 4.0 3.2 17 65 80 2.8 2.8 2.5 3.0 3.0 2.8 3.4 3.1 2.4 3.1 3.1 2.5 18 75 110 3.9 3.8 3.4 4.1 4.0 3.8 4.2 3.6 3.6 4.2 4.1 3.7 19 75 100 3.6 3.3 2.8 3.6 3.4 3.1 4.0 3.6 2.9 4.2 4.1 3.4 20 75 90 3.7 3.5 2.9 3.8 3.7 3.1 3.9 3.5 3.0 4.0 3.9 3.0 21 85 110 3.8 3.8 3.4 3.9 3.6 3.7 4.1 3.5 3.7 4.1 3.9 3.6 22 85 100 3.7 3.6 2.7 3.7 3.7 3.1 3.9 3.5 2.6 4.0 4.0 3.0 23 85 90 3.4 3.3 2.6 3.5 3.4 2.8 3.8 3.3 2.7 3.8 3.8 3.0 24 95 100 4.0 3.8 3.4 4.3 3.8 3.4 4.1 3.6 3.2 3.8 3.8 3.0

136

Table E-3 Mean OBJECT SPECIFIC ratings for the blue and purple objects.

B1: Pepsi Can B2: Blueberries P1: Grape Crush P2: Onion

ID Rf,n Rg,n L N V L N V L N V L N V 1 65 120 3.9 3.5 3.7 3.9 3.8 3.0 3.9 2.8 3.6 4.0 2.8 4.3 2 65 110 4.0 3.4 3.5 4.1 3.9 3.1 4.0 2.8 3.9 4.0 3.2 3.9 3 65 100 4.0 3.6 4.0 3.5 3.5 3.3 3.6 3.4 3.5 3.5 3.1 3.9 4 65 90 3.5 3.2 3.3 3.4 3.4 2.9 3.7 3.3 3.5 3.5 3.0 3.3 5 65 80 3.7 3.2 3.2 3.1 3.2 2.4 3.0 2.9 2.7 3.3 3.2 3.2 6 75 110 4.1 3.6 3.4 4.2 3.9 3.1 3.9 3.3 3.7 4.0 3.3 3.9 7 75 100 4.0 3.7 3.5 3.7 4.0 3.0 3.7 3.4 3.3 3.6 3.3 3.5 8 75 90 3.9 3.6 3.4 3.8 3.7 2.9 3.9 3.2 3.2 3.9 3.5 3.4 9 85 110 4.2 3.5 3.9 4.0 3.9 3.6 3.7 3.3 3.6 3.8 3.2 4.2 10 85 100 3.9 3.6 3.5 4.2 4.1 3.2 4.0 3.6 3.7 3.7 3.3 3.6 11 85 90 4.2 3.6 3.5 3.6 3.8 2.9 3.2 3.1 2.6 3.8 3.4 3.2 12 95 100 4.3 3.8 3.9 4.0 4.1 3.2 3.7 3.5 3.3 4.1 3.7 3.7 13 65 120 4.2 3.4 3.7 4.0 4.0 3.1 3.7 3.3 3.5 3.7 3.3 3.6 14 65 110 3.9 3.5 3.2 3.8 3.8 2.8 3.7 3.5 3.0 3.7 3.1 2.8 15 65 100 4.1 3.4 3.5 3.7 3.9 2.7 3.5 3.3 2.9 3.7 3.5 3.0 16 65 90 4.0 3.8 3.5 3.6 3.9 2.7 3.5 3.3 3.0 3.8 3.6 3.2 17 65 80 3.5 3.1 3.0 3.2 3.4 2.3 3.1 2.8 2.2 3.2 2.5 2.2 18 75 110 4.1 3.4 3.9 3.9 4.0 3.3 3.7 3.4 3.4 3.8 3.6 3.7 19 75 100 3.9 3.4 3.4 3.9 4.0 2.7 4.0 3.5 3.3 3.7 3.3 3.6 20 75 90 4.1 3.5 3.3 3.6 3.8 2.8 3.5 3.1 2.8 2.7 2.5 2.3 21 85 110 4.0 3.5 3.7 4.0 4.1 3.4 3.8 3.5 3.4 3.9 3.6 3.7 22 85 100 3.8 3.5 3.1 3.7 4.0 2.6 3.7 3.4 2.8 3.7 3.1 2.8 23 85 90 3.8 3.3 3.0 3.5 3.6 2.3 3.3 3.1 2.6 3.6 3.7 2.8 24 95 100 4.0 3.7 3.5 3.9 3.8 2.7 4.1 3.7 3.2 3.9 3.6 3.2

137

APPENDIX F: FM-100 hue test, correct chip arrangement

The graph below shows the sequential order of colored caps under CIE . The graphics on subsequent pages show the order of caps under the indicated experimental SPD and the correct order of caps that was used when calculating TESadj. The light source transposition error score (Rdt) is also shown for each SPD. An Rdt of zero (Rdt = 0) indicates that the light source did not cause any transpositions relative to the order of caps under CIE Illuminant C.

FM100 CHROMATICITY COMPARISON 40 25 24 23 22 21 20 19 18 17 16 15 14 26 13 27 12 28 11 30 29 10 30 31 9 32 8 33 7 20 34 6 35 5 36 4 37 3 10 38 2 39 1 40 85 41 84 42 83 82 b' 0 43 44 81 45 80 46 79 47 78 -10 48 77 49 76 50 75 51 74 52 73 -20 53 72 54 71 55 70 56 69 -30 57 68 58 67 59 66 60 65 61 62 63 64 -40 -40 -30 -20 -10 0 10 20 30 40 a'

A B C D

138

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

139

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

140

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

141

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

142

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

143

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

144

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

145

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

146

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

147

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

148

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

149

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

150

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

151

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

152

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

153

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

154

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

155

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

156

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

157

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

158

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

159

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

160

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

161

Rf 65 75 85 95 Rg

Experimental SPD's 120 1 13 Identification Numbers

110 2 14 6 18 9 21

100 3 15 7 19 10 22 12 24

90 4 16 8 20 11 23

80 5 17

162

APPENDIX G: CVG Best-fit ellipses

163

164

165

166

167

168

169

170

171

172

173

174

APPENDIX H: Best-fit model statistics

Variable Definitions Rf IES TM-30-15 Average Fidelity Rcs,hi IES TM-30-15 hue angle bin chroma shift (i = 1 to 16) Phi (ψ) Best-fit ellipse rotation angle b Best-fit ellipse semi-minor axis length

BEST-FIT MODEL FOR “NATURALNESS” RATING Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 5 6.24432 1.24886 42.11 0.000 Rf 1 1.05989 1.05989 35.74 0.000 phi 1 1.67637 1.67637 56.53 0.000 phi*Rf 1 1.43982 1.43982 48.55 0.000 Rcsh1 1 0.00791 0.00791 0.27 0.612* Rcsh1*Rcsh1 1 2.19052 2.19052 73.86 0.000 Error 18 0.53381 0.02966 Total 23 6.77813

Model Summary S R-sq R-sq(adj) R-sq(pred) 0.172210 92.12% 89.94% 84.30%

Regression Equation 2 NAT = 1.464 + 0.02674 Rf + 0.188 Rcs,h1 - 15.41 Rcs,h1 - 0.05305 ψ + 0.000602 Rf*ψ

*Term was retained to maintain a hierarchical model

BEST-FIT MODEL FOR “VIVIDNESS” RATING Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 1 7.481 7.48093 138.96 0.000 Rcsh16 1 7.481 7.48093 138.96 0.000 Error 22 1.184 0.05383 Total 23 8.665

Model Summary S R-sq R-sq(adj) R-sq(pred) 0.232023 86.33% 85.71% 82.63%

Regression Equation VIV = 3.3315 + 4.594 Rcs,h16

175

BEST-FIT MODEL FOR “PREFERENCE” RATING Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 5 5.0970 1.01939 22.67 0.000 Rf 1 1.1804 1.18037 26.25 0.000 phi 1 1.7347 1.73466 38.58 0.000 Rcsh16 1 3.1684 3.16845 70.47 0.000 Rcsh16*Rcsb16 1 0.6203 0.62032 13.80 0.002 Rf*phi 1 1.4493 1.44931 32.24 0.000 Error 18 0.8093 0.04496 Total 23 5.9062

Model Summary S R-sq R-sq(adj) R-sq(pred) 0.212036 86.30% 82.49% 69.92%

Regression Equation 2 LIKE = 1.629 + 0.02686 Rf - 0.04866 phi + 3.423 Rcs,h16 - 10.01 Rcs,h1 + 0.000566 Rf*phi

BEST-FIT MODEL FOR “SKIN PREFERENCE” RATING Analysis of Variance Source DF Adj SS Adj MS F-Value P-Value Regression 3 4.3348 1.44492 37.28 0.000 b 1 0.9834 0.98336 25.37 0.000 Rcsh16 1 0.2829 0.28291 7.30 0.014 Rcsh16*Rcsh16 1 0.5555 0.55548 14.33 0.001 Error 20 0.7751 0.03876 Total 23 5.1099

Model Summary S R-sq R-sq(adj) R-sq(pred) 0.196869 84.83% 82.56% 79.55%

Regression Equation 2 SKIN = 0.128 + 3.758 b + 1.161 Rcs,h16 - 8.41 Rcs,h16

176

VITA: TONY ESPOSITO

Tony was born and raised in Philadelphia, PA and moved to State College, PA in 2007 after high school to study at Penn State University. Tony completed a combined Bachelor/Master of Architectural Engineering (B.A.E./M.A.E.) in 2012. Prior to receiving his Master’s degree, Tony completed internships at The Lighting Practice (lighting design), Pusey Electric (electric contracting), SmithgroupJJR (electrical engineering), and Lighting Design Alliance (lighting design), and won AE awards for Outstanding performance in Lighting/Eelctrical in the 4th year and First place Lighting/Electrical Senior Thesis. After receiving his master’s degree, Tony went to work for The Lighting Practice in Philadelphia, PA as a full-time lighting designer, before returning to Penn State in 2013 to pursue a Ph.D. in Architectural Engineering. For the first two years of Tony’s doctoral program he served as a graduate fellow to the NSF Gk-12 CarbonEARTH program where he developed science curriculum and worked closely with middle schools in Philipsburg and Harrisburg, PA. During his time at Penn State, Tony won the R.J. Besal Scholarship in 2010, 2011, 2014, and 2015, won several individual and group travel stipends from the IES and IALD, served as a teaching assistant in the Rome Study Abroad Program, advised numerous undergraduate students, served as peer reviewer for ARCH 542 poster sessions on game theory, and rehabilitated the Architectural Engineering Graduate Student Association (AE GSA). Tony has a passion for education and has given many guest lectures at Penn State. To spread the word about his research and educate the lighting industry, he has traveled to give lectures on color science in Harrisburg, Baltimore, Philadelphia, New York, Frankfurt Germany, and Boston. Tony plans to continue his education and research initiatives and will begin his professional research career as the Lighting Quality Researcher for Philips Research North America.

177