Quick viewing(Text Mode)

Acc~Pted ~V~L~~~~~~~

Acc~Pted ~V~L~~~~~~~

GRAPHICAL CONTEXT AS AN AID TO CHARACTER RECOGNITION

by

THEODORE THOMAS KUKLINSKI ~

B.S., Drexel University (1972)

S.M., Massachusetts Institute of Technology (1975)

E.E., Massachusetts Institute of Technology (1975)

SUBMITTED IN·PARTIAL FULFILLMENT OF 'THE REQUIREMENTS FOR THE . DEGREE OF DOCTOR OF PHILOSOPHY at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY FEBRU/\RY, 1979 Vo;(,,_L. Signature redacted Signature of Au.thor ••••· • • ~l'"'T""'t ...... -.~.-.-- ••- .-•• -.. --•••••-. ·; ••- ••••••••• Department of Electrical Engineering and Computer Science /J /J ~~ Fe b r u a r y 2 O, 1 9 7 9 Signature redacted Ce rt i f i e d by •••••• • l · .r.&.--:-: • ·t. .L.:~ ~~---r h; ; i ; . sl~ p; ~: i ; ~ ~ ------1~. ?~ . ) ~~ -

Acc~pted ~v~l~~~~~~~ ...... ~-.... . Chairperson, Departmental Committee on Graduate Students ARCHIVES MASSACHUSETTS INSTITUTE Of TECHNOLOGY f'~ UV 1 0 1980 / UBRARlES GRAPHICAL CONTEXT AS AN AID TO CHARACTER RECOGNITION

by

THEODORE THOMAS KUKLINSKI

Submitted to the Department of Electrical Engineering and Computer Science on February 14, 1979 in partial fulfillment of the requirements for the Degree of Doctor of Philosophy.

ABSTRACT

Contextual information is of great use to humans in the recognition of both machine and hand printed characters. The incorporation of contextual information, particularly a form known as graphical context, having to do with stylistic con- sistency within and between characters, may lead to better recognition than that currently achievable. The role and importance of character recognition technol- ogy in our society is examined, as are the problems involved and the state of the art in of both machine and handprinted character recognition. An examination of the importance of the use of various forms of contextual information in charac- ter recognition is presented. A functional attribute theory of character recognition is reviewed. This theory incorporates graphical context as a modulator of rules which transform between the physical and psychological domains. The theory is based on ambiguous char- acters which form the boundaries between letters. These bound- aries can be found using a variety of methodologies which are described. One of these, goodness rating, is used throughout the remainder of the thesis for testing graphical contextual effects. The range frequency model is described as a potentially useful candidate model for describing contextual effects on goodness curves and is traced historically and derived mathematically. It is based on the premise that judgment is a weighted compromise betwee*n a range principle and a frequency principle. The range tendency has to do with the stimulus range while the frequency principle deals with the effects of stimulus distribution. The model provides a method by which predictions can be made from the results of one category rat-. ing experiment to the results of another. A formulation to deal with prediction onto partial range experiments is developed.

-2- Eight category rating experiments, spanning a variety of stimulus ranges and stimulus distributions, are described and provide a test of the graphical context model. The test con- sists of predicting the results of one experiment from the results of another. The procedures for making such predic- tions via the model are described in detail. The idea of a weighting factor varying along the dimension is developed. Three variants of the model are tried and three sets of predictions made. The first, a naive traditional RF model is found to be an inadequate predictor. N-ext, excellent predic- tions are obtained with some of the parameters having been optimized on one group of experiments (involving V-Y) and then tested on another group (involving C-F). It is discovered that the range principle is dominant particularly in the stimulus region around V and C. Finally, very good results are obtained using only the range principle. The results of each set of predictions are compared using a sum squared error metric. Letter archetypes are shown to be important determinants of stimulus range. It is concluded that plasticity effects in the perception of characters can be modeled successfully using a range type model. Applications of the model to graphical contextual analysis are discussed and possible future work is described.

THESIS SUPERVISOR: Barry A. Blesser TITLE: Associate. Professor of Electrical Engineering ACKNOWLEDEMENTS

I am very grateful to thesis supervisor, Barry

Blesser, for the help, encouragement, support and patience he has shown to me over the course of my stay at MIT. His com- ments have guided me along in the course of this thesis work. Likewise the constructive comments and insights of my thesis readers, Murray Eden and Mary Naus, as well as Ching Suen, is also gratefully acknowledged and appreciated.

Thanks are due to Tim Waldron, Robin Kuklinski, as well as Robert Shillman for assistance in running some of the experiments. reported here. Many people in the CIPG laboratory and RLE were very helpful in many ways over the past few years. I especially thank Al Rudnick, our night custodian,

John Hewitt, RLE Librarian, and John McKenzie, CIPG PDP-9 Doc- tor. The friendship and kind assistance of various other peo- ple in the CIPG lab, among them, Charlie Cox, Don Levinstone,

Bob Babcock, is appreciated.

I wish to thank the Cognitive Information Processing

Group for their support of my work indirectly through the use of their computer facilities. I am grateful to the National Science Foundation for their support of me as a research assistant through May 1977 under Grants #NSF-ENG-7417459 and

ENG74-24344, as well as MIT's Research Laboratory of Eledtron- ics for support in Fall 1977 under an industrial fellowship. To my parents I will ever be in debt for their encourage- ment, prayers and support. My true gratitude and love goes to my wife Hsueh-Rong for her help and understanding. This thesis is dedicated to her. Deo Gratias.

-5- TABLE OF CONTENTS

TITLE PAGE......

ABSTRACT...... 2 .

ACKNOWLEDGEMENTS ...... 4

TABLE OF CONTENTS ...... 6

LIST OF FIGURES...... 12

LIST OF TABLES ...... 17

CHAPTER 1 INTRODUCTION ...... 18

1.1 INTRODUCTION ...... 18

1.2 OUTLINE OF INTRODUCTORY AND BACKROUND MATERIAL . . 21

1.3 OUTLINE OF RANGE FREQUENCY WORK...... 25

CHAPTER 2 CHARACTER RECOGNITION...... 30

2.1 INTRODUCTION ...... 30

2.2 THE THREE "R's" - A STATE OF THE ART REPORT. .. . 30

2.3 MACHINE PRINTED CHARACTER RECOGNITION - THE PROBLEM...... 34

2.4 HANDPRINTED CHARACTER RECOGNITION - THE PROBLEM. . 37

2.5 ENGINEERING LITERATURE IN CHARACTER RECOGNITION. . 47 2.6 PSYCHOLOGICAL LITERATURE IN CHARACTER RECOGNITION. 50

2.7 SOME USES OF CHARACTER RECOGNITION ...... 53

2.8 SUMMARY...... 57

-6- CHAPTER 3 CONTEXT IN CHARACTER RECOGNITION...... 58

3.1 OVERVIEW...... 58

3.2 THE POWER OF- CONTEXT: AN ILLUSTRATION . . .60

3.3 HIGHER LEVELS OF CONTEXT ...... 62 3.4 CULTURAL, HISTORICAL AND GENERATIVE CONTEXT. . .. 65 3.5 GRAPHICAL CONTEXT...... 71

CHAPTER 4 A FUNCTIONAL ATTRIBUTE BASED THEORY OF CHARACTER RECOGNITION...... 76

4.1 INTRODUCTION ...... 76

4.2 THE CONCEPT OF ATTRIBUTE ...... 77

4.3 A SIMPLIFIED MODEL ...... 81

4.4 SUMMARY...... 85

CHAPTER .5 METHODOLOGIES FOR A FUNCTIONAL ATTRIBUTE BASED THEORY OF CHARACTER RECOGNITION...... 86

5.1 INTRODUCTION ...... 86

5.2 A REVIEW OF EXPERIMENTAL WORK...... 87

5.3 THE GOODNESS METHODOLOGY ...... 91

5.4 SOME OTHER METHODOLOGIES ...... 94

5.5 PLASTICITY OF INTERLETTER BOUNDARIES ...... 99

5.6 SUMMARY...... 102

CHAPTER 6 RANGE FREQUENCY: A THEORY OF RELATIVITY FOR PSYCHOPHYSICS...... 104

6.1 INTRODUCTION ...... 104

6.2 SOME TERMINOLOGY ...... 108

-7- .

.

6.3 ROOTS IN ADAPTATION LEVEL THEORY . . . * . .9 115 6.4 EARLY RANGE FREQUENCY FORMULATION. 119

6.5 THE LIMEN MODEL...... 124 6.6 THE "SIMPLIFIED" RANGE FREQUENCY MODEL . 127

6.7 SUMMARY...... 136

CHAPTER 7

RANGE FREQUENCY: FURTHER DEVELOPMENT. . .. . a . . . 138

7.1 THEORETICAL DEVELOPMENT...... 138

7.2 FUNCTIONAL MEASUREMENT...... 138

7.3 RANGE FREQUENCY -THEORETICAL DEVELOPMENT. 141

7.4 PSYCHOPHYSICAL LAW ...... 147

7.5 PSYCHOLOGICAL LAW-: RANGE FREQUENCY MODE 150

7.5.1 THE RANGE PRINCIPLE...... L. . . . 151

7.5.2 THE FREQUENCY PRINCIPLE . ... 155

7.6 THE RANGE FREQUENCY COMPROMISE . . 0 . .* 159 7.7 RESTRICTED RANGE CASE...... 164

7.8 SUMMARY...... 169

CHAPTER 8 EXPERIM ENTS...... 170

8.1 INTRODUCTION...... 170

8.2 GOODNESS EXPERIMENTS AS A TEST OF RF THEORY4 170 8.3 THE ATTRIBUTE "LEG" AS A TEST OF RF THEORY 173 8.4 CONTEXT DETERMINING VARIABLES...... 175

8.5 OVERVIEW OF EXPERIMENTS. . . .0..... 176

-8-

.

.

.

...... 8.6 RANGE VARIATION EXPERIMENTS. . . 181

8.6.1 BACKROUND 181 8.6.2 METHOD. 183

8.6.2.1 STIMULI...... 183

8.6.2.2 SUBJECTS ...... 185

8.6.2.3 PROCEDURE...... 192 8.6.3 RESULTS . 195

8.7 EXPERIMENT 7: FULL RANGE...... 200

8.7.1 BACKROUND 200

8.7.2 METHOD. 201

8.7.2.1 STIMULI...... 201

8.7.2.2 SUBJECTS . 205 8.7.2.3 PROCEDURE...... 205

8.7.3 RESULTS 209

8.8 EXPERIMENT 8: FULL RANGE.'...... 209

8.8.1 BACKROUND 209

8.8.2 METHOD. 210

8.8.2.1 STIMULI. . . 210

8.8.2.2 SUBJECTS...... 212

8.8.2.3 PROCEDURE...... 213 8.8.3 RESULTS . 215 8.9 EXPERIMENT 9: FULL RANGE...... 217 8.9.1 BACKROUND...... a . . . . 217 8.9.2 METHOD...... *.... 218 8.9.2.1 STIMULI.. 218

8.9.2.2 SUBJECTS . . .a . . . 219

8.9.2.3 PROCEDURE. .a . . . 219

8.9.3 RESULTS...... 219

.. a .a . . 8.10 THE QUESTION OF LETTER ARCHETYPE. 221

8.11 SUMMARY...... 225

CHAPTER 9 PROCEDURE FOR PREDICTION ACROSS EXPERIMENTS. 227

. . a . . 9.1 INTRODUCTION...... 227

9.2 PREDICTION PROCEDURE ...... 229

9.3 DERIVATION OF GOODNESS CURVES. . . 236

9.4 DERIVATION OF THE FREQUENCY FUNCTION 244 9.5 A MODEL FOR W, THE WEIGHTING FACTOR. 249

9.6 RANGE CURVE PROCESSING ...... 256

9.7 OPTIMIZATION PROCEDURE ...... 267

9.8 THE ROLE OF THE COMPUTER IN ANALYZING THE DATA 278

-.... a . a. 9.9 SUMMARY...... 283

CHAPTER 10

PREDICTIONS ACROSS EXPERIMENTS . a . . a . .a 285

10.1 INTRODUCTION...... 0.&.0. .. . 285

10.2 SET A PREDICTIONS ...... a .. a a .a 288

10.2.1 INTRODUCTION.a...... 288 10.2.2 DISCUSSION OF SET A RESULTS. 303

-10- 10.3 SET B.- OPTIMIZED PRED ICTIONS...... 307 10.3.1 INTRODUCTION . 307

10.3.2 OPTIMIZATION 308

10.3.3 PREDICTION RE SULTS ...... 315

10.3.4 LOW RANGE PRE DICTIONS. . ... 325

10.3.5 HIGH RANGE PR EDICTIONS . ... 328

10.3.6 FULL RANGE PR EDICTIONS...... 330

10.3.7 GENERAL COMME NTS ...... 333

10.4 SET C PREDICTIONS . 336

10.5 COMPARISON OF RESULTS .. .. . 348

10.6 CONCLUSIONS...... 352

CHAPTER 11 SUMMARY AND CONCLUSION ...... 355 11.1 INTRODUCTION ...... 355 11.2 A BRIEF REVIEW

11.3 CONCLUSIONS. .. ..358

...... 360 11.4 FUTURE WORK. . ..

BIBLIOGRAPHY ...... ~ .40 ...... 365

BIOGRAPHICAL NOTE...... 0. 0. 0. 0. 0. 0. a. 0. 0.'. ... . 386

I"111" LIST OF FIGURES

FIGURE PAGE

2.1 OCR-A, A CO4MON MACHINE ORIENTED TYPE STYLE . . .. 33

2.2 THE ANSI CHARACTER SET FOR HANPRINTING [204]. . . . 39 2.3 SOME SUBSTITUTION ERRORS FROM GRAFIX-I HANDPRINT RECOGNITION SYSTEM [85]...... 44 1 2.4 A "V' AND "Y" FROM NEISSER AND WEENE'S DATA [140] . 45

3.1 TWO CHINESE CHARACTERS EXHIBITING LITTLE DIFFERENCE TO WESTERNERS...... a... . *.. 67 3.2 SEVERAL WORDS EXTRACTED FROM THE "DECLARATION OF INDEPENDENCE" [from 215] .. .1.0.0...... 67 3.3 "EUROPEAN" 1 AND 7 CONTRASTED WITH THOSE OF THIS COUNTRY...... 67 3.4 CHINESE IDEOGRAPHS WHICH LOOK SIMILAR TO WESTERNERS...... 69 3.5 TWO V'S ILLUSTRATING HOW THE METHOD OF GENERATION MAY INFLUENCE PERCEPTION...... 69

3.6 AN E WITH INTERSTROKE LINES VISIBLE ...... 69 3.7 AN EXAMPLE OF THE USE OF INTERCHARACTER GRAPHICAL CONTEXT....g...... 72 3.8 EXAMPLES OF THE VIOLATION OF INTRA AND INTER CHARACTER GRAPHICAL CONTEXT IN THE PRINTED CHARACTER DOMAIN...0...... 72

4.1 THREE LEVELS OF THE ATTRIBUTE LEG ...... 78

4.2 CHARACTERS ALONG V-Y, C-F, AND U-H TRAJECTORIES . . 78

4.3 A SIMPLIFIED MODEL OF THE CHARACTER RECOGNITION PROCESS ...... 82

-.12- FIGURE PAGE 4.4 MODEL INCORPORATING GRAPHICAL CONTEXT FOR THE ATTRIBUTE LEG...... 82

5.1 COMPARISON OF RESULTS ACROSS GOODNESS, LABELING AND REACTION TIME PARADIGMS FOR V-Y LETTER PAIR . 92 5.2 COMPARISON OF RESULTS ACROSS GOODNESS, LABELING AND REACTION TIME PARADIGMS FOR C-F LETTER PAIR . 93 5.3 ILLUSTRATION OF PLASTICITY OF INTERLETTER BOUNDARIES FOR THE C-F LETTER PAIR (from [115]) . 100 5.4 LABELING RESULTS ALONG V-Y, D-P AND 0-C TRAJECTORIES AFFECTED BY ADAPTING CONDITIONS OF FIGURE 5.3 (from [115])...... 100

6.1 ILLUSTRATION OF DIFFERENT DISCRIMINABILITY FOR PAIRS OF CHARACTERS DIFFERING BY THE SAME PHYSICAL AMOUNT ...... 110 6.2 ILLUSTRATION OF VARIOUS TYPES OF STIMULUS DISTRIBUTIONS ...... 114 6.3 A COMPARISON OF THE ADAPTATION LEVEL AND RANGE FREQUENCY APPROACHES...... 121 6.4 TWO DIFFERENT STIMULUS DISTRIBUTIONS ILLUSTRATING THE METHOD OF DERIVING THE FREQUENCY FUNCTION . . 130 6.5 PREDICTOR GOODNESS AND FREQUENCY FUNCTIONS AND INFERRED RANGE FUNCTION...... 134 6.6 PREDICTING A GOODNESS FUNCTION FROM ANOTHER STIMULUS DISTRIBUTION...... 134

7.1 GENERAL FUNCTIONAL MEASUREMENT DIAGRAM (following Anderson [11]) ...... 139

7.2 FUNCTIONAL MEASUREMENT DIAGRAM FOR RANGE FREQUENCY. 142 7.3 SIMPLIFIED FUNCTIONAL MEASUREMENT DIAGRAM FOR RANGE FREQUENCY ...... 143

-13- F IGURE PAGE 7.4 AN ILLUSTRATION OF THE DENSITY FUNCTIONS AT VARIOUS STAGES IN THE FUNCTIONAL MEASUREMENT DIAGRAM...... 146 7.5 A HYPOTHETICAL EXAMPLE OF THE PSYCHOPHYSICAL STAGE TRANSFORM ...... 149 7.6 DENSITY FUNCTION ILLUSTRATION OF THE RANGE PRINCIPLE ...... 154 7.7 DENSITY FUNCTION ILLUSTRATIONS-OF THE FREQUENCY PRINCIPLE ...... 158 7.8 DENSITY FUNCTION ILLUSTRATION OF THE RANGE FREQUENCY COMPROMISE...... 161 7.9 THE RELATION OF RANGE CURVES BETWEEN DIFFERENT RANGES OF STIMULI ...... 168

8.1 ILLUSTRATION OF STIMULUS DISTRIBUTIONS FOR EXPERIMENTS INVOLVING V-Y ...... 180 8.2 ILLUSTRATION OF STIMULUS DISTRIBUTIONS FOR EXPERIMENTS INVOLVING C-F ...... 180 8.3 EXAMPLES OF V-Y AND C-F CHARACTERS USED IN EXPERIMENTS 1-6 ...... 184 8.4 RANGE ADAPTING SHEETS FROM EXPERIMENTS 1-6 . 186

8.5 RESULTS FROM V-Y EXPERIMENT 1 (FULL RANGE). .. . . 196 8.6 RESULTS FROM V-Y EXPERIMENT 2 (LOW RANGE) AND EXPERIMENT 3 (HIGH RANGE) ...... 196

8.7 RESULTS FROM C-F EXPERIMENT 4 (FULL RANGE). . . .. 197 8.8 RESULTS FROM C-F EXPERIMENT 5 (LOW RANGE) AND EXPERIMENT 6 (HIGH RANGE) ...... 197

8.9 EXAMPLES OF CHARACTERS USED IN EXPERIMENT 7 ... . 201

8.10 STIMULI FROM THE V-Y AND C-F TRAJECTORIES OF EXPERIMENT 7 ...... 203

8.11 GOODNESS RESULTS FOR V-Y FROM EXPERIMENT 7. . . .. 208

8.12 GOODNESS RESULTS FOR C-F FROM EXPERIMENT 7. . .. . 208

-14- FIG U RE PAGE 8.13 V-Y STIMULI USED IN EXPERIMENT 8...... 211

8.14 V-Y GOODNESS RESULTS FROM EXPERIMENT 8...... 216 8.15 STIMULI FROM THE 420 V-Y TRAJECTORY IN EXPERIMENT 9...... 218

8.16 V-Y GOODNESS RESULTS FROM EXPERIMENT 9...... 220

9.1 AN ILLUSTRATION OF ANALOGOUS (a) HUMAN AND (b) RF MODEL BEHAVIOR IN GOODNESS EXPERIMENTS .. 230 9.2 ILLUSTRATION OF THE PREDICTION PROCESS FROM ONE EXPERIMENT TO ANOTHER ...... 233 9.3 AN ILLUSTRATION OF THE NOISE AMPLIFICATION EFFECT WHEN USING THE UNSMOOTHED GOODNESS CURVE TO INFER THE RANGE CURVE ...... 238 9.4 AN ILLUSTRATION OF THE SMOOTHING PROCESS USED ON THE PREDICTOR GOODNESS CURVES ...... 242 9.5 AN EXAMPLE OF THE SMOOTHING PROCESS APPLIED TO THE VYY1 GOODNESS CURVE...... 243 9.6 FREQUENCY FUNCTIONS FOR V-Y EXPERIMENTS INVOLVING DIFFERENT RANGES...... 246 9.7 FREQUENCY FUNCTIONS FOR FULL RANGE V-Y EXPERIMENTS ...... 246

9.8 FREQUENCY FUNCTIONS FOR ALL C-F EXPERIMENTS ... . 247

9.9 THE PARAMETRIC MODEL FOR W(X), THE WEIGHTING FUNCTION...... 253

9.10 DERIVATION OF THE RANGE FRACTION FROM THE PSYCHOLOGICAL SCALE FUNCTION...... 263 9.11 THE EXHAUSTIVE OPTIMIZATION PROCEDURE FOR THE W(x) FUNCTION ...... 275 9.12 THE SIMPLE HILL CLIMBING PROCEDURE USED IN OPTIMIZING THE RANGE FRACTION ...... 275

9.13 OVERVIEW OF THE COMPUTER ANALYSIS SYSTEM...... 279

-15- FIGURE PAGE 10.1 THE FLAT WEIGHTING FUNCTION FOR SET A. W(x) = 0.55 ...... 291 10.2 GRAPHS OF PREDICTED AND EMPIRICAL V-Y GOODNESS CURVES FOR SET A ...... 293 10.3 GRAPHS OF PREDICTED AND EMPIRICAL C-F GOODNESS CURVES FOR SETA...... 298 10.4 IMPROVED RVYV PREDICTION ASSUMING'RANGE FRACTION, f = 0.83...... 301

10.5 RF COMPROMISE FOR VYY1 ASSUMING W(x) = 0.55 .. .. 302

10.6 THE OPTIMAL W(x) FUNCTION FOR THE V-Y PREDICTIONS . 311 10.7 GRAPHS OF PREDICTED AND EMPIRICAL V-Y GOODNESS CURVES FOR SET B ...... 317 10.8 GRAPHS OF PREDICTED AND EMPIRICAL C-F GOODNESS CURVES FOR SET B ...... 322 10.9 RANGE FREQUENCY COMPROMISE FOR VYY1 WITH OPTIMAL W(x)...... 334

10.10 WEIGHTING FUNCTION FOR SET C, W(x) = 1.0...... 338 10.11 GRAPHS OF PREDICTED AND EMPIRICAL V-Y GOODNESS CURVES FOR SET C ...... 340 10.12 GRAPHS OF PREDICTED AND EMPIRICAL C-F GOODNESS CURVES FOR SET C ...... 345

-16- LIST OF TABLES

TABLE PAGE

3.1 RESULTS FROM "F" COUNTING EXPERIMENT ...... 60

4.1 A SURVEY OF EXPERIMENTAL WORK PERFORMED BASED ON A FUNCTIONAL ATTRIBUTE THEORY OF CHARACTER RECOGNITION...... 89

8.1 SUMMARY OF EXPERIMENTS ...... 177

8.2 RESULTS FROM EXPERIMENTS 1-6...... 198

8.3 TABLE OF ARCHETYPES ACROSS LETTER PAIR ...... 220

9.1 PARAMETERS USED IN OPTIMIZATION,...... 273

10.1 BASIC SET OF PREDICTION PAIRS...... 286

10.2 SET A PREDICTIONS, W(x) = 0.55 ...... 292

10.3 TEN BEST W(x) FUNCTIONS FOR V-Y PREDICTIONS... .. 312

10.4 COMPARISON OF INDIVIDUAL FILE OPTIMUM RESULTS WITH OVERALL V-Y OPTIMUM RESULTS...... 314

10.5 SET B PREDICTIONS, W(x) = (V-Y OPTIMUM)...... 316

10.6 SET C PREDICTIONS, W(x) = 1.0 ...... 339

10.7 COMPARISON OF ERROR ACROSS SETS...... 350 10.8 COMPARISON OF EMPIRICAL AND PREDICTED GOODNESS CROSSOVERS ACROSS SETS& ...... 351

-17- CHAPTER 1

INTRODUCTION

1.1 INTRODUCTION

There has long been a dream of having a machine that could read in as facile a manner as humans. Such an ultimate device would be able to read printed material, typed material, handprinted material and cursive script. The truly educated devices would of course be able to read other alphabets such as Arabic or Cyrillic, and non alphabetic symbols such as

Chinese characters, while other more cultured ones would read and play music (perhaps for their own amusement).

Returning to present reality however, we find no device as versatile as that postulated above, even in the less exotic tasks mentioned.. Some predictions of machine capabilities made ten years ago [99, 166] have failed to materialize.

Today's machine is more likely to deal with one or at most a few fonts at a time. If handprinting is recognized, it likely is fairly constrained and probably limited to numbers only.

The commercial handwriting recognizer doesn't exist as of yet.

There have been inroads in these areas, but usually at great price.

In a recent evaluation of the current state of the infor- mation industry, Withington [213] states that, "Simulation of

-18- human behavior will continue to be difficult. Humans excel at pattern recognition without knowing how they do it." It comes down to the fact that the human being is is only device currently available capable of the desired tasks. Part of the reason for human success at the task of character recognition is due to their ability to bring to bear contextual factors'at various levels onto the recognition of characters. Even humans, as powerful recognizers as they are, in the absence of context still exhibit on the order of a 4% error rate [140] in recognizing unconstrained handprinted characters. Information other than that contained within a character itself must be utilized in order to account for the near perfect recognition rate we would like to claim.

The ultimate goal to which this thesis work may lead is the accomplishment of better performing optical character recognition (OCR) devices through the use of one particular form of contextual information, known as graphical context.

In short, we are working towards a means for adaptive, context dependent character recognition, which would be economically feasible. The immediate goal of this thesis is to develop a usable model for graphical contextual analysis, a form of con- text utilizing the stylistic consistency, within and among characters, in the recognition process.

Given the task of recognizing a particular character, the human observer makes use of the knowledge of the rest of the

-19- set of characters being recognized. There may be characters which are somewhat ambiguous as to their identity as one or the other of two letters. The boundary between such letter pairs might be expected to change with the set of characters to be recognized, due to the graphical characteristics of the set. We would like a character recognition machine machine to extract the same graphical ensemble impression from the set that the human.observer would. It is this type of model toward which this thesis is addressed.

A model that mimics human performance in its analysis of context might successfully be applied to the machine interpre- tation of handprinted material. The current need for a cheap, efficient, accurate recognition device for handprinting in particular is currently unfulfilled and probably will remain so for the immediate future. The potential applications of such systems are great. The incorporation of graphical con- textual analysis may improve the accuracy and performance of such recognition devices.

-20- 1.2 OUTLINE OF INTRODUCTORY AND BACKROUND MATERIAL

This and the following section will attempt to provide preview of the upcoming material in this document. We will briefly describe the contents and intent of each of the upcom- ing chapters and how the information in each relates to the other chapters.

Chapter 2 will concern itself with the general problem of character recognition. We will review performance in this task by both the human and various machines developed thus far. The state of the art in both machine printed and hand printed character recognition will be surveyed. Character recognition is a subject which has been studied intensely over the last twenty-five years without fully achieving all the goals envisioned. We will review the engineering literature in the field, covering both academic endeavors as well as from a commercial and practical viewpoint, encompassing what is currently available in OCR technology.

This thesis is heavily grounded in the field of psychol- ogy and a brief overview of this field relevant to the. present research is given. Finally we will review some of the uses to which character recognition technology can be put.

The use of one particular form of context is the central theme of this thesis and in light of this, Chapter 3 will dis- cuss the general topic of context as an aid to character recognition. We will discuss the meaning of context and then,

-21- through examples, examine some of the various levels of con- text which are applicable to the character recognition task.

Among these will be such higher levels of context as syntax and semantics, in addition to. others such as dictionary or n-gram or other probability techniques. By example we will look at other important, though not usually well covered as- pects of context, cultural, historical and generative con- textual effects.

Finally, we will discuss the important topic of graphical context, the major topic of this thesis. This type of context has to do with the stylistic consistency of characters, both within characters and between adjacent characters. This level of context is a very important component of a functional attribute based theory of character recognition upon which the thesis rests. Graphical context as dealt with in this thesis is probably more amenable to use with handprinted characters than machine printed characters. We will later describe experiments involving contextual factors which may be con- sidered analogous to graphical context.

Chapters 4 and 5 provide a discussion of a functional attribute based theory of character recognition. This theory provides the basic underpinning of the thesis and the bulk of the work described herein is done in the context of-this theory. For this reason a review type discussion of the theory is given in Chapter 4. We will distinguish several different levels of attributes, the essential components of which letters are made. A simplified model of the character recognition process will be described. In this model we will see how context fits into the theory as a modulator of so called Physical to Functional rules. These rules act as a go between between the physical and psychological domains of per- ception. The idea of discovering these PFR's through the use of ambiguous characters is developed. Ambiguous characters form the boundaries between letters. Examples are given for the ideas of this chapter in terms of the letter pair V-Y, which varies in the presence or absence of the functional attribute LEG.

Chapter 5 presents a review of the methodologies used in testing this theory. We will cover the idea that the boun- daries between letters can be discovered by experiments involving trajectories betweeen characters varying along some physical dimension or combination of dimensions. Various experiments can be used to tell the position of the ambiguous character along this trajectory and thus discover the boundary between the two letter extrema.

One of these methodologies, a category rating procedure

(goodness) is given special emphasis, since it is this para- digm which will be used in testing out the model for graphical context to be developed. In this method, each character along the interletter trajectory is rated as to how good a represen- tation it is of a particular letter on a numerical scale, for example, 0-10. This rating procedure is done for both letters of the letter pair. By plotting the rating curves, we will. see that they intersect and that the intersection point may be taken as a.measure of where the interletter boundary lies.

Other paradigms discussed will include labeling, reaction time, direct choice, ABX discrimination, and generation. It will be seen that, for a given letter pair and stimulus set, the boundary found by using different paradigms, remains essentially the same. Another discovery is that different letter pairs, presumably involving the same attribute, show similar behavior in their placement of the interletter bound- ary, indicating that attribute behavior may be the important characteristic, and not merely the specific letters involved.

Some experiments will be briefly described which will show that the interletter boundaries can move around with a change in the range of the stimuli presented. A surprising result shown in this chapter is that when the interletter boundary for one letter pair is shifted by some manipulation of the graphical context, then the boundary for other letter pairs involving that attribute are also affected. Letter pairs not involving that atribute remain unaffected and their boundaries unmoved. This implies that if the rules for the movement of attributes can be predicted, then the interletter boundaries for all letter pairs involving that attribute can be predicted.

-24- The plasticity of the boundary obtained in such experi-

ments is a primary concern in this thesis in that we are look-

ing for a model to explainthis behavior. Previously it had been shown that the plasticity phenomenon had existed and the direction of boundary shifts could be rationalized. Now we

would like to be able to predict in a more precise manner such movements. The Range Frequency Theory, which will be described in the chapters following Chapter 5, is a promising

model to explain contextual effects in character recognition.

1.3 OUTLINE OF RANGE FREQUENCY WORK

Our goal in this thesis is to account for contextual effects, analogous to those from graphical context, which occur in goodness experiments performed. A most promising model in this regard is one based on the Range Frequency

Theory, primarily developed by Parducci [150]. We hope to use this theory with modifications to be able to account for the contextual effects experienced in performing goodness experi- ments with various different stimulus distributions.

This particular model has found some application in psychophysics in explaining some of the effects of context in experiments. Chapter 6 is devoted to laying out the princi- ples of this very useful model. An understanding of this

-25- model is necessary for understanding the remainder of the work presented in this thesis. For this reason we give an histori- cal review of the development of this model. We will follow it through various stages of development up to the so called "simplified" formulation of Parducci and Perrett [157]. This theory is based on the presumption that judgment is a comprom- ise between two conflicting principles, a range principle, depending on the range of stimuli, and a frequency principle, depending on the relative frequencies and distribution of the stimulus set. These principles will be explained by example. in order to give the reader a better understanding of their workings. We will examine a means by which predictions from one experiment to another can be made using the formulation of the simplified range frequency model.

The following chapter, Chapter 7, will continue the treatment of the range frequency model. There it will be put in the context of the functional measurement viewpoint of

Anderson [8]. The range and frequency principles will be covered on a more mathematical and less intuitive basis. We will arrrive a formulation of the RF theory identical to that which was obtained in a less rigorous manner in Chapter 6. Finally we will examine the important case of making predic- tions from a full range of stimuli to that of a partial range. This procedure has been relatively uncovered in the literature to date. With these basic tools, we will be able to make predictions from our goodness experiments to other goodness experiments as a means of testing the formulation of the predictive model.

Chapter 8 will describe experiments which will be used to test the model which we develop. First the philosophy of using the changes in goodness experiments as a test of a graphical context model will be discussed, followed by a dis- cussion of the motivation for the particular choice of experi- ments which are to be analyzed. The selection of the experi- ments reflects a desire to investigate several different aspects. One of-these is related to prediction to both par- tial and full range experiments. The other is the desire to investigate more than a single letter pair. We plan to derive parameters for our model from the data from experiments with one letter pair, V-Y, and to test them with data from experi- ments with data from another letter pair, C-F.

Most of Chapter 8 will be devoted to the description of the methodolgies of the experiments themselves. There we will describe in detail the stimuli, subjects, procedure, and so forth, from each of the experiments. The actual results from the experiments will be presented in the form of goodness curves. Chapter 9 deals with procedures for predicting across experiments,the means by which we will be testing our model.

The prediction procedure to be used will be discussed in fine detail. We will be using the experiments described in Chapter 8

-27- in this prediction process, predicting results from full range experiments to low and high range experiments, as well as other full range experiments.

The computations involved, derivations and implications of each of the important relevant variables involved in the prediction process will be explained. Several parameters of the model will be optimized, and the procedure and philosophy of this optimization will be provided. The chapter actually lays the foundation for the actual predictions to be made in

Chapter 10.

Chapter 10 provides the actual main results from the thesis. In this chapter we will use the procedures of Chapter 9 in order to make predictions for a variety of paired files.

This will involve taking a goodness curve from one experiment and using it.to predict the equivalent goodness curve for another experiment. Several variants of the range frequency model will be tried in making such predictions. The first will be a naive approach done in a traditional manner. Next we will try optimization of some of the model parameters.

Finally we will make a very simplifying assumption and assume that only the range principle is in effect.

Chapter 10 will present many comparison plots for each of the variations of the models tried. These plots show a predicted goodness curve for a particular experiment along with the actual empirical curve which was obtained in that experiment. Next, comparisons will be made between the results obtained with each of the three variants of the model tested. We will compare the sum -squared error terms obtained for each method, as well as the boundary offsets for the predicted curves versus the empirical results. Some of the conclusions drawn from these results will then be discussed. Lastly in Chapter 11, we will summarize the thesis, results and conclusions. We will consider their implications for the design of a character recognition machine. Finally we will indicate some. possible areas for future research in this field of endeavor.

-29- CHAPTER 2

CHARACTER RECOGNITION

2.1 INTRODUCTION

This chapter will deal with the topic of character recog- nition by both human and machine. We will discuss the prob- lems of reading characters, both those printed by machine and those printed by human hand. Some of the approaches to the problems, both from an engineering as well as a psychological viewpoint will be covered. The capabilities and performance of current machines will be reviewed. Finally we will review some of the areas where character recognition technology might be usefully applied.

2.2 THE THREE "R's" - A STATE OF THE ART REPORT

Educators these days often stress the importance of the three "R's", Reading, wRiting and aRithmetic. This is part of the "back to basics" movement which has spurned such book titles as Why Johnny Can't Read ... [751. Meanwhile computer scientists are expressing much the same concerns over the abilities of their computers.

The aRithmetic aspect seems to pose little problem for the computers of today. Even current low cost programmable

-30- calculators (functionally almost small computers) are capable of computations equivalent to those performed in high school and beyond. Bigger computers are becoming faster and faster, performing millions of arithmetic operations a second. The

MACSYMA system [132] performs calculus at college level using such human used methods as integration by parts.

The wRiting ability of computers is certainly also improving quite a bit, with new technologies such as ink jet printing,.laser printing and direct xerography available. High density multiple pass matrix printers can do a tolerable job at reproducing handwriting, Farsi (an Arabic script) and

Chinese characters, not to mention standard typefonts. Text formatting programs such as NROFF [146] allow almost complete specification of format, in some systems including a choice of a wide variety of intermixed typestyles. Line printers today can. print thousands of lines per minute, while computer output on typewriter-like daisy wheel printers is becoming common- place. All of these innovations are part of the goal to make the computer output easier for the human to read and to under- stand. Otherwise we might have to start learning to interpret l's and O's directly. Its a question of whether we talk the computer's language or does the computer talk ours (even speech output is currently becoming more feasible).

-31- In general, so far, it seems to be a much.easier task to have the computer communicate to us graphically on our terms than it is to teach the computer to read our output. The

Reading ability of computers is improving, but is still not up to its other abilities. In this realm there seems to be a much larger "communication gap". Computers can communicate very well with their own kind but when it comes to interacting with humans, they still prefer communication on their own terms. They understand buttons and toggle switches very well, and the punched card, paper and magnetic tape media are also very much to their liking. The now ubiquitous bar codes are a delight for them to read, if still a source of puzzlement to most humans. Direct keyboard input is also quite acceptable.

In the case of optical reading of characters however, computers are a little less advanced. If they must must read alphanumerics they prefer typed or printed input, preferably in one of their own stylized type fonts such as E13B (the Mag- netic Ink font found on bank checks), OCR-A or OCR-B. Some manufacturers had even gone to the extreme of having a tiny bar code typed below the text in order to facilitate computer recognition [144]. Figure 2.1 below illustrates OCR-A one of most common machine oriented alphabets. ABCDEFGHIKLM N 0P RSTUVbJ X YZ 0123 456789

FIGURE 2.1': OCR-A, A COMMON MACHINE ORIENTED TYPE STYLE.

Although many machines indeed do have a much broader reading ability than a single type font, it is usually at great expense (currently on the order of 50 thousand to several million dollars). Those machines that do read hand- print often force writers to highly constrain their printing and quite likely read only numeric characters. Thus.the bal- ance is still much toward the convenience of the computer than toward that of the human. The computer age will truly have arrived when the computer-human interaction problem has been solved, partially through better readers. Then there will no longer be a need for a book entitled "Why My Computer Can't

Read". The mythical "Johnny" still maintains the edge in this important category.

-33- Using'a functional attribute based theory of character recognition as a foundation, this thesis is geared toward helping to close, this reading communication gap in some small way. It will deal with techniques for incorporating graphical context into the recognition process and hopefully lead to improved recognition rates, comparable to those of the human computer. The next two sections will outline some of. the problems to be faced in reading machine printed and hand- printed characters.

2.3 MACHINE PRINTED CHARACTER RECOGNITION - THE PROBLEM

The task of computer reading of machine printed charac- ters, while it may appear to be a simpler problem than hand- printing, is in the general case somewhat difficult. In addi- tion to the many typewriter styles available, there are currently in excess of some 20,000 printed typefaces in existence [105]. This number may be expected to grow with the advent of.new technologies, where fonts can be stored as parameters in computer memory [50] instead of in metal.

Within printing there are many diverse families of type, many sizes and many weights, all with different often subtle special characteristics. A recognition algorithm which works for a Bodoni Light. typefont may not work for Bodoni Bold, par- ticularly with a template matching scheme.

-34- With typed material there is often the problem of imper- fect, broken, smudged, or faint characters, due to the nature of the process. Everyone doesn't type in OCR-A using a carbon ribbon on OCR bond paper. There may also be a further degra- dation of the original image due to Xerox or carbon copies, if these are to be used.

In printed character recognition, many of the same prob- lems arise. In addition to those just mentioned, there may be touching or misaligned characters. Serifs (the bulges at the ends of strokes in printed characters) may cause problems with some algorithms as adjacent serifs merge together. Characters also can overlap somewhat on the x dimension but this is usu- ally not so in the case of typed characters. Thus, even though we think of printed characters as discrete entities, sometimes there is the problem of segmenting one from another.

Print quality is indeed an important factor in the recognition process; some discussion in this area can be found in [32, 21.7].

There are many machines available to recognize one or a small number of fonts such as OCR-A or B or other common typ- ing fonts such as Courier 72 [59]. In these cases, where the within letter category variability is low, template or simple feature matching schemes can provide low error rates. Tem- plate matching schemes consist of comparing the unknown image to the various templates of the alphabet until a best match or overlap is obtained. In other similar schemes the unknown chbracter is viewed through a peephole mask. Different combi-

-35- nations of peepholes in which the unknown character is visible indicate different letters. The problem is partially one of tailoring the machine to the application. It is important to know the consistency of the input material. If the input can be controlled, the simple reader may be the most reasonable solution. In other more uncontrolled environments, such as postal applications, the broader approach is needed.

In order to achieve more versatile machine performance, many schemes have been tried. In some cases the approach is to store the great variety of templates and then, using some efficient architecture, compare against them all. Again this is not a really general approach and does have its limita- tions. A feature approach of some sort is the usual tact.

The presence or absence of certain distinctive features deter- mines the character identity. A desirable property of features is relative invariance for members of a single letter class, while relative variability from class .to class. Many features are geometrical or topological in nature, such as height to width ratio or the presence of closed loops. Some have tried various mathematical transforms of the characters, such as the Fourier Transform, in the hope that the transformed feature space would yield better separation of the characters. A danger with this approach is that it may be merely shifting one's ignorance around from one domain to another.

-36- Within a given typefont there is a certain graphical con- sistency of form. Techniques relying on this graphical con- sistency of characters may be of use in improving recognition, and will be described. in more detail in the following chapter.

2.4 HANDPRINTED CHARACTER RECOGNITION - THE PROBLEM

This section will deal with the problem of recognizing handprinted characters. It has been estimated [178] that 90 percent of all data processed by computer originates in hand- printed or handwritten form. Capturing data at the source could.have significant economic benefits and handprint charac- ter recognition could be very important in achieving them.

Unfortunately there is still a great deal of progress to be made in this area. In this section we will discuss some of the difficulties associated with handprinted character recog- nition and some of the possible solutions.

Usually the term "handprinted characters" refers to upper case block style printing of letters or Arabic numerals.

Nevertheless many people print in mixed upper and lower case and occasionally lapse into script fonts. Despite this, we will, for the purposes of discussion, be considering hand- printing only as defined above.

-37- In handprinting, even if only in the block style, there are potentially an infinite number of forms. Within letter forms for an individual there often is a great deal of variety, due to the printer's mood, speed, writing instrument, and other factors. From my own observations, most people are unaware even of the stroke order that they use in forming characters unless you give them a pencil and let them draw the characters slowly. This is somewhat akin to asking someone to describe the motions of tying a shoelace. Suen [197] points out that there are over thirty different handprint models being taught in the elementary school systems of North Amer-, ica. It is little wonder that a great variety of form exists.

Wright [214] has done extensive work on the writing of Arabic numerals and, for instance, lists some 162 different styles of making a "2". He distinguishes some 6 different tops, 3 dif- ferent middles and 11 different bottoms for this one basic number. There are several studies documenting the multiple varieties of handprinted character forms [60]. For instance

Rengger and Parks [1703 list some 142 different common varieties of the 36 letters.

In light of the great variety in handprinting it should come as. no surprise that computers have a difficult -time deal- ing with such material. There are relatively few full alphanumeric handprint readers commercially available,

-38- although much research has been performed on the subject.

Most of the available devices are geared toward recognition of

numerals and perhaps a few special symbols [59]. Many designers thought that algorithms developed for printed char- acters could, with slight modification, be adapted to recog-

nizing handprint. The tremendous intraletter variability pre- cluded the success of such modifications.

The tact most have taken, in the event of failure to

recognize the unconstrained variety of handprint, is to con-

strain the form of letters used. If the machine cannot read

what we print then we will just have to print what the machine

can read. There have been developed several standard hand-

print sets. The most well known and widely used in this coun-

try is the ANSI (American National Standards Institute) Set

[6], illustrated below in Figure 2.2.

- Number Number Nu ber Nuber Number Nuber Nuber R N utb. Nu ber N bb 0 1 2 3 4 6 6 7 S

C0 0 00 H S A S

K L M N C0 P 0 A S T

Le . ~ Let"W Ltemr t enrttnLt tn FtPC Hyph.t Prid Cmma~ U V W x Y Z in

FIGURE 2.2 : THE ANSI CHARACTER SET FOR HANPRINTING [204]. Such sets have been designed in order to make the job easier for the machine readers. Some of these forms may look strange but they do serve a purpose. For instance, the tail on the S serves to dist.inguish it from 5, an easily confused letter pair. In the event that only alpha letters were possible, the tail on the S could be dropped in favor of the more natural form. Along the same lines, the 0 form of "oh" is to distin- guish it from zero.

Another form of constraint is that of special forms in which to print the characters. These may involve boxes, dots to write around, or subsections of boxes to write through [13, 192]. A collection of papers dealing with the experiences of users in dealing with various types of constraints was pub- lished recently [202]. Other studies on the subject can be found in [60]. The use of constraints buys accuracy but not without some cost in throughput. A study by Apsey [13] showed a one-third degradation in throughput using the ANSI con- straints versus unconstrained handprint. Another more highly constrained set reduced throughput by two thirds. He con- cluded that, "Although minimally constrained handprint data entry applications continue to enjoy some success, throughput losses associated with highly constrained fonts and the avai- lability of other lightly constrained methods with similar performance argue against severely constrained handprint fonts as a viable data entry method". Thus we see that performance

-40- depends on many factors. Constraints may be appropriate in

some applications, while in others, unconstrained input may be unavoidable (e.g., postal applications).

Goff [823 has pointed out that error rates are highly

dependent on the printing environment. He says that

currently, in a highly controlled environment, commercial

error rates of 0.05% might be .expected, while only 0.5% to 2%

for a semicontrolled environment, and from 2% to 15% in an

uncontrolled environment. Reporting on another relatively flexible commercial system, Griffith [861 claims an error rate of less than 0.1% with a reject rate of 5%. A recent report by Iwata et al. [103] claimed an error rate of only 0.1% and remarkably a reject rate of only 0.3%. In these cases the input was constrained, though to what extent is not greatly elaborated. One must be very careful in interpreting error rates. A high reject rate and low error rate may indicate a meaningless result. Also as we have seen the error and reject rates are highly dependent on the constraints imposed upon the printer. For example the low reject rate of Iwata et al.

[103] may indicate that the constraints were fairly rigid.

The standard of performance, as viewed in this thesis, will that of the literate human. This brings us to the question of how well humans can recognize handprinted characters.

It is important to have some criterion for success when evaluating the performance of handprinted character recogni-

-41- tion algorithms. In this thesis we will be comparing perfor- mance against what we consider the ultimate recognizer, the literate human. However, given the task of classifying.iso- lated, unconstrained handprinted characters, even the sophis- ticated human may not perform as well as we might expect. In the classic study by Neisser and Weene [140], individual sub- jects had error rates ranging from 3.5% to 5.1%. Even sub- jects' pooled best guess still resulted in a 3.2% error rate.

The error rate is there because of the occurrence of ambiguous characters, which may be as easily called one letter as some other letter. The Neisser and Weene study shows that such characters do often arise. However, when encountered in our daily lives, they are hardly noticed, being mostly resolved by the context of the situation. In a constrained handprint case judged by humans we might expect almost 0% error rate on char- acters.

In another study, Suen and Shillman [201] studied the U-V confusion pair, one of the most prone to ambiguity effects.

This in fact was one of the most troublesome confusions in the

Neisser and Weene study. Such ambiguous characters and confu- sion pairs form the basis of the theory of character recogni- tion upon which this thesis is based. Using U's and V's from Munson's data set [135, 160], Suen and Shillman compared human performance to that of the.ir algorithm based on a functional attribute theory of character recognition. On the average their algorithm performed correct separations with only a

3.74% error, while the individual human error rate was 8.7%.

Applying a majority vote to derive the human decision resulted in a 3.4% error rate. Also noteworthy was the fact that the

error trends of the algorithm were similar to those of the

human performance, e.g.,more V's were called U's than vice

versa in both cases. Again this study demonstrated the falla-

bility of judgment, both for human and machine, in the absence

of context.

Another recent study by Niemann [141] concluded that the

human visual system performed a better job on handprinted

numeric characters, while it was inferior in recognizing sin- gle font printed characters degraded by additive noise. In investigating classification of such pairs as 5-S, 8-B, 6-G, and O-D (handprinted), Niemann found similar error performance

to humans for his algorithm but different strategies. This contrasts with the Suen and Shillman results which were based

on psychologically derived rules rather than mathematically

.derived rules.

In this light we can look at some of the quoted machine

.error rates with caution. For instance, in Figure 2.3 below

are shown some of the substitution errors from a test of the

GRAFIX-I handprint recognition system [853.

-43- 0/1

;

Ai

FIGURE .2.3 :SOME SUBSTITUTION ERRORS FROM GRAFIX-I HANDPRINT RECOGNITION SYSTEM [85].

This system, an earlier version of that reported by Griffith

[86], performed with a 6% reject rate and a 0.4% error rate.

The question arises whether humans would make the same errors.

In an informal test with several subjects, few of these machine error cases posed any difficulty. This might lead us to believe that, for this data set, human performance is still an order of magnitude above the machine algorithm used. It' also indicates that the testing data set was somewhat con- strained input, as the human error rate on it is relatively sma 1.

In unconstrained cases of handprinting, there is the prdblem of defining what letter a given character actually is.

-44- In the cases considered here, the correct letter is defined by the intention of the character generator. It is not incon- ceivable that the generator, upon viewing the character shortly thereafter, might classify it as a different letter.

An alternative def'inition .might be the best choice among a panel of literate humans, familiar with the alphabet in ques- tion. For example, in Figure 2.4 below, "U-T is in the eye of the beholder", if not in that of the generator (intended as V-Y).

FIGURE 2.4 A "V" AND A "Y" FROM NEISSER AND WEENE'S DATA [1403.

This latter form of definition we would consider as the real test of a character recognition algorithm. We might con- sider the character recognition task done if, given the same set of data, both humans and machine 'algorithm arrive at the same classifications.

-45- There has been much research in the field of handprinted character recognition. As we have seen, the challenge is great and many approaches have been tried. The problem of handprint recognition is difficult enough that it has in fact become a testing ground for pattern recognition algorithms in general. In order to facilitate such tests and to compare different algorithms, the I.E.E.E. Computer Society [160] has avail able several standard character data bases, both for handprinted and machine printed characters. There are four such data bases available for handprinting, two of which are numeric only. The alphanumeric font data sets are binary images, one quantized to a 24x24 [135] and the other to a

12x12 [96] resolution. This image resolution may actually not be fine enough for some recognition. Another problem is that the binarization may have been done at the wrong threshold for good recognition results. In fact, adaptive thresholding is often employed commercially to aid in recognition. If a char- acter is rejected, sometimes it is rescanned at a different binary threshold; this often leads to a recognizable charac- ter. Thus care is needed in the image preprocessing if suc- cessful recognition is to take place.

We have seen that, in the isolated letter recognition case, humans do make errors to a surprising extent. However this effect does not usually show up in our daily lives since

-46- we utilize some form of context in recognizing characters.

This essentially reduces the error rate to nil. In fact many commercial OCR systems contain some reject processing facil- ity. Characters rejected by the machine algorithm are passed on to a human operator for classification, usually displayed in the context of their surrounding characters [177].

This whole area of context will be dealt with in detail in the following chapter. The graphical context, or stylistic consistency of characters, is the topic of this thesis. The use of this form of context is particularly applicable to the recognition of handprinted characters.

2.5 ENGINEERING LITERATURE IN CHARACTER RECOGNITION

In a field such as character recognition, which has been around for so long, there has accumulated a vast amount of literature. Thus the reader is referred to several of the many surveys of the field which have appeared. This section thus will. serve more as a guide to the literature.

Among the earlier surveys of the field is the report by

Stevens [194] tracing the earliest history of OCR through 1961 with numerous references. A later survey is provided by Har- mon (1972) [90]. The bibliography by Shillman et al. (1974)

[187] contains references in both the engineering and psycho- logical domains.

-47- Ullmann's book [2083, though dealing with pattern recog. nition generally, is illustrated primarily in terms of charac- ter recognition applications with an extensive reference list.

A follow up chapter (1976) [209] deals specifically with char- acter recognition in great detailand again contains many references. Two very recent surveys , one by Suen (1978) on character recognition [199], and one by Suen, Berthod, and

Mori (1978) specifically on handprint recognition [200], bring us up to date and again contain an extensive listing of refer- ences in the field. Aside from these, each of the series of bibliographies on image processing by Rosenfeld, for example

[170], contains a section dealing with character recognition.

In addition, other sources -are the proceedings of pattern recognition conferences and workshops, for example [163, 164], which usually have one or more sessions devoted to character recognition.

In order to be commercially viable, character readers should have both speed and accuracy. Many laboratory tech- niques cannot meet these requirements. Nonetheless there are quite a variety of commercial machines available. The DATAPRO

Research Report on Optical Readers [59] provides the most comprehensive listing available. It gives an introductory overview of OCR applications, a summary of the various recog- nition techniques and a user survey of various manufacturers'

-48- machines. Finally there is a comprehensive list of manufac- turers and individual models, annotating the capabilities, recognition techniques, special features, price, number sold, and so forth for each machine. A recent report [591, dated May

1977, lists a total of 24 manufacturers of OCR machines. Of these 14 were offering a handprint option in some form. In contrast, their survey of September 1970 [561 showed 20 makers with only 8 offering a handprint option. Thus we see that the field is growing, albeit slowly.. More and more it is being recognized that handprint is a valuable addition. Nonetheless the recognition techniques for handprinting still leave much to be desired.

The high cost of. the available machines -is also hurting the implementation of more usage of OCR technology. Some per- spectives on the economics of incorporating OCR input versus other means are available from Schantz [180] and also from

Bush and Weaver [36], while [32] also deals with this and many other practical considerations such as paper properties, print quality, document handling, and so forth. A current view of the commercial OCR field can be obtained in OCR Today magazine

[143], the official publication of the OCR Users Association.

This journal contains articles by both users and manufacturers on some of the more practical aspects of OCR. Other sources of information on the industry are provided by reports such as

Optical Scannin News [145], as well as others [144, 145]. As

-49- prices continue to fall a.nd recognition techniques improve, we may expect to see more use being made of commercial character recognition technology.

2.6 PSYCHOLOGICAL LITERATURE IN CHARACTER RECOGNITION

Character recognition is a common meeting ground for many disciplines, among them engineering, computer science, artifi- cial intelligence, linguistics, education, paleography, typog- raphy, and psychology. This last field of psychology is one with particular relevance to this thesis and one which we will review briefly in this section.

The human capacity for such tasks as recognizing faces or scenes has been intriguing to many psychologists, among them

Neisser [139], who called the problem of pattern recognition ubiquitous in psychology. Letters have often been used as stimuli in psychological experiments for testing many diverse human abilities such as memory, cognition, perception, and others.

Character recognition, as well as pattern recognition in general, has been approached in often quite similar manners by both engineers and psychologists. In both disciplines there seem to be two major approaches to pattern recognition, tem- plate matching and feature analysis. The template matching theory is usually dismissed because of its inconsistency with

-50- the apparent human ability to generalize, for instance to recognize a slanted, rotated or magnified version of a letter or object almost as well as the letter or object itself. Reed

[169] has pointed out however that many psychological experi- ments utilize only one form for each category, and indeed a template match may be the most efficient strategy in some of these special situations. For example, Neisser [139] would even allow template matching in such specialized visual pat- tern recognition tasks as fingerprint matching. However most psychologists agree with some sort of feature approach to character recognition.

For example, Neisser's model [139] postulates a hierarchy of feature analyzers all working in parallel and feeding their weighted outputs to still higher analyzers until enough evi- dence has accumulated in favor of one particular class. In support of feature analysis he cites work with stopped images on the retina, where parts (or features) of letter images appear and disappear as units rather than the whole character disappearing. He also points out that small details can have a great influence on pattern categorization and template matching is not sensitive to that sort of variation.

Gibson's model [80] also relies on a feature approach to grapheme perception, one very similar to that developed for phonemes in speech [41]. She developed a table for a particu- lar alphabet describing each letter by the presence or absence

-51- of certain distinctive features, such as straight (horizontal, vertical, or diagonal) segments, curves, intersection, or sym- metry. As a test of the model she and her colleagues postu- lated confusion matrices based on the set of features used, and compared them to those actually obtained using human sub- jects. There was some correlation but the conclusion was that further work was needed [80]. Another important factor is the spatial relationship between features and global properties, which models such as

Gibson's do not emphasize. Grammatical descriptions fall into such structural models. The emphasis here is on how the parts fit together rather than what parts are present. Reed's chapter [169] on structural descriptions provides a good sur- vey of this aspect of psychological pattern recognition.

Reed's book (1973) [169] provides perhaps the best over- view of the pattern recognition field from the psychologist's viewpoint. He reviews most of the current models, mostly in terms of visual pattern recognition. Other relevant books in the field are those of Neisser [1391 on cognitive psychology,

Gibson [80] on perceptual learning and development, and

Dodwell [62] on visual pattern recognition with a more physio- logical view.

A functional attribute theory of character recognition, developed by Blesser, Shillman, Kuklinski, Cox, Ventura and

Eden [28, 29], contains aspects of the feature analysis approach, but uses psychologically based features called func- tional attributes to distinguish letters from each other. This theory, upon which this thesis is based, will be described in.detail in Chapter 4.

2.7 SOME USES OF CHARACTER RECOGNITION

We have seen some of the problems and current capabili- ties in the field of character recognition. As the reading ability of machines continues to improve, we may expect to see an increase in their usage in many aspects of both business and our personal lives. This section will touch upon some of these uses, current and future, of character recognition tech- nology.

Character readers can play a large role in solving the so called "Data Entry Problem" [100, 184]. More and more data is being stored and manipulated via electronic computer. While the amount of this data to be processed is huge and mounting daily, processing speeds of computers are growing faster and faster. However there is a bottleneck in getting the data from its source into the computer in an economical manner. It has been estimated that 50% of the cost associated with a modern data processing centers is from data entry, mostly done by keypunching [35]. To a large extent character readers could replace labor intensive methods such as keypunching, and

-53- would compete with other data entry methods such as key to

disk. Errors in the translation step via keypunching can be

relatively high (on the order of 3% [180]), causing a need for additional keypunch verification. The closer one gets to cap- turing the original data in machine form, the less errors and

more efficiency could result. With the increasing influx of computers into all phases of business, the information

bottleneck is growing. Efficient encoding of such huge infor- mation sources as tax returns, bank checks, credit card tran-

sactions, and mail orders into computer compatible form could be of great economic benefit.

There are many applications for truly versatile character readers, some. of these in.areas where one might not tradition- ally. expect them. As an example, one of these areas i.s the library. One can envision the library catalog of the future as a CRT terminal where the user types or even prints or writes the name of book, author or subject. A central com- puter then provides the information that the user previously found on cards. Many libraries now type their card files into computer data bases which then generate the physical card.

Nonetheless there are huge backlogs of cards which have been generated in the past in a variety of different typefonts and different formats. Converting these into computer form in a system such as the Library of Congress or large university library is a monumental task and one currently under con-

-54- sideration. In the future the entire texts of books will be in computer files. Again there will be the problem. of scan- ning the information into the computer, especially the older volumes (previous to computer typesetting), here complicated by the inclusion of figures which also would have to be recog- nized and handled in a different manner [36]. Even now news- papers have been making use of OCR technology in preparing stories off line on typewriters, scanning them'in, editing them, and then sending the finished product to computer typesetters. Schantz [179] foresees significant applications in the areas of credit transactions, banking, government health care, and retail department stores.

Another usage of character recognition is within the postal systems of the world. First there is the mail sorting problem, that of sorting the incoming mail into the correct output bin according to the address or some form of zipcode. Here there is the problem of reading innumerable different typewriter styles, not to mention the infinite variety of handprinting. Here we might mention that in many foreign countries typewriters may not be as common as in this country and the ability to read handprint is indeed foremost. In the postal environment, errors in destination are very costly. Nonetheless efficiency of the postal system must be improved if it is to remain a viable means of communication. OCR is one way of increasing efficiency.

-55- Looking to the future in postal service, electronic mail is on the horizon. The United States Postal Service is now studying the problem and OCR plays a large role in their current pl ans [72]. It may be more efficient to use OCR to convert the text where possible and transmit this text infor- mation rather than to digitize the whole page with a facsimile process and send it.

One of the earliest goals of OCR research was as an aid to the handicapped in the form of a reading machine for the blind, that is, a recognizer coupled with a speech synth- esizer. Much work has been done in this area both in the past [13, 130] and presently [119, 199]. Commercial machines are currently available from at least one group [118]. In such applications omni-font readers are desirable for maximum versatility. Currently such machines do not address the prob- lem of handprinting, but such ability might be a valuable option. The uses of intelligent OCR appear to be legion. We have mentioned only a few.

-56- 2.8 SUMMARY

In this chapter we have seen some of the problems of reading characters, both machine and handprinted. We have surveyed some of the techniques and models used in both the engineering and psychological domains. Lastly we have seen some of the uses to which working character recognition tech- niques could be put. This chapter has dealt with character recognition in general. The next chapter will deal with the application of various forms of context as an aid to improving the performance of character recognition algorithms. Human utilization of contextual information is one factor which still gives them an edge over current reading machines.

-57- CHAPTER 3 CONTEXT IN CHARACTER.RECOGNITION

3.1 OVERVIEW

One reason that humans are so good at the character recognition task is that they have the ability to utilize various forms of context as an aid to recognition. Letters are usually not perceived as single entities except in special circumstances, but rather as parts that make up syllables, which make up words, which make up phrases, sentences, para- graphs, and so forth.

This chapter will deal briefly with some of these forms of context. It will deal in more detail with the concept of graphical context or stylistic consistency, which forms the basis of this thesis.

What is context? In general, context is the situational setting within which something is judged. Toussaint [205] describes the effect of context as a phenomenon where "an entity Z is seen as one thing in context A and another in con- text B." It is this sort of context which concerns us, since letters are particul.arly susceptible to external influences.

There are many levels of context available to aid us nor- mally in our decoding of a particular character. Usually we draw on this help unconsciously. Perhaps in a case where we

-58- are stumped with someone's handwriting, we would go back and perform a letter by letter analysis, comparing the letter style in one word with that in another. Human ability in this area of synthesizing contextual information is still quite a bit superior to machines, although attempts have been made to incorporate context where possible. A recent survey paper by

Toussaint [2053 covers the incorporation of contextual analysis into pattern recognition at many levels, particularly as it applies to the task of text recognition where the text is in the form of hand or machine printed characters. Another excellent survey of the most common contextual techniques applied in character recognition is found in the thesis by

Fisher [74]. A listing of papers dealing with the utilization of context in character recognition is provided in the bibliography by Shillman et al. [187]. The forms of context to be discussed here are also covered elsewhere by Cox et al.

[523 and Kuklinski [115]. Let us now consider some individual areas where context is applicable.

Since .context acts at so many levels, there is often the question of where to begin an analysis utilizing context. Those approaches which start at the very lowest levels (e.g. individual letters, in text recognition) and try to form bigger and bigger units of meaning are called bottom-up approaches, while those starting at higher levels (e.g. seman- tics) are naturally called top-down approaches. Often a hybrid of both approaches is used.

-59- 3.2 THE POWER OF CONTEXT: AN ILLUSTRATION

We will now consider a relatively easy bottom-up analysis. The reader is asked to perform the straightforward task of counting the number of F's in the following sentence.

FINISHED FILES ARE THE RE-

SULT OF YEARS OF SCIENTIF- IC STUDY COMBINED WITH

THE EXPERIENCE OF YEARS.

Having given this task to a random sample of fifty people at

MIT, the following results were obtained.

TABLE 3.1

RESULTS FROM "F" COUNTING EXPERIMENT

INITIAL COUNT NO. OF F's 3 4 5 6 SUBJECTS*

50 people - TOTAL 33 5 2 10

38 native Roman alphabet 29 3 .2 4 13 native Non Roman alphabet 4 2 1 6

-60- As can be seen the most common answer is three, while close inspection will reveal that the correct answer is six.

Several subjects gave their answer and then, upon rereading the sentence, changed their answer to a lower number. The strangely "hard to find" F's are hidden within the three occurrences of "OF" in the sentence. Dunn-Rankin [65] explains the failure to find the F's within "OF" as due to both the high frequency of the word, which tends to make one almost skip over the word, as well as the phonetic pronuncia- tion of F as a voiced "V" sound instead of the characteristic unvoiced fricative sound. Curiously, subjects whose initial symbol system was not the Roman alphabet (in this case Chinese and Arabic native speakers) did much better at the task, prob- ably due to their less familiarity with the language habits, which forced them to analyze in a different fashion. Some other winning strategies in the task were reading the text backwards, and looking at the shortest words first.

This informal experiment provides a very strong demon- stration that it is often the high levels of context which are used. Individual letters are easily missed, even if one is searching specifically for them. The strong influence of the higher levels of context lead us to the ignorance of such low level parts as letters.

-61- 3.3 HIGHER LEVELS OF CONTEXT

Given that there is some structure in the material being generated, there are some features and methods which may be usefully employed in contextual analysis. The amount of structure limits the amount of application. If the text is known to be English words for instance, there are certain rules to their formation. Studies have shown that when read- ing, humans seem generally not to recognize individual letters (e.g. [373), but to use more of the word level clues, includ- ing such graphological properties as word shape. Phonology plays a role as well, as the words are sounded out. Related to this are spelling rules which tell us that certain letter combinations (e.g. "QR") are unacceptable. We saw some of these effects in the experiment above.

At the higher levels are semantic and syntactic con- straints. Semantics deals with meaning and world knowledge, while syntax is concerned with the aspect of grammar and structure. Even though the phrase "EYE OF HIS BIRTHDAY" may be grammatically correct, that is, the various words are the correct parts of speech for their position, (e.g., "EYE" is a noun in a position where a noun is expected) our knowledge of semantics precludes this as being the correct interpretation. The above phrase is syntactically but. not semantically correct. With "EVE" substituted for "EYE", it makes both syn- tactic and semantic sense. In the phrase "THE E*E OF THE HURRICAN'E" (where * represents a character which is ambiguously a V or a Y), both semantics and syntactics leave us undecided as to the meaning.

EVE and EYE are both acceptable choices. In such a-case we might have to appeal to still higher levels of semantics, or take a guess based on word probability.

Such. syntactic and semantic context has not been used extensively in character recognition but has found more use in the field of speech recognition [109], where the analysis of the individual phonemes is not the usual procedure. In continuous speech analysis there is the serious problem of segmentation which may be more readily solved using a top-down approach rather than a bottom-up procedure.

An example application of syntactic constraints was in the work of Munson [135] with the recognition of handprinted

Fortran code. The constraints of the Fortran programming language provided some aid to the letter recognizer. Most of the work in the application of context to charac- ter or text recognition has occurred around the word level.

Toissaint's review article [205], mentioned earlier, concen- trates mostly on techniques at this level, dictionary, Markov. or probability methods, or hybrid approaches. Another very good appraisal of this area is provided in the thesis by

Fisher [74]. In the situation where the characters are in organized meaningful groups such as words, dictionary methods may be applicable. Given the output of the individual character recognizer, the words may be compared to entries in a diction- ary. If no match is found, depending on the material, errors may even be corrected. Where the number of possible words is limited, a dictionary approach may be feasible, while in other cases the dictionary may be so prohibitively large as to be uneconomical in terms of money, time or computer memory required.

The redundancy of the language in use provides yet another opportunity for contextual analysis. Many letters can be left out of English text yet it may still be quite read- able. The possible letter for a particular position is strongly constrained by those letters which have gone before. A common approach in recognizing text utilizes the statistics of the language as represented by n-gram relationships. These techniques have to do with the probabilities of various letter combinations occurring, such as letter pairs (digrams) or tri- plets (trigrams). Much work has been done in this area and is well covered by Toissaint [205] and Fisher [74]. Positional dependent binary n-grams have also been successfully applied by Riseman and Lcolleagues [71, 171, 172]. Hanson, Riseman and

Fisher [89] have experimented with integrating an independent

-64- contextual post processor (CPP) into a regular character clas- sifier. They point out that even relatively low individual character error rates can lead to very high word error rates.

Using a positional binary n-grams, they were able to reduce a

45% word error rate (9.7% character error rate) to a 2% word error rate with a 1%.reject rate. Also their system allowed for feedback to the cha.racter analyzer from the CPP to make possible additional measurements on questionable characters.

Other CPP approaches with postal applications is represented in the recent work by Doster [63]. Fisher [74] deals in par- ticular with postal applications and makes use of the redun- dancy in the city and state information and the zipcode.

3.4 CULTURAL, HISTORICAL AND GENERATIVE CONTEXT

Cultural, historical and generative context are some of those forms of context whose effect is often subtle but nevertheless real. This section will attempt to give some backround in these easily overlooked areas of contextual influence.

Our cultural backround can often have some influence in our strategy of analyzing letters. This was pointed out ear- lier in the experiment of counting F's, where subjects whose native alphabet was non-Roman performed better. Cultural differences have long been noted in speech perception, for

-65- instance in the classic Lisker and Abramson study [127] of the perception of voiced initial stops. Likewise visual percep- tion differences have been noted between different cultural groups, for example, the susceptibility to the MU"ller-Lyer illusion has been extensively studied (for example, see

[171]).

Small apparent differences may have minimal meaning to one cultural group while very distinct differences to another.

Consider the two characters in Figure 3.1.

To most native English speakers the two characters shown probably appear unambiguously as the number 2. To Chinese viewers however, these are two very different characters, even though both are drawn in an almost identical manner, differing only in the slight extension of the left vertical above the center horizontal. It is this slight extension which distin- guishes the two characters. Further examples of this type and more discussion of cultural context may be found in Cox et al. [52].

Historical context is related to cultural context, differing only in the time rather than the geographical dimen- sion. The forms of our present day letters have evolved over the ages. Consequently, in analyzing them, they must be viewed in proper historical perspective. The words in Figure

3.2 may appear strange to us now but George Washington would have no trouble at all with them. What appears to us now as

-66- cma

FIGURE 3.1: TWO CHINESE CHARACTERS EXHIBITING LITTLE DIFFERENCE TO WESTERNERS.

941% /dcu/cl&hw'?t# /4 ~r 1tt~4 I loor .00

lei

FIGURE 3.2: SEVERAL WORDS EXTRACTED FROM THE "DECLARATION OF INDEPENDENCE ".

/'17 I 7

FIGURE 3.3: "EUROPEAN" 1 AND 7 CONTRASTED WITH THOSE OF THIS COUNTRY.

-67- scripty lower case "f" was actually the symbol for "s" in cer- tain situations, even in cases where there was actually a very similar "f" used.

Wright [214] points out that the forms of our numerals have varied quite a bit over time. He envisions them as existing in a letter space and bumping up against each other. A change in one causes a subsequent change in another for the purpose of avoiding ambiguity. . A modern example of this type of behavior, perhaps a little closer to cultural context , is that of the so called European "1" in Figure 3.3.

In the context of various 1's of this country, the Euro- pean "1" might be classified as a "7". Europeans avoid this problem by crossing the 7 with a horizontal bar in order to avoid ambiguity with "1".

The method by which a person generates characters may also affect how that person perceives characters. This idea has been used in the Eden analysis by synthesis [70, 67, 133] technique for analyzing cursive script, where the analysis process is guided by what possibilities of generation there are. In Chinese character generation this concept is taken to extremes.,. Formally there is a set order of strokes in making a particular ideograph. Although the three ideographs of Fig- ure 3.4 may appear to us to be very similar, they have in fact very different meanings and are formed with different stroke sequences. The differences to Chinese viewers are strongly rooted in the stroke order.

-68- /

.22 V 324-

FIGURE 3.4: CHI NESE IDEOGRAPHS WHICH LOOK SIMILAR TO WESTERNERS.

c~4~QO VV

FIGURE 3.5: TWO V' S ILLUSTRATING HOW THE METHOD OF GENERATION MAY )-u.....=*INFLUENCE PERCEPTION.

FIGURE 3.6: AN E WITH INTERSTROKE LINES VISIBLE.

-69- As pointed out earlier, there are many possible ways of forming a particular letter. Some of these methods may intrude on another letter's space more than others. It is conceivable that the human perceiver's way of generating char- acters may weight the decision toward one or the other as shown in Figure 3.5.

Many times in printing quickly, the pen or pencil is not lifted between strokes, resulting in heavier and fainter lines in the character, as in Figure 3.6. One might question whether the character in Figure 3.6 is an E or a B. A knowledge of the method of generation might be necessary to make the decision.

Another aspect of the generation context is the problem of left-handed people. Left handed printers often form their letters differently than right handers. For instance, they might tend to draw horizontal strokes, as in E, right to left instead of from left to right.

Factors such as those just discussed could be very relevant to character recognition systems, especially to the design of an online dynamic character recognition systems, where the stroke order information is of more importance.

-70- 3.5 GRAPHICAL CONTEXT

The concern in this thesis is mainly with an area of con- textual influence known as graphical context. This form of context has to do with the stylistic consistency both between neighboring characters and within a character itself. These are called respectively intercharacter and intracharacter graphical context.

Consider for a moment the example below in Figure 3.7. Either a V or a Y interpretation is possible from a word point of view. The identification, however, can be inferred by exa- mining neighboring characters. The middle horizontal stroke on the E defines the middle of the character, thus permitting the second character to be treated as artifact, and the char- acter interpreted as V, even though the overhang is physically longer than that of the middle character in Figure 3.7.

Intercharacter context, as illustrated below, can be of con- siderable use in resolving ambiguity.

In the printed character domain, a letter from one type font would look distinctly out of place in a set of letters from a different type font., and would violate the principle of intercharacter graphical context as illustrated in Figure 3.8a below. It might seem as much out of place as a Stephen Foster melody in a program of Bach works. In a similar manner, if the top of a letter had serifs while the bottom did not, this wodld violate the principle of intracharacter context (See

Figure 3.8b). Y =i-a * V

(a) (b)

FIGURE 3.7: AN EXAMPLE OF THE USE OF INTERCHARACTER GRAPHICAL CONTEXT.,

ABC D EGH. JL m NO

ABCDE.FG HIJK LM

NOP Q RSTUVWXYZ

FIGURE 3.8: EXAMPLES OF THE VIOLATION OF INTRA AND INTER CHARACTER GRAPHICAL CONTEXT IN THE PRINTED CHARACTER DOMAIN.

-72- Stylistic consistency is information which can be used in recognition. This type of context is almost universally available even when other, more commonly used forms are not. Despite this, in many situations it is ignored or even con- sidered as some form of undesirable noise. There is also a large class of characters for which other sophisticated con- textual techniques may be of little or no use. In the machine recognition of numbers, dictionary lookup, n-grams or word recognition techniques all are to no avail usually (except perhaps in structured numbers such as zipcode). Yet even here there is still the stylistic information available.

The question arises as to whether it is possible to characterize the underlying style of a particular font, whether machine or handprinted. Some attempts have been made on the generation end. Along these lines there is Eden's analysis-by-synthesis approach [67] which incorporates genera- tive rules in the analysis of cursive script. Coueignoux [50] attempts to provide a parametric description of typestyle, while Cox, Blesser, and Eden [51] enumerate type style rules for a number of different type fonts. The use of grammatical approaches in pattern recognition has been growing and today is recognized a a substantial subset of the field. Likewise the application to character recognition has been growing.

The purpose is to develop some form. of a grammar of charac-

-73- ters, often along the line of Chomsky's transformational gram- mar [43, 124] approach. Eden's pioneering work in this area dealt with a grammar for cursive script [67]. Other represent- ative work involving some form of grammar of characters is pro- vided for handprinted characters by Ali and Pavlidis [2, 161], for printed characters by Herrick [96, 97], and for Chinese characters by Rankin [168].

While many of these approaches are very general in that they specify a broad class of alphabets, the Cox, Blesser and Eden [51, 52] and Coueignoux [50] work emphasizes the particu- lar qualities unique to that particular type font. For exam- ple, Cox et al. [51] have stroke style rules which specify the placement of thick and thin strokes within a printed character and make rules which specify the placement of the serifs on the strokes. These apply to a range of different typestyles.

Also given are dialect rules, additional rules peculiar to a particular typefont which, with the previous stroke and marker rules, completely specify the stylistics of the type font.

Coueignoux [50] specifies characters in much finer detail, with such parameters as the degree of curvature in the join between a serif and the stroke to which it is attached. These two approaches do indeed incorporate the concept of graphical context in that the stylistic consistency idea is tied up in the stroke and marker rules or parameters. This information can be used in the recognition phase as an aid to the recog- ni zer.

-74- Graphical context has thus far received relatively little attention as a means of improving performance in character recognition systems. The concept of graphical context plays an important role in a theory of character recognition developed by Blesser et al. [291 to be discussed in the fol- lowing chapter. There we see specifically how graphical con- text fits into a functional attribute based character recogni- tion system and how rules which discriminate between letters can vary as a function of graphical context.

Our emphasis on graphical context is not meant to down- play the importance of any of the other forms of context.

Graphical context is one form of context and the contribution to be made here is only one step in our total understanding of the role of context in recognition systems. An approach which integrates context at all levels would make a powerful recog- nition system indeed, but such a system will probably remain in the future for some time.

-75- CHAPTER 4

A FUNCTIONAL ATTRIBUTE BASED THEORY OF CHARACTER RECOGNITION

4.1 INTRODUCTION

The work of the proposed thesis will be studied for the most part in terms of a theory of character recognition pro- posed by Blesser,.Shillman, Kuklinski, Cox, Eden, and Ventura

[29]. This theory and its associated methodologies have exhib- ited a moderate amount of success in describing human percep- tion of characters. For example, as mentioned earlier, Suen and Shillman [201] found a close agreement between human per- formance and their algorithm in recognizing unconstrained U's and V's using this approach. The details of the theory [28, 29] and methodologies [26, 188] can be found in the refer- ences.

A brief description, along with examples, of the workings of the theory will now be given in this chapter. A survey of the methodologies for this theory will follow in Chapter 5. An understanding of the basic ideas here are necessary for understanding the motivation and development of the main part of this thesis.

-76- 4.2 THE CONCEPT OF ATTRIBUTE

The theory we are discussing has, as its basis, the description of letters in terms of functional attributes, which are abstract descriptors of letters. In the theory, three levels of attributes are postulated, physical, percep- tual and functional. The term attribute with its different levels is used rather than the more common term "feature" or

'distinctive feature" since these latter terms do not make clear the level to which they are referring.

The meaning of the different. levels of attributes is illustrated below in Figure 4.1. The presence of a particular level of attribute is indicated by a plus sign, while its absence is indicated by a minus sign. The attribute in this case is that of LEG which distinguishes the letters V and Y.

If one starts with the letter V and gradually extends the right diagonal stroke past the bottom intersection, eventually the point will be reached where most people would be more wil- ling to call the character Y than V. This line extension past the bottom intersection we choose to call a "LEG".

Physical attributes are properties of the character which make up the physical image. In this class are such descrip- tors as lines, angles and intersections. On the physical level, Characters 2, 3 and 4 in Figure 4.1 have a physical

LEG. In the case of Character 2, it is physically present, although too small to be read in normal viewing. The state of f4 V Physical LEG + -t Perceptual LEG Functional LEG -

LETTER LABEL v v v _ _

FIGURE 4.1: THREE LEVELS OF THE ATTRIBUTE LEG.

(b)ZZfZIIFF.kLk_ (c)LJJL 'K

FIGURE 4.2: CHARACTERS ALONG V-Y, C-F, AND U-H TRAJECTORIES.

-78- the attribute is determined by physical measurement for the presence of a line extension.

Perceptual attributes are closely related to physical attributes but depend on whether the human perceives that the. attribute is present. The critical question in this case is

"Do you see a line extension?". In this sense Character2 would not have a perceptual LEG, while Character 3 above would definitely have one since if asked whether one saw a LEG, the answer would probably be in the affirmative.

The functional attribute is more abstract, but really provides the underlying description of letters. The presence of this level of attribute can be be determined by asking the question "Is the character V or Y?". Under this test, only

Character 4 contains the functional attribute LEG. Although

Character 3 has a readily perceived LEG, it is not enough for one to call it a Y. The relations between the functional attributes, which specify the letter's identity, and the physical attributes, which are derived from the physical image, are called Physical to Functional Rules (PFR's). These rules are not constant and, according to the theory, may be modulated by external contextual factors. Kuklinski [115] has demonstrated that such PFR's do change as a function of an experimentally con- trolled contextual situation. The manner in which and the degree to which these rules are affected by context, espe- cially by graphical context, is the primary interest of this the s is.

Let us now further examine the V-Y example. Consider the middle character in the V-Y series of Figure 4.2a. This char- acter might be classified as either the letter V or-the letter

Y depending on whether or not the tail extending beyond the intersection is considered artifact or not. The theory relies heavily on such ambiguous characters. They form the boun- daries between different letters in some hypothetical n-space.

In the theory it is hypothesized that the functional attribute

LEG is involved in the distinction between the two letters V and Y, as well a between C and F and between U and H as illus- trated in Figures 4.2b and 4.2c. We formalize this distinc- tion by the following rule:

present

(4.1) Functional LEG: k/L q

not present

that is, if the ratio of the tail, k, to length, L, is greater than a certain percentage, q, the leg is considered function- ally present and the character is identified as the letter Y by the human observer. The value of q in the above rule might

-80- be expected to differ under the influence of various, contex- tual environments. When the nature of such PFR's is fully specified, they may become the basis for algorithms in automatic character recognition systems.

4.3 A SIMPLIFIED MODEL

A simplified model for the character recognition process based on a functional attribute theory is presented below in

Figure 4.3. This model may be used as a representation of either a human or machine character recognition process.

In machine terms, the character on the page would be entered into the machine through some input device, for exam- ple a laser scanner or CCD array. The representation in the machine at this point might consist of a two dimensional array of black and white points. Measurements would be taken on the machine representation in order to extract a set of physi- cal attributes. The set of Physical to Functional Rules

(PFR's) provides a mapping from the physical domain into the functional domain. The assignment of letter label follows from the resulting functional attribute representation. In this model the PFR's are modulated by various levels of con- textual factors, such as graphical context or syntactic and semantic constraints. We will primarily be concerned with the effect of graphical context upon the PFR's.

-81- INPUT PHYSICAL PHYSICAL FUNCTIONAL C IARCTRCHA " CTR'EATURAE DEv IC FATR. ATTRIBUTESP , SALGORI THM LA BEL5 ATTRIBUTES DECSION LETTER

CONTEXTUAL FACTORS GRAPHICAL CONTEXT SEMANTIC CONSTRAINTS SYNTACTIC CONSTRAINTS ETC.

FIGURE 4.3: A SIMPLIFIED MODEL OF THE CHARACTER RECOGNITION PROCESS.

q' GRAPHICAL CONTEXT ANALYZER

CURRENTPHYS ICAL FUNC TIONAL ATTRIBUTES ATTRIBUTES (k/) 4 (k/L) 3 2 CHARACTER present I I I I k/L)5 - - ca C C 6 C5 INPUT PHYSICAL L not - DECISION C C C C -DEVICE FEATURE present CALGORITHM 4 3 2 1 EXTRACTORS--FR: LEG FUTURE - ALREADY INPUT PROCESSED CHARACTERS OTHER PERs CHARACTERS

FIGURE 4.4: MODEL INCORPORATING GRAPHICAL CONTEXT FOR THE ATTRIBUTE LEG.

-82- The diagram in Figure 4.3 is not meant to imply that

feedback is not employed. Some form of feedback is implicit

in the use of contextual constraints. A given character is

considered relative to its environment. This means that past

and future characters and characteristics of characters are utilized in the contextual analysis.

Consider the following example of how graphical context

could have an effect on the PFR for LEG. The classification

of a midrange character in a V-Y continuum of the type we have

been considering could very well depend on the other types of

characters with which it is presented. If such a character

were embedded in a set of relatively sloppy characters where

lines were commonly extended past intersections, then a clas- sification as Y might more likely result. Equivalently the

threshold value, q, in Relation 4.1 might be expected to be

somewhat higher, indicating more tolerance for LEG hangoff.

Under neater circumstances, the opposite effect might occur,

with the threshold being lowered. Effects of.this type were

illustrated in the last chapter in Figure 3.7.

The hypothesis of this thesis is that the threshold, q, could be determined, given the set of characters involving

that particular attribute. In our example above, if we knew where along the k/L continuum the previous characters involv-

ing LEG fell, then we would be able to determine the value of

-83- q at that particular instance in time. A description of the model that determines this value is a primary goal of the work to be described. We will try to formulate such a model from experiments involving a variety of stimuli along interletter continua. A slightly more embellis.hed description of our simplified character recognition model, incorporating feedback from a graphical context analyzer, is shown in Figure 4.4. This par- ticular description considers the use of information from past characters in the analysis of the current character. The k/L values of characters involving the attribute LEG are stored and analyzed by the graphical context model to arrive at a threshold value of q, for the current character being analyzed. The presence or absence of functional LEG in the current input character is determined by .comparison of the current character's k/L value with the current value of q.

Following this, a revised graphical context analysis, incor- porating the current character's k/L value, is performed in. order to predict the threshold for the following input charac- ter involving a LEG test. Similar analyses could be done for attributes other than LEG. This gives a simplified view of where the interest of this thesis lies. We will try to develop a better idea of the contents of the graphical context box.

-84- 4.4 SUMMARY

This chapter has described the basics of the functional attribute based theory of character recognition. These are briefly summarized in the following paragraphs.

There are three levels of attributes, physical, percep- tual, and functional. There are physical to functional rules

(PFR's) which map from physical measurements on a character to a functional attribute state. Letters are distinguished from. one another by functional attributes. A letter is specified by its functional attributes and their relationships to one another.

Ambiguous characters are the key to discovering func- tional attributes. PFR's are a function of contextual fac- tors. One of these contextual factors is graphical context.

A knowledge of the set of characters to be identified contains information from which the PFR threshold can be determined.

It is this step which is the focus of this thesis.

Thus far we have limited ourselves to only a theoretical discussion of matters with no mention of how these PFR's are empirically determined. The following chapter will review the methodologies developed for this purpose. Of particular interest there will be the so called ''goodness" methodology which will be used in unraveling the mysteries of graphical contextual effects.

-85- CHAPTER 5

METHODOLOGIES FOR'A FUNCTIONAL ATTRIBUTE BASED

THEORY OF CHARACTER RECOGNITION

5.1 INTRODUCTION

This chapter will continue our discussion of the func- tional attribute based theory of character recognition. Here we will review the experimental work which has been done in investigating this theory. We will examine the variety of methodologies which have been employed in the investigations of this theory. Of particular interest will be the goodness methodology which we will be using to investigate graphical contextual effects.

Because the functional attributes are themselves abstrac- tions inherent in the identity of a character, we cannot study them directly. Rather we can only study the PFR's which relate the. abstract attribute to the physical image and thereby infer certain properties of the functional attribute.

In effect, the operational definition of a functional attri- bute is represented by the empirically derived rules. The justification of attributes is found in their ability to com- pactly and accurately describe experimental observations.

As we have seen, the functional attribute theory of char- acter recognition relies on ambiguous characters, characters

-86- with a functional attribute in transition. Such ambiguous characters point out which are the important attributes and areas of characters which should be studied. The magnitude of the problem of specifying rules for all ambiguous characters can be grasped by looking at the chart of ambiguous characters provided by Blesser et al. [28]. This chart provides over 190 examples of characters in transition from one letter to another. Were numerals and lower case letters also to be included, the number of possible ambiguities would be large indeed. Shillman and Babcock [185] provide an overview of the process of investigating a functional attribute given an interesting letter pair. In their case, they describe the analysis of the 2-Z confusion pair and explain the process of finding interesting candidate attributes to investigate this particular pair. Our next section will deal with a review of the methodologies involved in such investigations.

5.2 A REVIEW OF EXPERIMENTAL WORK

Experiments with human observers provide us the means to ascertain the makeup of the Physical to Functional rules.

These experiments involve working with sequences of characters that change their identity from one letter to another. The subjects' task is to somehow indicate in a direct or indirect manner where the boundary on the interletter continuum lies.

-87- For example, given the V-Y interletter continuum of Figure

4.2, we would like to know where the boundary between V and Y 1ies on the k/L dimension.

Table 5.1 below provides a survey of the experimental work performed which has, as a basis, the functional attribute based the.ory of character recognition. This table lists the references where the work may be found, in addition to other relevant data. As can be seen from the table, there have been a fair number of letter pairs studied, involving several dif- ferent attributes, using a variety of different paradigms.

While much of the work has involved studies on the attri- bute LEG, there have been studies with several other attri- butes, for example the important distinction between U and V mentioned earlier. The work cited here represents studies of only a small fraction of the potential large number of possi- ble interletter ambiguities which exist. Shillman [182] presents a list of important functional attributes and an analysis of the 26 uppercase capital letters in terms of them.

It is not only the functional attributes which are important but also the spatial relationships between them.

Most of the experimental work to date has dealt with the cases where there is only a single functional attribute in transition. The interaction of several functional attributes at once is a fairly common occurrence, but remains undealt with in the work to date.

-88- TABLE 5.1

A SURVEY OF EXPERIMENTAL WORK PERFORMED BASED ON A FUNCTIONAL ATTRIBUTE THEORY OF.CHARACTER RECOGNITION

LETTER AUTHOR REF. PAIR ATTRIBUTE PARADIGMS

Babcock [15] 2-Z SEGMENTATION G, L

Ch ow [44] V-Y LEG L, DC, M

Jans [104] CORNER, L, DC PARALLELISM

Kukl i nski [115] V-Y L EG L B] esser, [26] C-F G, Kuklinski, U-H LEG" E L Shi Ilman D-P "I C-0 CLOSURE L i-U BAY L

Kuklinski., T.,[116] V-Y LEG G, L Kuklinski, R.. C-F "1 D-P "' 0-P it "I

V-X "t U-H "I 0-A C-I

Lee [123] V-Y LINE ADDITION L F-E "I I" L-C " "I

Naus, M., [138] V-Y LEG DC, GEN "I ShiIlman C-F "t U-H I

Sefk ow [181] R-A SYMMETRY G, L B-8 11

Shi 1 man [182] V-Y LEG G, L, RT Shi Iman, [189] C-F " E Kukli nski U - H Bl esser M-K 0-C CLOSURE G, L, RT 0-U "1 It Shillman, [190] O-D SYMMETRY L, DC Naus, G. Suen, [201] U-v Various L ShilU1man

WaIdron [211] V- Y LEG G, L Kuklinski, [117] D-P Wal dron

Yasuhara [218] C-F LEG SD 0-P "i C-0 CLOSURE SD F-P" 5-9 " 6-8 " 5-6 " 9-8 Yasuhara, [219] V-X LEG L, RT,.ABX Kukl i nski

PARADIGM CODE:

ABX - ABX Discrimination DC - Direct Choice G - Goodness GEN - Generation L - Labeling M - Matching RT - Reaction Time SD - Same-Different

-90- 5.3 THE GOODNESS METHODOLOGY

In order to investigate functional attributes, a variety of methodologies have been developed. One of the most -impor- tant .of these is the goodness or category rating method.

Stimuli are drawn on a continuum between two archetypal forms, for example V and Y as shown in Figure 4.2 previously, along some changing physical variable, in this case the ratio k/L.

Subjects are shown each stimulus and asked to rate each one as to how well it represents both of the archetypes (V or Y) on a numerical scale (say from 0 to 5, where 0 would mean no or very poor representation and 5 would indicate excellent representation). The ratings obtained for each stimulus are averaged and plotted as curves (from [26]), one reflecting the goodness of each stimulus as V (G1 ), the other as Y (GY), as in Figure 5.la below. The place where the two curves cross is known as the Point of Subjective Equality (PSE), and the phy- sical value associated with it may be taken as the value of q for the particular context of the experiment. The value of q represents the boundary value between the letters of the letter pair on that particular continuum. In Figure 5.la, q is apptdximately 0.16.

-91- LABELING PROBABILITY 1.0 y

.8

.6 L .4

.2 Pv 0 kL REACTION TIME 2.55 (SEC.) 2.5

2.0

1.5

1.09

0.5 1-

I I - 1 - 1 f I 0 1I -- -- . .1 .2 .3 .4 .5 k/L

GOODNESS 5

4 C2 Yw ...... - 2

0 .1 .2 .3 .4 .5 k/L.

FIGURE 5.1: COMPARISON OF RESULTS ACROSS GOODNESS, LABELING, AND REACTION TIME PARADIGMS FOR V-Y LETTER PAIR.

-92- LABELING PROBABiLITY 1.0 pF

.8

.6

.4

.2

kp/1 .1.2 .3.4 . L REACTION TIME (SEC.) 2.5

2.0

b 1.5

1.0

0.5

0 I I I I I .1 .2 .3 .4 .5 k/L

GOODNESS 54 G F 4

C 3

2

0 7 i.2 .3 .415 k/L

FIGURE 5.2: COMPARISON OF RESULTS ACROSS GOODNESS, LABELING, AND REACTION TIME PARADIGMS FOR C-F LETTER PAIR.

-93- The goodness rating method has shown itself to be a rela- tively sensitive measure, and the technique of category rating has found use in many areas. It has been used generally for scaling of psychophysical quantities [30, 66], for rating such qualities as handwriting legibility E142], television picture quality [4, 5, 162], and speech quality [25, 84]. More recently it has been used in a similar manner to that in which we had been using the method to quantify interphonemic boun- daries in studies of speech perception [47, 48].

The sensitivity of the goodness paradigm has led us to use it as the paradigm for studying the sometimes subtle effects of graphical context. We will discuss this more in our upcoming experimental chapter.

5.4 SOME OTHER METHODOLOGIES

Other methods of determining interletter boundaries have been employed, among them labeling and reaction time. In the labeling paradigm, subjects are shown stimuli along the inter- letter continuum and asked to decide which archetype a partic- ular stimulus best represents (essentially a two category rat.- ing task). Reaction time experiments involve measuring the subjects' time to identify each stimulus along the continuum.

The hypothesis is that the subject will take longest to ident- ify a stimulus at the interletter boundary.

-94- Results from the use of these other two paradigms (from

[26]) are illustrated in Figures 5.1b and 5.1c. The labeling curves cross at approximately the same location along the k/L axis as the goodness curves, while the reaction time peaks around this same location. The similarity of results across paradigms provides evidence that the resulting rule for LEG is consistent and independent of the testing method.

It is also hypothesized that the same attribute LEG is involved in ambiguities involving other letter pairs, such as

C-F. We expect that the PFR for this letter pair might be similar to that for V-Y. The results (from [26]), with the same paradigms as above, shown in Figures 5.2a, b, and c appear to support .this contention. There is a similar match- ing of behavior across paradigm, and some similarity of results across letter pair.

Aside from the goodness, labeling and reaction time para- digms, several other methods have been employed in our group's studies. For instance, Naus and Shillman [138] employed a generation task where subjects were. instructed to draw both ambiguous and archetypal characters. In the case of V-Y for example, subjects would be given the right diagonal and asked to draw in the left diagonal such that the resulting character could as easily be called a V as a Y by that particular sub- ject. This study showed that indeed the archetype is impor-

-95- tant in specifying the PFR. It appears from this work that it is really the ratio of ambiguous to archetype that remains constant and not merely the k/L ratio. In a sense the gen- erated character provides some sort of a neutral graphical context for the generator's characters. The generator's archetype provides a context of sorts for other characters that the generator provides. It is an anchor against which other characters are judged. We will deal more with this ambiguous to archetype constancy in Chapter 8, citing evidence from other experiments.

Another paradigm used for pinpointing the interletter boundary is that of direct choice employed by Chow [44], Jans

[104], and Naus and Shillman [138]. The method is simple and requires little time or effort. Subjects are shown a series of characters along an interletter continuum and asked to choose the most ambiguous. character, that character which would as easily be called one letter as the other. Archetypes are also chosen in this manner.

Most of the stimuli in these experiments have been very thin line characters with no variation in stroke thickness, as is the case in our normal printed characters. Chow [44] investigated, through a matching paradigm, which ambiguous characters of different stroke widths corresponded to each other. This would be important in any practical recognition system, since our relatively thin experimental characters are

-96- the exception rather than the rule. For example in the case of the rule for LEG, the physical variable, k/L, needs to be calculated. If the character is thick stroked, there is some problem in deciding just where the intersection should be defined. Chow found some reasonable parameters to handle these more. practical cases of measurement.

Yasuhara and Kuklinski [2191 have investigated yet another method which points to a consistent interletter boun- dary between different paradigms. In speech perception there is a well known phenomenon known as categorical perception, which seems to occur between certain classes of speech sounds. Basically the phenomenon consists of very poor discrimination ability, at or near chance, between stimuli within the same class, while excellent discriminability between stimuli of different classes. In the case of graphemes, Yasuhara and

Kuklinski [219] first, using labeling and reaction time para- digms, measured the interletter boundary for several different subjects along a V-X continuum involving the attribute LEG.

As in the previous studies, these results matched fairly well in their placement of the interletter boundary, that is, the reaction time peak occurred at the same position as the label- ing crossover.

They then used an ABX paradigm to investigate the discriminability between adjacent stimuli along the continuum.

In such a paradigm three stimuli are presented sequentially.

-97- The first two stimuli (A and B) are different, while the third (X) is identical to either the first or the second. The subject's task is to determine which of the.first two stimuli the third or X stimulus matches. The percentage correct for each stimulus pair is a measure of the discriminability along the continuum. As hypothesized, they discovered a distinct peaking in discrimination which corresponded exactly with the interletter boundary determined from labeling and reaction time experi- ments for that particular subject. Another group of subjects performed only the ABX task. They were shown the same stimuli upside down, where they would not as likely be identified as letter stimuli. These subjects did not exhibit a similar peaking in discriminability. In a letter mode then, there is some evidence of boundary discriminability enhancement, but not when in a mode where the characters are not perceived as letters. The results of this experiment lend more credibility to the hypothesis of consistency of results across experimen- tal paradigm.

These have been examples of a few of the experimental paradigms used. More adequate descriptions can be found in the individual references.

-98- 5.5 PLASTICITY OF INTERLETTER BOUNDARIES

We will now consider some experiments very relevant to the main topic of this thesis, graphical context. We have a strong interest in the manner in which and the degree to which the interletter boundaries are plastic, that is, how suscepti- ble they are to graphical context.

A set of experiments, performed by Kuklinski [115], demonstrated that interletter boundaries, obtained through the use of goodness and labeling methodologies, could be shifted around under the influence of the experimental context, namely the range of stimuli. With different ranges of stimuli, dif- ferent interletter boundaries were obtained, as illustrated below for the C-F letter pair in Figure 5.3. There we see that the interletter boundaries, as indicated by the crossover of the goodness curves, are quite different. For the low range set, the boundaries have shifted lower, while for the high range set, the boundary has shifted higher, relative to where it might normally be found in a full range experiment such as that illustrated in Figure 5.2 earlier. We can rationalize the shift in the following manner.

Low range stimuli, when considered in the context of just other low range stimuli, appear somewhat more Y-like. For instance, the highest of the low range stimuli, appears more

Y-like due to the lack of competition from higher valued stimuli which more closely resemble an archetypal Y. This

-99- Condition -ondition 2 ------I

10 \Q

7 -o G+ 0 5 0 0 Error bars cr 4 0

2 GF GF,

0 0.05 0.10 015 020 025 030 0.35 0.40 0.45 Q50 V/L

FIGURE 5.3: ILLUSTRATION OF PLASTICITY OF INTERLETTER BOUNDARIES FOR THE C-F LETTER PAIR (from [115]).

0.8 ~ ~ Fy Y,2

0.6 v ~ y 0.4

50.2

0 0.05 010 0.15 0.20 025 0.30 -tK/L 1 .A)--P

08 20.6 P=-PP

a 0.4

J 0.2 0

.015 0.25 030 K/L .0

06- CL

02

0 0.05 0.10 0.15 0.20 0.25 Q-3 Error bars: 68%C. gh

.FIGURE 5.4: LABELING RESULTS ALONG V-Y, D-P, AND 0-C TRAJECTORIES AFFECTED BY ADAPTING CONDITIONS OF FIGURE 5.3 (from [115]). -100- effectively lowers the interletter boundary. A similar effect happens when only high range stimuli are shown. This phenomenon is a demonstration of the well known psychological contrast effect [61]. If after completing an experiment such as that described above, the subject is then shown some test stimuli involving another interletter pair (involving the same attribute), a boundary shift is obtained for that pair also.

If on the other hand, a different attribute is involved in the letter pair, no such shift is observed. For example, the labeling results of Figure 5.4 show this important behavior.

After shifting a C-F boundary (Figure 5.3) via the goodness paradigm, the V-Y and D-P labeling boundaries of Figure 5.4 are significantly different, while the 0-C boundary, hypothesized to involve the attribute CLOSURE has not shifted.

Such an experiment provides a means for testing the relevance of a particular attribute to a particular ambiguity.

From the above results, it appears that shifting a subject's PFR by manipulating the context will shift the PFR for all other characters containing that attribute. It would not affect characters not involving the attribute in question. Were this true, then it would provide a great simplification in any contextual analysis. In this case, the contextual effect would only have to be obtained for a relatively small number of functional attributes.

-101- .Our main goal in this thesis will be to account for

behavior of the sort found in the experiments described here.

We will be analyzing these experiments, as well as others, in

order to arrive at a suitable model to describe the effects found. The most promising model in this regard is one based

on the so called Range Frequency Theory to be described in the next chapter.

5.6 SUMMARY

In this chapter we reviewed the methodologies which are

used to investigate the PFR's of the functional attribute

based theory of character recognition. The goal in running

experiments has been to determine the interletter boundaries

between various letter pairs. This is done by looking for the

most ambiguous character on the continuum between the two

archetype letters.

We saw that there were many different types of paradigms

helpful in ascertaining the interletter boundary. One of the most important of these was the goodness methodology, which we

will use later, in which the interletter .boundary is specified

by the crossover of two category rating curves. Other impor-

tant methods examined were labeling or identification experi- ments, where the boundary is specified by the stimulus which

is labeled as one letter half of the time. Other methods

-102- covered included reaction time, direct choice, generation, and discriminability studies. One principle of our particular character recognition theory is that there should be con- sistency across experimental paradigm when measuring an inter- letter boundary in a given letter pair. For example goodness, labeling and reaction time should all point to the same boun- dary.

Another principle is that there should be some con- sistency of-results with different letter pairs involving the same attribute in transition. For example V-Y and C-F would have very similar boundaries in terms of k/L under this prin- ciple, provided the corresponding archetypes had similar k/L values. We are presuming a constancy of ambiguous to arche- type ratio.

We also saw that the PFR's can be somewhat plastic, shifting with the contextual situation. The change in graphi- cal context can be simulated by changing the range or distri- bution of stimuli in the experiment. We noted that shifting the PFR for one letter pair shifts the PFR for another letter pair which involves the same attribute.

These last plasticity effects are of prime interest to us. The remainder of the thesis will be devoted to the development and evaluation of a model for these effects. This model will be primarily based on the so called Range Fre- quency Theory. Our next chapter begins delving into this aspect of the thesis.

-103- CHAPTER 6

RANGE FREQUENCY: A THEORY OF RELATIVITY FOR PSYCHOPHYSICS

6.1 INTRODUCTION

The old saying, "Everything is relative," applies in a wide variety of situations from Einstein to psychophysics.

Although both large diamonds and small boulders are rocks of

sorts, we are not likely to confuse their size. The case of

letters and letter identification is no exception to the gen-

eral principle of relativity. The judgment of one particular

character may depend on other characters which have been seen,

in addition to many other factors which form some sort of con-

text .of judgment. Our main goal in this thesis is the

development and testing of a model to quantify such effects.

One of the most promising models in this regard is one based on the so called Range Frequency Theory (hence called RF

Theory), primarily developed by Parducci and his colleagues

[150, 157]. This theory has shown itself to be a fairly good model of human judgment in a wide variety of situations, including some which utilize a paradigm similar to our own

"goodness" experiments described earlier in Chapter 4. The type of judgments to which RF Theory has been applied have varied from somewhat psychophysical in nature, such as judging numerosity of dots, lengths of lines, sizes of squares, heavi- ness of lifted weights [151], to other applications in more

abstract areas such as moral judgments, choice of jobs, util-

ity theory, and so forth [149]. The present chapter will focus on an explanation of this

potentially useful model. This is in fact the model which

will help us to quantify the context effects occurring in the

goodness experiments we have carried out. The particular set

of stimuli in our goodness experiments sets up some sort of

context for judgment. For instance, we showed earlier in

Chapter 4 that the goodness curves shifted with a change in the range of stimuli, thus effecting a change in the apparent

interletter boundary. Similarly they might be expected to change somewhat with the distribution of stimuli within a

given range. We would like to be able to predict exactly how these curves behave and thus how the interletter. boundary moves around. The manner in which we will eventually test the model is to see how well it can predict the results of one experiment from another experiment. Thus, knowing the bound- ary from goodness curves in a so called neutral context would enable us to predict it for some other non-neutral context.

As we shall see, the RF Theory is especially suited to the task at hand since it specifically deals with the effects of the variables that have concerned us in the goodness exper- iments, the range (dealing with stimulus range) and the fre- quency (dealing with the relative frequency of different

-105- stimuli and their distribution within a given range). Basi- cally what RF Theory says is that human judgment is a comprom- ise between two often conflicting principles, a range princi- ple and a frequency principle. The range principle, as its name indicates, evaluates a stimulus in the context of the total range of stimuli to which a subject is exposed. The frequency principle, on the other hand, is more concerned with the effects. of varying numbers of stimuli or the spacing of stimuli. within that particular range. The meaning of these two principles will become clearer shortly.

As the primary model under consideration and that which will be used in our experimental predictions, a thorough understanding of the RF model is necessary. That is the pur- pose of this chapter. First we will trace the development of

Range Frequency Theory from an historical perspective. This will entail a comparison with its roots in Adaptation Level

Theory. As it developed, RF Theory went through several stages and these will be covered primarily by way of examples.

This semi-historical approach will serve as a survey of the literature on the topic of Range Frequency Theory. Finally in this survey we will arrive at a method employed by Parducci and Perrett [157] for predicting from one rating curve to another, the method which we will. employ later for predicting goodness curves.

-.106- We will continue our treatment of the RF model in the

next chapter but on a more mathematical and less intuitive

basis. There we will derive in a more mathematical sense the

same end result for prediction as in this chapter, and then

proceed to modify it for use in predicting for cases with lim-

ited ranges of stimuli. This limited range case is one of

prime concern to us but has been little dealt with in any of

the RF literature to date.

The roots of RF Theory go back to studies in the early

1960's by Parducci and colleagues [151, 155]. The theory has

been developing even up to the present time, to the point where it has achieved some degree of acceptance.. However

despite this acceptance, it has received relatively little use outside its originators' own research community.

The basic groundwork for the theory lay in the numerous experimental results noting the existence of context effects.

The psychological literature of the time [3, 45, 73, 79, 91, 101, 102, 129] well demonstrates that subjective ratings were quite susceptible to various contextual influences and that such effects were looked upon with dismay. The so called

"absolute judgment" (as some rating paradigms were called) was not so absolute. Much attention was' paid to the use of so called "anchor stimuli" introduced to nail down the ends of the judgment scale. This was done to alleviate the flexibil-

-107- ity of rating scales (for example, see Guilford [881, Torger-

son [2063 or Nunnally [142] for discussions of the literature on anchoring).

For the most part any contextual effects were looked upon

as sources of error and as certainly undesirable. Context,

however, really is an inherent part of perception, not merely noise to be explained away. We saw in Chapter 3 many examples

of the important role which context plays. In contrast to the

"context undesirable" viewpoint, we will try to -use the con-

textual effects in our experiments in analyzing the human per-

ception of characters. Range Frequency Theory shares this

view as regards the usefulness of contextual effects; in fact this is its very basis.

6.2 SOME TERMINOLOGY

Before we get into the details of the various theories,

it would be appropriate to set straight some of the terminol-

ogy we will be using, in order to avoid ambiguity.

Most of the discussion of the theories will be in terms

that apply to our own goodness experiments. as described ear-

lier. In the goodness experiments considered in this thesis,

a set of characters of various values of k/L are presented to

subjects for judgment as to their "goodness" as a representa- tiye of a particular letter. Thus k/L is the physical vari-

-108- able in use. In our goodness graphs we have plotted the hor- izontal axis in terms of k/L. In the di-scussion here and in the following chapter, we will be talking in terms of a gen- eralized physical variable, along which. the stimuli can vary.

This generalized physical variable, we will refer to as "x" and k/L is an example of such an "x" variable. The minimum and maximum physical values are referred to as x0 and xm respectively.

At some points in our analysis it will be more convenient to work with a transformation of this physical variable into a scale of equal discriminability. Such a scale is also referred to as a psychological scale or Thurstone scale [206]. This is a monotonic function of x, such that equal divi- sions on the new scale correspond to stimuli that would be equally discriminable from each other. For example, in the case of k/L, in the region around k/L = 0, we might expect that subjects could discriminate two stimuli (differing by the same amount in terms of k/L) better than around k/L = 0.5.

For example, in Figure 6.1 below are shown two pairs of stimuli. Both pairs differ by the same amount in the physical variable, k/L, yet the left pair of characters are probably more easily discriminated from each other.

-109- FIGURE 6.1 ILLUSTRATION OF DIFFERENT DISCRIMINABILITY FOR PAIRS OF CHARACTERS DIFFERING BY THE SAME PHYSICAL AMOUNT.

The kind of transform needed in the above case is one

which would spread out the low end and squeeze the high end. We are interrested in how the stimuli make a psychological

impression, not a physical impression. There is a mathemati-

cal procedure known as Thurstone scaling, which is capable of performing this operation. It basically determines the discriminability of adjacent stimuli by an analysis of the dispersion of the ratings assigned to that stimulus. Torger- son [2063 and Guilford [883 both provide a more detailed dis- cussion of scaling and the computations involved. A con- venient means of deriving such a scale from empirical data is through the use of what we call a cumulative sensitivity func-

-110- tion (CSF). This is also based on the scaling idea and the

computations and ideas are embodied more recently in a theory of intensity resolution developed by Durlach and Braida [66, 30]. We will discuss this further in Chapter 9 where we will be using such scaling functions in our work. For the purposes of discussion we will use the term "psychological scale" when referring to this concept.

The point to keep in mind in the discussion ahead is that we will be using a transformed version of the physical values in the computation. This scaled variable will be y = J(x), where J is the monotonic scale transformation necessary to change the physical value to a scale value. The minimum and maximum values on the psychological scale will be referred to as y0 and ym. Recall that one of the main reasons for choosing k/L as our physical variable is that it is convenient. It does have some nice properties in that it is a ratio measure and thus size independent. In fact in the earlier papers dealing with the functional 'attribute based theory of character recognition

[188], we used a different though related physical measure,

11/12, rather than k/L,, where 1 = k and 12 = L - k. We could easily have chosen the absolute physical length of leg (except for the size independence convenience). In this light, the scale transformation should not be too disturbing.

-111- Other distinctions important in an experimental setting will be those dealing with the words "stimulus", "stimulus value", "scale value", and so forth. The following defini- tions are intended to make clear the meaning of these words as used in the upcoming discussions. They will be described as to their use in experiments of the goodness type.

The term "stimulus" refers to the actual item which is presented for judgment, the card or the character on the card which is to be judged. The term "stimulus value" is used to

refer the stimulus to a particular place along the stimulus

continuum. It depends on the scale in use. When referring to the physical scale, x, it would be the particular k/L value

associated with a particular stimulus. When referring to the

psychological scale, it would be the scale value, y, associ- ated with a particular stimulus. Note that a given stimulus.

has both an x and a y stimulus value associated with it. The term "scale value" will refer to the "stimulus.value" on the psychological scale, y.

At a given stimulus value in a particular experiment, it is possible to have more than one stimulus put up for presen- tation. We will discuss this possibility of multiple stimuli at one stimulus value later on in this chapter. The term

"stimulus distribution" refers to the set of stimuli presented, usually the relative numbers presented at each

-112- stimulus value. Figure 6.2 illustrates some of the various

forms of effects on stimulus distribution. The distribution

is uniform if, at each stimulus value, the same number of stimuli are presented (see Figure 6.2a). There can also be a

skewed distribution, one in which stimulus values at one end of the range would receive relatively more stimulus presenta-

tions that those at the other end of the range (Figure 6.2b).

Another type of skewing can occur where stimulus values are more closely spaced at one end of the range and relatively wider apart at the other end (Figure 6.2c). Naturally both types of effects could be present together (Figure 6.2d).

Due to our subject area of letters, another important distinction in most of our discussion in this thesis is that between the terms "letter" and "character". A "letter" is taken to mean one of the elements of an alphabet or alterna- tively the name of that element, such as "ay", "bee", "cee", etc. A "character", as used here, is a grapheme to which a letter label may be assigned; a character may assume the name of one or more letters after it has been classified. For instance the character, , could, depending on the cir- cumstances, be classified as the letter V or the letter Y. The capitalized form (e.g., V) will refer to the letter name.

In the experiments we are considering, the subject makes a rating of the character as to how well it represents one or the other of two letters, either V or Y, in most of the cases

-113- D7

H UNIFORM 5 No ~Lf Lu)3

0f I dwwwlo^u

%- --I a-.- I- nip f vo 0 w K. . -.- .,,I- ., -.. I Lt .I ---- r 0 2 3 + 5 PHYSICAL DIMENSION (x) k

-AMM

C6 -woo

-000 FREQUENCY H SKEWED Moab D3 b cC 0 0 1 2 3 5 U PHYSICAL DIMENSION (x)

SPACING SKEWED 17 V)U C 2 0 6 LU'4 cl- 0 123 '4 5 PHYSCALDIMENSION (x) L3

FREQUENCY AND SPACING SKEWED

- -- d

0 1 2 3. '4 5 PHYSICAL DIMENSION (x)

FIGURE 6.2: ILLUSTRATION OF VARIOUS TYPES OF STIMULUS DISTRIBUTIONS. -114- considered here. The rating is categorized into one of

several categories (11 categories [0-10 rating scale] in the

case of most of our goodness experiments). The quality being rated in our experiments is either "V"ness or "Y"ness.

The categories used in the experiment, in our case, 0-10,

are referred to as the "rating scale". In our experiments,

ratings of stimuli at a particular stimulus value are averaged across subjects. This averaged goodness rating is called G.

The minimum and maximum categories on the rating scale are

referred to as G0 (0 in our experiments) and Gm (10 in our experiments) respectively.

6.3 ROOTS IN ADAPTATION LEVEL THEORY

Range Frequency Theory actually received its start by explaining some of the discrepancies when an earlier approach,

Adaptation Level Theory, was applied to certain situations.

An understanding of this earlier theory will help us somewhat in comprehending the later RF Theory. This section will attempt to provide a short backround in Adaptation Level Theory.

In the 1950's, the primary model for taking contextual factors into account was Adaptation Level Theory (AL Theory).

The originator and chief proponent of AL Theory was Helson, whose book [93] on the subject covers much of the literature

-115- and applications of the theory.

The basic assumption of Adaptation Level Theory is that

the psychological effects of the stimulus context can be

represented in some sense by a single "average" stimulus value

called the adaptation level or AL. This value is the pooled

effect of focal stimuli, backround stimuli and other residual effects.

The equivalent effect of all the other stimuli could be

obtained by presenting only a single stimulus at the AL

stimulus value, rather than the entire stimulus set. Under AL

Theory, stimuli are judged with reference to the current AL.

A stimulus presented, which had a stimulus value equal to the

AL, would, under the theory, receive a rating as the middle or

neutral category in the rating scale in use.

Experimentally, in a category rating task, as is the nature of our goodness experiments, the AL is defined by Hel- son as the stimulus value (either physical value or scale value) associated with the stimulus which would be assigned the neutral or middle value on the rating scale in use. For example, in a goodness rating task, using a goodness rating scale of 0-10, a judgment of 5 would be the neutral category for this rating scale. In this instance the defined AL would be that stimulus value such that, if a stimulus of that value were presented, a rating of 5 for that stimulus would most surely result. Although there are really several operational

-116- methods of specifying this defined AL in an experimental set-

ting [115], one popular method [147] is to average the

stimulus values of stimuli rated as the neutral category.

Thus in our goodness experiments, we would note those stimuli

eliciting a rating of 5 and then average their scale values

together to calculate the defined AL. An alternate method

would be to tabulate a goodness curve and see which stimulus

value coincided with the G = 5 point on the goodness curve.

In any event we cannot determine the defined AL until the

actual experiment is run and the results analyzed. The test

of AL Theory comes about by making a prediction as to what the defined AL will be, given a certain set of stimuli. If the

match between predicted and defined is good, then the theory

is somewhat validated.

Under AL Theory, the AL is predicted from a given

stimulus set by the mean stimulus value (on the psychological

scale) of all stimuli affecting a particular rating. If y 2 ... yN represent the scale -stimulus values associated with the stimuli of the experiment, t.hen the traditional AL prediction (designated ALP) would be given by:

i=N

y (6.1) i=1 AL = N

-117- Knowing the AL we should be able to predict what the rat- ing judgment would be for a particular stimulus. This judg- ment or goodness value for the ith stimulus value, designated

Gi, is determined by the ratio of the scale stimulus value of i that stimulus, y , to the current value of AL. Knowing that a stimulus at AL should receive the neutral category on the rat- ing scale, the rating of the new stimulus at yi is scaled accordingly. The neutral category is merely the average of the maximum possible rating, Gm, and the minimum possible rat- ing, G0 . Thus the goodness rating at the ith stimulus value is given by:

y (G -G ) (6.2) G = GmGo AL 2 p

This equation predicts that the goodness rating or judg- ment will be a linear function of the scale stimulus value, y . Under the AL formulation, if goodness were plotted versus scale stimulus value, it would always turn out to be a straight line. We will illustrate this by example in the next section where we will compare the AL and RF approaches.

The AL approach was relatively successful in accounting for a large mass of psychophysical data and phenomena such as anchoring effects. In the next section we will deal with our main topic of concern, that of the Range Frequency approach.

-118- 6.4 EARLY RANGE FREQUENCY FORMULATION

Range Frequency received its start by explaining some of

the discrepancies found when AL Theory was applied to certain

situations. Parducci et al. (1960) [1513 found that the mean of the stimulus distribution values (Thurstone scaled) was a good predictor of AL when its value fell between the midpoint and median of the series of stimuli. However they found that judgment shifted under shifts in either the midpoint or median of the series of stimuli, even when the mean stimulus value was held constant. This was effected by using skewed, U- shaped and peaked stimulus distributions, in addition to a uniform distribution. They attributed the success of AL to the fact that, for most types of distributions in use, the mean would indeed fall between the midpoint and the median, thus disguising any effect.

This led to the proposal of an early Range Frequency (RF) Theory [147], but still one in the AL mold. Under this formu-- lation, AL was a weighted average of the midpoint of the range and the median of the stimulus set. Under this early RF approach [147] the RF compromise took the following form, where ALrf is the adaptation level predicted from the RF compromise:

-119- (6.3) AL = (W) - MIDPOINT + (1-W) . MEDIAN

N I y -y (+/2 (6.4) AL = (Wt) . + (1-W) .y 2

where W is a weighting factor for the range principle versus the frequency principle, which can vary between W = 0 (total frequency principle dominance) to W = I (total range principle dominance). In the early studies of Parducci [147], a value of W = 0.55 was found to fit the data the best, that is, in making the predicted and defined curves match up the best.

The midpoint, of course, reflects a dependence on range, since it is calculated from only the endpoints. The median

(the most central stimulus by virtue of its having equal numbers of stimuli on either side) has a dependence on the distribution of stimuli. A stimulus distribution skewed to the left for instance, will have a low median and vice versa. The example below indicates the discrepancy between the

RF midpoint and median compromise and the Adaptation Level approach and will serve to better illustrate the range and frequency principles.

Consider the following hypothetical goodness experiment with a stimulus distribution as in Figure 6.3a below. In this experiment stimuli would be shown to subjects who would rate them as to how well they possessed some attribute. The exact

-120- MEAN = AL 4.22 ADAPTATION 3-p a LEVEL 2

0 1 2 3 4 5 6 7 8 9 10 11 12 y MEDIAN MIDPOINT RANGE 3 b FREQUENCY 2

0 1 2 3 4 5 6 7 8 9 10 11 12 y

COMPROMISE = ALrf = 4.65 RANGE 3 r FREQUENCY 2 C COMPROMISE

0 1 2. 3 4 5 6 7 8 9 10 11 12 y

RANGE 3' j- 2 PRINCIPLE 000-1 d 1 --1 .F11

0 1. 2 3 4 5 6 7 8 9 1 11 112

# Ot "III" i" 12" - FREQUENCY 3 r- PRINCIPLE Pi

0 1 2 3 4 5 6 7 8 9 10 11 12

# li lt Ililt1211 RF 3 T"2 "r-" PRINCIPLE 2 0 2 0 -MI01 0 11Fl Fl 112 S 1 2 3 4 5 6 7 8 9 10 11 12 y

GOODNESS -2--.-- Gr CURVES 1.5-- -- G a 0

1.0

0.5

0.0 1 3 4 5 6 7 8 9 10 11 12y PSYCHOLOGICAL SCALE

FIGURE 6.3: A COMPARISON OF THE ADAPTATION LEVEL AND RANGE FREQUENCY APPROACHES. -121- number of categories in the rating scale is not of immediate

concern. Of more interest is the stimulus distribution, where we have a total of nine stimuli at seven different stimulus values. Each stimulus is indicated by a square at a certain

scale stimulus value. The lowest stimulus occurs at a scale

value (y) of y0 = 0, while the maximum value is y1 = 12.

Stimuli at y = I and y = 7 are presented twice while those at

other scale stimulus values are presented only once. Given

the above stimulus distribution, we will now proceed to calcu-

late a predicted AL under both the AL (AL ) and RF (ALrf) approaches.

Under the AL approach our task is to predict the subjec-,

tive average position along the y continuum, that place where

a stimulus, if presented, would be judged "average" in the

sense that it would be rated as the neutral category.

Helson's AL approach would involve calculating the mean of all

the stimulus values associated with an individual stimulus.

Performing this operation as in Equation 6.1 yields a value of

AL = 4.22 (indicated by the arrow in Figure 6.3a). p Since the range principle evaluates a stimulus only in

relation to the endpoints, using this principle we would

choose as the' "average" stimulus value, a value equidistant between stimuli at y = 0 and y = 12, that is at a value y = 6

(the right arrow in Figure 6.3b). This choice is irrespective

of whatever stimuli lie in between the two endpoints. Parducci

-122- [147] formally described the range principle at that time as "a tendency of judgment to divide the range of stimuli into

proportionate subranges, each category of judgment covering a

fixed proportion of the range." In light of this statement, finding the AL might be looked at as a task- of breaking the

stimuli down into two categories, those stimuli "below aver- age" and those "above average", with AL forming the dividing line between these two classes of stimuli. The proportionate subranges mentioned by Parducci are then the two halves of the stimulus range. Actually the number of proportionate subranges is equal to the number of rating categories as we shal I see shortly.

Parducci describes the frequency principle as "a tendency to use the categories of judgment with proportionate frequen- cies, each category being used for a fixed proportion of the total number of judgments." A subject using this principle would want to call half the stimuli "below average" and half

"above average". In our example experiment with 9 stimuli, the first 4 on the y axis are "bel ow average", the last 4 are above average, and the fifth stimulus at y =.3 is the median

(see left arrow in Figure 6.3b) by virtue of its having equal numbers of stimuli on either side.

Applying the RF compromise, as in Equations 6.3 and 6.4, we average the midpoint (y = 6) and the median (y =.3) using

Parducci's weighting factor (W = 0.55) to arrive at a predicted ALrf 4.65, a value slightly higher than Helson's

AL = 4.22. These are p the respective predictions for the stimulus value which would be judged as the neutral category on whatever scale we were using.

The above example illustrates that the AL and RF approches are often not far apart in predicting a neutral

stimulus value. Nonetheless, part of the weakness of the AL approach was that, with certain distributions of stimuli, the

ALrf was a better predictor than ALP. At this stage in the development of Range Frequency theory no attempt was made to

predict the form of curves, but merely the location of the

Adaptation Level.

6.5 THE LIMEN MODEL

The next advance in RF Theory (1965) [148] took place with the advent of the "limen" model. A limen is simply the boundary between two categories; in practice it is defined [148] as the

stimulus value that, if presented, would be judged half the time as the lower of the two categories. Continuing with our example case, consider the following. Suppose that we did an experiment, with our stimulus set, where the subject was to classify each stimulus into one of three categories - "0",

"1", or "2". We use numerical categories for ease of computa- tion of goodness results. They could just as easily be called

"A", "B" and "C".

-124- The range principle uses the three categories to subdi-

vide the psychological range into subranges, each correspond-

ing to one category. This is illustrated in Figure 6.3d. The range 0-12.is cut into thirds, since we have 3 categories.

The frequency principle says that the subject wants to use

each category for an equal portion of the judgments made. In

Figure 6.3e, with the limens drawn as shown, three stimuli are

placed in category "0", t-hree in category "1", and three in

category "2". The compromise is effected by averaging the

positions of the range and frequency limens to yield the limen

of Figure 6.3f. We then see how the stimuli are classified

with these new RF compromise limens.

Knowing how stimuli will be categorized enables one to

calculate predicted average judgment values for each stimulus, in other words, goodness curves. In Figure 6.3f we see that

stimuli at y = 0, 1, 2 fall in the "0" category, while those

at y = 3, 5 fall into "1". The two stimuli at y = 7 fall on the "1-2" boundary, so we assign one to each category, while the stimulus at y = 12 definitely falls in "2". Averaging the different ratings at each of the stimulus values yields the predicted goodness curve (solid line) of Figure 6.3g. This is a simplified result assuming perfect discriminability for the purposes of illustration.

-125- The question arises as to what AL Theory would predict in this same example case. From Equation 6.2 we obtain the fol- lowing:

y (G -G ) y (2-0) (6.5) Gal(Y) m 0 ~ AL 2 4.22 2 p

(6.6) Gal(y) = 0.237 y

This is the equation of the goodness curve predicted from AL

Theory. It is plotted as the dashed line in Figure 6.3g.

Note the flattening at the stimulus value, y = 3.44, due to the limitation of the rating scale to give any ratings beyond

G = 2.0. We see that in prediction the AL approach yields a straight line goodness curve and that the RF approach indeed offers a different result.

The limen model, as applied by Parducci, used results from experiments with a uniform distribution of stimuli (in the frequency sense) to predict results for other non-uniform distributions. His procedure was to use the range limens derived from the uniform distribution and calculate. frequency limens for the remaining distributions. From this, predic- tions were made of the distribution of ratings for each stimulus, from which averaged ratings were calculated for com-

-126- parison to the empirical data. In general , this model did a fair job of predicting the trends in the data for the non- uniform distribution curves, far surpassing the linear. predic'- tions of Adaptation Level Theory.

6.6 THE "SIMPLIFIED" RANGE FREQUENCY MODEL

In 1971, Parducci and Perrett [1571 published an account of what they called the "simplified" RF model. Rather than a concern over the rating distributi.on for stimuli or category limens, this model dealt primarily with averaged ratings (such as our goodness ratings). Under this'model, as with the pre- vious limen model, predictions could be made from the results of some baseline frequency distribution experiment onto other distributions with the same range. This prediction capability we will find useful in analyzing our own experimental results later on in this thesis. For -this reason a good understanding of the simplified approach is desirable. Under the simplified model, the*RF compromise formally states that the mean judgment of the ith stimulus value, Gi, is a weighted average of an associated range value, Ri, and a frequency value, Fi, as follows:

-127- (6.7) = W R + (1-W) F1

where W is a weighting constant between 0 and 1. We will now investigate the meanings of the range and frequency values.

Under this approach we now have both range and frequency curves using the same axes as for the goodness curves (a set of G values). The range curve (a set of Ri values) is that goodness curve which would be obtained were the subject using only the range principle. Likewise the frequency curve (a set of F values) is that goodness curve which might be expected were only the frequency principle in use. First we will restate the range and frequency principles and then we will again use an example case to illustrate these principles.

The simplified RF model still adheres to the previously stated range and frequency principles but in slightly dif- ferent terms. The RF model assumes that each stimulus has a particular so called range value on the scale of judgment, designated R for the ith stimulus value. This range value is largely determined by the relationship between the presented stimulus and the extreme stimuli defining the end. points on the scale of judgment. Parducci and Perrett discover there range values indirectly, as we will see shortly.

They define the frequency values as the mean of the rat- ings that a given stimulus would elicit if each category was

-128- used with equal frequency. We have seen this idea in our pre- vious example where we had three categories and three stimuli' per category. I'n practice, the frequency values, designated

Fi for the ith stimulus value, are obtained as follows. First we calculate the number of stimulus presentations per category. Individual stimulus.presentations are then sorted into the categories, from the lowest to the highest. Some- times a stimulus will be partitioned, part in one category, part in another. After this process, the ratings for a -given stimulus value are averaged to obtain the mean rating for that stimulus value. This we designate Fi.

The example to follow will illustrate the general method of determining the range and frequency curves and of making predictions from the results of one experiment onto another.

This is essentially the method we will be using later in our own predictions. The example will involve two hypothetical goodness experiments with two different stimulus distribu- tions, one in which there is an equal number of presentations at each stimulus value and one in which there are not.. Figure

6.4a shows a stimulus distribution which is uniform as far as relative frequency is concerned, but is nevertheless skewed somewhat to the right by virtue of there being more stimulus values at that end of the range. Figure 6.4b shows the dis- tribution for the experiment whose results are to be predicted. This particular distribution is also skewed to the

-129- #

NUMBER OF 8 PRESENIAT IONS 6

4 4 S 123 .4 5 6 7W9 A 2 H 22 4 5 c n 2 34 5PC V 0 1 2 3 4 5 6 7 8 9 10 PSYCHOLOGICAL SCALE

10 NUMBER - OF8 PRESENTATIONS b6 -4- 4 - - 4 5,,,I7 I8I11 2 22 21 21 3 54 9, n 0 1 2 3 4 5 6 7 8 9 10 PSYCHOLOGICAL SCALE

G 10

9 *- - --- F (from Figure 6.4a)

------F2 (from Figure 6.4b) GOODNESS 8

7

6 C s 4

3 /

2

0 1 2 3 4 5 6 7 8 9 10 PSYCHOLOGICAL SCALE

FIGURE 6.4: TWO DIFFERENT STIMULUS DISTRIBUTIONS ILLUSTRATING THE METHOD OF DERIVING THE FREQUENCY FUNCTION.

-130- right, both by the stimulus spacing and by the relative number of presentations. These figures are actually histograms show- ing the relative numbers of stimulus presentations at dif- ferent stimulus values. In this example we are using the scale or psychological dimension, y, as the horizontal axis.

We are also assuming the number of categories, C, to be 11, that is a 0-10 goodness rating scale, and the number of dis- tinct stimulus values, N, to be 11. The total number of stimulus presentations, T, in this example case is 55.

Our first order of business will be to calculate fre- quency values and curves for both of these distributions.

First we obtain the number of presentations per category, in our example cases, T/C = 55/11 = 5. In order to calculate the set of frequency values, F1 , we must apportion the stimulus presentations equally into each of the 11 rating categories.

The procedure is relatively simple in the case of the first stimulus distribution. Since T/C = 5, we know that we must give 5 ratings for each of the 11 categories. Starting at the left in Figure 6'.4a we label each of the stimulus presentations with the rating it would receive under the fre- quency principle. The first five stimulus presentations are all at a stimulus value, y = 0, and thus all receive ratings as the first rating category, "0". Continuing on to those presentations at y = 1, all 5 presentations are rated as the category "1", while the 5 presentations at y = 3 are rated as

-131- the category "2". This process continues until finally all presentations are rated, as illustrated in Figure 6.4a. Next an average rating is calculated for each stimulus value by averaging all the ratings accorded a particular stimulus value. Thus, averaging the 5 ratings as "4" for the stimulus value, y = 7, we obtain an FY value of 4. Doing this for the other stimulu.s values, we can draw the frequency curve for this distribution as in Figure 6.4c (the upper curve, filled triangles, solid line).

Turning to the stimulus distribution of Figure 6.4b, we begin counting off presentations into the 11 categories as illustrated. We see that, under the frequency principle, both presentations at stimulus value, y = 0 and y = 1, will be categorized as "0". However at y = 9, we have the situation where categories "4", "5" and "6" would all be used. Follow- ing this step, the values in each column are averaged to cal- culate the Fi values, which are then plotted as the lower curve in Figure 6.4c (lower curve, open triangles, dashed line).

Were the frequency principle totally dominant, these Fi values would be the average judgments accorded our stimulus values in their respective distributions. We will obtain the R1 values indirectly as we will see in the example prediction to follow.

-132- Now we will go through an example of the use of Equation

6.4, in the manner of Parducci and Perrett [1573, for predict-

ing rating curves for distributions with the same range. The 1 R values (the goodness values which would be obtained were

the range. principle totally dominant) are obtained as follows,

given that we actually run a goodness experiment using the

stimulus distribution of Figure 6.4a. The empirical data from

this hypothetical experiment, GI (the average rating accorded

a particular stimulus value) are plotted in Figure 6.5 (middle curve, filled circles, solid line). Also plotted are the Fi

values for this distribution which were obtained earlier, the

lower curve in Figure 6.5. The range function, the upper

curve in Figure 6.5, is inferred from the other two curves by

Equation 6.7. For present purposes we will assume the value

of W to be 0.55. Thus Equation 6.7 then takes the form:

GI (1-W) Fi

(6.8) R = W

G - 0.45 F (6.9) Ri = __ _ 0.55.55__ __

We essentially find that Ri value, that when compromised with thp known Fi value, would yield the goodness curve that we obtained empirically.

-133- G 10

9- - R (inferred)

(empirical) GOODNESS 8 F

7

6

5

4

3

2

0 -1 t 11 0 1 2 3 4 5 6 7 8 9 10 PSYCHOLOGICAL SCALE

G 10

-Q --- R P; G' (predicted)

GOODNESS 8- - F2 I 0 7 I

6

4 0- I 3- 2-- -

0 0 1 2 3 4 5 6 7 8 9 10 PSYCHOLOGICAL SCALE

FIGURE 6.5: PREDICTOR GOODNESS AND FREQUENCY FUNCTIONS AND INFERRED RANGE FUNCTION. [upper] FIGURE 6.6: PREDICTING A GOODNESS FUNCTION FROM ANOTHER STIMULUS DISTRIBUTION. [lower] -134- Parducci [147, 149, 157] has found the particular weight- ing used above, W = 0.55, or one very close to it, suitable for a large class of experiments. You will recall that this is the same weighting which was employed for the AL type analysis (Equation 6.3), where the midpoint (related to range) received a 0.55 weighting, and the median (related to rank) received a 0.45 weighting. Figure 6.5 graphically illustrates judgment as a range frequency compromise, with the goodness curve in between the range and frequency curves.

Since the Ri values presumably depend only on the end- points of the range, the range function should remain constant if the spacing or frequency distribution of the stimulus set is altered, providing that we maintain the same range. Now assume that we wanted to predict the judgment function for another stimulus distribution, for example, the frequency skewed distribution of Figure 6.4b. We would take the range function obtained from our first distribution (the upper curve in Figure 6.5) along with the distribution determined fre- quency function (the lower curve of Figure 6.5), and apply

Equation 6.7 to obtain the predicted judgment function shown as the middle curve in Figure 6.6. This prediction would then be compared to empirical data obtained from actually using the second distribution of stimuli. We will be using a modified version of this procedure later to make predictions of our own.

-135- Parducci and Perrett demonstrated a modest degree of suc- cess with the simplified RF model in predicting curves for a wide variety of stimulus distributions within a given range.

The errors of prediction were significant but the model still accounted for some eighty percent of the variance associated with context [157]. Their results also further disproved the

AL claim that the entire goodness curve could be predicted from a single stimulus value, AL.

6.7 SUMMARY

This chapter has attempted to give an historical overview of the development of Range Frequency Theory from its roots in

Adaptation Level Theory up to the "simplified RF model." It is a variation of this simplified RF model which we will be using in Chapters 9 and 10 as a test of the model's applicability to contextual analysis in character recognition.

We have covered the basic ideas of Range Frequency

Theory. Further work has been done by Parducci and colleagues and some of this work will be mentioned in the upcoming chapters. The early RF work includes [147, 151, 154, 155], while the period of development cited in this chapter is represented in [148, 149, 152, 156, 157, 158]. Some of the

-136- more recent work and discussion in Range Frequency Theory (since 1975) is provided in [9, 10, 21, 150, 153, 176]. Thus we see that RF Theory has a long history and that there is still recent work, in the field.

Our next chapter will continue with the topic of range frequency theory from a more mathematical perspective and will try to justify the methods of the simplified model. Also we will see how the case of predicting goodness curves when the ranges are different is handled under the RF approach.

-137- CHAPTER 7

RANGE FREQUENCY: FURTHER DEVELOPMENT

7.1 THEORETICAL DEVELOPMENT

This chapter will attempt to give a more theoretical

background to the discussion of range frequency theory. In

the last chapter we discussed RF theory in a more intuitive

fashion with relatively little theoretical groundwork. The

discussion here will be a look at RF theory from yet another

viewpoint, in terms of what is called the functional measure-

ment view as espoused by Anderson [8, 9]. The development of

the range frequency model in terms of functional measurement

in this chapter is an elaboration on Birnbaum's treatment [21]

for the most part. In addition this chapter will develop the

rules. needed to extend RF Theory to the case of limited range experiments. These cases are of some importance in the func- tional based theory of character recognition.

7.2 FUNCTIONAL MEASUREMENT

The basic idea in functional measurement is that a psychological measurement can be broken down into several stages, as illustrated below in a functional measurement diagram, Figure 7.1 (following Anderson [11]).

-138- S s SI 2 s2,r R S 3 s30 0------

PSYCHOPHYSICAL PSYCHOLOGICAL P.SYCHOMOTOR LAW LAW LAW

s = V(S.) r I(sls2,s3...) R = M(r)

FIGURE 7.1 GENERAL FUNCTIONAL MEASUREMENT DIAGRAM (following Anderson [111).

Physical stimuli, S., impinge on the organism and are converted by the evaluation process into psychological stimuli, si. Next the psychological stimuli are combined in some manner by a psychological law, I, to yield an internal response, r. Finally the implicit internal response is con- verted to an overt response, R, by the psychomotor law, M.

As an example illustration of these various stages, con- sider the the following hypothetical experiment involving the classical "Same or Different" paradigm, for instance in a tone loudness experiment. A subject hears sequentially two tones and is asked if the two tones are the same loudness or not.

If they are the same the subject is instructed to push a but- ton labeled "Same" and if different to push a button labeled "Different". On the physical stimulus level we have some phy-

-139- sical measurement of loudness for the tones produced, for

example, sound intensity, I, in watts/m 2 . The human ear may actually not act in a linear fashion and responds to loudness on somewhat of a log scale [76]. The first tone, SI, may pro- duce an impression, s; the second tone produces an impres- siom s2, with the transform being something like

(7.1) si = (a log Si) + b

which corresponds to the psychophysical law or pure perception

aspect. Then comes the psychological stage in which cognition

and contextual factors come into play. The subject considers

the two tones and perhaps all the tones heard before in the

experiment, and makes a decision as to whether the two tones were the same or not. At this point the subject has made a covert response, r, through applying the psychological law to

the si's. Now the subject converts the covert response.to an

overt one by commanding the subject's fingers to push the

proper button, or perhaps voicing the choice, depending on the

paradigm. This is termed the psychomotor law. This last motor function will usually be rather a linear

or one to one mapping, but not necessarily so, as Anderson

warns [9]. Suppose for instance that the subject's reaction

time was the actual dependent variable being measured, and

this measure was thus considered the response, R. There may

-140- be some difference in the amount of time it takes to pronounce the words "Same" or "Different". This would imply a non-linear psychomotor mapping.

In summary this has been the basic functional measurement framework. This framework can and has been used to evaluate a large variety of paradigms in many areas of psychology and sociology [8, 9, 111.

7.3 RANGE FREQUENCY THEORETICAL DEVELOPMENT

The class of experiments with which we are -dealing comes under the heading of a category rating task. Subjects are shown stimuli and asked to rate them as to some characteris- tic. In our case the goodness experiments described earlier are in this class. Subjects are shown characters and asked to rate them as to how well they represent a particular letter.

The theoretical analysis for this type of experiment will now be considered under the functional measurement framework.

In Figure 7.2 below we see a hypothetical illustration for our analysis. Our functional measurement diagram is as shown.

-141- x y z G 0 ------J(x) ------H(y) - -- - M(z) ------0

PSYCHOPHYSICAL PSYCHOLOGICAL PSYCHOMOTOR LAW LAW LAW

y = J(x) z = H(y) G = M(z)

FIGURE 7.2 : FUNCTIONAL MEASUREMENT DIAGRAM FOR RANGE FREQUENCY.

Our final judgment, G, is the result of three functions,

the psychophysical function, y = J(x), relating physical

stimuli to psychological impressions, a psychological func-

tion, z = H(y), relating impression to internal response, and

a psychomotor function, G = M(z), relating internal to exter-

nal response. In a sense these new x and y variables are con-

sistent with those in the last chapter, x being the physical

dimension and y being the transformed scale value.

For the present discussions we will make the assumption

that M(z), or the transform from psychological impression to

physical response, is a unity operation and will ignore it

from now on, that is:

(7.2) G = M(z) = 1 z

(7.3) G = z

Our diagram then becomes a two stage process as shown in Fig-

ure 7.3 below.

-142- x y G 0------J(x) ------H(y) ------o

PSYCHOPHYSICAL PSYCHOLOGICAL LAW LAW

y = J(x) z = H(y)

FIGURE 7.3 SIMPLIFIED FUNCTION.AL MEASUREMENT DIAGRAM FOR RANGE FREQUENCY.

We make the following definitions concerning our vari- abl es

x The physical measure of stimulus value.

x o' Xm The minimum and maximum stimulus values respectively on the physical dimension.

Px(x) The probability density function (PDF) of x on the physical dimension.

P (x) The cumulative distribution function of x on the physical dimension.

y The psychological impression of x.

Y ym The minimum and maximum scale values respectively on the psychological dimension.

p (y) The probability density function (PDF) of y. y Py(y) The cumulative distribution function of y on the psychological dimension.

-143- G The response variable (on some rating scale).

GO, Gm The minimum and maximum acceptable or possible responses on the response continuum.

~p(G) The probability density function (PDF) of G on the response continuum.

PG(G) The cumulative distribution function of G on the response continuum.

Note: These variable are continuous random variables.

The variable x, of course, corresponds to k/L in our reg- ular goodness experiments, while. G is essentially the same as our goodness ratings except for the continuity aspect. This is to say that, for our current analysis, we are assuming con- tinuous random variables. Treating x, y, and G as continuous random variable will ease the analysis and provide a more meaningful description of the processes. We could in princi- ple perform the same analyses with discrete random variables.

For the x dimension, this is in contrast to our usual paradigm of presenting stimuli only at certain discrete stimulus values. The judgment scale, G, is also continuous instead of our usual quantized category ratings, 1101, "1"1,1211, etc.

This might correspond to some graphic rating scale, for instance, a paradigm where a subject would move and place a sliding pointer along a slot to indicate the appropriate good- ness rating.

-144- Our functions are the following:

(7.4) y = J(x)

(7.5) G = H(y)

Joining these two functions in serie-s we obtain:

(7.6) G = H[J(x)]

We also make the assumption that J(x) and H(y) are monotonic increasing functions of their respective variables. This insures that the inverse functions, J-1 (x) and H-1 (y), exist. A hypothetical illustration of the density functions at the various stages in the functional measurement diagram and t.heir relationships among each other is given below in Figure 7.4. We note that, for ease of presentation on the graph, the extreme values (x0 , xm yO, m, Go, Gm) on each continuum have been aligned. In our hypothetical example, we have assumed a quadratic physical stimulus distribution, which through some

J(x) transform, maps to a linear distribution. The H(y) transform in this particular case implements the RF model. We will examine these various transforms in detail in the upcom- ing sections.

-145- PROBABILITY CUMULATIVE DENSITY DISTRIBUTION FUNCTIONS FUNCTIONS

px) P (x) .. - x x . x I - .. - ] I K~ ljo, ; 9 5 o m I * , I I in * * I u * 3(x) I I I I I , I I I i I ~ :. ; ,' : ~ *~ ~

K: ; y() a 1 -I ~ I I I I Y NJ W-- I 1 5 -.- L ------IL - I a I o ' ' I S U m Yo YM 3 1 I p H (y) I * 5 9 I ~ a 9 1 1 3 a ~ I I I I .8 , g .~I a 9 8 1 p 5 1 3 - I I I I 1 g j 9 p * a PG(G) G PG(G) -Ia I .11 1 13 * I G G Gm G G

FIGURE 7.4: AN ILLUSTRATION OF THE DENSITY FUNCTIONS AT VARIOUS STAGES IN THE FUNCTIONAL MEASUREMENT DIAGRAM.

-146- 7.4 PSYCHOPHYSICAL LAW

We will begin our .study of the system with an examination of the initial stage, y = J(x). The psychophysical law medi- ates between the physical domain and the psychological domain.

In many ways it is analogous to the step between the physical and perceptual levels of the functional attribute based theory of character recognition. Likewise the psychological law stage might be looked at as that stage between the perceptual and functional levels discussed in Chapter 4. The PFR or Phy- sical to Functional Rule is a composite of the two stages we will be discussing.

Given that we know the probability density function for one variable, and the transformation applied to that variable, it is possible to calculate the probability density function for the transformed variable. From probability theory we know that the y PDF is related to the x PDF by the following rela- t ion:

dx (7.9) py(y) = Px(x) . dy

We will be using this relation in this and following. sections to derive the relations between successive stages in the model

-147- For. the purposes of illustrating the first stage

transformation, y = J(x), a hypothetical function is given

below in Figure 7.5. This time we have assumed a linearly skewed physical stimulus distribution and a squaring psycho- physical transform. The uniform psychological scale distribu-

tion can be derived as follows utilizing Equation 7.9:

Example:

(7.10) j(x) = y =x2

(7.11) x y1/2

dx 1 1 2 (7.12) = - y- / dy 2

Applying Equation 7.9 we obtain:

(7.13a) py(y) = x .Y-12

(7.13b) = y1 /2 .Y-1/2

(7.13c) py(y) = I

In this particular case we get a uniform distribution for y, as illustrated in the lower curve of Figure 7.5.

-148- PROBABILITY CUMULATIVE DENSITY DISTRIBUTION FUNCTIONS FUNCTIONS

1

p (x) (x)

x I I x m I I I I ~ x 0 145 - . 4 - -, 0, !A jb xo 'm 3(x) py~y)x Il y P (y)

Y yo Yo

FIGURE 7.5: A HYPOTHETICAL EXAMPLE OF THE PSYCHOPHYSICAL STAGE TRANSFORM.

-149- We also note the property that equal areas under the pro- bability density function in the physical domain map to equal

areas under the density function in the y domain, or equivalently:

(7.14) Px(xo) = 0

(7.15) Py(y0 ) =.O

(7.16) P [G(x)] - Py(G(x o )l = Px(x) - Px(xo)

(7.17) Py(y) = Px(x)

Next we will be examining the following stage in the model, that involving psychological law.

7.5 PSYCHOLOGICAL LAW: RANGE FREQUENCY MODEL

The range frequency concept fits nicely into the- func- tional measurement view, primarily in the secondary, H(y), stage of processing. It is at this stage that contextual effects are applied. The RF model provides a functional map- ping between the subjective stimulus values, y, and the judg- mental response, G. We will now analyze both the range and frequency principles in the framework which we have been

-150- developing. This will entail deriving the relationships between the density functions at the psychological scale level, y, and the judgmental response level,, G.

7.5.1 THE RANGE PRINCIPLE

The range principle as.serts that differences in response tend to be proportional to differences in psychological magni- tude. This principle says that differences in response are directly proportional to the differences in the subjective values, y, and inversely proportional to the range of stimuli.

These ideas are embodied in the following:

a (7.18) dG = (Gm-Go) - ( ) dy

The "a" is a constant of direct proportionality. This is to say that, in a given range, a small change in subjective value would cause a proportional change in the judgment rating.

Likewise, if the range of subjective values, (ym-yo), of stimuli increases, the judgments would be more spread out, implying a smaller change in G for a given change in y. The

(Gm-Go) factor merely reflects that dG depends on the size of the rating scale in use. On a scale of 0-10 for example, dG would be ten times larger than on a scale of 0-1. This, then, is our normalizer for the rating scale in use.

-151- With such a range principle acting alone, we now investi- gate the form of the function, H(y). Integrating Equation

7.18 over both G and y domains we obtain:

G y a (7.19a) dG (GM-G 0) ' dy (ym yo G 0 yo

G a y (7.19b) GC = (C-Go) y GC0

a (. 19c) G- C0 = (GM-G0) (YY0 (ymSm -YO0

(y-yo) (7.20) G = G + I(Gm-Go) a * (ym~-YO)

Going further, we obtain the P.DF of G under the range princi- ple by the following relations:

dy (7.21) PG(G) = p (y) *-- dG

-152- dG (Gm-Go) (7.22) -- =a' 0 dy (Ym~-o)

Thus, inverting dG/dy, we obtain:

(y -y 0) (7.23) PG(G)= py(y) a m( a * (Gm-G )

Under the range principle alone, the form of the PG(G) function is identical to that of the py(y) function, except for a constant factor set by the range of stimuli. This implies that whatever frequency distribution the stimuli have on the y dimension, the judgments will have the same form of distribution. This is illustrated below in Figure 7.6a.

Changing the range on the psychological or y dimension has no effect on the form of the G probability density function, as- seen in Figure 7.6b, if the G rating scale is kept the same.

-153- PROBABILITY CUMULATIVE DENSITY DISTRIBUTION FUNCTIONS FUNCTIONS

p y (Y) L: P (y) - b$ad 4.! :03

01 6m y a 01 31 yo 1 YM 11I

I G* PG(G)

I ~ G I Gin 4G G 0

Py(y) P (y)

- I,.~y. .5 0 b o2 m2 y02

... G I. * P I PG (G) I I G PG(G)

f% I G G m G 0 Gmn

FIGURE 7.6: DENSITY FUNCTION ILLUSTRATION OF THE RANGE PRINCIPLE.

-154- 7.5.2 THE FREQUENCY PRINCIPLE

The frequency principle says that changes in response, dG, are directly proportional to the perceived frequency dis- tribution of the stimuli, that is the probability density function, py(y). This reflects that subjects tend to use equal portions of the response continuum with equal frequency.

This principle can be written as:

(7.24) dG = b (Gm-G0) py(y) dy

where b and (Gm-Go) are constants of direct proportionality.

In a similar manner to that for the range principle, we now investigate the form of the function, H(y), if the fre- quency principle were dominant. Integrating Equation 7.24 over both G and y domains we get:

G Y

(7.25a) dG = b (Gm-Go) py(y) dy

G0Y

y G (7.25b) G G b- (G -G0 ) j py(y) dy

-155- (7.25c) G G = b (Gm-Go Pf(y)

(7.26) G = G+ [ b (Gm-G0) PPy(Y)

Again going further to obtain the PDF of G under the frequency principle, we have, by rearranging Equation (7.24) and invert- ing:

dG (7.27)- b p(Gm~y) dy

dy 1 1 (7.28) dG b (Gm-Go) py(y)

Then by Equation 7.21, we have.:

1 1 (7.29) PG(G) = py(y) (0rro' 7 29)PG(G) p (Y)b (G m-Go) py(y)

(7.30) pG(G) = b . (Gm-Go)

This we see is a uniform distribution, which is exactly what the frequency principle says, that is, equal portions of the response continuum used with equal frequency. A picture

-156- of this is shown in Figure 7.7a below. With frequency princi-

ple dominance, no matter what distribution appears on the

psychological impression scale, y, the distribution always spreads out -on the G response scale into a uniform distribu-

tion.

We note that a uniform distribution -is not always the proper distribution to assume. There may in fact be some

underlying biases in some cases, for example, subjects may tend to use the end categories.more frequently. Biases such as these are felt in the frequency principle; the PG(G) changes to something other than a uniform distribution. The case where there is a tendency to use the end categories more

is illustrated below in Figure 7.7b. The probability distri- bution along the judgmental axis i-s no longer fl at under the frequency principle. Instead it causes the distribution to be higher at the ends than in the middle section of the range. Next we will take up the topic of combining the effects of these two principles and derive the RF compromise solution.

-157- PROBABILITY CUMULATIVE DENSITY DISTRIBUTION FUNCTIONS FUNCTIONS

y y(Y) - I - a L 0 a yom I m

PG(G) PG(G) I 1 Ga

ra 0 i GS G G m 0 G

1

p (y) P (y)

1-1 I: 6 * I I 0 I S I b YO YM

PG(G) 1GG ap G 0 G Gm

FIGURE 7.7: DENSITY FUNCTION ILLUSTRATIONS OF THE FREQUENCY PRINCIPLE.

-158- 7.6 THE RANGE FREQUENCY COMPROMISE

The total response is a compromise between the two

preceding principles, the range tendency and the frequency

tendency. Combining Equation 7.18 and Equation 7.24, we

arrive at the following statement of the range frequency prin- cip e:

(7.31) dG = (Gm-Go) (y ) *dy + b Py(y) dy

ym~ 0

Again, solving for G in the same manner as before:

G y- ](y + b'py(y) dy (7.32a) GdG = (Gm-Go) a f f (ymyo)

G yy

0

y a y (7.32c) G - G0 = (Gm-G). y + b j py(y) dy (Ym~ jo

-159- Now since we know that the integral of the PDF is the cumulative distribution function as below,

y

(7.33) Py(y) = py(y) dy YO then substituting this into Equation 7.32c yields:

-0) (7.34) G =Go + (Gm-Go) aY) + b-PY(y) [a mKyo)

The probability density function on the judgmental scale,

G, becomes just a compromise between the two .distributions derived earlier under the assumptions of the range (Equation

7.23) or the frequency principle (Equation 7.30) acting alone. The three stages of processing are illustrated in Figure 7.8 below, using the same illustration as in Figures 7.6 and 7.7.

The RF compromise is illustrated in the lower section of Fig- ure 7.8. It can-be seen that PG(G) falls between the density functions obtained using only the range or frequency principle alone. Using procedures similar to those used in-the previous sections, we can derive the actual form of this distribution which turns out to be the following:

-160- PROBABILITY CUMULATIVE DENSITY DISTRIBUTION FUNCTIONS FUNCTIONS

I

P (x) p (x)

x

xo Xm 3(X)

0 m

p y(y) P (y)

1-1

V ALP --- ' I .- ---- A I I ...... j 7 o m o" H(y)

RANGE FREQUENCY pG(G) PG(G

G C G m G G

RF COMPROMISE

PG(G) G (G)

. mn Gom

FIGURE 7.8: DENSITY FUNCTION ILLUSTRATION OF 'THE RANGE FREQUENCY COMPROMISE.

161- ( m~ 0) py(y) ( ) PG(G)(Gm-Go) a + b Py(y) (y-y0 )

We will now continue to simplify Equation 7.34. In line with this, for convenience we will define y0 and ym as fol- lows:

(7.36)Y =o0

(7.37) ym 1

This implies merely that the stimulus endpoints are held con- stant. As long as this is the case, the labeling of the psychological continuum is really arbitrary. With this assumption, our statement of the RF principle (Equation 7.34), in a given fixed range, then becomes:

(7.38) G = G0 + (Gm-Go a y-O) + b - Py(y) (1-0)

(7.39) G = Go + (Gm-GGO a y +b Py(y) 3

Noting that the "a" and "b" we have been using are propor- tional weighting constants for the range and frequency princi- ples respectively, we replace them with their weighting factor

(W) equivalents as follows:

-162- (7.40) a = W

(7.41) b (1-W) where

(7.42) 0 < W < 1

Now recalling Equation 7.4, y = J(x), and Equation 7.17,

P Y() = P (x) , Equation 7.39 then becomes:

(7.43) G = G 0 + (Gm-Go) *[ W J(x) + (1-W) . Px(x) ]

Thus we see that judgment, G, is a weighted function of J(x), the psychophysical transform function (range principle) and

Px(x), the cumulative distribution function on the stimulus dimension (frequency principle). This expression (Equation

7.43) gives meaning to the R1 and F values of the simplified RF model of the last chapter (Equation 6.7). Ri values correspond. to J(x), and the F values are really just approxi- mations to the P(x) values. The Gm and G0 merely normalize to the rating or judgment scale in use.

This completes our look at the derivation of the RF for- mulation of the simplified RF model. We have interpreted the range and frequency principles in terms of the behavior of probability density functions at different stages in a func- tional measurement diagram. Next, building on some of the

-163- mathematical groundwork developed thus far, we will examine the relevant topic of cases where the range of stimuli is not constant.

7.7 RESTRICTED RANGE CASE

Thus far we have dealt with the case where the stimulus range has been held constant and the frequency distribution of stimuli has been varied. Now we will consider the effect of var.ying the stimulus range. Most of our own goodness experi- ments have been of this variety. Specifically we are looking for the relationship of the range function, R(y), in one range context.to that in a different range context. This informa- tion is needed in order to make predictions in the manner described in the section on the simplified RF model. The difference is that we want to make predictions, not for a dis- tribution with the same range of stimuli, but with a different range.

Let us define the context of an experiment as the partic- ular range and frequency distribution for that experiment. We will designate the kth context by the "k" subscript. Thus in context k, from Equations 7.34, 7.40 and 7.41, we have:

-164- (7.44) Gk = Gok + (Gmk-Gok) [ tW4 di ~ e ok) +(olt-W)xPxk(x)

Consider the relation between two different contexts:

(7.45) G = 0Go+ (Gml~ l) + (1-W) * pX (X) [ w

(7.46) G2 = Go2 + (Gm2yGo2) [w ______+ (1-W) Px2(X) m2~o2I

Let us make the assumption that the judgment scales are the

same for both contexts, that is,

(7.47) Gml - Gol = Gm2 - Go2

(7.48) G l = Go2

(7.49) Gml = Gm2

Thus we have:

(-W) .Pi(x) (7.50) G =,Go + (Gm-Go) W - + . ( mlYOi

-165- (YY 2) (7. 51) G2 Go + (Gm-Go)* W + (-W) Px2(X)J ym2yo2)

We also assume that y is in an overlap region, on the psycho- logical scale, between the two different range contexts, that is:

(7.52) yol Y< Ylim<

(7.53) Yo2 Y < Ym2

As we said before, the term involving y above is equivalent to the R value of the simplified RF model. What we are looking for is the relation between the R, and R2 , where

(7.54) R = (______(m1~ ol)

(7 55) R2 Y o2 =ym2-Yo2)

Rearranging Equation (7.54) we have:

(7.56) y = [ R (Ymxyo1)I +yo

-166- Substituting into Equation (7.55) yields:

Ri 2 Ym-Yol) + Yoi - Yo2 7.57) R2 (Ym2Yo2)

(7.58) R2 = (-Y 1 R+(YoI-yo2) (ym2yo2) (Ym2yo2)

This is just a linear relationship, the slope depending only on the ratio of the two ranges on the psychological dimension.

Thus, given that we have the RI(y) function for a particular range, we can infer the R2 (y) function for a different range for those sections of the two ranges overlapping.

Some example illustrations of these relationships are shown in Figure 7.9. Assuming a range function as in Figure

7.9a, we plot the inferred range functions for a variety of other ranges, (b), (c), (d), and (e). Ranges (b), (c) and (d) are fully included within range (a), while (e) has only a par- tial overlap. Thus, only that section of (e) within (a) can be predicted from (a). Note also that reverse predictions could be made from (b), (c), (d) and (e) to (a), but again only for that part of (a) included in each.

-167- BASELINE a R a RANGE FUNCTION

Yoa Yma Rb b

. 1L.bW- y Yob Ymb G R C

y 'oc ~ mc G Rd d

y Yod md S - e Re 0 Vo... --0 .,.-- ~

Yoe mne

FIGURE 7.9: THE RELATION OF RANGE CURVES BETWEEN DIFFERENT RANGES OF STIMULI.

-168- We will find this formulation very useful in the upcoming chapters where the range function for some limited range will be inferred from a full range function. This is a necessary step in making predictions of goodness curves from one range of stimuli to another.

7.8 SUMMARY

We have thus far covered the bas.ics of Range Frequency

Theory. It provides us with a possible model for predicting contextual effects related to the set of stimuli presented in goodness type experiments. Here we have reviewed the main ideas of RF Theory from a mathematical viewpoint and derived a parallel formulation to that of the simplified model of the previous chapter. Lastly we have extended the formulation to the case of stimulus sets with different ranges.

The following chapters will test the applicability of a model based on RF Theory to the case-of letter judgments and goodness experiments described in Chapter 4. We will try to predict the results of one of our empirical experiments from the results of another using some variations of the RF model as a basis for the prediction.

-169- CHAPTER 8

EXPERIMENTS

8.1 INTRODUCTION

In the previous two chapters we examined the foundations, theory and practice of the range frequency approach. The current chapter will describe the design and motivation for

experiments to test this approach's applicability to letter

recognition. In addition it will describe the methodology of

the experiments themselves, as well as the experimental

results. In the following chapter we will explore the methods

employed in predicting the results of one experiment from

another under the range frequency approach.

8.2 GOODNESS EXPERIMENTS AS A TEST OF RF THEORY

Goodness experiments have been the principal vehicle

through which we have studied human perception of letters.

This type of experiment has shown itself to be a very good

indicator of interletter boundary. Our interest in range fre-

quency theory stems from its supposed ability to- describe con-

textual effects, particularly in category rating tasks, a class into which goodness experiments fall. We will observe the success that range frequency theory has in describing how

-170- the interletter boundary, obtained from goodness experiments,

moves around under contextual influence. These contextual

influences will take the form of changes in the range and dis-

tribution of stimulus values in goodness experiments. In this manner then, the goodness experiments serve as a test of the

range frequency theory.- We will now review some of the

characteristics of goodness type experiments which lend them-

selves to our present purposes.

Shillman [182] has pointed out that goodness, as a

psychological paradigm, possesses high reliability and con-

struct validity, is easy to obtain, and is quite sensitive to

subtle changes in the perceiver's judgment over the entire

range of stimulus possibilities. Labeling, on the other hand,

while sensitive at the boundary, is not a good measure at

other places along an interletter continuum.

As we discussed in Chapter 4, the category rating method

has often been used in psychophysical experiments. There are

elaborate procedures for interpreting the information gained

from such experiments in the form of the Law of Categorical

Judgment, described in both Guilford [88] and Torgerson [206], and based on the work of Thurstone. Another related and simi-

lar analysis is provided in the Preliminary Theory of Inten-

sity Perception of Durlach and Braida [66, 30]. Recently researchers have taken to using rating paradigms for studying

speech phenomena [47, 48, 111]. Likewise some of the work of

-171- Braida and Durlach [30] in auditory tone perception utilizes rating scales. They have done work comparing the discrimina-

bility scales derived from their experiments by several dif-

ferent paradigms and found a fairly good match.

Many researchers such as S. S. Stevens [195, 196] have

argued against the use of rating scales' in favor of other

methods such as fractionation. Much of the argument against

rating scales arises due to the variability of these rating

judgments due to outside influences. Nonetheless it is just

this variability which enables us to study contextual effects.

The fact that the goodness curves are somewhat variable helps

us in that it mirrors the movable interletter boundaries of the letter perceiver. Birnbaum [21] provides a further defense of this approach of using contextual effects to bene- fit in a range frequency framework. Also in a range frequency setting, Anderson [10], borrowing an old phrase, describes the use of such contextual effects as getting "a silk purse from a sow's ear.'

What we have said in this section may be summarized as follows. We will be using goodness experiments using the V-Y and C-F letter pairs as a test of the range frequency theory.

The goodness paradigm has a history of use in the area of psychophysics and is susceptible to contextual influences, a property we consider an asset. The individual goodness curves may shift with changes in experimental condition and thus the

-172- interletter boundary may also be expected to move. Next we

will consider the attribute LEG which we will be studying

using a range frequency approach.

8.3 THE ATTRIBUTE "LEG" AS A TEST OF RF THEORY

The attribute we have studied most is that of LEG, partic-

ularly as it pertains to the V-Y and C-F interletter pairs. It

is this attribute which we will also use in conjunction with our experiments investigating the range frequency approach, partially due to our familiarity with its properties using a variety of paradigms. The.V-Y and C-F letter pairs are used

in the experiments to be described, again since we have accu- mulated a good deal of data and experience in their behavior.

There is also the desire to compare results across letter pair as was done before in Chapter 4. Later on we will discuss how using two letter pairs provides a fairer test of the model (we will optimize model parameters on one letter pair and test on the other).

With goodness experiments involving the attribute LEG, the interletter boundary has been defined as that value of k/L where the two goodness curves cross. Knowing how these indi- vidual curves move around under contextual influence tells us how the boundaries move. This will be the goal of our range frequency investigation.

-173- The matter of watching the movement of the boundary may

be somewhat complicated by the fact that the individual good-

ness curves may react in different ways to contextual influ-

ence.. Whereas most psychological category rating experiments rely on verbal labels (e.g., very small, small, average, big,

very big), where the middle category is explicitly defined as

the neutral category, such is not the case in our goodness

experiments; the middle category may not be the neutral

category. This argues against relying on the information from

a single goodness curve to establish the boundary. It the goodness crossover point which is important. For example, we might a priori assume that a goodness rating of 5 on a GV scale (assuming we are using a 0-10 rating scale) would correspond to the k/L value of the V-Y interletter boundary.

This is not necessarily the case, as can be seen earlier in this Figure 5.3 from a C-F experiment where the curves actu- ally cross at a goodness value of around 4. The interletter boundary may be strongly influenced by what might seem rather subtle changes in the behavior of the individual goodness curves. In our later range frequency analysis we will first consider the goodness curves on an individual basis and then in combination to yield the interletter boundary. Next we will take up the variables involved in this con- textual influence which we have been discussing.

-174- 8.4 CONTEXT DETERMINING VARIABLES

As its name implies, there are two important variables to

be considered in the range frequency approach, the range of stimuli and the frequency distribution of stimuli. These

variables, as previously pointed out, have been relevant in

interpreting the results of psychological experiments. The thrust of this thesis is based on the premise that these two variables form a graphical context of judgment for character

recognition. This context is the same whether letters are

actually being read or that goodness judgments are being made on.the characters. In line with the above we want to test out these two aspects of the theory by a suitable choice of exper- iments.

We investigate the range aspect by a series of experi- ments run under a similar format but using different ranges of stimuli. There should be some relationships between the results from these different range experiments. These rela- tions are basically those developed in Section 7.7. For con- trast, in the experiments to be described, we have used a so called full range (spanning the full interletter V-Y and C-F continuums), low range (spanning the low end in terms of k/L) of the continuum approximately to the vicinity of the inter- letter boundary of the.full range experiment), and high range

(from approximately near the full range interletter boundary up'to the upper end, in terms of k/L).

-175- The frequency aspect is covered by a series of experi-

ments all with the same full range but differing in the dis-

tribution of stimuli within that range. For the V-Y letter.

pair there will be four different full range experiment, while

for the C-F pair there will be two. In the studies reported

here, these two variables will be the main experimental vari-

ables as regards graphical context. The next section will

deal with the precise components of the experimental program.

8.5 OVERVIEW OF EXPERIMENTS

In this section, we will describe the experiments relevant to the current research in summary form. Detailed methodological descriptions of each experiment will follow in later sections.

A summary of the major variables in the experiments is given below in Table 8.1. These include the letter pair, paradigm, rating categories, stimulus range, number of stimuli and number of subjects. It should also be mentioned that after the goodness data was obtained in the experiments described here, additional tasks such as labeling, requiring the goodness as a conditioning paradigm, may have been run subsequently. This data is not relevant to the present dis- cussion and will not be discussed further. Thus the results

-176- TABLE 8.1 SUMMARY OF EXPERIMENTS

EXP. FILE STIM. PARA- CATE- RANGE STIM. -NO. SUBJ. NO. NO. NAME PAIR DIGM GORIES (k/L) DIST. STIM. GROUP SUBJ.

I VYV1 v-Y GV 0-10 0.00-0.50 A 12 a 24 I vYY1 v-Y GY 0-10 0.00-0.50 A 12 a 24 2 VYV2 v-Y GV 0-10 0.00-0.18 B 10 b 22

2 VYY2 v-Y GY 0-10 0.00-0.18 B 10 b 22 3 VYV3 v-Y GV 0-10 0.18-0.50 C 10 c 10

3 VYY3 v-Y GY 0-10 0.18-0.50 C 10 c 10

4 CFC1 C-F GC 0-10 0.00-0.50 D 12 d 24 4 CFF1 C-F GF 0-10 0.00-0.50 D 12 d 24

S CFC2 C-F GC 0-10 0.00-0.19 E 10 e 36

S CFF2 C-F GF 0-10 0.19-0.50 E 10 e 36

6 CFC3 C-F GC 0-10 .145-0.50 F 10 f 12

6 CFF3 C-F GF 0-10 .145-0.50 F 10' f 12

7 RVYV v-Y GV 0-10 0.00-0.50 GS 12 g9 10 7 RVYY v-Y GY 0-10 0.00-0.50 GS 12 9 10 7 RCFC C-F GC 0-10 0.00-0.50 GS 12 9 10

7 RCFF C-F GF 0-10 0.00-0.50 GS 12 9 10 8 TVYV v-Y GV 0-10 0.00-0.50 HI 11/5x h 3 8 TVYY V-Y GY 0-10 0.00-0.50 H 11/5x h 3 9 SvYV v-Y GV 0-5 0.00-.485 I 7 i 10 9 SvYY v-Y GY 0-5 0.00-.485 I 7 i 10

-177- to be presented here are from so called first line experiments

with relatively naive subjects.

For reference to later graphs and as a means of talking

about different experiments and parts of experiments, a

descriptive file name is associated with the data from each of

the experimental goodness curves. The file name, which is a

maximum of 5 letters long, will contain within it the letter

pair under consideration, e.g , V-Y. Following this letter

pair will be one of the two letters of the letter pair. For

example VYV would imply that the file contains goodness rat- ings as V while VYY implies ratings as Y. A number in final

position, if present, refers to the range of the experiment, that is, 1 = full range, 2 = low range, 3 = high range. An

initial code letter before the letter pair, if present, such as R, S or T, refers to the particular paradigm and stimulus distribution from which the data was obtained. If there is any doubt, Table 8.1 will make clear the type of experiment performed. These file names merely provide a convenient means of referring to the various experimental results in our later analysis. Some examples of file names are given below:

CFF2 - A low range C-F goodness experiment with ratings as to how well the characters represent t.he letter F (generally the companion file would be CFC2, i.e., ratings as C).

RVYV - a V-Y experiment with ratings as V, using the full range "R" distribution.

-178- In the experimental program we want to explore both the

range and frequency aspects of the theory. The choice of

experiments reflects this desire. Figures 8.1 and 8.2 below

illustrate the stimulus distributions for the experiments

involving the V-Y and C-F letter pairs respectively. At dif-

ferent vertical positions the stimulus values for each experi- ment are plotted on the same horizontal line along the k/L LEG

dimension.

In the V-Y experiments the range variation is explored in

Experiments 1, 2 and 3 in the sense that later we will try to

make predictions for the results of Experiments 2 and 3 (lim-

ited range experiments) from those of Experiment 1 (a full

range experiment). The same holds true for the C-F Experi-

ments 4, 5 and 6 where we will try to make predictions later

for the results of Experiments 5 and 6 from those of Experi- ment 4.

The frequency Variation is explored in a variety of full range experiments. In the V-Y case we have data from four

separate full range experiments, each with a different stimulus distribution. These are Experiments 1, 7, 8, and 9 shown in the Figure 8.1. In the C-F case there are two full range experiments, Experiments 4 and 7. Range Frequency

Theory predicts that these separate stimulus distributions should result in different curves for the different experi- ments. We will explore this topic in the following chapters.

-179- VYV2-VYY2 E XP. VYV3-VYY3 Exp. ----- VYV1 -VYYI EXP. RVYV-RVYY EXP. - :-TVYV-TVYY EXP. 8 - SVYV-SVYY EXP. 9

E1 0 5 .20 .25 3 .35 .40 .45 .5

LEG RAT I (K/L)

CFC2.-CFF2 EXP. 5 :O.CFC -CFF3 EXP. f6 CFCT-CFFI E XP. 4 PRCFC-RCFF EXP. ;7

......

......

......

i I - 5 .0 10 .5 .20 .25 .30 .35 .40 .45 .50

LEG RATIO (K/L)

FIGURE 8.1: ILLUS TRATION OF STIMULUS DISTRIBUTIONS FOR. EXPERIMENTS INVOLVING V-Y. [upper] FIGURE 8.2: ILLUSTRATION OF STIMULUS DISTRIBUTIONS FOR EXPERIMENTS INVOLVING C-F.' [lower] -180-- In the remainder of this chapter, each of the experiments. from Table 8.1 will be described in a more detailed manner as to subjects, methodology, and so forth. Following the methodo- logical description, the experimental results in the form of goodness curves will be presented and discussed.

8.6 RANGE VARIATION EXPERIMENTS

V-Y EXPERIMENTS

Experiment 1 VYV1 - VYY1 Experiment 2 VYV2 - VYY2 Experiment 3 VYV3 - VYY3

C-F EXPERIMENTS

Experiment 4 CFC1 - CFF1 Experiment 5 CFC2 - CFF2 Experiment 6 CFC3 - CFF3

8.6.1 BACKROUND

This set of experiments had as their purpose the explora- tion of the range variable as an influence on interletter

boundary. It was hypothesized that the interletter boundary would move around with a change in the range of stimuli

presented.

The mechanism of boundary shift can be roughly explained as ,follows in terms of the present experiments with V-Y

-181- stimuli in low and full range experiments. In a full range experiment the ratings tend to extend to the full range of possibilities (0-10) since there are stimuli over the entire range of.interletter possibilities (k/L = 0 to 0.5) and due to the rating definitions given the subject. Now consider the case of a low range experiment,. for example a V-Y experiment with ratings as Y. In such a case the high end stimulus values of this experiment would have received middle category ratings in a full range experiment. Now in the context of the larger numbers of low range stimuli- and dearth of high range or long

LEGged stimulus values, the mid full range stimulus values appear much better in goodness than before and the effect is to raise their goodness ratings by several points. This effect takes place along the entire GV curve, least at the very lowest stimulus values and successively increasing until the mid full range. The same effect takes place with the other goodness curve, GV, only the entire curve is effectively lowered. The actual effect is something like a rotation around an axis, the point at k/L = 0 being anchored. With both of the curves being moved somewhat, the Gy being raised and the GV being lowered, the net effect is a leftward shift in the interletter boundary.

The present experiments will attempt to demonstrate this effect, while in the next chapter we will call upon the RF approach to account for the exact mechanism of the change.

Here we will show the effect for both V-Y and C-F letter

-182- pairs. The experiments described here are partially derived

from some of..the experiments described earlier in Chapter 5.

Some of the experimental data here is identical to that reported by Kuklinski [115].

8.6.2 METHOD

8.6.2.1 STIMULI

There were three experiments (Experiments 1, 2, 3) per-

formed with V-Y stimuli in Group 1, each experiment covering a

different range of stimuli and each experiment involving a

different group of subjects. Likewise there were three exper-

iments (Experiments 4, 5, 6) performed with C-F stimuli, again

each involving a different range and different subjects.

The stimuli for these experiments were either characters

along the V-Y or the .C-F continuums, varying in the physical

parameter k/L, the percentage length of line extension, as

shown below in Figure 8.3. The characters were drawn centered

on 5 in. x 8 in. (12.7 cm. x 20.3 cm.) white index cards with black India ink using a Mars #00 technical pen (0.2 mm. line width). The .height of all characters was 1 inch (2.54 cm.).

The angle between the top arms in the V-Y stimuli was kept constant at 420 . The width for the C-F stimuli was kept constant at 0.67 in. (1.7 cim.).

-183- In Experiment 1 (full range) the stimuli consisted of 12

V-Y characters varying from k/L = 0 (V) to 0.5 (Y). In Exper-

iment 2 (low range) the stimuli consisted of 10 V-Y characters

varying from k/L = 0 to 0.18, the approximate boundary of

Experiment 1. In Experiment 3 (high range) the V-Y stimuli

ranged from k/L = 0.18 to 0.50. In Experiment 4 (full range)

the stimuli consisted of 12 C-F characters varying from k/L

0 (C) to 0.5 (F), while Experiment 5 (low range) involved 12

C-F characters from k/L = 0.0 to 0.19. In Experiment 6 (high

range) there-were also 12 C-F stimuli, ranging from k/L =

0.145 to 0.50. The actual stimulus spacings were shown in

Figure 8.2 and can also be seen in the results.

420

L LL

FIGURE 8.3 EXAMPLES OF V-Y AND C-F CHARACTERS USED IN EXPERIMENTS 1-6.

-184- Six 8.5 in. x 11 in. (21.6 cm. x 27.9 cm.) range adapting

sheets were also prepared, one for each experiment. On this

sheet were reproduced all the stimuli the subject would see

during the experimental session. These characters were placed

on this sheet in a random arrangement. The purpose of the

range adapting sheet was to give the subject an idea of the

range of the stimuli beforehand, rather than have the subject

deduce this in the rating part of the experiment. This would

help in reducing rating variability and otherwise the sub-

jects' ratings might be more inconsistent throughout the time

course of the experiment. The range adapting sheets for

Experiments I through 6, with the actual k/L values for each

character added, are.shown below in Figures 8.4 a-f.

Within each experiment, each subject received a different

counterbalanced order of presentation.

8.6.2.2 SUBJECTS

Twenty-four members of the MIT community served as sub- jects in Experiment 1. Twenty-two different subjects served in Experiment 2, while ten subjects different than those in

Experiments I or 2 served in Experiment 3. The subjects serv- ing in Experiments 4, 5 and 6 were different than those in

Experiments 1, 2 *and 3. Experiments 4 and 5 each involved twenty-four .subjects, while Experiment 6 involved twelve sub-

-185- V.051, 095 070

.500

.127-

.167

*.281

*51 9.42 .-219 *.038

FIGURE 8.4a: RANGE ADAPTING SHEET FROM EXPERIMENT 1.

-186- .-145

.051

.000 .038

.130

.-127 1.09 .1i1

..095 .070

FIGURE 8.4b: RANGE ADAPTING SHEET FROM EXPERIMENT 2.

-187- .248

.18

.281

.500

.219

.351

.429

.315

.219

.390

FIGURE 8.4c: RANGE ADAPTING SHEET FROM EXPERIMENT 3.

-188- .281

038

.167 .000

.351

.500 .219

.127

.051

.095 am;-

.070 .429

FIGURE 8.4d: RANGE ADAPTING SHEET FROM EXPERIMENT 4.

-189- 127

.027

.095 .000

.145

.190 .111

.083

.038

.070

.051 .167

FIGURE 8.4e: RANGE ADAPTING SHEET FROM EXPERIMENT 5.

-190- .390

.167

.315 .145

.429

.500 .351

.281

.190

.248

.219 .462

FIGURE 8.4f: RANGE ADAPTING SHEET FROM EXPERIMENT 6. -291- jects (see the experimental summary in Table 8.1). Subjects

in these and following experiments (unless otherwise speci-

fied) were members of the MIT community. For these particular

experiments, subjects were paid a candy. bar or piece of fruit

for participating in the experiment.

8.6.2.3 PROCEDURE

The description of procedure below is in terms of a V-Y

experiment. The procedure for the C-F experiments was identi-

cal and for Experiments 4, 5 and 6, the C-F letter pair would

be substituted in the procedural description (i.e. C for V, F

for Y).

For present purposes, basically each subject performed two tasks: .1) the subject rated each letter as to how well it represented the letter V; 2) the subject rated each letter as to how well it represented the letter Y; For half the sub- jects, Y was the first letter considered for rating purposes in the steps above. Some of the subjects in these experiments performed other tasks following these two tasks such as label- ing which letter the stimulus characters best represented.

All subjects did.have the above two goodness rating tasks in common however.

The procedure was as follows. The subject was seated at a desk and read the following instructions:

-192- "This is an experiment in character perception. We

are interested. in finding out how people will rate

different looking letters. On this page (the subject

was shown the range adapting sheet at this time) are

the characters which will be used in the experiment.

Please look over this page carefully now. Each of

these various representations of the letter V will be

shown to you on a card. For each character shown to

you, use one of the numbers, zero through ten, to rate

how well the character represents the letter V. A

zero rating means no or very poor representation while

a ten indicates excellent representation of the letter

V. The procedure will be the following: I will hand

you a card and then after consideration you. will tell

me your rating of the character as to how well it

represents the letter V. Then place the card face

down on the desk. Please remember that this experi-

ment is in no sense a test of your ability. There are

no right 'or wrong answers."

When the subject was finished looking at the range adapt- ing sheet it was placed face down on the desk. The subject held each stimulus card in turn at a normal reading distance, studied it, and then voiced a rating of the character, on a scale from zero to ten, to the experimenter, who recorded the ratings on a pad not visible to the subject. The subject then

-193- placed the stimulus card face down on the desk. This process

continued until all the stimulus cards were exhauste.d.

When finished rating each character as V,, the subject was

shown the range adapting sheet again and told that the next

task would be to rate the same characters as to how well they

represented the letter Y. The original rating instructions

were partially reread, substituting Y for V, in order to

refresh the subject's memory as to the meaning of the numeri-

cal ratings (i.e., zero means no or very poor representation, while ten indicates excellent representation, this time for

the letter Y). The physical procedure for rating Y was the

same -as that for rating V. The presentation order was the

same in both cases.

Following this, some of the subjects in some of the experiments performed other tasks which will not presently concern us, for example a labeling task. In such a case, the

subject was told that next, the same characters would be presented and (s)he would be asked to tell which letter, V or

Y (Y or V for half the subjects), the stimulus best represented. The cards were shown in the same order as in the first two tasks and after consideration, the subject voiced a decision as to the character's identity to the experimenter, who recorded it on a pad not visible to the subject.

-194- 8.6.3 RESULTS

The results from the full range V-Y experiment (Experi- ment 1) are shown in Figure 8.5 while those from V-Y Experi- ments 2 and 3 are both shown in Figure 8.6. Likewise results from C-F Experiment 4 are given in Figure 8.7 and those for

C-F experiments 5 and 6 are shown in Figure 8.8.

In all of these cases the ratings, averaged across sub- jects for each stimulus value, are plotted a.long a horizontal k/L axis. Additionally, error bars representing t16, that is one standard deviation of the mean goodness value, are pro- vided at each stimulus value on each of the graphs. This serves to supply some rough confidence interval for the good- ness curves. These will prove useful later on, in that they provide a visual goodness of fit measurment for the predic- tions of these curves which we will make.

For each experiment, two goodness curves are drawn, one representing ratings as the unLEGged letter of the pair (shown as open circles), while the other represents ratings as the LEGged letter of the pair (shown as open squares). The curves are labeled according to the nomenclature adopted in Section

8.4. The intersection of the two related goodness curves is a measure of the interletter boundary. Table 8.2 below provides the interletter boundaries for these six experiments obtained from the crossover of the two appropriate goodness curves.

-195- 10

SI

. II

...... 7 cri CO 6 lij I Z II r-) S 0 0 0 4

... .I...... 2 ...... t......

p .15 -23 .3EI .35 .40 .45

L.E.*r-.', Rnl-IO K L

10 ......

9 ......

8 ......

7. VYY2

U) 6 VIIN2 lij z 0 5...... VYY3 0 0 0 4...... WV3

3.

......

1-......

05 .10 .15 .20 .25 .30 .35 .40 .45 .50

L E. G RA T I CI K / L

FIGURE 8.5.- RESULTS FROM V-Y EXPERIMENT.1.(FULL RANGE). [upper]

FIGURE 8.6: RESULTS FROM V-Y'EXPERIMENT 2 (LOW RANGE) AND EXPERIMENT 3 (HIGH RANGE), [lower] -196- 9.... 8-... ..

...... CFF

tG t. 6..FCl

4 . - 4-.-.- -.

0------r------F T .15 .20 .2s5 . .3 s0.50

LEG RATIO (K/L)

G LII 7 - CFF2

6 - -- - -2 F-- z I G CFFS

G

3 - - .-.. -. - -.- -.-. .-.-

2 - --- -. 1T ......

.0A .05 .10 .15 .20 .25 .30 .35 .40 .45 .50

LEG RrATIO (K/L FIGURE 8.7: RESULTS FROM C-F EXPERIMENT 4 (FULL RANGE). [upper] FIGURE 8.8: RESULTS FROM C-F EXPERIMENT 5 (LOW RANGE) AND EXPERIMENT 6 (HIGH RANGE). [lower] -197- TABLE 8.2 RESULTS FROM EXPERIMENTS 1-6

EXPERIMENT RANGE CURVES INTERLETTER NUMBER INVOLVED BOUNDARY (.k/L) I full VYV1 VYY1 0.185 2 low VYV2 - VYY2 0.106 3 high VYV3 - VYY3 0.227

4 full CFC1 - CFF1 0.227 5 low CFC2 - CFF2 0.145 6 high CFC3 - CFF3 0.288

It can be seen that the boundaries for the three separate

ranges are substantially different, in the case of both the

V-Y and C-F experiments. This follows the expected trend.

For a low range experiment, the boundary moves lower, while

for high range' experiments, the boundary moves higher. We saw this phenomenon before in Section 5.5.

Another point of interest is the fact that the corresponding boundaries for the V-Y and C-F experiments are not the same, with the C-F boundaries appearing to be con- sistently higher than the corresponding V-Y boundary. This variation may be due to several effects. The first possibil- ity is that due to stimulus distribution. We note that the stimulus distributions of the corresponding ranges are not the same. In the low and high range experiments, it should also

-198- be noted that the V-Y and C-F ranges, in terms of k/L , are indeed different. That such differences may have an effect on the rating curves is one of the tenets of RF theory, one which we will take up when we do RF analyses on these results.

Another possibility for this difference is the presence of different archetypes in the perceptions of the different letters by the different groups. Naus and Shillman [138] have claimed that it. is really the ambiguous to archetype ratio which is invariant across letter pair and not just the k/L ratio. The subject's personal archetype may form some part of the total contextual experience. We will take up a discussion of this topic following the rest of these experiments.

-199- 8.7 EXPERIMENT 7: FULL RANGE

RVYV-RVYY

RCFC-RCFF

8.7.1 BACKROUND

Experiment 7 provides us with another full range experi-

ment to compare with the results from Experiment 1 (V-Y) and

Experiment 4 (C-F). The stimulus distribution is different in

this case and might be expected to yield different results.

This experiment, originally reported by Kuklinski and

Kuklinski [1163, had as its original purpose the comparison of

boundaries between different letter pairs, all presumed to

involve the common functional attribute LEG. The letter pairs

V-Y, C-F, D-P, 0-P, V-X, U-H, 0-A, and C-I were involved.

Although all these letter pairs are used in the experiment, for the purposes of this thesis, we are only interested in the V-Y and C-F letter pairs, since as mentioned above, they pro- vide comparison results for Experiments 1 and 4. In contrast to the paradigm of these previous experiments, stimuli from the different letter pairs were intermixed together during the rating process in the present experiment.

-200- 8.7.2 METHOD

8.7.2.1 STIMULI

This experiment used eight.interletter trajectories presumed to involve the functional attribute LEG. There were

five single LEGged cases, V-Y, C-F, D-P, and C-I (a horizontal

leg), and three double LEGged cases, V-X, U-H and 0-A. Exam-

ple characters from each of these interletter trajectories are shown below in Figure 8.9.

k L (L L 'ID

ky V-Y D-P v-x

L kC F- C-F O-P O-A

FIGURE 8.9 EXAMPLES OF CHARACTERS USED IN EXPERIMENT 7.

-201- All characters were drawn centered on 5 in. x 8 in. (12.7

cm. x 20.3 cm.) white index cards with black India ink using a Mars #00 technical pen '(0.2 mm. line width). The height of all characters was 1 inch (2.54 cm.). For the V-Y and V-X

trajectories, the angle between the top arms in the stimuli

was kept constant at.420. For all other trajectories the

width was kept constant at 0.67 in. (1.7 cm.).

Each of the 8 full range trajectories consisted of 12

stimuli varying in the LEG variable from k/L = 0 to 0.5. The

stimulus spacing was locally linear in k/L with an inter-

stimulus.interval of 0.25 in the midrange region and further

apart at the ends. Reproduction of the stimulus characters

for-the V-Y and C-F trajectories from this experiment, along

with the actual k/L values, are shown in Figure 8.10a and 8.10b below. The stimulus spacing was illustrated earlier in

Figures 8.1 and 8.2. Each of the 96 stimuli was assigned a

randomly chosen number from I to 96. This code number was

printed in the upper left hand corner of the stimulus card.

There was a different random order of presentation of the

stimulus cards for each subject.

In this experiment, subjects were to write down their

ratings on special data sheets, one for rating as unLEGged

letters, one for rating as LEGged letters and another for

labeling. On the data sheet, the spaces were numbered from 1 through 96. Next to the number was a letter indicating that

-202- . 000. . 050 .100 .125

.150. .175 .200 .225

.250 .300 .400 .500

FIGURE 8.10a: STIMULI FROM THE V-Y TRAJECTORY OF EXPERIMENT 7.

-203- .000 .050 .100 125

.150 .175 .200 .225

.250 .300 .400 .500

FIGURE 8.10b: STIMULI FROM THE C-F TRAJECTORY OF EXPERIMENT 7.

-204- the particular stimulus was to be rated as to how well it

represented that letter. One sheet contained all unLEGged

letters (C, D, 0, U, V) while the other rating sheet contained

only LEGged letters (A, F, H, I, P, Y, X). The third sheet contained only numbers and was for labeling purposes.

8.7.2.2 SUBJECTS

Ten people volunteered to serve as subjects in this experiment. They were mostly female around 17 years of age.

These subjects were not participants in any of the other experiments and were relatively naive to this task.

8.7.2.3 PROCEDURE

The stimulus deck was shuffled beforehand to insure a random order of presentation and the response sheet was given to the subject. The subject was instructed as follows:

"This is an experiment in character perception.

On these cards you will see various representations of

the letters C, D, 0, U and V (A, F, H, I, P). I would

like you to read the code number in the upper left

hand corner of the card and find the corresponding

number on your rating sheet. Then I would like you to

rate the character on the card as to how well it

-205- represents the corresponding letter on your response

sheet. Please use the numbers zero through ten to

rate the characters, where zero means no or very poor

representation and ten'means excellent representation

of the particular letter in question. Please remember

that this experiment is in no sense a test of your ability. There are no right or wrong answers."

The subject read the code letter, found the corresponding

line on the response sheet, and then rated the character on

the card as to how well it represented the corresponding

letter on the response sheet. The subject wrote down the,

chosen rating on the response sheet on the proper line. This

process continued until all the stimulus cards were exhausted.

Next the subject was given a second response sheet where rat-

ings as the LEGged character would be written. The procedure was the same except for the substitution of the LEGged letters

(A, F, H, I, P) in the instructions above where indicated.

Finally a blank response sheet was given to the subject and the subject was asked which letter each of the characters best represented. The subject went through the cards and wrote the choice of letter on the line corresponding to the stimulus code number. In each of these three tasks the same random order of presentation was preserved.

-206- 8.7.3 RESULTS

The results for the V-Y and C-F letter pairs are shown in

Figures 8.11 and 8.12 below. The graphs for the other curves obtained in this experiment are not shown here but are avail- able [116]. The graph format of Figures 8.11 and 8.12 is again the same as for the previous experiments and will be the same in the others to follow.

The V-Y experimental results, with the curves designated

RVYV and RVYY, shows a goodness crossover at k/L of 0.175.

This compares favorably with that obtained in Experiment I of k/L = 0.185. The corresponding result for the C-F letter pair, involving curves labeled RCFC and RCFF, was a k/L value of 0.200, again fairly close to that from Experiment 4 of k/L

= 0.227.

Again it seems that the V-Y boundary is lower than the corresponding C-F boundary. Close examination of the RVYV and

RCFC curves reveals them to be fairly close in shape. On the other hand there are differences between the RVYY and RCFF curves, involving rating as the LEGged letters, Y or F. The

RVYY curve appears to peak at around k/L = 0.4, while the RCFF curve peaks at k/L = 0.5. This indicates a higher archetype for F than for Y, in terms of k/L, for the same group of sub- jects. Because of this, the RCFF curve is further to the right than the RVYY curve, thus effetcting a higher boundary

-207- 10!~...... e -> ...... 7. . --G R YY (li) F.V..V LJ

0 C) 4...... I.II C

3 ------

2 -4 ..

~1 5 15 . .20 .25 .30 .35 .40 .45 .50

LEG RATIO (K/L)

9 -

...... 7- /1 i-s-RCFF (C) 6- ...... rEFC

0 0 0D 4- I:

3-

2 -......

......

i .d 9).. .00 .05 .10 .15 .20 .25 .30 .35 .40 .45 .50

LEG RATIO (K/L) FIGURE 8.11: GOODNESS RESULTS FOR V-Y FROM EXPERIMENT 7. [upper]

FIGURE 8.12: GOODNESS RESULTS FOR C-F FROM EXPERIMENT 7. [lower] -208- for C-F. Aside from this the general shapes of the two curves

are similar. We will discuss these archetype effects later in

Section 8.10.

8.8 EXPERIMENT 8: FULL RANGE

TVYV - TVYY

8.8.1 BACKROUND

This experiment is similar in many respects to the V-Y experiment described previously but differs in some points. Again this is a full range experiment which can be compared with Experiment 1. The purpose of this experiment was to investigate within subjects behavior rather than behavior across subjects as with most of the other experiments. Conse- quently the data here was obtained with only three subjects.

However each subject was exposed to the stimuli many times.

The goodness results studied here are those pooled from these three subjects.

The experimental data here was obtained as part of a study performed by Waldron and Kuklinski and a full descrip- tion of this work can be found in [211, 117]. This study was geared toward investigating the effects of varying the fre- quency of presentation of the stimuli at different stimulus -209- values as discussed in the Chapter 6 discussion on Range Fre- quency Theory. The same subject would be run with several different frequency distributions on different days. The data presented here will only deal with the data collected where each stimulus value was presented an equal number of times.

The following will be an abbreviated version of the procedural description but adequate for the subset of data considered here.

8.8.2 METHOD

8.8.2.1 STIMULI

This experiment utilized stimuli on a full range V-Y con- tinuum. In this experiment there were a total of 55 stimuli,

5 at each of 11 different stimulus values, ranging from k/L =

0.0 to 0.5. The stimulus spacing was determined such that the median stimulus of the series fell at k/L = 0.17 and that there was a constant fractional change between each successive stimulus value (except for the lowest); this makes the spac- ing essentially logarithmic in k/L. Reproductions of the stimuli at each of the different 11 stimulus values are shown below in Figure 8.13.

In total then, there were a large number of ratings across subjects made at each stimulus value. There were 5

-210- .000 .137 . 325

.072 .170 .403

'Y . .

.089 .211 .500

.110 .262

FIGURE 8.13: V-Y STIMULI USED IN EXPERIMENT 8.

-211- ratings as V at a particular value in each subsession, or 15

ratings per session per subject, for a total of 45 ratings per

stimulus value.. Each point on the goodness curve in the

results is the the average of 45 ratings of a character at that particular stimulus value.

In order to insure some degree of uniformity between dif-

ferent tokens at the same stimulus value, the stimuli for this

experiment were computer generated using routines developed by

Babcock [15]. These routines, given a small number of parame-

ters, were capable of generating an entire interletter contin-

uum.

The stimuli themselves were drawn on a Calcomp plotter with black India ink using a Mars #00 technical pen (0.2 mm.

line width). The centered characters drawn on the Calcomp

plotter were cut out into 4 in. x 6 in. (10.2 cm. x 15.2 cm.)

and glued onto the same size blank white index cards. The height of all characters was 1 in. (2.54 cm.) while the angle between the top arms was kept constant at 420.

8.8.2.2 SUBJECTS

The three subjects participating in the V-Y experiment were members of the MIT community. All were native speakers of English and had not participated in any other letter recog- nition experiments. Each subject was paid $3.00 for partici- pating in this session which lasted about an hour.

-212- 8.8.2.3 PROCEDURE

The experimental session consisted of three almost ident-

ical subsessions. Each subsession consisted of three tasks:

goodness rating of each of the 55 stimuli as the letter V,

goodness rating of each of the 55 stimuli as the letter Y, and labeling the character as either V or Y. In each subsession the order of rating was reversed from the previous subsession.

The sequence of tasks was this in the entire session was this: Subsession 1 [GV, GY, L], Subsession 2 [GY, GV, LI, Subsession 3 [GV, GY, LI (where GX indicates a goodness rating as the letter X, and L indicates a labeling task). Two subjects had this ordering of -tasks while one subject rated the goodness of Y initially instead of V. Each subsession used a different random ordering of the stimulus cards. In order to familiar- ize the subject with the full range as quickly as possible, the first five stimuli in each ordering contained both end- points, that is stimuli with values of k/L = 0.0 and 0.5. At the beginning of each goodness rating session, .a set of instructions were read to the subject. These were essen- tially the same instructions as those used earlier in Experi- ments 1-6.. The subject was handed the stimulus cards in such a manner that only one character could be seen at a time. The

-213- subject held each card at a normal reading distance, studied

it and then voiced a rating of the character to the experi-

menter, who recorded it on a pad not visible to the subject. The subject then placed the card face down on the desk. When finished rating all stimulus cards, the subject was

reread the previous instructions with Y substituted for V and

then proceeded to rate each stimulus in the same order as pre-

viously. Finally the subject was shown the cards again in the

same order and asked which letter each character best

represented. Following this the experimenter left. the room,

reshuffled the stimulus cards and returned. The procedure of

goodness rating and labeling was then repeated with reverse

order of the goodness ratings as the two letters, as in the

first subsession. A third subsession with the same procedure

followed.

The complete experiment involved not only the letter pair V-Y but also the letter pair D-P, which two additional sub- jects rated. Here we have only described the first session in a series of five for each of these subjects. The other ses- sions dealt with the same stimulus values but varied the rela- tive numbers of presentations of each. The session described here was the distribution with equal numbers of presentations at each stimulus value from that total experiment.

-214- 8.8.3 RESULTS

The goodness results from this experiment are shown below in Figure 8.14. The ratings from the three subjects have been averaged together. In this case we find an interletter bound- ary of k/L of 0.187, again relatively close to those bound- aries of 0.185 and 0.175 found in Experiments 1 and 7 respect- ively. The form of the goodness curves does exhibit differ- ences however. The low end of the TVYV curve varies somewhat from that of VYV1. The TVYY curve appears to be shifted a little to the right of that for VYY1. The archetype in the

TVYY case again appears to be higher in k/L than for VYY1 and this may contribute to the shift. These differences we will try to account for later in the RF analysis.

-215- ......

...... co Co 6 ...... I'NYV

...... CD 0 Lu 4 -- ......

......

......

......

0 .03 .03 .10 .15 .20 .25 .30 .35 .40 .45 .50

L. E G R A T I (.-J, K L

FIGURE 8.14.* V-Y GOODNESS RESULTS FROMEXPERIMENT 8.

-216- 8.9 EXPERIMENT 9: FULL RANGE

SVYV SVYY

8.9.1 BACKROUND

This V-Y ex-periment was originally performed by Shillman [182] and has in fact been the standard example which we have used to illustrate the basic concepts of the functional attri- bute based theory of character recognition as far as consistency across experimental paradigm and across letter pair [26, 88] are concerned. The experimental description here is adapted from Shillman [182]. Again it is a full range experiment, lending itself to comparison with Experiments 1, 7 and 8.

This experiment differs from the others in several aspects. The character size is half that of the other experi- ments and the goodness rating scale used had only 6 categories

(0-5) rather than our usual 11 (0-10). Since our physical variable (k/L) is a ratio, it should remain relatively size invariant. Parducci and Perrett [157] have noted little difference in results when their rating scale was changed from

6 to 9 categories. We thus expect just a linear scaling of the results. For compatibility with our other results, this is in fact what we will do in the analysis.

-217- 8.9.2 METHOD .

8.9.2.1 STIMULI

This experiment dealt with four V-Y trajectories with four different angles between the top arms, 180, 420, 660, and 0 90 . There were seven stimuli in each. trajectory, varying from k/L = 0.0 to 0.484 on the 420 trajectory and to 0.5 on the others. The stimuli were originally drawn in black India ink with a Mars #0 technical pen (0.3 mm line width). The characters were all 0.5 in. (1.3 cm.) high. Reproductions of 0 the characters in the 42 trajectory are shown below in Figure

8.15. The stimuli were intermixed and placed in random order on an 8.5 in. x 11 in. (21.6 cm. x 27.9 cm.) rating sheet with space for rating as V and as Y. This allowed the subjects to view all the stimuli previous to assigning goodness values.

.000 .083 .145 .259 .315 .429 .485

FIGURE 8.15 STIMULI FROM THE 420 V-Y TRAJECTORY IN EXPERIMENT 9.

-218- 8.9.2.2 SUBJECTS

Ten members of the MIT Community participated as subjects in this experiment. None were involved in letter rating experiments previously and each received a candy bar for par- ticipating.

8.9.2.3 PROCEDURE

Following a brief motivational statement, each subject was given the rating sheets and told to rate each character

(using the integers zero through five) in accordance to its goodness as a representation of a particular letter. A rating of zero indicated that the character was a very poor represen- tation of the indicated letter while a rating of five indi- cated that the character was an excellent representation of the indicated letter. Subjects were allowed to proceed in any order. The session took approximately 15 minutes to complete.

8.9.3 RESULTS

The results here are the same as those presented earlier in Figure 5.1c. They are replotted below in Figure 8.16 with a different vertical axis. The data has been scaled to compat-

-219- IE) ......

......

...... -0- G ...... VY Y G 6 ...... - sVYV lij

......

qp 4 ......

......

......

......

0 4------I------T ------F------T-- .35I------T-- ly ) .05 .10 J ri .20 .25 .311 .40 .45

LEG RATIO K L

FIGURE 8.16: V-Y GOODNESS RESULTS FROM EXPERIMENT 9.

-220- ibility with our other experiments (0-10 goodness) by multi- plication. by a factor of 2. Thus an actual goodness rating of five becomes ten. Only the 420 results from the experiment are plotted here. The other trajectories are available in the original description of the experiment by Shillman [182]. The goodness crossover for these two curves, designated SVYV and

SVYY, occurs at k/L = 0.153. This crossing is a little lower than those for the previous full range curves such as in

Experiment I (k/L = 0.185). Both the SVYV and SVYY curves are shifted somewhat from the corresponding VYV1 and VYY1 curves in such a manner as to push the interletter boundary lower.

The archetype for this experiment appears lower than k/L =

0.5, as is also the case in Experiment 1. The topic of arche- types will be discussed in the next section.

The shapes of the curves here nevertheless are similar to those generated in Experiment 1 where 11 rating categories. were used instead of only the 5 here.

8.10 THE QUESTION OF LETTER ARCHETYPE

This section will deal with the topic of letter archetype referred to in earlier sections. As we said, the archetype that the subject or group of subjects tend to use may influ- ence the interletter boundary. We have noted some of these type effects in- the discussions of the earlier experiments.

Naus and Shillman [138], in studying the LEG attribute,

-221- surmised that perhaps the factor that.remains constant across letter pair is not necessarily the k/L value, but rather the ratio of the k/L value of the ambiguous character at the interletter boundary to the k/L value of the letter archetype of the LEGged letter. This archetype would occur somewhere in the vicinity of k/L = 0.5, but may be higher or lower for some letter pairs or for some subjects. This question of archetype will become relevant in our later discussion of predictions made with RF theory, so therefore we discuss it here now.

From the results of Experiment 7, considered in this chapter, Table 8.3 below was generated. This table lists, for each letter pair, the goodness crossover, an estimate of the

LEG character archetype, and the computed ambiguous to arche- type ratio. Data is included from all the letter pair results of Experiment 7 since this particular.experiment was geared toward the comparison of results across letter pair. All eight letter pairs involving LEG are listed and compared.

In the results from Experiment 7 there is some variation in the k/L interletter boundary for the different letter pairs, from a low of k/L = 0.165 for the 0-A letter pair, to a high of k/L = 0.220 for the C-I letter pair. We note in par- ticular that the C-F letter pair has a somewhat higher boun- dary (k/L = 0.200) than that for V-Y (k/L = 0.175). This is consistent with the results (Table 8.2) from Experiments 1 and

4, where again the C-F boundary (k/L = 0.227) was higher than the V-Y boundary (k/L = 0.185)

-222- TABLE 8.3 TABLE OF ARCHETYPES ACROSS LETTER PAIR

FILE EXPERIMENT RANGE GOODNESS POSITION AMBIGUOUS TO NAME NO. CROSSOVER MAX. GOODNESS ARCHETYPE (k/L) (k/L) RATIO

RVYY full 0.175 0.40 0.44

RCFF full 0.200 0.50 0.40

RDPP full 0.175 0.40 0.44

ROPP full 0.200 0.50 0.40

RVXX full 0.200 0.50 0.41

RUHH ful 1 0.205 0.50 0.41

ROAO full 0.165 0.40 0.41

RCII full 0.220 0.50+ 0.44-

-223- The archetypes in the table are only rough estimations

based on the maximum goodness value on the curve, since there

are only stimulus values at certain points near the high end

of the k/L scale. A "+" in Table 8.3 indicates a possible archetype k/L value, but the actual archetype appears to be out of range, that is, it has a k/L value greater than 0.50. This is hinted at by the fact that the maximum goodness is

less than a value of 10. This effectively makes the ratio calculated to be less than the given value, indicated by "-" The archetypes do appear to vary somewhat depending on the letter pair in question. We recall that this experiment was performed with intermixed stimuli from all the given letter pairs with the same group of subjects..

The- variation in boundary and the variation in archetype

appear to be somewhat in synchronization, as indicated by the relatively small spread in the ratio of ambiguous/archetype

for Experiment 7 (0.40 to 0.44). These results agree somewhat with those reported by Naus and Shillman [138] where they

found an ambiguous to archetype ratio of approximately 0.40

for V-Y, C-F and U-H letter pairs, using a direct choice para- digm. Thus the seemingly divergent results in terms of k/L

are actually not so divergent. The data from Experiment 7

illustrates the different archetypes for different letter

pairs within a single group of subjects.

-224- The seemingly higher k/L boundaries for C-F than for V-Y, in both Experiment 7 and Experiments I and 4 seen earlier, seem to be due to a somewhat higher archetype for the letter F than for the letter Y. Yet in these cases the ambiguous to archetype ratio does not vary a great deal. Thus the idea of a relatively constant ambiguous to archetype ratio seems to be supported in the data here. However caution must be exercised in comparing such ratios across experiments with different distributions since there may be range and frequency effects present to perturb the boundary.

The insights gained here will be applied in Chapter 10.

There we will analyze predictions from experiment to experi- ment and the concept of archetype will play an important role in interpreting those results.

8.11 SUMMARY

This chapter has provided the experimental work upon which the later work relevant to range frequency theory is based. We have explored the motivation for using this partic- ular set of experiments. We have looked at goodness as an appropriate experimental paradigm for exploring contextual effects. The contextual effects we will use are based upon variations in stimulus range and stimulus distribution.

-225- We overviewed the nine experiments and stud-ied the stimulus distributions of each. The methodology of each experiment was described and the results presented in the form of goodness curves. We discussed these results and their relations to each other. Finally we looked at the topic of letter archetype, a subject which will be more relevant when we discuss the results of predicting from one experiment to another in Chapter 10.

Our next chapter will take up the procedure and calcula- tions required to predict one experimental result from another under the range frequency approach.

-226- CHAPTER 9

PROCEDURE FOR PREDICTION ACROSS EXPERIMENTS

9.1 INTRODUCTION

The previous-chapters attempted to give the reader some

insight into the design and motivation of the experimental

program and the experiments themselves were described. Fol-

lowing this we examined some of the individual experimental

results. The current chapter will delve into the process of

applying the theory developed so far, in order to predict the

results of one experiment from another. This will. be the main

criterion for success of the RF'based model.

The RF theory holds that, given a knowledge of one exper-

imental situation, the results of other experimental situa-

tions are predictable. In brief, the prediction method is to

take a chosen goodness curve (.the predictor), calculate a fre- quency function with the knowledge of the predictor stimulus

values, and to infer a range curve, just as described earlier

in Section 5.6. By a suitable transform, depending on the relative size of the ranges of the predictor experiment and that predicted, a range curve for this second experiment is derived. Now, using. the second (predicted) experiment's fre- quency function and this derived range function, along with

-227- the weighting factor, we infer the second experiment's good- ness curve. This predicted goodness curve is then compared with the empirical goodness curve actually obtained in the second experiment. These two functions are then compared and an index of fit computed. By the theory then, our goal of predicting the form of goodness curves should be achievable.

Consider an example of the type of predictions to be done. Recalling the notation from the experiments of Chapter

8, we.would use, for instance, the VYY1 goodness curve to predict the VYY3 curve. Both of these curves involve ratings as the letter Y of the V-Y letter pair. We will not be doing predictions across letter pairs, such as from V-Y to C-F (as from VYY1 to CFF1). We will however use the parameters of the

RF model obtained from V-Y predictions to make predictions within the C-F letter pair.

The main content of this chapter will be in explaining the subtleties and details of the prediction process. First there will be an overview of the prediction method by means of a diagram. Following this, each of the.variables, computa- tions and their consequences in the prediction process will be thoroughly examined.. We will actually make and view the pre- dictions in the following chapter, for which this chapter lays the groundwork. Several varying methods of prediction will be tried in the next chapter. These include prediction in the traditional manner of Parducci and Perrett [157], optimized

-228- predictions, and predictions using only the range principle.

We will cover the relevant material for all these types of predictions in this chapter.

9.2 PREDICTION PROCEDURE

This section will describe in an overview manner the method employed in predicting the results of one experiment from another. This will take the form of using a goodness curve from one experiment to predict the equivalent curve for another experiment.

In a goodness experiment, in a sense, we may consider the human subject as something of a black box. We put in a cer- tain set of stimulus values and get out a certain set of good- ness ratings. This is illustrated in Figure 9.la below. It is the behavior of this black box which we are trying to model. The analogous behavior of the RF based model is illus- trated in Figure 9.1b. Here the box is provided with the same information as the human subjects in the form of the set of stimulus values along the x axis, in our case, k/L. From this set of stimulus values, a frequency curve, F(x), can be gen- erated. This curve, along with the range curve, R(x) (which is essentially determinate given the range of stimuli), and the weighting function, W(x), allows us to arrive at a set of goodness ratings in the form of a goodness curve, G(x).. These

-229- will match those of the human subjects if the model is an accurate reflection of the judgment process.

HUMAN SUBJECTS [xi] IN G(x) 0----- GOODNESS ------0 EXPERIMENT

(a)

R(x) 0------RANGE-FREQUENCY ANALYSIS W(x) G(x) 0------G = (W).R + (1-W)-F ------0 0 ------[xi] KIx) (b)

FIGURE 9.1 AN ILLUSTRATION OF ANALOGOUS (a) HUMAN AND (b) RF MODEL BEHAVIOR IN GOODNESS EXPERIMENTS.

Figure 9.lb essentially implements the standard range frequency relationship,

(9.1) G(x) = (W) -R(x) + (1-W) -F(x)

-230- Actually, given any-three of the four variables involved in this equation, we could solve for the fourth. We will use this technique for making predictions. A schematic overview of the general prediction process is given in Figure 9.2 below. The type of analysis described there actually involves several functions of the physical variable, k/L.. For general- ity, the following definitions and discussions will be in terms of a generalized physical variable, x, to which k/L corresponds for the attribute LEG. A subscript 1 will indi- cate a variable associated with the predictor experiment, while a subscript 2 indicates association with that experiment being predicted; this is only a notational convenience and there is no relation specifically to Experiments 1 and 2 of

Chapter 8. The absence of a numerical subscript indicates generality in that the given function could refer to either predictor or predicted experiments. Before we go into an explanation of the system, let us define the variables with which we will be working.

x - The physical variable. (for the cases discussed here this will be k/L for the attribute LEG).

[x] 1 - The set of ttimulus values for the predictor experiment(1

G1 (x) - Empirical goodness curve of the predictor experiment (1).

-231- G1 s(x)- Smoothed goodness curve of the predictor experiment (1).

F1 (x) - Frequency function derived from the predictor experiment stimulus array, [x] 1 .

R(x) - Inferred range function of the predictor experiment.

(x) - Scale function of the predictor experiment (actually the cumulative sensitivity function), derived from the predictor goodness results.

fx] 2 - The set of stimulus values for the predicted experiment (2).

G(x) - Empirical goodness curve of the experiment (2) being predicted.

Gk(x) - Predicted goodness curve for the second experiment. E The - sum squared error between Gk(x) and G2()

F2 (x) - Frequency function derived from the stimulus array of the experiment being predicted, fx] 2 . Rk(x) - Range function of the second experiment derived as a function of the predictor experiment's range function, R1 (x). Jk(x) Scale function of second experiment inferred from the second experiment's stimulus array and predictor experiment's scale function, J(x).

W(x) - Weighting factor as a function of the physical variable, x.

In the prediction diagram, Figure 9.2, the variables at various stages are circled, while processes and calculations are in boxes. The actual procedure in prediction is very similar to the example case given in Section 6.6.

-232- EXPERIMENT EXPERIMENT 2

GI(x) G2 (x)

SMOOTHERCOMPARE E CURVE

FREQENC G () G(x) FREQUENCY GF(x) FUNCTION

F(x) MODEL R1(x) TRANSFORM 2(MODEL

FIGURE 9.2: ILLUSTRATION OF THE PREDICITON PROCESS FROM ONE EXPERIMENT TO ANOTHER.

-233- Let us assume that we have run two goodness experiments.,

both involving ratings as a particular letter. The results of

one of these experiments (the predictor) will be. used to

predict the goodness curve for the other experiment.

The human transfer functions are shown as the two boxes

labeled Experiment I (predictor) and Experiment 2 (predicted).

Given a set of stimulus values in either case, the output is a set of goodness values. A byproduct of the goodness values in

the case of Experiment I is a set of psychological scale values, J(x), in the form of a cumulative sensitivity func-

tion. These come into play in making the transform from one

range to another.

The set of stimulus values, [x]l, is used to generate the

frequency function, F1(x), while the set, ix] 2, is used to

generate F2(x) for the predicted experiment. A weighting fac- tor function, W(x), is shown in the diagram. Whereas Parducci

and colleagues have usually assumed the weighting factor to be

a constant across the range, e.g., W = 0.55, we will explore later in this chapter the idea that this weighting factor can vary along the x axis and depends somewhat.on the range of stimuli.

In the case of the first experiment, we now have three of the variables needed for the RF relationship. We use Gy(x),

F1 (x) and W(x) to infer the curve R1(x). Under RF theory, we

-234- can use. this range curve to approximate any other experiment's range curve through a suitable transformation. If the range of the other experiment is exactly the same, then the transformation is unity. Otherwise, we rely on Equation 7.58, seen earlier, which relates the range curve in one experiment to that in another. This transformation takes the predictor range curve, R1 (x), scale function, J1 (x), and the set of stimulus values, [xI, as input values, and yields a range curve, Rk(x), for the predicted experiment. We take this predicted range curve and utilize it in con- junction with the same W(x) function used in the predictor experiment and the frequency function, F 2 (x), of the predicted experiment. Again using the general RF relation, Equation

9.1, we arrive at the predicted goodness curve, G2'(x). We can compare this predicted goodness curve with that actually obtained in the predicted experiment, G2(x). A sum squared difference, E, between empirical and predicted goodness curves is calculated and this is used as a goodness of fit measure. The above describes the basic prediction mechanism. We will also perform some amount of optimization in two areas of the prediction process. One area of endeavor will be to find an optimal W(x) function which yields the best match or lowest

E between predicted and empirical goodness curves. We will develop a model for W(x) incorporating five separate paramet- ers. By trying different combinations of these parameters we will search for an optimum.

-235- Another optimization will be done in the step of

transforming the range curve from the predictor experiment to

that predicted via the variable of the "range fraction", the

ratio of the predictor stimulus range to the predicted

stimulus range. Different range fractions will be tried until a best fit is obtained. This is necessary since the implied

range fraction cannot always be obtained accurately from the

relationships between the two stimulus sets. The following sections will cover in detail the various variables and

processes involved in Figure 9.2 above. Discussed in turn

will be the derivation of goodness, frequency, and range

curves, the W(x) function and the optimization procedure.

9.3 DERIVATION OF GOODNESS CURVES

This section will concern itself with the processing

involved for the goodness curves obtained in the predictor and predicted experiment. This will include the calculation of the goodness curves and the smoothing procedure for the pred-

ictor curve.

The goodness curve, G1 (x), as we saw in Chapter 5, is formed by a set of averaged goodness values at particular stimulus values. The goodness at a particular stimulus value, in the predictor curve for instance, is given by:

-236- j=N

G (x ) (9.2)

G (x) N

where the G1 1 (x ) represents the jth of N separate ratings

given the stimuli at the ith stimulus value. Such curves were

plotted for all the experimental. data in Chapter 8. As with any empirical data there is some uncertainty in

the measurement process, and most of the goodness curves do

not appear perfectly smooth. There are several justifications

for the process of smoothing the predictor experiment's good-

ness curve. The first is the assumption that the underlying

goodness curve should really be, to some degree, inherently smooth. This property we would expect of most psychophysical functions. In fact our second reason for smoothing the pre- dictor goodness curve is related to the inferment of the pre- dictor range curve.

In-this regard we also make the assumption that the underlying range curve, at least in the LEG case, should be monotonic and smooth. Using a raw unsmoothed goodness curve to infer the range curve is not consistent with this cri- terion, as we will explain. This is illustrated below in Fig- ure 9.3, where the frequency curve and unsmoothed goodness curve are used to infer the range curve. The frequency curve

-237- 10

GOODNESS 8 (G) 6 --O-- G (empirical) -0-- R (inferred) 4

2 -

r 0 Iowa 0 .1 .2 .3 .4 .5 PHYSICAL DIMENSION (x)

FIGURE 9.3: AN ILLUSTRARION OF THE NOISE AMPLIFICATION EFFECT WHEN USING THE UNSMOOTHED GOODNESS CURVE TO INFER THE RANGE CURVE.

-238- is dependent only on the stimulus array and hence, for the

cases studied here, is inherently smooth. On the other hand, the range curve is derived from both G(x) and F(x) by the

rel at ion:

J(x) - (1-W) F(x)

(9.3) R(x) = ______W

Since the range curve is inferred from the goodness and fre-

quency curves, it is relatively sensitive to small variations

in the goodness curve. Taking the derivative with respect to 0, we obtain:

dR(x) 1

dG (x) W

For example, if W = 0.5, then dR(x)/d (x) = 2. For every change in the goodness curve, the change in the range curve would be twice as large. Thus we have somewhat of an amplifi- cation effect, the amplification factor being determined by

1/W. We opt not to use a range curve generated from a noisy goodness curve in predicting another goodness curve. In order to get around these difficulties, we went to the process of smoothing the predictor goodness curve in order to make pred- i.ctions. We will not however smooth the predicted

-239- experiment's empirical goodness curve when making comparison with the goodness curve we predict, in order to make an honest comparison. Smoothing at this stage might tend to bias one into believing the prediction results were better than they are.

Most of the remainder of this section will be devoted to the computational details of the smoothing process. This pro- cess description has been included for completeness.

The complete smoothing process is illustrated below in Figure 9.4. The raw data is generally spaced at nonequal intervals, while the smoothing algorithm which we will use requires equally spaced points. A linear interpolator is used to transform the data to equally spaced interva-ls (around

0.025 k/L units in this application). The function of the linear interpolator is the following. Given a set of known x and G values (in the case here, x represents the stimulus values in terms of k/L while G represents the averaged good- ness values of stimuli at those stimulus values), we want to determine approximate G values for a new set of x values, in this case a set of equally spaced stimulus values. This same linear interpolation scheme will be used later in the predic- tion routines. The interpolation formula is the following:

-240- For x ( x K

(x-x1) (9.5) G(x) = G+(X) + S [.G(x +1) - G(x 1)] (x1+l1x1)

After achieving equally spaced stimulus values, the actual smoothing function is applied. This function, given below, perfOrms an averaging of adjacent points, weighting the central point at 0.5 and the two points on either side at 0.25 each. The end points are not averaged.

For i = 1

(9.6a) (xI) G1 s (x ) = G

For i = 2,...,N-1

[G(xi~1) + 2G(xl) + G(xi+133 (9. 6 b) Gs = 4

For i = N

(9.6c) G1 s(XN) = G(xN)

Following the smoothing procedure, linear interpolation is used to transform back to the original stimulus value spac- inq. We now have a smoothed goodness curve with the original

-241- RAW DATA UNEQUALLY SPACED STIMULUS VALUES

LINEAR INTERPOLATOR TO EQUALLY SPACED STIMULUS VALUES

G EQUALLY SPACED DATA POINTS -x

SMOOTHING ALGORITHM

x

CSMOOTHED EQUALLY SPACED ------DATA POINTS

G- LINEAR INTERPOLATOR BACK TO ORIGINAL DATA POINT SPACING

SMOOTHED. IDATA POINTS ---- a* w CORIGINAL SPACIDNG

FIGURE 9.4: AN ILLUSTRATION OF THE SMOOTHING PROCESS USED ON THE PREDICTOR GOODNESS CURVES.

-242- * .4zzr-4rz------

9.... - o-G. (. .. ..VY] (SMOOTHED) L..

5 -......

3 ...... /......

3 - -.

03 .0 .0 .5 .2a .25 .30 .35 .40 .45 .5

LEG FHTII (K/L)

FIGURE 9.5: AN EXAMPLE OF THE SMOOTHING PROCESS APPLIED TO THE VYYI GOODNESS CURVE.

-243- stimulus spacing. This processed goodness curve, GIs(x), is then used in further predictive processing. The effect of the whole smoothing operation on the data from VYY1 is illustrated

above in Figure 9.5. The goal of taking out local noise vari- ability and at the same time preserving the salient properties of the curve appears to have been achieved in the illustrated case.

This completes our look at the processes involved with the goodness curves. Next we will take a look at the pro- cedures used for calculating the frequency function.

9.4 DERIVATION OF THE FREQUENCY FUNCTION

The frequency function, F(x), was described in detail earlier in Section 6.6. Given a stimulus distribution, in our case a set of k/L stimulus values, and the number of presenta- tions of each stimulus, the F values can be obtained by the stimulus packing method described by example in Figure 6.4. For the case where there are an equal number of presentations at each stimulus value, as is the case with the experiments considered here, the alternate formula for the frequency value of the ith stimulus value,

-244- (C-i) (9.7) F(x 1 ) - (i) (N-i)

may be used in cases where C, the number of rating categories,

is approximately equal to N, the number of stimulus values

This merely says that the frequency values are proportional to

the order along the x axis, indicated by i. Generally, an

F(x) curve will vary with the number of categories.

The frequency functions for all the experiments involving

V-Y are shown in Figures 9.6 (different ranges) and 9.7 (full range), while those for the C-F experiments are given in Fig-

ure 9.8. These are actually the frequency functions for the

LEGged ratings. The F(x) curves for the unLEGged ratings are

naturally just the inverted versions of these curves. As can be seen, the curves are relatively smooth, merely indicating

that the stimulus values are in a reasonably regular order.

It can also be seen that the curves reflect the frequency

principle, that is, the values along the goodness axis within each curve are equally spaced. If subjects were solely using the frequency principle in their judgments, these are the goodness curves that we might expect. Inherent in these curves is the assumption that subjects have an innate tendency to use the various categories with equal frequency.

Figures 9.6 and 9.8 illustrate how widely varying the frequency functions are for different ranges of stimuli,

-245- 10 - - -- -

&./ F 7 -- VYY2 VeY2 -i FVY13 L11 F 03

4 /..

LU 0 .5 .10 .15 2 .2 .25 .30 .35 .40 .45 .50

U LEG PflTIO (K/L)

42--

2 - ...-...... -..-...... *-....

I. -...... W

F

...... 4 VYY Co Li 4 ...- ...... F

'3 ...... T...... F.

0

0 - .05 1 15 .20 25 30 .3 .0F 45 .5

LEG RATIO (K/L)

FIGURE 9.6: FREQUENCY FUNCTIONS FOR V-Y EXPERIMENTS INVOLVING DIFFERENT RANGES. [upper] FIGURE 9.7: FREQUENCY FUNCTIONS FOR FULL RANGE VY EXPERIMENTS. .-246- [lower] ......

9 ......

e------...... F ......

CFF3" Ljj

...... CrTl. Cl LD RCFF ......

3 ......

2 ......

......

I cl .15 .20 .25 .30 .35 .40 .45 .50

LEG RHT10 (K/L)

FIGURE 9-8: FREQUENCY FUNCTIONS FOR ALL C-F E RIMENTS.

-247- although the basic shapes are fairly similar. Figure 9.7

shows the different full range V-Y frequency functions. The

curves all vary from each other, as they should under RF

theory. It is this diversity which would lead to different

empirical goodness curves, since under RF theory the range

curve should remain constant. If all these experiments had

the same frequency function, then our predictions would have

little purpose since, in that case,.all goodness curves should be the same.

Of the curves in Figure 9.6, those for VYY1 and TVYY are

most similar in shape. We are primarily interested in the

difference of the other curves from that for VYYI, since it is the VYY1 goodness curve which would be used as the basis for predicting the other curves. We note that the frequency curves for RVYY and. RCFF are identical since they were run concurrently with stimulus.spacing exactly the same.

This completes the discussion on the derivation of the frequency function. Our next topic will be that of the weighting factor, W(x).

-248- 9.5 A MODEL FOR W, THE WEIGHTING FACTOR

In all the previous RF experimental work. done, the

assumption has always been that W, the relative weight

accorded the range and frequency principles, was a constant

over the range of stimuli and usually that W was some fixed

value. We recall that W is the weighting factor for the rela-

tive importance of the range or frequency principle in judg-

ment, that is,

(9.8) G(x) = (W) -R(x) + (1-W) F(x)

A value of W= 1 implies total range dependence, while W-= 0

implies total frequency dependence.

In most of Parducci and colleagues' early work [147, 148,

157], a value of W. somewhere near W = 0.55 was found to pro-

vide the best match to the empirical data. However, in some

single subject studies [156], Parducci and Perrett noted dif-

ferent properties of W in different subjects. Of four sub-

jects tested (judging sizes of squares), two subjects failed

to show.strong contextual effects. A best fitting W value for

these subjects was 0.9 (quantized to 0.1 units of W) while the

other two subjects showed a best fitting value of W = 0.6, this latter value being much closer to other of their results with multiple subjects. One surprising result of this was an

initial best fitting W of 1.0 and 0.9 for the two context

-249- affected subjects, thus indicating a change in W with time.

In a more recent study by Parducci, Knobel and Thomas

[1533 on ratings of the sizes of squares and circles, it was

found that the best fitting W's were in the region, W = 0.75

to 0.80. These W values were thus somewhat higher than in

previous studies. This group also noted that, in earlier RF work [157] with narrow ranges, the results improved when they

assumed a higher W than they had previously used. This indi-

cates that perhaps W depends on the range somewhat. Such an

effect might be expected based on the knowledge that discrimi-

nability is better when the range is smaller (the well known 7+2 phenomenon [134]).

It appears then, that an optimal W value may vary from

experiment to experiment, from subject to subject, and even

within an individual subject. In our predictions we will con-

cern ourselves with investigating some of these issues. For

example, in indi.vidual predictions, we will search for individ- ual optimal W's and compare them to each other.

Still another possible degree of freedom for W is that it could vary as a function of the physical variable, x. Instead of being some constant such as W = 0.55, we would have a W(x) function. Anderson [103 mentions this possibility and sug- gests that the range effect may have greater weight nearer the endpoints, since stimuli may be more discriminable in this region. This idea seems reasonable in terms of the discrimi-

-250- nation enhancement at edges, the so called edge enhancement

effect observed by Braida and Durlach [30, 18] and others. In

the present case we probably have some sort of anchoring

effect. The stimulus at k/L = 0 (V in the V-Y case) acts as

an anchor. Other stimuli in this vicinity are more easily

directly compared with the reference V, implying a greater

reliance on the underlying psychological V scale than on pos-

sible frequency effects. Such effects also might be expected

to occur at the high end, although the anchor or a.rchetype Y

is not so well defined in terms of k/L value as V.

In the case of LEG, the better discriminability at the

low end is documented in several studies. Using an ABX para-

digm, Yasuhara and Kuklinski [219] showed maximal discrimina-'

bility (in terms of percent correct) at k/L = 0 for the.V-X

case, while Pastore [159] showed a similar effect for the V-Y

and D-P cases. Likewise, discriminability measures from the present experiments, both with single subjects and across sub- jects, have shown a high discriminability at k/L = 0. Berliner et al. [18] postulate that one of two modes of stimulus evaluation, called context coding. is more likely around the edges of the range of stimuli. This mode involves judging a stimulus in the immediate context of other well defined stimuli such as those at the edges of the range. They describe the approach as measuring the distance to the given stimulus with a noisy ruler. One knows where one is by refer- ence to other local stimuli.

-251- All of these ideas are pretty much based on our knowledge of the underlying scale. Near edges or anchors, we know fairly well where we are with respect to that scale and-conse- quently, the anchors have well defined ratings. In midrange we are much more at sea, so to speak, with respect to the underlying scale and may be more prone to be swayed by fre- quency effects.

We would expect the W(x) function to be a relatively con- tinuous smooth function. The W(x) function would have the property of being higher (tendency toward the range principle) at the edges than in the midrange, based on both discrimina- tion and memory effects. For example, in the V-Y case, a tend- ency to the range principle at the low end in k/L for ratings as V may be due to both physical discriminability and strong memory effects for the absolute anchor V at k/L = 0. At the high end, the physical discriminability is actually quite low, but there is the memory anchor provided by the archetype Y.

For experimental purposes, a parametric model for W(x) was developed consistent with the above assumptions. The model is parametric to allow for optimization of the W(x) function in.making a certain set of predictions.

The model and its parameters are illustrated below in

Figure 9.9. The W(x) function basically consists of two vari- able amplitude raised cosine functions joined by a variable

-252- 0.9 - -

0.8 .-.-.-.-... .

...... EXAMPLE

......

0.3

0.2 -.... I.

U 0.3 ... ..-...... -. . . Lij

0 .2 ...... - 0 ...... 0.15..0.5.3.3. .5 ..5.

03 .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 W=EXAMPLE LEG FHTIO (K/L) 0.50 b = 0.20 c = 0.25 d = 0.15 e = 0.15

FIGURE 9.9: THE PARAMETRIC MODEL FOR W(x), THE WEIGHTING FUNCTION.

-253- length flat section. This function is a convenient one for

implementing the desirable properties mentioned above, that of higher values of W(x) at the edges and somewhat lower values

in the central region.

The parameters of the model are defined as follows:

a - Amplitude of the low end W(x) difference (the vertical distance between the highest and lowest W(x) values).

b - Horizontal distance to minimum W(x); defines the width of the low end range principle dominance.

c - Amplitude of the high end W(x) above the minimum W(x).

d - Width of flat midrange constant W(x) zone; defines the width of minimum range principle dominance.

e - Amount the maximum W(x) falls below 1.0 which is the absolute maximum possible.

Analytically, W(x) is defined as follows in terms of the above five parameters and M, the maximum value on the horizon- tal axis (in the case of k/L, M.= 0.5).

-254- For 0 K x K b

(9.9a) W(x) cos x)- 1 + 1 - 2 \b/

For b K x K b + d

(9.9b) Wtx) = I - a - f

For b + d K x K M

x[ b+ d (9.9c) W(x) = - -Cos IT + 1 - a - 2 M- b + d

The following set of constraints apply:

(9.10a) 0.0 K W(x) K 1.0

(9.10b) a + e K 1.0

(9. 10c) b + d < 1.0

(9.10d) c < a + e

where a b, c, d, and e are all greater than or equal to zero.

-255- The above model allows great freedom of variation, from a completely flat function at a specific W value, such as 0.55, to a widely varying W(x) function as in Figure 9.9 above, depending on the choice of parameter set. This freedom will

be very useful in the optimization stage where we will search

for an optimum W(x) function to achieve the best possible.

match between predicted and empirical goodness curves.' We

will discuss this topic more in an upcoming section dealing

with optimization. The next section will deal with the topic of the range

curve, now that we have covered all the variables necessary

for inferring it.

9.6 RANGE CURVE PROCESSING

As noted previously, the range curve for the predictor is

obtained by inference from the frequency and goodness curves

by the relation:

G 1s(x) + [1-W(x)] F1 (x) (9.11) R1 (x)= W(x)

-256- At each stimulus value, we have an empirically determined but smoothed goodness value, Gls(x), a stimulus determined

frequency value, Fi(x), a weighting value, W(x), and an inferred range value, R1(x). This method of inferment for obtaining the range curve has been employed by Parducci and colleagues in thei'r work on inferring other curves from some basel ine curve [157].

Let us now consider a few restrictions on the form of the range curve. Such.restrictions provide a means of ascertain-

ing when we have an unreasonable set of parameters in the model

We first of all remember the interpretation of the mean- ing of the range curve. It is the goodness curve that might be expected if subjects were using only the range principle in their judgments, that is, if subjects were judging stimuli only in their relation to the endpoints on the psychological scale. Under such a condition the values of the range curve are constrained to that region allowable to the subject in making goodness ratings. In other words, since the range curve represents that goodness curve obtained were the subject using only the range principle, the values of this curve are con- strained to the possible set that.could be obtained in a good- ness experiment. For example, using 11 categories, 0-10, the lowest allowable range value would be 0, while the highest

-257- value would be 10. In the processing done in making predic-

tions, we provide for a limiter on the range curve such that

if it the range value was outside the allowable values, it

would be made equal to the end value,, either 0 or 10 as fol- Iows

(9.12a) For R(x) K GO, R(x)=G

(9.12b) For Go0< R(x)

(9.12c) For Gm < R(x) R1 (x)=Gm

where G0 and Gm represent the minimum and maximum values

respectively of the rating scale and RI(x) represents the

range curve processed through the limiter.

In the case of the k/L physical continuum, we presume that concurrent with it, there lies a somewhat parallel scale of V'ness, as in the case of the VYV1 file for example. As one goes higher. in the LEG direction, we would reasonably expect that the character possesses less and less of the attribute V'ness and that this V'ness scale would be monotoni- cally decreasing. Likewise there would be a Y'ness scale, measured from the VYY1 experiment for example, such that it would be a minimum at k/L = 0, and would increase with more

LEGness to a maximum somewhere in the vicinity of k/L = 0.5.

It would then-would decrease slowly as Y'ness deteriorates in

-258- the vicinity of k/L = 1. In as much as the range curve is

related to the underlying scale, we would expect that it would

be bound by many of the same properties. Since the underlying

scale is presumed relatively monotonic, then the range curve

should be relatively monotonic over the range of interest. A

possible e-xception might be in the Y'ness scale at the high

end where it might start a slow decline.

Related to this idea, we might also expect that there

should be no sudden discontinuities in the range function, and

thus it would be relatively smooth. We went to the step of

smoothing the goodness curve in order to minimize the the

resultant variation in the inferred range curve. However, due

to the nature of the scale here, where we have a natural

dichotomy between letter classes, a local increase in slope at

the interletter boundary should not be unexpected. We might,

after all, expect a degradation from V to be faster when

letter identity is itself changing.

Thus far we have discussed some general properties of the range curve. Next we will study the means for the transforma- tion of this predictor range curve to a range curve for the predicted experiment. The value of the range curve as a pre- diction medium is that aspect in which we are most interested.

Now we will consider the step of transforming the range function from the predictor experiment, R(x), to another one,

R2 (x), suitable for predicting G2 (x). A basic assumption of RF

-259- theory is that there is an underlying range function or resid- ua.l scale which is invariant in all cases with the same range and differs by a linear transform in experiments with dif- ferent ranges. We recall Equation 7.58 which relates a range curve from one stimulus range (Context 1) to that in: another

(Context 2):

(9.13) R2(x) = R(x) + (ym2~ o2) m2~ o2)

where we recall that Yo0 and ym# represent the minimum and maximum psychological scale values in their respective ranges (#). This equation provides the basic mechanism by which the transform is done. The terms which we need to compute in order to carry out the transform are obtained from the psycho- logical scales of the two experiments. The multiplicative factor above depends on the ratio of the widths of the two psychological ranges. We will define the ratio of the predicted experiment's width to that of the predictor experiment's width as the "range fraction", f, as in Equation 9.14 below.

-260- Ym2 -Yo2 (9.14) f

Thus the multiplicative factor of Equation 9.13 becomes

1/f. In those. cases where we will be predicting from one full range experiment to another full range experiment, this factor would be 1. In the.other prediction cases we will be predict-

ing from a full range experiment to a narrow range experiment.

The additive component depends only on the relative placement of the low ends of the two ranges. If they are coincident, then this term is zero. Otherwise this term gives the y axis (goodness) intercept of the predicted range curve.

Bear in mind that the transformation of Equation 9.13 is done in terms of the psychological scale and not the k/L phy- sical scale. In order to do the transform we need some grasp of the underlying psychological scale. For the purpose of calculation, a close approximation to the underlying scale is provided by the cumulative sensitivity function. Parducci and

Perrett [157] have used Thurstone scaling to set up the horizontal axis for their experiments. A description of scal-

ing in general and this particular type of calculation is pro- vided in Torgerson [206]. The cumulative sensitivity function as used by Braida and Durlach [30] involves essentially the

same calculations as those for Thurstone scaling and the differences are very slight between these two methods. This

-261- function is merely formed by the sum of discriminability meas- ures (d's) between each successive stimulus. We have adopted the cumulative sensitivity function as the psychological scale in making predictions.

We use the scale function, y = J(x), from the predictor for two purposes, as illustrated below in Figure 9.10. One purpose is to approximate the value of the range fraction, f.

The x values of both the predictor and predicted experiments. are converted to y values using the predictor experiment's scale function. The range fraction and additive constant of

Equation 9.11 can then be calculated. The second purpose is to use the y.scale instead of the x physical scale in order to interpolate a baseline range curve at the stimulus. values of the predicted experiment. We take the range curve, R1 , as a function of the scale value, y, and the set of new stimulus values for the predicted experiment, [y]2 , and interpolate basically the same range curve, R1 (y), but at different stimulus values, those corresponding to the physical stimulus values of the experiment to be predicted. This range curve,

Ri(y) is then transformed using Equation 9.11 into Ri(y) for use in inferring the predicted goodness curve. There were some problems associated with using this type of scale function. Occasionally it was impossible to calcu- late the scale difference between two stimulus values due to the set of ratings received. The d' value between such

-262- ...... A- vyyl VYY2 'p-1 ......

......

...... y m l

vyyl ......

G - ......

...... m2 ...... y Lij ml -yol 3 ......

y m2-yo2 yw2-yo2 2 ...... :3 ml-yol ...... RAN CW& iM':

T -7 YO .05 to .20 .25 .30 .35 .40 .45 .50

LEG RATIO K L

FIGURE 9. 10: DERIVATION OF THE- RANGE FRACTION FROM THt- PSYCHOLOGICAL SCALE FUNCTION,

-263- stimulus values was large but indeterminate. In these cases, a value based on the shape of the cumulative sensitivity func- tion was estimated. Such approximations lead to some inherent variability in the calculation of the range fraction. As we will see in the next chapter, the results using the calculated range fraction were not necessarily of high quality. It was found that the prediction error was extremely sensitive to variation in the range fraction and we set this up as one of the parameters to be optimized.

Several other complications having to do with archetypes arise in defining the width of the psychological range. In light of Section 8.10, it is quite possible that there are different archetypes in different experiments. The effect of such different archetypes would be to stretch or shrink the effective psychological range, depending upon their position.

This type of effect can even occur when an archetype is not even included within the range of the stimulus set. The situation of rating letters might be considered a classic case of this phenomenon. In this case, the letter archetypes are well defined anchors as seen earlier. Most of us have been exposed to letter type stimuli since we were very young. Thus it should not be surprising that in the type of experiment described here,' if one of these archetypes was not included in the actual set of stimuli, that the archetype may still exert some influence on judgment. There may be a tendency to treat

-264- the stimulus set as if the archetype were there anyway, with a consequent effect on the psychological range of the stimuli

A clearcut example of this in the present experiments might be in the following situation. The VYV3 file consists of rating data as to how well stimuli represent the letter V.

The stimuli in this experiment range from k/L = 0.17 to 0.50.

The standard V archetype at k/L = 0 is thus not included in the stimulus set. There is some question as to the psycholog- ical range for this experiment. It would seem that the V would indeed be a member of. the psychological set of stimuli.

The very nature of the rating task, that of rating how good a

V the given stimulus is, dictates this. Subjects, although not physically -viewing the stimulus V (k/L = 0) in this'task, would psychologically extend their range to include it. Thus we would expect the results of such an experiment to be close to those obtained in a full. range experiment.

The same would not necessarily be true for the VYY3 case, at least not to the same extent. The subject presumably is aware of the V-Y nature of the experiments and does have some knowledge of V in this experiment, but now the subject is rat- ing characters as to how well they are Y-like. Thus we have the curious situation where two tasks, both using the same physical range of stimuli, nonetheless have different psycho- logical ranges (at least in the functional sense).

-265- Such anchoring effects play an important role in the

present work and are well recognized in the literature. Par-

ducci [150] provides a discussion and references to anchoring

in light of range frequency theory. Parducci, Knobel and Tho-

mas [153] have postulated such an effect in their work with

size ratings of squares and circles. They surmised that the lighter field against which they projected stimuli could serve

as a distant anchor in that it would represent the largest

stimulus that could be presented under the given experimental

conditions Their data seemed to reflect such an effect. They

attributed the position of the psychological endpoints to four possible factors: (1) the physical value of the two extreme

stimuli.presented for judgment, (2) backround stimulus

anchors, (3) the presence of irrelevant or unjudged stimuli,

and (4) the skewness of the distribution of values presented for judgment. This last effect they attribute to the hypothesis that a skewed distribution suggests the possibility of even more extreme stimuli at the skewed end, and thus an extension of the psychological range in that direction.

As a result of their experiments, they proposed a modifi- cation of the simple RF model we have been using thus far, to allow for a linear transform of the range function within the same psychological range as follows:

-266- (9.15) G(x) = (W) [s R(x) +t] + (1 - W) F(x)

where s and t are the multiplicative and additive terms for

the range function. This is certainly consistent with our

limited range derivation, which also provided a linear

transform of the range function. The above is motivation for

allowing some optimization in the range fraction, f, even when

predicting from one full range curve to another.

This finishes our important discussion on the range curve, its derivation and transformation. Next we will cover a related topic, that of optimizing the fit between predicted

and empirical results, which is done partly by the variation in the range fraction which was discussed in this section.

9.7 OPTIMIZATION PROCEDURE

This section will deal with the procedure of optimizing the predictions from one experiment to another. First we will discuss the philosphy behind the optimization and some justif- ications for this step. Following this we will look at the error measure, E, for the prediction fit and reasons for choosing the sum squared difference between predicted and empirical goodness curves as this error measure. Next the details of the optimization method will be explained. The actual results using the optimized parameters for prediction

-267- will be explained in the following chapter. The optimized results are only one of several sets of predictions which will be presented in the next chapter. Thus the procedures described here are applicable to that particular set of optim- ization results.

In doing predictions from one experiment to another, there are potentially a large number of free variables and parameters of the model which could be adjusted. Naturally, one would like the model to fit the data perfectly, but this is a rare if not unattainable situation. However the fit can usually be improved somewhat by adjusting the parameters of the model.. With enough free variables, a perfect fit could be asymptotically approached. In general, of course, any model can be made to fit any data given enough free parameters, and thus such a model may be considered to have very little power.

We would have an excellent fit but probably a vacuous model.

On the other hand, obtaining a good fit with only a few sim- ple, reasonable modifications of parameters would lend much more credence to the model.

We would like to avoid the difficulties mentioned above if possible. We take a lesson from the standard pattern recognition procedure of using a design set and a test set.

Traditionally, in pattern recognition algorithm design, the sample data to be recognized is randomly split into two groups. Half the data (the design set) is used to estimate

-268- the boundaries between classes as accurately as possible. A

high correct recognition rate is expected with this set. The

second half of the data is then used as an independent test of

the worth of the partitioning algorithm. Performance is gen-

erally- poorer on this test set.

In our present situation, we have a two fold purpose.

One purpose is to develop the model and discover just what are

the optimal parameters. The other purpose is to test the

model using the optimized parameters. We have two fairly com-

parable sets of data for two separate letter pairs, V-Y and

C-F. Analogous to the pattern recognition approach above, we

will do most of our parameter estimation work on the V-Y data

and then use these optimal parameters to make the C-F predic-

tions. This procedure provides a fairer test of the model.

The remainder of this section will outline the optimiza-

tion procedure to be followed in order to obtain the best prediction from one experiment to another, and consequently

the best fit of the existing model to the data.

The goal of optimization is to have the predicted good- ness function, G2(x), match the empirical results, G2 (x), as closely as possible. As with any optimization procedure, some error metric is required to indicate how good the fit is currently, and in order to direct the future parameter adjust- ments. We chose as this metric the sum squared difference between the predicted and empirical results. The differences

-269- are computed between the predicted and empirical goodness curves at the stimulus values of the predicted experiment, as given below.

i=N

(9.16) E G(xi) -- G2 x 2

where xi represents the i'th stimulus value.

This error term was chosen over the correlation coeffi- cient, another standard measure, for several reasons. High correlation can be misleading in its indication of the fit of the data. Regular deviations from linearity can- still yield high r measures. The correlation coefficient, r, is not a very sensitive measure of the fit, while the sum squared difference does have this desirable property. The correlation coefficient measures the tren.d but without regard for the slope or for the y axis offset. These latter factors can be important indicators of systematic deviations from the fit.

In the cases studied here, a slope of 1 and y offset equal to zero in the least squares line between predicted and empirical results, would indicate the closest fit. In the literature,

Birnbaum and others [20,. 22, 1743 have commented to great length on the demerits and merits of using correlation alone asa measure of model fit.

-270- We will, in the case of the predictions done, provide a

correlation coefficient, along with the slope and intercept of

the least squares line between predicted and empirical

results. This will not be used as the criterion in the optim-

ization procedure, but is provided in order to give a truer

indication of the model fit. In the actual optimization pro-.

cedure, we will use the minimum squared error term, E. In

addition the visual comparison of. predicted and empirical

goodness curves provides perhaps the best and most honest

measure of all.

There are many approaches to the problem of optimization.

the classical problem can be stated as follows. Given a func-

tion, u = v(w1 , w2 , w 3 ,....wN), find those values of w

w2, ... wN that minimize u over all allowable values of the variables. If u were some well defined function, there may be

some analytic solution to the optimization problem. There are

of course sophisticated iterative methods which will converge to -a minimum. Unfortunately we are not guaranteed that it is

a global. minimum. In order to obtain a global minimum, one could perform an exhaustive grid search, but the practicality of this decreases with a higher' number of variables and a smaller sizes of iteration steps for each variable, the so called curse of dimensionality. For example, consider the case of six individual variables (the number we will be

-271- using), with a step size such that there are only ten possible values each variable could assume. Assuming one second per iteration, an exhaustive search would require 106 iterations, requiring over 11 days to compute all possibilities.

In view of the above difficulties with the various methods, a relatively simple hybrid approach was taken. The optimization procedure consisted of a coarse exhaustive search for an optimal W(x) function along with a finer hill climbing procedure for optimizing the range fraction.

The variables to be optimized for the W(x) function, a, b, c, d, and e, are those discussed in Section 9.5. The pur- pose in optimizing W(x) was to see if some of the ideas of

Section 9.5 were born out in the data. We wanted to ascertain whether the W(x) function did indeed peak at the edges of the range. We also wanted to investi gate whether similar optimal

W(x) functions would be found for different experimental pre-- diction pairs. However, with five parameters involved, a detailed exhaustive search would be out of the question. A rough exhaustive search would still allow us to achieve our goal s.

The set of iterative values for each variable in the optimization procedure for both the W(x) parameters and the range fraction are listed below in Table 9.1.

-272- TABLE 9.1

PARAMETERS USED IN OPTIMIZATION

VARIABLE MINIMUM MAXIMUM STEP SIZE ITERATIONS

0.00 0.40 0.10 5

b 0.00 0. 50 0. 10 6

c 0.00 0.40 0.10 5

d 0.00 0.40 0.10 5

e 0.00 0.50 0.10 6

f 0.40 1.00 0.02 *

The step values of 0.10 for the W(x) function were chosen in order to limit the number of iterations to a reasonable number. This works out to a maximum of 4500 different combi- nations for the W(x) function. The exhaustive method was used for several reasons. One reason is that the space under con- sideration does not necessarily have a single peak and thus, more efficient optimization techniques are not guaranteed to converge to the desired minimum. They may only-find a local optimum. Another reason was for the purpose of comparison across different prediction pairs. One of our goals was to sum the error of all the V-Y predictions to obtain the optimum

W(x) function over all the V-Y predictions. This required

-273- that each prediction pair be optimized with exactly the same set of parameters. This requirement could be met only with the exhaustive search method. This optimal W(x) function for

V-Y predictions could then be tested in performing the C-F predictions.

Although by RF theory there are derivable values for the range fraction, the optimization'procedure is justifiable on the grounds that. the calculations to derive f are based 'on some relatively uncertain quantities obtained from.the under- lying scale as discussed in the previous section. Finding an optimal f value may give us a clearer indication of the true underlying scale., In initial tests, the predictions were found to be most sensitive to the range fraction, f.' For this reason we wanted to optimize this value to a fine degree. The step value in this case was chosen to be 0.02 and f was per- mitted to range between 0.40 to 1.00. Holding the W(x) func- tion constant and searching over the range of f should likely yield only a single local optimum. Thus, for this part of the optimization, a simple hill climbing routine was used which will be explained below.

Diagrams of the total optimization procedure are shown in

Figure 9.11 below and the details for the optimization of the range fraction are shown in Figure 9.12.

The main optimization routine steps through each possible set of W(x) parameters, a, b, c, d and e. With this set of

-274- :Enter

seed

Evaluate E Start E E a aseed b =bseed c seed f+ step d seed e =eseed Evaluate E .i E o RANGE fopt f FRACTION OPTIMIZER < E - (Fig. 9.11b)

ff+ step

Fseed =f_

Emn'f opt

FIGURE 9.11: THE EXHAUSTIVE OPTIMIZATION PROCEDURE FOR THE W(x) FUNCTION. [left]

FIGURE 9.12: THE SIMPLE HILL CLIMBING PROCEDURE USED IN OPTIMIZING THE RANGE FRACTION. [right]

-275- parameters, a prediction from one experiment to another is made (e.g., from VYV1 to VYV2). There is a seed range frac- tion (f = 0.40) supplied in each case to get the process started. The prediction is compared with the empirical results, and the sum squared error difference, E, is computed.

Next, a local optimization with respect to the range fraction, f, is performed (Figure 9.12). The fraction, f, is incre- mented one step (0.02) and the prediction made and error com- puted. If this error term decreases, we continue to step f until no further decrease in error is noted. When this occurs, f is stepped negatively until again no further improvement results. The f value then is that which results in the best prediction for the given set of W(x) parameter-s.

At this point, the W(x) parameters, range fraction, and error term are written out to a computer file. Next the W(x) param- eters are incremented and the process repeated with the optimal range fraction, f, from the last set of W(x) parame- ters becoming the seed fraction for the next iteration of the W(x) parameters. This process continues until all possible sets of W(x) parameters have been tried.

At the completion of the above, we have a record of the error, E, at each particular set of W(x) parameters, but with the optimal range fraction for that particular set. The optimal W(x) function for this particular prediction pair is found by looking through the accumulated list for the minimum

-276- value of E. For each of the prediction pairs, the same pro-

cedure is followed.

Using the information contained in computer files gen- erated for each prediction pair, it is therefore possible to find an overall best W(x) function by summing the errors

across the various prediction pairs. To find the optimum W(x)

function for the V-Y case, the error is summed across all

prediction pairs involving V-Y at each iteration of the W(x)

parameters. As in the individual case, we look for that W(x) whibh results in minimum summed overall error. Recalling the pattern recognition training and test set idea, we use this function now on the C-F prediction pairs. If the results are good, then the model is somewhat validated.

The actual optimized predictions will be presented in the following chapter. There we will compare the optimal W(x) functions from each of the prediction pairs. We will arrive at the optimal W(x) function for all the V-Y predictions in the manner described above. We will use this optimal V-Y W(x) function to make predictions for both the V-Y and C-F predic- tion pairs. An analysis and discussion of these predictions will be presented. This section has laid the groundwork for this upcoming description of the optimization results in

Chapter 10.

-277- 9.3 THE ROLE OF THE COMPUTER IN ANALYZING THE DATA

Over the past several years a great variety of experi-

ments have been carried out in our group's research in the

field of character recognition, many of which were of the

goodness or labeling variety. Part of this thesis effort was

devoted to developing a system for the encoding of the results

of all these experiments in some uniform computer readable

form. The system- would then take this data and perform calcu- lations, be able to plot graphs, and to make predictions from

one experiment to.another. The effort to develop this system

was a necessity as far as the aims of this thesis are con- cerned since it would be much more difficult if not impossible

to perform the types of data analysis and optimizations

desired without the aid of a computer and computerized data

base. This section will describe the computer analysis system

and some of its useful features.

An overview of the whole system is shown below in Figure 9.13. Several computer programs were developed to implement the various functions required and these will now be described

in order.

The initial goodness experiment is performed and the raw data is put into an ordered computer file via an interactive

program CRF (CReate File), the first of the analysis programs.

This program also permits the addition of new data to an already existing file. Provision is made for creating files

-278- GOODNESS EXPERIMENTS

RAW DATA SHEETS

CRF CREATE FILE

RAW DATA FILES

GOOD GOODNESS ANALYSIS

PROCESSED DATA FILES

OPT PRED PLOT OPTIMIZATION PREDICTION PLOTTING ROUTINES PACKAGE PACKAGE

PREDICTIONS OUTPUT GRAPHS

FIGURE 9.13: OVERVIEW OF THE COMPUTER ANALYSIS SYSTEM.

-279- of goodness, labeling or reaction time data. The files

created in this manner are then stored on magnetic tape (PDP-9

DECtape). These files form the experimental data base.

The raw data from the experiments is stored in an ASCII

format for ease of human readability, ease of moving the data

base to other computers, and for ease of modification. Put-

ting the data into computer files has the added advantage that

it provides a backup copy of the original data sheets.

The file format is able to handle a variety of different

types of experiments but under the same general organization.

Experiments with different sets of stimulus val.ues, different

groups of subjects and different numbers of rating categories,

can all be accomodated. Within each file the following infor-

mation is recorded: a file name, number of stimulus values,

number of subjects, number of presentations, the number of

rating categories, the stimulus values, and the goodness rat-

ings accorded each of the stimulus values.

The file system.is oriented toward the feature of ease of addition to the existing data. If, in the future, a duplicate

experiment were performed, one would has the capability of

adding this new data and being able to analyze the merged old

and new data.

Next the second program, GOOD (GOODness analysis),

accesses the raw data files and, from this data, calculates mean goodness values, variances, confidence intervals, scale

-280- values, d' values between adjacent stimuli, and range fre-

quency information. This information, particularly the scal-

ing, is relatively.time consuming to compute. Thus, in the

situation where our main concern will be prediction, we don't

want to recompute the same basic information each time. For

this reason it was decided to set up a second file system to

store this processed information for further quick and easy access by the plotting and prediction routines.

Thus the output of GOOD again contains the same basic

experimental information as the output of CRF but now contains

the processed information mentioned above. These files are

also stored in computer files on magnetic tape.

A third set of programs, PRED (PREDiction), was developed

to take this processed data from individual experiments and

with it, to predict and compare the results from more than one

experiment by means of the RF based model under consideration.

This essentially involves all the steps described in this

chapter. With this program one could choose any experiment

and try to predict the results for any other comparable exper-

iment, with a choice of whatever set of W(x) parameters or

range fraction desired.

Closely associated with the prediction section were the

optimization routines, OPT (OPTimization), which enabled one to adjust the free parameters discussed in previous sections

until an optimal prediction was obtained. The optimization

-281- routines were constructed to be very flexible. Any combina-

tion of variables could be adjusted. Also changeable were the

allowable minimum and maximum limits of the variables.

Another important parameter for optimization is the step size

which allows an optimum to be found to any desired accuracy.

For implementing the W(x) function in the optimization

procedure, a subroutine was provided in the prediction part

which, given the parameters a, b, c, d, and e, returned the

W(x) function defined at intervals of 0.025 on the k/L range

from 0 to 0.5. The value of the W(x) function at experimental

stimulus values was obtained by the linear interpolation pro-

cedure discussed earlier.

Another important routine is the PLOTting package, PLOT.

This routine has the capability of plotting the large variety

of functions in which one might be interested. Such variables

as goodness, sensitivity function, d', and so forth, can be

plotted. We can also plot goodness, range and frequency

curves together from an individual file. The goodness curves

seen earlier in Chapter 8 are the result of processing by these plotting routines. The prediction routine calls on the

plot package to plot predicted and empirical goodness curves together for comparison. We will see these in the next chapter.

The computer analysis system was implemented in the language Fortran IV on the PDP-9 BIP computer of the Cognitive

-282- Information Processing Group at M.I.T.'s Research Laboratory

of Electronics. The optimization routines were converted for

use on this same group's PDP-11/40 computer running the UNIX

operating system. As a backround task, the individual predic-

tion pair optimizations took about six hours to compute. This

completes the overview of the use of the computer in the pro- Sect.

9.9 SUMMARY

This chapter has dealt with the process of applying RF

theory to predicting goodness curves from one experiment to

another. We overviewed the procedure used in making predic-

tions. Following this we looked in detail at each of the

stages involved in the prediction process.

We examined the processing of the goodness curve and the

frequency curve. Then we described the model for the RF weighting function, W(x). Following this was an extensive discussion on the range curve and its use as a vehicle for making the prediction from one range to. another. Next the philosophy behind the optimization of the predictions was presented and also the optimization procedure. Finally we reviewed the computer implementation' of the prediction system.

This chapter laid the groundwork for the actual predic- tions which will be made in the next chapter. There the pro-

-283- cedures described here will be used in several sets of predic- tions. We will examine the naive range frequency approach and compare it with a model optimized as far as the W(x) function and range fraction are concerned. There we will also take the optimized parameters from V-Y and use them to predict results for C-F experiments. The material of this chapter, while quite detailed, is useful for understanding the significance of the .results in Chapter 10.

-284- CHAPTER 10

PREDICTIONS ACROSS EXPERIMENTS

10.1 INTRODUCTION

In th.is chapter we will discuss the results obtained in making predictions from one experiment to another. Here we will examine a variety of predictions, using the procedures described in the last chapter. The predictions will be made from a full range experimental goodness curve to low range, high range or other full range experiments. Predictions for both the V-Y and C-F letter pairs will be performed. We will also use several different variants of the prediction model.

This chapter is organized around these variants.

For comparison among the different model variants, we have chosen a basic set of file prediction pairs from the experiments .of Chapter 8. The predictions pairs used are listed below in Table 10.1. The notation for designation of the goodness curves involved is that adopted in Chapter 8, with each file name referring to a particular goodness curve. There are several characteristics of this set of predic- tion pairs. Predictions are made only from a full range experimental file. It would be impossible to predict an entire full range curve from a partial range curve. The full range predictor files chosen were those from Experiment 1

-285- TABLE 10.1

PREDICTION PAIRS

FROM TO PREDICTOR PREDICTED FILE RANGE EXP. FILE RANGE EXP.

VYVI full I VYV2 1ow 2 VYY1 ful1 I VYY2 1ow 2

VYV1 ful I 1 VYV3 high 3 vYY I full I VYY3 high 3

vvYV full 1 RVYV ful I 7

vYY1 full 1 RVYY ful1 7

VYV1 full 1 TVYV full 8 vYY1 full 1I TVY Y ful1 8

VYV1 ful 1 SVYV full 9

vYY1 full 1 SVYY full 9

CFC1 full 4 CFC2 I ow, 5 CFFI ful I 4 CFF2 1 ow 5

CFC1 full 4 CFC3 high 6 CFF1 full 4 CFF3 high 6

CFCI ful I 4 RCFC full 7 CFF1 full 4 RCFF full 7

-286- (VYV1 and VYYI) .and Experiment 4 (CFC1 and CFF1), by virtue of

their being the most archetypal of the full range goodness

experiments performed. In both V-Y and C-F cases, predictions are made onto low range, high range, and full range experiments. Predictions

are only made within letter pair, that is from one V or Y file

to another and from one C or F file to another. Predictions

are also only made from a curve involving one letter of a

letter pair to another curve involving the same letter. We do not make predictions from a LEG file to a non-LEG file, for

example from V to Y as in VYV1 to VYY2. The prediction set 'of Table 10.1 will be used to compare three different RF model variants. The first set of predic- tions, designated SET A, will be done using the simple RF model in the fashion of Parducci and Perrett [157]. These predictions will involve no parameter optimization and will be done with a constant range frequency weighting function,

W(x) = 0.55. The second set of predictions, designated SET B, will be optimized with regard to W(x) function and range fraction.

The W(x) optimization will be over all the V-Y prediction pairs involved, and then used to generated predictions for the

C-F cases. Again the procedures of the last chapter will be followed.

-287- Our third set of predictions, designated SET C, will

involve the simplest procedure, in that it will assume that

the range factor is totally dominant, that is W(x) = 1.0. The

purpose here is to see how much degradation from the optimum is experienced by making this simplifying assumption.

We will now examine each of the above sets of predictions in detail. We will note the points of the model peculiar to each set and discuss implications. The results of each pred- iction will be graphically compared with the actual empirical data. We will then discuss the individual fits.

10.2 SET A PREDICTIONS

10.2.1 INTRODUCTION

This section describes the first of several sets of pred- ictions, SET A. This set of predictions will be obtained-in a relatively naive manner, adopting as much as possible from the simple RF model described by Parducci and Perrett [157]. .In their studies they did no prediction from full to partial range experiments, while this is an important part of our current effort. The basic procedure for prediction was explained in the previous chapter. With this set of predic- tions, no parameter optimization will be applied. The range

-288- fraction, f, in predicting from full to partial ranges, will be that value derived by taking the ratio of the scale func- tions corresponding to the respective physical ranges.

The weighting function in this case will be flat with a value of W(x) = 0.55. This W(x) is illustrated below in Fig- ure 10.1. We recall that the value, W = 0.55, was the weight- ing cons.tant in the early work on range frequency [147].

A summary of the results for SET A is presented in Table

10.2 below. This table first lists the predictor and predicted files, the range of the predicted file and the range fraction which was used in making the prediction. The next four columns of the table are devoted to the parameters of fit between the empirical and predicted results. The sum squared error, E, gives a general idea of the fit. Comparison of the sum squared error between different ranges should be done with caution since different numbers of points and different regions of k/L may be involved. The correlation coefficient, r, slope, a', and offset, b', provide another measure of fit between predicted and empirical curve. They are derived from the following least squares line, the regression of predicted on empirical results.

(10.1) G(x) = a' G2x) + b'

-289- In this case, a low error term (E), high correlation (r), slope (a') close to 1.0, and offset (b') close to 0.0, are desirable.

The other important measure of model worth is provided by the graphs of predictions and empirical data together. For the V-Y case, prediction graphs are provided in Figures

10.2a-j, while those for the C-F case are shown in Figures

10.3a-f. These graphs match and are in the same order as the results in Table 10.2. The corresponding figure numbers for each individual prediction are given in the last column of the table.

These graphs were computer generated using the pre.diction and plotting programs described in the last chapter. When interpreting the prediction graphs, bear in mind that the actual empirical data for a particular file is shown. by open circles with +1o9 error bars (one standard deviation of the mean goodness value) and a solid interconnecting line between points on, the curve. The RF model predictions are shown by solid squares connected with closely dotted lines. The nota- tion GVYY2 indicates empirical goodness data from the file

VYY2, while on the same graph, Gyy, indicates the predicted goodness curve for the file VYY2 predicted from the file VYY1.

The table and figure format for the upcoming sets of predic- tions will be similar to the format in this section, and for this reason we have explained the format in detail here.

-290- ...... : , ......

0.9 ......

0.8 ......

...... SET A

0.6 ......

cc t D_ ......

...... I ......

X_ c q ...... I-A

0 .2 ......

......

T------T ------.05 .15 .20 .23 .30 .35 .40 .45 .50

W = 0.55 L. E. G RAT 10 K L = 0.45 b = 0.50 c 0 d 0 e 0

FIGURE 10.1: THE FLAT WEIGHTING FUNCTION FOR-SET Al W(x) 0.55

-291- TABLE 10.2

SET A PREDICTIONS

W(x) = 0.55 SCALE SUM RANGE SQUARED CORR. FIG. PREDICTION FRAC. ERROR COEFF. SLOPE OFFSET NO. FROM. TO RANGE f E r a b' #

VYV1 VYV2 1ow 0.56 11.20 964 1.06 0.48 10. 2a

vYY1 V YY2 o 0.59 4.11 .970 1.11 -0.50 10. 2b

VYV1 VYV3 high 0.44 82.81 .980 2.11 -0.49 10. 2c

vYY1 VYY3 high 0.41 31.44 .988 1.49 -4.33 10. 2d

VYVI RVYV full 1.00 3.93 .992 0.96 0.59 10. 2e

VYY1. RVYY full 1.00 7.37 .992 0.89 0.06 10.2f

VYV1 TVYV full 1.00 0.93 .997 0.98 0.20 10.20 vYY1 TVYY full 1.00 1.55 .995 0.97 0.25 10. 2h

VYVI SvYV ful 1.00 9.10 .985 0.73 0.91 10. 2i vYY1 SvYY full 1.00 44.69 .987 0.72 -0.48 10. 2j

CFC1 CFC2 ow 0.61 13.61 .978 0.85 1.82 10. 3a CFF1 CFF2 1ow 0.44 4.69 .964 1.13 -0.51 10. 3b

CFCI CFC3 high 0.50 69.13 .988 1.49 -4.33 10. 3c

CFFi CFF3 high 0.66 15.60 .986 1.34 -0.49 10. 3d

CFC1 RCFC full 1.00 27.50 .980 0.89 1.87 10. 3e CFF1 RCFF ful 1.00 6.14 .987 0.90 0.06 10.3f

-292- 10 ~......

9 - -%

111

6 '0-v I a-

C) fr

4-- .. ~1 .

2-

0------r ------.05 . .5 .20 .23 6.3 .35 .40 .45 .56

W'= 0.55 L.EG FsGR TIA t C(K/L ) f 0.56

10

9 -

7-- VYYA I ...... t..... i.. .. G.. 6-- ...... -...- crC)j b C) 0D - t * - - 4-

3- ..... - ...... -......

1- ...... -..

.0a .0 .16 .15 .20 .25 .30 .35 .40 .45 .56 W =0. 55 LEG RATIO K/L) f=0.59

FIGURE 10.2: GRAPHS OF PREDICTED AND EMPIRICAL V-Y GOODNESS CURVES FOR SET A.

-293- ......

......

...... VYV3 VNI Di ...... 0 .0 ......

......

.2 .2 S .40 .45 .50

W = 0. 55 LEG RATIO (K/L) f = 0. 44

......

G ...... V.. Y3 CO G co ...... V I Lij 0 ...... 0 CD ...... o ......

jo -T .20 .25 .30 .35 .40 045

W= 0. 55 LEG RATI 0 (K/L) f = 0. 41

-294- ......

...... RVYV.

co ...... lij 0 ...... D L9 -1......

......

......

......

r------i------.05 .10 .15 .20 .25 .30 .35 .40 .45 .50

W 0. 55 L. E C RA I J K/ L f 1.00

......

......

......

...... RVYY. U) M vqyl lij ...... 0 0 L9 ......

......

------w ------F--- T I I - .1 .0-3 .40 .45 - .50

W 0.55' E G RA T 10 f 1.00

-295- ......

I VYV co ci ...... VW I

0 1 4 . I ......

LO ......

......

......

......

------Ica .20 .30 .35 .01 .45 Skl

W 0. 55 G RA T 10 f 1.00

...... I ......

...... J, ...... 1,vyy VYYI lij ...... 0 0 L9 ...... A......

------r -- F------T------l ------T------F------T ------I------T------r------05 .10 15 .20 .23 .30 .35 .40 .45

- W = 0. 55 L. E.0 Cl RA T 10 K/ L ) f = 1.00

-296- ......

......

SVYV ......

Co VVI !jj

c D

...... 3 ......

......

......

...... ------r ------r ----- T ------I ------T---" .0-3 .0 .10 .15 .20 .2 5 .30 .35 .40 .45 .5o

W 0.55 K L f= 1.00

...... ------

......

...... SQYY

VIM V J 6 - ...... i .. , ......

:...... 0 5 - ......

* ...... * ...... 4 ......

...... I ......

...... 2 ...... "Ar

......

......

T r ------T ------

.0? .03 .10 .15 .20 .23 .30 .35 .40 .45 .50

W = 0.55, F -If E. ' L 'P I K L f = 1.00

-297- 10 -

6- CRC2 tG Cf) cC I ,JJ 0 a L ...... +...... 5.. . .. C)

2-

1

.0S 10 .1s .20 .25 30 .35 .40 .45 .50

W 0.55 LEG RflTIU (UK/L) 1.00

9-

...... 11I...... &.iG ...... CJ) 0 - p1 . lii

-) +...... b ...... C) 4-

.-. . .rm...... r----v

r T t ~ - -i------.OS .es .10 .15 .20 .25 . 1 .35 .40 .4s .so

W = 0.55 LEG F"rATIO C(K1/ L) f = 0.44

FIGURE 10.3: GRAPHS OF PREDICTED AND EMPIRICAL C-F GOODNESS CURVES FOR SET A.

-298- ......

let.-

......

...... CPC3

V CFCI Lli

...... 0 ... 0 ......

1 4 ......

.05 Ao .15 .20 .25 -3t 5, .40 .45 .50

W= 0. 55 LEG RATIO (K/L) f = 0. 50

......

......

......

G CrF3

...... CrF. I

......

......

......

......

16 I 1 , - - I I .0a .10 .15 .20 .25 .4.5 .50 W = 0. 55 LEG RATIO K L f = 0. 66

-299- ......

......

...... PCFC cov lij Z 0 ...... 0 LU

......

......

......

ED 5 .10 .15 .20 .23 .30 .35 .40 .45 a. W = 0. 55 L. E. G R i TI 0 1 L f = 1-00 ...... 10 ...... 9 ......

8 ......

'FF ...... LO C'Fl ljj 0 ...... 0 ...... L9 4 ......

...... 3

...... 2 ......

......

------I ... --- T I------T------F------T ------I------T------I------T .20 .25 .30 .35 -.40 .45 S 0 W = 0. 55 L. E. G Rn T 10 K / L f = 1.00

-300- le......

......

......

...... G ...... R. V Y Y

...... vyyl Lij

......

Uri ...... *......

3 ......

2 ......

......

'T F-1 5 .10 .15 .20 .25 .30 .40 .45

W = 0.55 LEG RATILI K L f = 0. 83

FIGURE 10.4': IMRPOYED RVYV PREDICTION AS'SUMIN-G RAIN41GE PRACTION5 f = 0.83

-3ol- ......

...... 7 777-i*>

...... Y yl

cr) 6 ...... P ...... lij OPP YYYI CD C-) C-1 ......

3 ......

2 ......

...... pr, Er' ------.05 .10 15 .20 .25 .30 .35 .40 .45 .50 W 0.55 L. E"C** R F I K L

FI GU RE 10. 5: RF COMPROMISE FOR VYY1 ASSUMING W(x) o.55.

-302- 10.2.2 DISCUSSION OF SET A RESULTS

Now we will discuss the results presented in Table 10.2 and Figures 10.2 and 10.3, commenting on the interesting points of each of the predictions. The results for the low range predictions for both V-Y and C-F cases appear riot to be too far off from the empirical results, particularly for the VYY2 (Figure 10.2b) and CFF2 (Figure 10.3b) cases. For both

VYV2 (Figure 10.2a) and CFC2 (Figure 10.3b), there appears to be a slight but consistent elevation of the predicted above the empirical results. There does appear to be a consistency across letter pair. Both V-Y and C-F predictions deviate from the empirical results in approximately the same manner, in the corresponding LEG and non-LEG rating cases. This is also reflected in the closely matching error, E, in the correspond- ing V-Y and C-F files.

For the high range cases (Figures 10.2c-d and 10.3c-d), the predictions are not nearly as good. In these cases the predicted curves appear to traverse almost the entire range of possible goodness values. A limiter, described earlier, was used on the inferred range curve for these cases lest the predicted goodness curves, like the inferred range curve,

-303- extend outside the allowable range (0-10). This means that

somewhere the model is in error, indicated by the above viola- tion of principles.

The large deviation of the predicted goodness curve is due to its derivation from an inappropriate inferred range

function for the predicted experiment. It appears that this

situatuion is due to multiplying the predictor range curve by

too high a factor in making the range transformation. Alter-

natively, it can be stated that the range fraction was too

low. This may be due to the relatively naive and potentially

inaccurate manner in which it was derived. The high range

difficulty in the VYY3 (Figure 10.2d) and CFF3 (Figure 10.3d) cases is most likely due to range extension towards V, with

these cases equivalently becoming full range.experiments. If

this were true, then the range fraction should be higher than

those obtained here. Another possible source of difficulty might be the apparent peaking of the empirical high range

curves at a goodness value less than 10 (at least in the VYY3 case). This will be dealt with more thoroughly in the com- ments on the next set of predictions.

The full range curves (Figures 10.2e-i and 10.3e-f)

present a much better picture. Here, of course, we assume a range fraction of 1.0, since the predictor and predicted curves have the same physical range. This might be different for subjects using different archetypes and for different experiments.

-304- In the RVYV and RVYY cases (Figures 10.2e-f), the predic- tions are within error bars except at the high (with respect to k/L) end. The RVYY case affords a good example of the improvement which can result by changing the assumed range fraction. It turns out that, in this case, a range fraction of 0.83 gives the optimal fit. The improved prediction, shown in Figure 10.4, is quite dramatic. This result implies a smaller psychological range for the RVYY case than for the VYY1 case. This certainly seems a possibility after comparing the two goodness curves, Figures 8.5 and 8.11. Referring to

Table 8.3 and these figures, the archetype Y for RVYY (approx- imately k/L = 0.40) appears to be slightly lower than for VYY1

(approximately k/L = 0.43). We should also note that Experi- ment 7 (RVYV-RVYY) was done with a different paradigm than the

VYV1-VYY1 experiment, in that many different letter pairs were intermixed. Thus some interaction between these letter pairs might be expected to have some effect.

We will not dwell too much on an explanation of discrepancy here, but will defer that until the next set of predictions, where rests our main focus of attention. Turning to the full range SET A C-F case, we see that the RCFC predic- tion (Figure 10.3e) gets the trend right, but the prediction is displaced too high. The RCFF prediction (Figure 10.3f) is good at the low end and displaced at the high end, but again has the correct trend.

-305- The matches.for TVYV and TVYY (Figures 10.2g-h) are rela- tively good. This experiment (Experiment 8) was run under very similar conditions to the VYV1-VYY1 experiment (Experi- ment 1), although it involved many judgments per stimulus from three subjects versus only one judjment per stimulus from a larger number of subjects. The stimulus sets were different and the low error and high correlation obtained here are encouraging.

The predictions for SVYV and SVYY (Figures 10.2i-j) are relatively poor. Here the predicted extent of goodness rat- ings fell short of those actually obtained. In this case the paradigms were much different, with this experiment (Experi- ment 9) using a different number of rating categories and a different rating procedure. Note that the vertical scale has been normalized to 10. The actual goodness curve (in fact using only ratings up to 5) was scaled by a factor of 2 for presentation here and to facilitate the prediction process.

Appropriate adjustments were made to the frequency function, since this was a six category case (0-5) instead of an eleven category (0-10) case.

In general, the predictions using W(x) = 0.55 are far from satisfactory. We reiterate that these predictions were generated in a naive manner under the RF approach, using no optimization. Some of our basic assumptions, such as the 0-10

-306- limitation on goodness values, were violated. Some other of our assumptions were also violated. For instance, Figure 10.5 illustrates the range frequency compromise for the VYY1 file with the W(x) = 0.55 assumption. We note in this case, at the low end, the nonmonotonic behavior of the inferred range curve. It is this range curve which was used in predicting the other V-Y LEGged letter rating experimental curves. Exam- ination of some of Parducci and Perrett's [157] RF graphs reveal a similar problem.

Factors such as those above lead us to reject as a viable model this naive RF approach. The next section, dealing with the SET B predictions, will try to overcome some of the shortcomings cited here.

10.3 SET B - OPTIMIZED PREDICTIONS

10.3.1 INTRODUCTION

This set of predictions, which we consider our main focus of attention, was generated using the RF model allowing for a non-constant W(x) function and an optimized range fraction.

This procedure allows us to get almost the best predictions available with the general formulation of the RF model we have been using. The optimization methods described in the last chapter will find their application here in SET B.

-307- First we will briefly review the optimization strategy.

Then we will present the results of optimizing the individual

file predictions. We will then arrive at an optimal W(x)

function for all files involving V-Y and then use this single

W(x) function to make all predictions for both V-Y and C-F

cases. The results of these predictions will be presented and

analyzed in a manner similar to that used for SET A.

10.3.2 OPTIMIZATION

This subsection will deal with the topic of optimization.

Here we will use the optimization procedure described in Sec-

tion 9.7 to arrive at an optimal V-Y W(x) function and optimal

range fractions for each prediction. We optimize for several

reasons. We want to see the best predictions the model can achieve. Optimizing the range fraction within each prediction

pair tells us something about the underlying scale reality.

Optimizing the W(x) function tells us something about the

underlying weighting function reality. The optimization pro- cedure is somewhat justified by the training-testing set way

in which we are doing it. We train on the V-Y data and test on the C-F data. We are not actually putting the, optimization routine to its maximum possible use. This is to say that we are not getting the best possible prediction for every predic-

-308- tion pair. We will maintain the integrity of the model by using only a single W(x) function to arrive at the prediction results for SET B.

Let us now briefly review the total plan for our optimi- zation. We will optimize the two factors in the model, the weighting function, W(x), which has five parameters,'and the range fraction, f. An individual tabulation (in computer out- put file form) was done for each of the prediction pairs by trying all possible values for the W(x) function within cer- tain limits. For each iteration through W(x), f was locally optimized by a hill climbing routine, using E as the error criterion. Thus for each prediction pair we accumulated a computer file containing a list of all the W(x) values, the optimal f for that set of W(x) parameters, and the sum squared error, E, associated with prediction using these parameters,

Coincidentally we could obtain the optimum W(x) and f for each of the predictions by looking through the individual lists for the minimum error. However our goal was to find a global optimum for all the V-Y prediction pairs. This was done by summing across all the individual computer files at each set of W(x) parameters and looking for the minimum total error.

We then used this optimal W(x) function to predict all the files in our SET B predictions, including C-F predictions.

With the above procedure, an optimal W(x) function was found. Its parameters were the following: a = 0.3, b = 0.5,

-309- C = 0.0, d = 0.0, and e 0.0. This W(x) function is graphed in Figure 10.6 below. As can be seen, at the low end the weighting function starts at W(0) = 1.0, and gradually dips to a mi.nimum of 0.7 at the high end of the range, k/L = 0.5. The main feature of this function is that it represents a rela- tively high value of W(x), indicating a somewhat stronger dependence on the range principle than that presumed in the early RF work. Nonetheless our values appear quite in line with the range of the values for W found in some of the later RF work [153].

This function indicates a strong range tendency at the low end. This agrees with our surmisal that one knows where one is within the range relatively better in the vicinity of the strong V anchor. This is also due to a Weber's law type effect of discriminability enhancement at .edges. There is no rise at the high end for this particular function as we may have postulated earlier. This may be due to the fact that the

Y archetype as an anchor stimulus is not as effective as V which is physically well defined.

Table 10.3 below lists the ten best W(x) functions for predicting the V-Y files. The main characteristic of this particular set of functions is their similarity in form to the optimal. All have approximately the same shape. The only slight exception is the W(x) function ranked 6, which exhibits a slight rise at the high end.

-310- 0.9

-~1 -USET B 11 7 - ......

cr

LL

.00 .05 .15 .20 .25 .30 .35 .40 .45 .50

W = OPT L E GFAflTI ( K/L a = 0.30 b = 0.50 cr0 d= 0 e=0

FIGURE 10.6: THE OPTIMAL W(x) FUNCTION FOR THE V-Y PREDICTIONS.

-31- TABLE 10.3

TEN BEST W(x) FUNCTIONS FOR V-Y PREDICTIONS

RANK TOTAL ERROR W(x) PARAMETERS

#f E a b c d

1 24.83 0.3 0.5 0.0 0.0 0.0

2 25.10 0.2 0.5 0.0 0.0 0.0

3 25.15 0.2 0.4 0.0 0.0 0.0

4 25.22 0.3 0.4 0.0 0.0 0.0

5 25.36 0.4 0.5 0.0 0.0 0.0

6 25.65 0.3 0.4 0.1 0.0 0.0

7 25.95 0.1 0.4 0.0 0.0 0.0

8 26.04 0.1 0.3 0.0 0.0 0.0

9 26.05 0.1 0.5 0.0 0.0 0.0

10 26.06 0.2 0.3 0.0 0.0 0.0

-312- Table 10.4 below compares the error obtained with predic-

tions done, using the overall optimal W(x) function, with

individual prediction optimums. Also given are the optimized

range fractions for the respective global and local optimiza-

tions. It can be seen that in most cases, the individual curve

optimization error, Em, is not far below that achieved for the overall V-Y optimum. The overall V-Y optimum was usually

highly ranked within the individual file optimizations. In.

fact most of the individual optimized W(x) functions are of

the same general shape as the overall optimum, that is, they

retain the basic characteristic of high value in low range

portion and lower value on the high range portion. It is also

interesting that the corresponding high range files VYY3 and

CFF3 had exactly the same optimal W(x) function, (a = 0.4,

b = 0.4, c =- 0.0, d = 0.0, e 0.2), as was the case for the

corresponding RVYY and RCFF full range files (a = 0.4,

b = 0.5, c = 0.0, d = 0.0, e = 0.3).

For comparison sake we also evaluated an overall optimal

W(x) function for the C-F files. This turned out to be the

following: a = 0.4, b = 0.4, c = 0.0, d = 0.0, and e = 0.0. This function matches fairly closely the optimum obtained for the V-Y files. By all the above evidence, we can have some

confidence that our optimal W(x) function, to be used in this

set of predictions, is a reasonable approximation to the

underlying weighting function reality.

-313- TABLE 10.4

COMPARISON OF INDIVIDUAL FILE OPTIMUM RESULTS WITH OVERALL V-Y OPTIMUM RESULTS

OPT. SUM MIN. RANGE SQUARED MIN. ERROR PREDICTION FRAC. ERROR ERROR FRAC.

FROM TO RANGE f E Em fm

VYV1 VYV2 1ow 0.54 3.62 3.60 0.54

VYY1 VYY2 Iow 0.64 1.36 1.15 0.64

VYV1 VYV3 high 1.00 1.00 0.80 1.00

VYY1 VYY3 high 0.72 3.41 2.63 0.96

VYVI RVYV full 0.98 3.23 3.23 1.00

VYY1 RVYY ful I 0.90 4.67 1.94 0.74

VYVI TVYV ful I 0..94 1.66 0.92 1.00

VYY1 TVYY ful 1.00 3.38 1.21 0.96

VYVi SVYV ful I 0.88 0.51 0.39 0.88

VYY1 SVYY ful I 0.76 0.84 0.74 0.80

CFC1 CFC2 Iow 0.48 1.85 1.74 1.00

CFF1 CFF2 Iow 0.58 1.59 1.56 1.00

CFC1 CFC3 high 0.98 1.24 0.91 1.00 CFFI CFF3 high 0.82 10.49 7.26 1.00

CFC1 RCFC ful I 1.00 22.59 22.50 1.00

CFF1 RCFF ful I 0.92 6.92 3.49 0.74

-314- The next subsection will deal with the predictions made using the optimal V-Y W(x) function. There we will comment on the individual, fits and the optimized range fractions obtained.

10.3.3 PREDICTION RESULTS

This SET B of predictions, which we consider the main set., was generated with the RF model using the optimal W(x) function for the V-Y predictions, while the range fraction, f, was optimized for the best fit.

Table 10.5 below presents the results and the fits obtained, in a manner similar to that for. SET A. An exception is that the range fraction used, f, is the optimum value, not the scale derived value as in SET A. This f value was obtained by a hill climbing technique.described earlier in

Section 9.7. These f values may give us more insight into the underlying scale reality. However, for comparison, the scale derived values .used in SET A.are presented -in the ninth column of the table. Following this table, in Figures 10.7a-j and

10.8a-f, are the- corresponding plots of predictions compared with empirical results. The corresponding figure lettering is the same in all of our prediction sets.

-315- TABLE 10.5 SET B PREDICTIONS

W(x) (V-Y OPTIMUM) OPT. SUM L.S. SCALE RANGE SQUARED CORR.* L. S. L.S. RANGE FIG. PREDICTION FRAC. ERROR COEFF. SLOPE OFFSET FRAC. NO.

FROM TO RANGE f E r a b f' #

VYVI VYV2 low 0.54 3.62 .984 1.13 -0.44 0.56 10. 7a vYY1 VYY2 Iow 0.64 1.36 .986 1.02 -0.14 0.59 10. 7b

VYV1 VYV3 high 1.00 1.00 .987 1.07 -0.03 0.44 10.7c vYY1 VYY3 high 0.72 3.41 .975 1.06 -0.14 0.41 10.7d vYV1 RVYV full 0.98 3.23 .993 0.91 0.66 1.00. 10.7e vYy1 RVYY full 0.90 4.67 .990 0.89 0.72 1.00 10.7f

VYV1 TVYV full 0.94 1.66 .994 0.95 0.34 1.00 10. 7g V YY I TVYY full 1.00 3.38 .994 0.92 0.71 1.00 10.7h

VY1 SvYV full 0.88 0.51 .998 0.98 -0.04 1.00 10.71

VYY1 SVYY full 0.76 0.84 .996 0.97 0.23 1.00 10.7j

CFCI CFC2 1 ow 0.48 1.85 .-988 1.03 -0.13 0.61 10.8a

CFF1 CFF2 1 ow 0.58 1.59 .984 1.05 -0.21 0.44 10.8b

CFC1 CFC3 high 0.98 1.24 .991 0.93 0.31 0.50 10.8c

CFF1 CFF3 hi gh. 0.82 10.49 .978 1.17 -0.32 0.66 10.8d

CFC1 RCFC ful I 1.00 22.59 .980 0.82 1.93 1.00 10.8e

CFFI RCFF ful 1 0.92 6.92 .981. 0.87 0.68 1.00 10. 8f

-316- In -

9-

- -- U 1: -.. . 7-

V 1 6- It)I : . . a - 5- I' --

(5 4-

* -1-\-

1-

0- 5 .10 .15 .20 .25 .30 .35 40 45 50

W4= OPT LEG RATIO (K/L) f = 0.54

10

9 It. ell-

GY ...... GY1 Co b 0 0 22 I.....

.05 .10 .15 . .20 .25 .30 .35 .40 .45 .50 W= OPT LEG FRATIO (K /L ) f = 0.64

FIGURE 10.7: GRAPHS OF PREDICTED AND EMPIRICAL V-Y GOODNESS CURVES FOR SET B.

-317- V V3 ) vil L) CC

8 ...... +- ......

7,

6 -- .05 10 .15 .20 .25 .30 .35 .40 .45 5

W = OPT LEG RA4TIO ( K/ L) f .O

4 2 fv Y LO -- ' -...... IT/I. ... .VY-.. CD 10 I...... VY

1 1 1G 1 I

3.

1 I I .05 .10 .15 .20 .25 .30 .35 .40 .45 .56U

W = OPT LEG RATlD (K/L f=0.72

-318- 06*0 = I ( I / ) 0 11 *j 9 3'1 IdO = M

Gil - - -L I - I A, 13

* ......

......

t7

......

......

...... AAM

...... 6

......

86*0 = J ollud -C-11-1 IdO = M 0 t7 Sz, I;z* iz so" C,el3*

......

......

17

I A.),A

Of

......

...... TV'(V f-r YYVI CO ...... Lij

CD

......

......

.21l .25 .3 .4 0- .45

OPT LEG RATIO' (K/L) f 0.94

......

G ...... TVYY.

(D ...... VYYI. LLI ...... 0 0 co ......

......

......

-51- .05 .10 .15 .20 .2b .7 D .3 5 .4e .45 So

W OPT LEG -RHTIU K L f= 1.01

-320- -

9110 = I (I /N ) oilud 931 IdO = Ji

......

......

......

......

......

......

...... -

08'0 = 4 u-Ij H;j 3 1 Ido = N

GE, CIE* SI, 01' SO, W L- -i L- - t .- I ...... -- 13

G-)

TAA

AAAS

...... g .. , - e . -

7- CFC2

07) . .... CC) CF CI L

. ... , . . .

+ . . .. . e. e...... a. . p . , , , ...... 3 -..-.-.-.-.---.

...... 1.

CC-)- C.E .5 .10 15 .20 .25 .38 .35 .40 .45 .50 (f W =OPT 0 LEG RATIl (K/L)f0.4 CD 10

9

S

55F2 *...... IFF +...... + b

3 ...... +.+.... +.+...+......

2

*1

0 . 05 10 .15 .20 .25 .30 .3 .40 .45 .50 W = OPT LEG RAITI 0 (K/L) f = 0.s8

FIGURE 10.8: GRAPHS OF PREDICTED AND EMPIRICAL C-F GOODNESS CURVES FOR SET B.

-322- 10

9

7

...... CFC 3 Cr) hi S......

0

C9 4. I I I 2

1*

0- 00 15 .O15 .20 .25 .30 .35 .40 .459 .50

LEGHiIH (KL)W = OPT

10 -

9-

8-

7- 'G 3

C hri 6- A...... I d 0 C-9 4- ......

2-

.1 - '',......

0- T-I F 1 1. F -1 f 0 t 05 .10 .15 .20 .25 .30 .35 .40 .45 .50

W=OPT LEG FA^TI@ ( K/L ) f=0. 82

-323- Cl - h --,

(0 w I ...... 0 2- - - 1 ~ i I 0 .05 .10 .15 .20 .25 .30 .35 4 .45 .50

LEG R A TIO ( K / Lf) -W =1.00=OPT

10 -

9 *...... T ......

7 {...... RCFF LO Cf) f 0 0 tG L9 4,- 1..... 3 -

2 A 44--. -- ...... 1-

P-r-- - .05 .10 .15 .20 .25 .30 .35 .40 .45 .50

W = OPT LEG ROTIO K / L) .=.92

-324- The first point to be made with regard to this set of

results is that the predicted functions are, with few except-

ions, fairly accurate representations of the empirical data.

We note that the sum squared error terms for the Set B files

are substantially less than those for Set A. We will discuss the low, high and full range predictions separately in the following three subsections.

10.3.4 LOW RANGE PREDICTIONS

In these cases, both for VYV2, VYY2 (Figures 10.7a-b) and

for CFC2, CFF2 (Figures 10.8a-b), the predictions seem to fol-

low the resul-ts quite well, particularly for VYY2 and CFC2.

With very few exceptions, the predicted values are within the

empirical error bars. In this low range, the W(x) values are all above 0.90.

We notice several things about the optimal range frac-

tions obtained in these various cases. First, concerning the

low range predictions (VYY2, f=0.64; VYV2, f=0.54; CFF2,

f=0.59; CFC2, f=0.48), we see that the optimum range fraction

for ratings as the LEGged letter (Y or F) are higher by about the same amount (0.10, 0.11) in both the V-Y and C-F cases.

This says that, even though physically there is the same range

fraction in rating characters as the letter with LEG as without LEG, psychologically they are not the same.

-325- Again we will rely on the anchor argument to explain this

phenomenon. For example, in the VYY2 experiment, the arche-

type Y is not included in the judged set of stimuli, causing.a

psychological extension of the range toward the missing arche- type in the manner of Parducci's hypothesis discussed earlier

in Section 9.6. Under normal circumstances, if we did not

have a remote out of range archetype, we might even expect a range fraction corresponding more closely with the physical

range. However, due to the characters being rated as to how

well they represent the missing archetype, the effective range

extends toward the archetype Y. In the case of VYV2 however, the archetype is included at the low end of the stimulus set.

Thus this effect does not occur (except to the extent that the

archetype Y may still be considered part of the set,

representing a zero rating).

The above explanation may account for the difference

between VYV2 and VYY2 and between CFC2 and CFF2 results. A question still remains about the cross letter'pair difference

between VYV2 (f=0.54) and CFC2 (f=0.48) and between VYY2

(f=0.64) and CFF2 (f=0.58). Again we note that the differ-

ences are of the same order of magnitude (0.06, 0.05). We saw before that the actual physical range for the C-F low range case was k/L = 0 to 0.19, actually slightly higher than the 0 to 0.17 range for the V-Y low range case. This might lead us

-326- to expect that the range fraction in the C-F case would be higher than in the V-Y case, not lower as was actually. obtained. At this point we recall that from Table 8.3 ear- lier, we concluded that the archetype F was somewhat higher in k/L value than the archetype Y as outlined in Section 8.10 (dealing. with the constancy of ambiguous/archetype ratio across letter pair). The higher archetype for C-F provides a rationale for the behavior. The effective range for C-F is larger than for V-Y due to this higher F archetype. Approximately the same low range of stimuli for both the V-Y and C-F cases covers a lesser fraction of the distance to the archetype for C-F than it does for V-Y. This is then reflected in the slightly higher range fractions for the V-Y predictions.

Even though the predictions were done with a W(x) optim- ized for V-Y, the C-F results are quite satisfactory. There seem to be no outstanding violations of the .RF model assump- tions. This completes our discussion of the low range results.

-327- 10.3.5 HIGH RANGE PREDICTIONS

Turning to the case of the high range experiments we find that the predictions for both VYV3 (Figure 10.7c) and CFC3

(Figure 10.8c) are particularly good, without the high offsets observed in Set A. The fit for VYY3 (Figure 10.7d) is fairly good, with some vertical offset over most of the curve, while the CFF3 (Figure 10.8d) prediction is the poorest.

We find phenomena similar to those for the low range case probably occurring as far as the optimal range fractions are concerned. Let us first consider the predictions for VYV3 from VYV1 and that of CFC3 from CFC1. For both of these pre- diction cases the optimal range fraction is close to 1.0. This indicates that, for all practical purposes, ratings for these files were made on the same basis as those for a full range experiment. This is explainable in terms of an unrated anchor at k/L = 0, a V or C archetype. The stimulus at k/L = 0 was part of the psychological stimulus set, particularly by way of its implicit use in the rating scale definition. This has the effect of extending the psychological range to that of a full range experiment.

Next we consider the results from the high range VYY3 file (Figure 10.7d). For this case, in contrast to the CFF3 file, the model provides a fairly good predictor. However, we see tha.t the optimal range fraction in this case is 0.72, con- siderably less than the 1.00 found with its sister file, VYV3.

-328- In this case we do expect that the range extension effect toward V would not be as strong, since in this task only rat- ings as Y are called for. There may still be some lower extension of the psychological range, since V could act as a distant anchor, with an almost defined rating as Y equal to 0. That range extension of some sort is taking place, is indicated by the 0.72 range fraction for VYY3 and the 0.64 fraction for VYY2. Were there no range extension, we would expect these values to add up to 1.0, since the high and the low ranges meet at the same physical value. In this case, if we knew that the VYV3 file took up 72% of the full range, this would leave 29% of the full psychological range left for a low range stimulus distribution.

Another interesting result is the failure to register a goodness rating of near 10 in the upper range regions in either the VYY3 file (maximum goodness = 9.0) or VYY1 file

(maximum goodness = 9.3). This similarity may account for the relatively better match of the VYY3 prediction from VYY1 than for the CFF3 prediction from CFF1. The archetype Y, for some reason, may not elicit a rating of 10 in this case as expected. This may be due to a conception of a truly global archetype as a character with a straight upright LEG as in

"Y", rather than a slanted LEG as in our common lower case

"y". Still another possibility is that the angle used here

(42 ) was not the subjects' ideal angle. Different subjects

-329- and groups of subjects may have different global archetypes.

This may account for the poor prediction in the CFF3 case, where the CFF3 subjects' archetype may be somewhat higher than for the CFFI subjects' archetype.

Overall these high range predictions were much improved over those used in SET A. Most of the improvement is attri- butable to the choice of range fraction in this set, particu- larly recognizing the fact that range extension of some sort is taking place.

10.3.6 FULL RANGE PREDICTIONS

In the full range cases (Figures 10.7e-j and Figures

10.8e-f) the predicted results are not too divergent from the empirical results, except in the case of the RCFC-RCFF predic- tion pair. In general the optimal range fraction is high, as it should be, for the non-LEG files, while usually lower for those cases with ratings as LEGged letters.

The RVYV fit (Figure 10.7e) appears to be fine except for some divergence at the bottom of the curve in the higher range. The .empirical curve bottoms out early (k/L = 0.30), giving justification to a lower Y archetype for RVYY than for

VYY1. This may be born out by the best fitting range fraction of 0.90 for RVYY (recall the the argument for the lower f value in the SET A discussion). This says that even though

-330- RVYY and VYY1 have the same physical range, the functional range for RVYY is only 90% of that for VYYI. The RVYY fit

(Figure 10.7f) is good generally except in the low range area where some divergence occurs. The fit for TVYV (Figure 10.7g) is fairly good, but with some consistent discrepancy over much of the range. The optimal f in this case was 0.94, close to the 1.0 we would expect. With TVYY (Figure 10.7h) the prediction is not quite as good, but follows the trend well over the entire range. An optimal f of 1.00 was found for this file using the optimiza- tion procedure. However this procedure limited f to values no greater than 1.00. It turns out that an f higher than 1.00 would yield an even better match, which would imply that the psychological range may be slightly higher for TVYY than for

VYYI. Again this may be an archetype phenomenon, since the archetype for TVYY does appear to be higher than for VYY1

(recalling the results of Chapter 8), as indicated by the TVYY curve peaking close to k/L = 0.5. The predictions onto the six category Experiment 9 are quite good, for both the SVYV (Figure 10.7i) and SVYY (Figure

10.7j). Predictions are within empirical error bars and match the empirical curves very closely. This is particularly encouraging since the stimulus sets ere so different, with the spacing much farther apart in Experiment 9. Recall that the paradigms used were quite different as were the number of

-331- categories. The inferred range fractions in these cases were

somewhat low however, indicating a lower archetype for this

experiment. This could very well be the case, since there were actually several concurrent continua of V-Y (with some angles different than 420) being run in the experiment. Finally, the RCFC~and RCFF (Figures 10.8e-f) predictions appear to be the worst of the full range cases. The RCFC prediction appears regularly higher than the empirical case. This seems to be due to the empirical CFC1 curve (Figure 8.7) not fully bottoming out at the high end of the range. This effectually ties any reasonable prediction endpoint to this higher offset goodness value. It seems likely that the RCFC psychological range is probably somewhat smaller than the CFC1 range. The goodness curves seem to indicate this, since the archetype F appears to be lower in Experiment 7. The RCFF case really appears to show a steep goodness change in midrange. The empirical data is also nonmonotonic. The pre- diction mimics neither of these features. Perhaps this is a case where the W(x) function was quite different for the two experiments. This should not be wholly unexpected since the subjects and paradigms were somewhat different.

This concludes our look at the full range predictions using the optimal W(x) for V-Y. With only the full range C-F exception,. they were fairly good. Next we will overview all the SETB predictions.

-332- 10.3.7 GENERAL COMMENTS

The results obtained in SET B were generally quite satis- factory. They did not seriously violate the RF assumptions, as did some of the SET A predictions. Looking at the implied range curve for VYY1 in Figure 10.9 below, using our current optimal W(x) function, we see that it is indeed monotonic and relatively smooth. When plotted on a CSF scale instead of the k/L scale it does approximate a straight line function except when it flattens at the high end of the range. This flatten- ing is behavior which we might expect of a range curve. Past a certain point in k/L, psychologically, stimuli considered as Y are pretty much the same. The range function behavior seen above is typical for the other file range curves. The major problem with this approach is that of determin- ing the optimal range fraction in an actual predictive situa- tion. The means we have tried here, that of inferring the range fraction from the scale values, appears to be an inade- quate predictor of the optimal range fraction. Certainly, a knowledge of the underlying scale for some baseline distribu- tion is a necessity. This is true in light of the different behavior we have seen for differences between both low and high range cases, and between ratings as LEGged and unLEGged letters.

In this set of predictions we did learn something of the underlying weighting function. We also learned something

-333- .l ......

Lii E

-T -- T 0 03 .15 .10 .5 ./ 2, 30 .5 .0 4 5

4 x

-34- about the actual range fractions in the various predictions.

Using our optimized W(x) function we were able to make a fairly good set of predictions, not only for the V-Y files from which W(x) was derived but also for the C-F files which served as a test set. We were successful in using the~test set. idea in that the C-F results using the V-Y optimized W(x) function were much improved over the naive approach of SET A.

This is true in spite of the fact that several of the predic- tions for C-F did not provide a particularly close match. The results obtained were excellent however, in comparison with the optimal match which could have been obtained under the circumstances. In light of this, the optimization was justi- fied and the results can be considered successful.

The success with both limited and full range curve pred- ictions demonstrates the versatility of the model. In light of the results obtained in this set of predictions, the RF model seems to be somewhat justified, although there are a number of precautions which must be observed in using it.

-335- 10.4 SET C PREDICTIONS

We present here the results of making predicti.ons with

the W(x) function set equal to 1.0 along the whole range. In this situation, frequency effects are presumed to be minimal and the range effect is totally dominant. This is actually a

very simple case to analyze since the range function is equal

to the goodness curve. For different ranges, the predicted goodness curve is just a linear function of the predictor

goodness curve. Again, here the main problem is deciding on

the coefficients of the linear transform or equivalently the

range fraction. The W(x) function for SET C is illustrated

below in Figure 10.10.

Much of Parducci and colleagues' results seem to indicate

a model where range is at least a larger effect than fre-

quency, particularly for naive subjects. These results were

obtained from situations where subjects were not necessarily

accustomed to making the type of judgments called for in the

experiments. The range in their experiments was not usually

something naturally defined except within the experimental context.

In our case however, subjects are quite familiar with the type of stimuli used, and probably have a natural range of possibilities for rating these stimuli. Given such familiar- ity with letters, it would not be surprising that subjects could place intermediate stimuli along the range with a fai.r

-336- degree of accuracy. Thus they may be less prone to be swayed by frequency or spacing effects. The implication then is that, in this task, a W(x) close to 1.0 would not be unusual.

In fact, in our optimizations from SET B, the optimal W(x) function exhibited a strong range tendency, particularly at the low end of the range.

The same group of predictions as for SET A and SET B were performed. Table 10.6 below presents these results in a form similar to that for SET B. Following this, in Figures 10.11a-i (V-Y) and 10.12a-f (C-F), are the predicted results plotted along with the empirical results for all the cases under consideration.

Looking at Table 10.6 we see that, as in the results for

Set B, the error is generally low. The optimal range frac- tions are usually very close to those for Set B. Most of the discrepancy here was due to the fact that in the optimization done in SET B, the range fraction was quantized to 0.02 steps while here optimization was done in 0.91 steps. In general the fits for the SET C curves are also quite good, the excep- tions again being the CFC3, RCFC and RCFF files.

There is really little apparent difference between the B and C sets of results. The comments and analyses for the Set

B predictions pretty much hold the same for this set also.

Consequently there will be no detailed comment on the indiv- idual fits given here.

-337- 9 , ......

CD-

cc r 0.5-

D.0 4-p......

(A 0.3 - LUJ

.03 .05 .10 .15 .20 .25 .30 .35 .40 .45 .50

W = 1.00 LEG RATIO (K/L) a=0 b = 0.50 c=0 d= 0

FIGURE 10.10: WEIGHTING FUNCTION FOR SET C, W(x) = 1.0.

-338- TABLE 10.6 SET C PREDICTIONS

W(x) = 1.0

OPT. SUM L.S. SCALE RANGE SQUARED CORR. L.S. L.S. RANGE FIG. PREDICTION FRAC. ERROR COEFF. SLOPE OFFSET FRAC. NO.

FROM TO R ANGE f e r a bf' #

vYV1 VYV2 Iow 0.55 3.52 .985 1.12 -0.34 0.56 10.1 a

vYY1 VYY2 1ow 0.64 1.15 .988 1.01 -0.11 0.59 10. 11b

VYVI V Y V 3 high 0.94 1.18 .979 1.03 -0.17 0.44 10. 11c

vyY1 VYY3 high 0.67 4.04 .970 1.08 -0.30 0.41 10. 11d

VYV1 RVYV full 0.99 3.48 .993 0.89 0.70 1.00 10. le

vYY1 RVYY full 0.91 5.18 .988 0.89 0.74 1.00 10. 11f

vYVi TVYV ful 0.96 1.77 .994 0.94 0.35 1.00 10. 11g

vYY1 TVYY ful 1 1.03 3.36 .994 0.89 0.72 1.00 10. 11h

VYV1 SVYV full 0.91 1.26 .995 0.98 0.25 1.00 10. 11i

vYY1 SvYY full 0.90 0.98 .996 0.98 -0.03 1.00 10. 1lj

CFC1 CFC2 I ow 0.49 1.96 .988 1.04 -0.15 0.61 10. 12a

CFF1 CFF2 1 ow 0.59 1.54 .984 1.04. -0.20 0.44 10. 12b

CFC1 CFC3 high 0.91 1.70 .986 0.97 0.12 0.50 10. 12c

CFF1 CFF3 high 0.77 13.56 .972 1.21 -0.57 0.66 10. 12e

CFC1 RCFC full 1.00 22.95 .979 0.81 1.87 1.00 10. 12f CFF1 RCFF full 0.93 7.68 .979 0.86 0.73 1.00 10. 12g

-339- . .-......

8-v-. CO {\I C

0 cDi

1--

'.1 .105 .. 10 .15 .20 .25 .30 .35 .40 .45 50 W =1 .00 LEG RATIO (K/L) f=0.5

9-

8-

7- VYY2 (Y~) LUi I Y1 z b (71* 0 0

3 - 1- -- -[

2

Ci r-- ...... "------r. .0-3 .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 W=1.00 LEG RATIO (K/L)=064

FIGURE 10.11: GRAPHS OF PREDICTED AND EMPIRICAL V-Y GOODNESS CURVES FOR SET C. -340- 10 -

9-

8- ...... C. V3

7-

w 6-

3-

4-

3-.

2-

1-

0- .1 (U 1 5 .20 .254 .30 S5 .49 .50

0 LEG RrH'TIG (K/L)f= .9

10 -

9-

8- ......

7

6 G2

U) I I 4 5

4-

3 1......

2

1-

0- .8 13 .0 10 .5 .20 .25 .30 .35 .40 .45 .50

W = 1.00 LEG FflTIO K /L) f = 0. 67

-341- ...... - ......

......

......

G ...... l ... : ...... rlvyv

G LO VIVl Z'"w

...... 0n ...... 0

4 ......

2

......

0 I I 1 7- 1 1

.0-3 .. 05 .10 .15 .20 .25 .30 .35 .40 .45 .50

W 1.00

LEG RATIU K L f 0.99

10 ......

9 ......

a ......

G ...... V Y Y

G' VIYl

r4 ...... 0 0

(29 4 - ...... f ......

3 ......

2 ......

......

A-3 .05 .10 15 .20 .25 .31) .35 .40 .45 .50

W = 1.00

LEG PFITIO K/L f = 0. 91

-342- ......

......

T IYV

CO ...... vyv I w 7: cl E ...... I=f

......

2 ......

......

I Lf - F T f X 0 .15 .20 .25 .30 .35 .40 .45 .50

W = 1.00 LEG RATIO K L f = 0. 96

......

......

G ...... TVYY. G ...... VYYI Ld

C-4 ...... 0 0 LD ......

......

......

......

......

I ---- I---- T- I - . F- 1 -1 ------f- I I .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 W 1.00 LEG RATIO K / L f 1. 03

-343- ......

......

...... SVYV ...... YYV I LLJ

...... 0 0 C-9 ......

3 ......

2 ......

......

.2 C, .25 1310 3, 5 .40 .45 .50

W = 1.00 LEG RATIO (K/L) f = 0. 91

......

......

"...... G ...... SVYY G ...... VYYI. ui z ...... 0 0 L-q ......

T I --- V ------T .115 1 i .15 .25 .45 .56

W = 1.00 LEG ROTIO K / L f = 0. 90

-344- 10 -

9-

, aj ...... 2...2....3 40 5 5 2 CFF2 o LEG RAT...(K/.)...

.l .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 ------5------co I

......

.0 .05 10 15 .20 .25 .0 35 40 45 50

W 100 LEG PPTIO (K/L) f=0.5

FIGURE 10.12: GRAPHS OF PREDICTED AND EMPIRICAL C-F GOODNESS CURVES FOR SET C.

-345- 18*0 = M iuld '931 001 = M

oil 90, CIO,

------

......

......

to

......

Eiji

......

...... 13 1

L6'0 = I 00*t = M

S 0 E-Q*

43

...... t7

......

. d ...... Co

(Y)

...... u t ......

......

...... RCFC

Ll J ...... c5c 1 Lli C-i ......

......

1 .15 .20 .25 .40 .45 .50 LEG RPTIO K L fW = 1.1.00 00 ......

G RCFIF G ...... F F I

Ul ......

......

rl

.20 .25 Ac 5 .40 .45 .50

W = 1.00 LEG RATIO K L f = 0.93

-347- We note that most of the analysis before was couched in

terms of effective range and not really concerned so much with

frequency effects. It does indeed appear that range is the most important variable.

In our next section we will compare the results from these three sets of predictions and make some general conclu-

sions.

10.5 COMPARISON OF RESULTS

This section will comp.are the results obtained from using the methods of SETs A, B and C in making predictions of exper- iments under various conditions.

From Tables 10.2, 10..5 and 10.6, 'it appears that SETs B and C provide predictions much closer to the empirical data than SET A. The error terms from each of the sets are com- pared below in Table 10.7. As to the relative merits of the sets, we computed the average sum squared error, for each and compared them. This showed an average error of 4.25 for SET B and 4.71 for SET C. The corresponding error for SET A was

20.86. Considering only the V-Y files, the results for SETs B, C and A respectively were 2.33, 2.59 and 19.71. We see that in most cases the corresponding errors for SET B are less than those for SET A and SET C. There was also a similar ranking for average correlation coefficient, that for SET B

(.987)3, SET C (.985) and SET A (.981). From the above

-348- evidence we see that SET B seems to have a slight edge over SET C as to predicting the empi-rical curve.

Let us now turn to the evaluation of the goodness cross-

over. After all, our real practical goal is to learn where

the interletter boundary is, not necessarily the exact form of

the goodness curves. It is possible that even a large diver-

gence in. the form of the predicted curve may still result in

only a minor deviation in the curve crossover.

Table 10.8 below provides a comparison of the actual

empirical (and empirical smoothed curves) crossovers, with the

crossovers derived by plotting the appropriate pairs of our

predictions (e.g. VYV1 and VYY1) for SETs A, B and C. All crossovers are in terms of k/L, our LEG physical variable.

Generally all the predicted crossovers are not too far

from the empirical crossovers. The worst case difference is

only 0.054 (for VYV3-VYY3, SET A). Here again we computed the

average error for each of the three sets. Again SET B turned

out to be superior ([ = 0.011) with SET C slightly worse (E =

0.013) and SET A about twice as bad (E = 0.027) as SETs B and C.

Again this indicates a slight superiority of the SET B predictions and associated W(x) function. It is encouraging that the more easily implemented SET C results are not far behind.

Our next section will draw some conclusions from. the results of this chapter.

-349- TABLE 10.7

COMPARISON OF ERROR ACROSS SETS

PREDICTION SUM SQUARED ERROR

FROM TO RANGE SET A SET B SET C

VYVi VYV2 low 11.20 3.62 3.52 VYY1 VYY2 low 4.11 1.36 1.15 VYVI VYV3 high 82.81 1.00 1.18 VYY1 VYY3 high 31.44 3.41' 4.04 VYV1 RVYV full 3.93 3.23 3.48 VYY1 RVYY full 7.37 4.67 5.18 VYVI TVYV full 0.93 1.26 1.77 VYY1 TVYY full 1.55 3.38 3.36 VYVi SVYV full 9.10 0.51 1.26 VYY1 SVYY full 44.69' 0.84 0.98

CFC1 CFC2 low. 13.61 1.85 1.96 CFF1 CFF2 low 4.69 1.59. 1.54 CFC1 CFC3 high. 69.13 1.24 1.70 CFF1 CFF3 high 15.60 10.49 13.56 CFC1 RCFC full. 27.50 22.59 22.95 CFF1 RCFF full 6.14 6.92 7.68

AVERAGE ERROR

SET A SET B SET C

V-Y ONLY: 19.71 2.33 2.59 C-F ONLY: 22.78 7.45 8.23 OVERALL: 20.86 4.25 4.71

-350- TABLE .10.8 COMPARISON OF EMPIRICAL AND PREDICTED GOODNESS CROSSOVERS ACROSS SETS GOODNESS CROSSOVERS CURVES (k/L) INVOLVED EMPIRICAL PREDICTED

nonLEG LE G RANGE SET A SET B SET C

VYVI VYY1 ful 1 0.185.

VYV2 VYY2 1ow 0.106 0.130 0.117 0.117

VYV3 VYY3 high- 0.227 0.276 0.217 0.214 RVYV RVYY ful I 0.175 0.193 0.182 0.180 TVYV TVYY full 0.187 0.189 0.189 0.189 SVYV SVYY f u 1 0.153 0.207 0.157 0.172

CFF1 ful 1 0.227

CFC2 CFF2 Sow 0.145 0.149 0.138 0.137 CFC3 CFF3 high 0.288 0.325 0.267 0.265 RCFC RCFF full 0.200 0.230 0.225 0.222 (

Empirical Cross. - Predicted Cross. ) N I SET A SET B SET C

V-Y ONLY (N = 5) 0.029 0.007 0.010

C-F ONLY (N = 3) 0.023. 0.018 0.018

OVERALL (N = 8) 0.027 0.011 0.013

-351- 10.6 CONCLUSIONS

From the experiments and analyses which we have just

examined, there are several conclusions which seem to be indi-

cated.

Given that we know the goodness behavior for one experi- ment, the results of other experiments can be predicted. This

holds for experiments with different distributions and with different ranges.

A knowledge of the true psychological scale is necessary, including knowledge of all stimuli, overt and covert, in the

stimulus set. In the letter case particularly, unseen and unrated letters can still exert influe'nce (acting as anchors).

These must be taken into account for accurate predictions.

In the case of rating characters, effective psychological range is the most significant variable in determining a pre- diction. This was indicated by the extreme sensitivity of the prediction to the variation of the range fraction. Frequency does play a definite role, but range is by far the dominant factor.

The optimal W(x) function in the LEG case appears to be close to 1.0 at the low end of the range and then to taper off to a lower value (around 0.70) at the high end of the range.

This was indicated by the optimal W(x) found for all the V-Y predictions. The individual prediction optima and the C-F

-352- optimum also support this idea. This would agree with much

evidence in the literature on the greater discriminability

near edges.

For all practical purposes, the W(x) function can be assumed to be 1.0 without too much consequence in the predic- tion error. We saw above that the SET C results were on the same order as those obtained in a more optimal manner. This would mean that the frequency effects can in essence be ignored. We have total range principle dominance. In any practical application this would prove a great simplification in the analysis.

In a given experiment, the underlying scale and psycho- logical range may be different for ratings as one or the other of the two letters for which the continuum is drawn. There was ample evidence of this in the analyses above.

There may be some differences between different letter pairs involving the same attribute (such as V-Y and C-F).

Such differences are most likely due to differing archetypes.

Archetype letters for particular subjects or groups of sub- jects should be taken into consideration in estimating any boundaries. Table 8.3 illustrated that the archetypes could be somewhat different but still keep approximately the same ambiguous to archetype ratio.

-353- From the individual curves predicted, the interletter boundary can be obtained fairly accurately from the goodness crossover point.. Merely choosing the point on. a single good-

ness curve where G=5 is not sufficient. The boundary predic- tion will remain fairly stable even under conditions where the individual curve predictions have some variation. We saw in

the last section that even though there was considerable vari-

ation in the error for the different sets, the variation in goodness crossover point was not very great.

The boundary is relatively stable under most condit-ions

(spacing effects and probably frequency effects [211] as well), and is most prone to limited range effects. Under limited

range we saw great variation in the intereletter boundaries

in the data from Experiments 1-6.

These are some of the specific conclusions made as a result of examining the results of this chapter. Our next chapter will more generally discuss the results of this thesis. In addition there we will review the thesis and con- sider future extensions of the work.

-354- CHAPTER 11 SUMMARY AND FUTURE WORK

11.1 INTRODUCTION

This chapter will act as. a summary of the thesis work and a pointer to possible future work. First we will briefly review the content of the thesis. Next we will discuss some of the immediate conclusions which may be drawn from the results presented here, and then some more general conclu- sions. Finally we will examine some future extensions of the present work.

11.2 A BRIEF REVIEW

We started out by examining the role and importance of character recognition technology in our society, looking at the problems involved and the state of th.e art in of both machine and handprinted character recognition. We saw that contextual information is of great use to humans in the recog- nition of both machine and hand printed characters. The incorporation of contextual information, particularly a form known as graphical context, having to do with stylistic con- sistency within and between characters, might lead to better recognition than that currently achievable.

-355- A functional attribute theory of character recognition

was described. This theory incorporated graphical context as

a modulator of rules which transform between the physical and

psychological domain. The theory was based on ambiguous char- acters, which form the boundaries between letters. These

boundaries, we saw, could be found using a variety of method-

ologies which were described. One of these, the goodness

rating methodology, was used throughout the remainder of the

thesis -for testing graphical contextual effects. We reviewed

some of the results obtained with this theory. Among the

findings were that we could find the interletter boundary for

a letter pair, e.g., V-Y, involving LEG, and that different methodologies yielded essentially the same interletter boun- dary. We also saw some degree of consistency in the results from different letter pairs but involving a common attribute.

A third important result was the finding that the interletter boundaries were plastic in the sense that they would move around depending on the particular stimulus set in use. It is this behavior, a form of graphical contextual influence, that the remainder of the thesis was aimed at modeling.

The range frequency model was considered as a potentially useful prime candidate model for describing contextual effects on goodness curves. This model was first traced historically through the use of examples. Next we looked at the theory

-356- from a more mathematical viewpoint. We saw a method for

predicting from the results of one experiment to the results

of another experiment that had a different stimulus distribu-

tion. A formulation for predicting onto partial ranges was

developed.

Eight category rating experiments, spanning a variety of

stimulus ranges and stimulus distributions, were described.

These experiments provided a test of a graphical contextual model based on Range Frequency Theory. The test consisted of

predicting the results of one experiment from the results. of

another. The procedures for making such predictions via the model were described in detail.

Three variants of the model were tried and three sets of predictions made. The first, a naive traditional RF model, was found to be an inadequate predictor. However, using the next variant, where some of the parameters were optimized on

V-Y files and then tested on C-F files, excellent predictions were obtained. It was discovered that the range principle is dominant, particularly at the low end on the LEG scale. In fact, very good results were obtained in a set of predictions made just using the range principle. The results of each dif- ferent set of predictions were compared using a sum squared error metric.

-357- 11.3 CONCLUSIONS

This section will discuss some of the conclusions to be drawn from the results presented in this thesis. These will be somewhat more general than the specific conclusions made in Section 10.6.

Our main goal was to formulate a model for taking graphi- cal context into account in the character recognition process.

We examined the Range Frequency Theory as a possible basis for such a model and found that it was indeed a reasonable approach. We can conclude however that a blind naive applica- tion of the RF model leads to poor results. For adequate results.one needs to know something of the underlying scale as well as end anchors or archetypes.

Range appears to be the important variable as far as letter rating tasks are concerned. The effect of stimulus spacing appears to be a minor factor in comparison to the range effect. This makes the analysis relatively simple in terms of the PFR's. Essentially only the endpoints of the range determine the value of the threshold for the PFR. The stimulus spacing or relative frequencies need not be measured since apparently the PFR would stay fairly constant with changes in these variables.

Experiment 7 bolstered the idea of constant ambiguous to archetype ratio. The position of the archetype was shown to be important in making some of our predictions. Some

-358- archetypes apparently are more flexible than others. For instance, we saw that V with a well defined physical basis for

a V archetype, is well anchored, while the Y archetype is much

less stable. The range effects may very well depend on how

tied down the archetypes are. If they are really strongly

tied down then range effects may not have any effect. This is indicated by the fact for some of the high range experiments

that they were equivalently full range experiments as far as

the psychological range was concerned, despite the physical

range not being full.

The question arises as to how we would use the results

presented here in a practical manner. Given that we have a certain set of characters entering a machine, we would like to know the PFR for that particular input set. This goal seems workable along the lines of Figure 4.4 earlier, where we would use the formul.ation developed in this thesis in the box labeled graphical context analysis.

In short, the present methods have promise for the in- corporation of graphical context into character recognition machines. We have certainly not solved all the problems necessary to accomplish this here but we have made a start.

More research is needed and the next section takes up some possible areas of endeavor.

-359- 11.4 FUTURE WORK

Thi.s thesis represents just the beginning of an under-

standing of one aspect of contextual influence. This section

will outline-some of the possible future extensions of the work described herein.

In the realm of the functional attribute based theory of

character recognition, the current work has extended our

knowledge in the area of plasticity of interletter boundaries.

We have only dealt in, this thesis with the attribute LEG and a

limited number of letter pairs, V-Y and C-F. Certainly the

same type of methods used here should be tried with other

letter pairs involving the attribute LEG, as well as with

other attributes.

There is also the important question of dealing with com- binations of attributes. There are many interletter ambigui- ties which depend on several attributes in transition. We would like to know how these attributes interact with each other.

The work to date on the functional attribute theory has dealt only with uppercase letters. It would certainly be desirable to extend the work to numbers and perhaps to lower case printing as well. Numbers are often much more relevant than letters in many practical applications. Other important attributes may be possibly be discovered in this manner.

-360- Still another important facet is the general structure of a character recognizer based on a functional attribute theory of character recognition. If the theory is to have any relevance, it must be capable of being incorporated into a viable machine in some manner.

Turning to specific aspects of future extensions as regards range frequency theory, we find several possible areas of endeavor.

It would appear that we need a better way to estimate the range fraction in making predictions. This involves a better characterization of the nature of the range curve and the psychological scale.

Further work may also be needed in looking at frequency effects. Experiments could be designed directly to investi- gate this aspect of the theory in a better manner. More work needs to be done with full range predictions. There should be more study of the effects of relative frequency of occurrence of different stimuli as a follow up to some initial work in this area [211].

There is also the possibility of further investigation of the form of the W(x) function and its relation to the topic of anchor stimuli. There could. be different W(x) functions in different ranges. We failed to investigate any effects of this sort in the present study.

-361- There is also the question of the interaction of dif-

ferent letter pairs involving the same attribute. Most of the

present experiments have been confined to investigating a

single letter pair. It has been shown that moving the inter-

letter boundary for one letter pair involving LEG also moves

it for another letter pair involving the same-attribute [115].

Experiments along the same idea could be applied in the

present framework. This would involve experiments with dif-

ferent ranges and distributions with a mixture of letter

pairs. Some initial, work was done in this area [211] but more

is needed. This is critical for an assessment of graphical contextual influence in character recognition. We would like to know how the PFR's for a small number of distinct attri- butes change, rather than how the individual rules for a large number of possible interletter pairs change. There is the complication of different archetypes, as we saw in some of the experiments reported here.

Another possible area for investigation is that of time effects of contextual influence. If an interletter boundary is shifted, for how long does it stay shifted? Do boundaries shift away from some neutral context value and then decay back to that value eventually? In an earlier experiment by Kuklin- ski [115] there was some evidence for time order effects.

Labeling identification functions made on characters farther

-362- removed in time from a goodness adapting situation were not

shifted as much as those made closer in time to the adapting

session. Thus far the experiments performed have been

analyzed under the assumption of no time decay and have

presumed a relatively narrow experimental time span. Any actual handprint recognition system will have a definite sequence of characters coming in. Each new character should be considered in the graphical temporal context of those before it. It would be desirable to look into how an RF model could take the time variable into account to develop a "dynamic graphical contextual theory." Some hint of this approach was presented earlier in Section 4.3.

The experiments performed thus far have dealt with rela- tively structured characters varying along some well defined parameters such as k/L in the V-Y case. It may be informative to work with some more realistic characters in a closer to reality graphical contextual situation. For example, one could embed some test characters involving the attribute LEG in a set of relatively sloppy characters where lines were com- monly extended past intersections. Different distributions of extra hangoffs could be used and rating or identification methods could be utilized. The graphical context model should be able to make predictions in cases such as. these.

Further in the future it would be wise to look at how the type of graphical contextual model looked at here could be

-363- tied in with other methods of applying contextual analysis.

Perhaps a look at the speech literature would be helpful in this regard. What are the interactions with other levels and types of context?

The comparison with human performance is the ultimate test of the character recognition theory and associated models. If human response to a set of input characters turns out to be the same as that for th.e machine incorporating the theory, then the theory is validated. Due to limited time and resources, we have not been able to fully pursue all the interesting possibilities and open questions in this work. Perhaps some day we will see character recognition machines with much of the resourcefulness of the human being. This ability, we would hope, will be partially due to better graph- ical contextual analysis.

FINIS

-364- BIBLIOGRAPHY

[1] Ades, A.E., "Vowels, Consonants, Speech, and Non- speech," Psychological Review, Vol. 84, No. 6, Theoretical Note, pp. 524-530, 1977.

[2] Ali, F., and Pavlidis, T., "Syntactic Recognition of Handwritten Numerals," IEEE Transactions on Sys- tems, Man, and Cybernetics, Vol. SMC-7, pp. 537- 541, 1977. [3] Allnatt, J., "Opinion Distribution Model for Subjective Rating Studies," Int. J. Man-Machine Studies, Vol. 5, pp. 1-15, 1973.

[4] Allnatt, J.W., and Corbett, J.M., "Adaptation in Observers During Television Quality Grading Tests. I: Adaptation as a Function of the Conditioning Situation," Ergonomics, Vol. 15, pp. 353-366, 1972.

[5] Allnatt J.W., and Corbett, J.M., "Adaptation in Observers During Television Quality Grading Tests. II: Progress of Adaptation During the Experi- ment," Ergonomics, Vol. 15, pp. 491-504, 1972.

[6] American National Standards Institute, Character Set for Handprinting, 1974. [7] Anderson, J.A., Silverstein, J.W., Ritz, S.A., and Jones, R.S., "Distinctive Features,Categorical Perception, and Probability Learning: Some Appli- cations of a Neural Model," Psychological Review, Vol. 84, No. 5, pp. 413-451, 1977.

[8] Anderson, N.H., "Functional Measurement and Psychophy- sical Measurement," Psychological Review, Vol. 77, No. 3, pp. 153-170, 1970.

[9] Anderson, N.H., "Algebraic Models in Perception," Chapter 8 in Carterette, E.C., and Friedman, M.P., (Eds.), Handbook of Perception, Vol II. Psycho- physical Judgment and Measurement, New York: Academic Press, 1974.

-365- [10] Anderson, N.H., "On the Role of Context Effects in Psychophysical Judgment," Psychological Review, Vol. 82, No. 6, pp. 462-482, 1975.

[11] Anderson, N.H., "Note on Functional Measureme'nt and Data Analysis," Perception and Psychophysics, Vol. 21 (3), pp. 201-215, 1977. [12] Appley, M.H. (Ed.), Adaptation Level Theory, A Sympo- sium, New York: Academic Press, 1971.

[13] Apsey, R.S., "Human Factors of Constrained Handprint for OCR," IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-8, No. 4, 1978.

[14] Attneave, F., "A Method of Graded Dichotomies for the Scaling of Judgments," Psychological Review, Vol. 56, pp. 334-340, 1949. [15] Babcock, R.T., "Simulation Method of Feature .Selection for Unconstrained Handprinted Characters," S.M. Thesis, Dept. of Electrical Engineering and Com- puter Science, M.I.T., 1977. [16] Baty, Gordon, OCR Applications Sketchbook, Burlington, Mass.: Context Corporation, 1976.

[17] Berliner, J.E., and Durlach, N.I., Intensity Percep- tion..IV. Resolution in Roving-Level Discrimina- tion," Journal of the Acoustical Society of Amer- ica, Vol. 53, No. 5, pp. 1270-1287, 1973. [18] Berliner, J.E., Durlach, N.I., and Braida, L.D., "Intensity Perception. VII. Further Data on Roving-Level Discrimination and the Resolution and Bias Edge Effects," Journal of the Acoustical Society of America, Vol. 61, No. 6, pp. 1577-1585, 1977. [19] Berliner, J.E., Durlach, N.I., Braida, L.D.., Intensity Perception. IX. Effect of a Fixed Standard on Resolution in Identification," Journal of the Acoustical Society of America, Vol. 6(TT ,lTetter to editor, pp. 687--679, 1978.

[20] Birnbaum, M.H., "The Devil Rides Again: Correlation as an Index of Fit," Psychological Bulletin, Vol. 79, No. 4, pp. 239-242, 1973.

-366- [21] Birnbaum, M.H., Using Contextual Effects to Derive Psychophysical Scales," Perception and Psychophy- sics, Vol. 15, pp. 89-96, 1974. [22] Birnbaum, M.H., "Reply to the Devil's Advocates: Don't Confound Model Testing and Measurement," Psychological Bulletin, Vol. 81, No. 11, pp. 854- 859,1974.

[23] Birnbaum, M.H., and Veit, C.T., "Scale-free Tests of an Additive Model for the Size-Weight Illusion," Per- ception and Psychophysics, Vol. 16 (2), pp. 276- 282, 1974.

[24] Birnbaum, M.H., Parducci,.A.,.and Gifford, R.K., "Con- textual Effects in Information Integration," Jour- nal of Experimental Psychology, Vol. 58, No. 2, pp. 158-170, 1971. [25] Blesser, B., "Judging Vowel Quality of the Swedish /i/," STL-QPSR 4, pp. 39-42, 1969.

[26] Blesser, B., Kuklinski, T., and Shillman, R., "Empiri- cal Tests for Feature Selection Based on a Psycho- logical Theory of Character Recognition," Chapter 3 in Holt, A.W. (Ed.), Some Concepts of Character Recognition, Oxford: Pergamon Press, 1976. (also in Pattern Recognition, Vol. 8, No. 2, pp. 77-85, 1976.). [27] Blesser, B, and Peper, E., "The Relevance of Psychol- ogy to Character Recognition," R.L.E. Quarterly Progress Report, No. 97, Massachusetts Institute of Technology, p. 150, 1970. [28] Blesser, B., Shillman, R., Cox, C., Kuklinski, T. Eden, M., and Ventura, J., "Character Recognition Based on Phenomenological Attributes," Visible Language, Vol. 8, No. 3, pp. 209-223, 1973.

[29] Blesser, B., Shillman, R., Kuklinski, T., Cox, C., Eden, M., and Ventura, J., "A Theoretical Approach for Character Recognition Based on Phenomenologi- cal Attributes," International Journal of Man- Machine Studies, Vol. 6, pp. 701-714, 1974.7(Also presented at the First International Joint Confer- ence on Pattern Recognition, Washington, 1973.

-367- [301 Braida, L.D., and Durlach, N.1., "Intensity Perception. II. Resolution in One Interval Paradigms," Journal of the Acoustical Society of America, Vol. 51, No.2 (Part 2), pp. 483-502,1972.

[31] Braida, L., Chase, S., Colburn, H., Durlach, N., Houtsma, A., Lim, J., Rabinowitz, W., and Reed, C., "Intensity Perception and Loudness," M.I.T., Research Laboratory of Electronics Progress Report , No. 120, ppJT35-131, dan. 1978.

[32] British Computer Society, Character Recognition 1971, London, 1971.

[33] Broadbent, D.E., and Ladefoged, P., "Vowel Judgments and Adaptation Level," Proc. Royal Soc. Britain, Vol. 151, pp. 384-399, 1960.

[34] Bruning, J.L., and Krantz, B.L., Computational Handbook of Statistics, Glenview: Scot, Foresman and Co., 1968.

[35] Burstyn, H. Paris, "Reading Writing," M.I.T. Reports on Research, Vol. 4, No. 3, November 1976. [36] Bush, D.A., and Weaver, J.A., "OCR and Its Application to Documentation - A State of the Art Report," March 1976, NATO AGARD-OGRAPH-216, NTIS No. AD- A024 401.

[37] Carson, D.H., "Letter Constraints Within Words in Printed English," Kybernetik, Vol 1, No. 1, pp. 46-54, Jan. 1961. [38] Carterette, E.C., and Friedman, M.P., (Eds.),.Handbook of Perception, Vol II. Psychophysical Judgment and Measurement,.New York: Academic Press, 1974. [39] Caskey, D.L., and Coates , C.L., "Machine Recognition of Handprinted Characters," Information Systems Research Laboratory, University of Texas at Aus- tin, Technical Report No. 126, May 1, 1972.

[40] Cermak, G.W., "Performance in a Delayed Comparison Discrimination Task as a Function of Stimulus Interpretation," Perception and Psychophysics, Vol. 21 (1), pp. 69-76, 1977.

-368- [41] Cherry, C., On Human Communication, Cambridge, Mass.: MIT Press, 1966.

[42] Chodrow, M., "A Study of Handprinted Character Recogni- tion Techniques," Information Dynamics Corp., Reading, Mass., 1966.

[43] Chomsky, N., Aspects of the Theory of Syntax, Cam- bridge, Mass.: MIT Press, 1965. [44] Chow, Chun-Ling, "Effect of Line Width on Recognition of V's and Y's," B.S. Thesis, M.I.T., 1975, unpub- 1 ished.

[45] Cohen, Nathan E., -"The Relativity of Absolute Judg- ments," American Journal of Psychology, Vol. 49, pp. 93-100, 1937.

[46] Cooper, W.E., "Selective Adaptation to Speech," in Res- tle, F., Shiffrin, R., Castellan, N., Lindman, H., and Pisoni, D. (Eds.), Cognitive Theory, Potomac, Md.: Lawrence Earlbaum AssocYites, 1975.

[47] Cooper, W.E., Billings, D., and Cole, R.A., "Articula- tory Effects on Speech Perception: A Second Report," Journal of Phonetics, Vol., 4, pp. 219- 232, 1976.

[48] Cooper, W.E., Ebert, R.R., and Cole, R.A., "Perceptual Analysis of Stop Consonants and Glides," Journal of Experimental Psychology: Human Perception and Performance, Vol. 2, No. 1, pp. 92-104, 1976. [49] Corso, John, "Adaptation-Level Theory and Psychophysi- cal Scaling," in Appley, M.H. (Ed.), Adaptation- Level Theory, A Symposium, New York: Academic Press, 1971. [50] Coueignoux, P., "A Parametric Representation of Roman Printed Fonts," Ph.D. Thesis, Dept. of Electrical Engineering and Computer Science, M.I.T., June 1975, unpublished.

[51] Cox, C., Blesser, B., and Eden, M., "The Application of Type Font Analysis to Automatic Character Recogni- tion," Proc. of the Second International Joint Conference on~Pattern Recognition, Copenhagen, Denmark, IEEE Cat. No. 74CH0885-4C, pp. 226-232, 1974.

-369- [52] Cox, C., Blesser, B., and Eden, M., "The Graphical Con- text of Printed Characters," Visible Language, in press, 1979.

[53] Cox, C., and Coueignoux, P., "Concise Letter/Type Font Description: Theory and Computer Implementation," Technical Association of the Graphic Arts 1976 Conference Proceedings, Philadelphia, Pa., May, 1976.

[54] Cox, C., Coueignoux, P., Blesser, B., and Eden, M., "Skeletons: A Link Between Theoretical and Physi- cal Letter Descriptions," submitted for publica- tion, 1979.

[55] Creelman, C.D., "Discriminability and Scaling of Linear Extent," Journal of Experimental Psychology, Vol. 70, No. 2, pp. 192-200, 1965.

[56] Datapro Research Corporation, "All About Optical Readers," Feature Report 70F-320-Ola, September 1970, Philadelphia, Pa.

[57] Datapro Research Corporation, "All About -Optical Readers," Feature Report 70F-.320-01, November 1972, Delran, N.J.

[58] Datapro Research Corporation, "All About Optical Readers," Peripherals Report 70D-010-78 October' 1975, Delran, N.J.

[59] Datapro Research Corporation, "All About Optical Readers," Peripherals Report 70D-010-78 May 1977, Delran, N.J. [60] Devoe, D.B., and Graham, D.N., "Evaluation of Hand Printed Character Recognition Techniques," Final Report, Sylvania Electronic Systems, , RADC-TR- 68-103, January 19.68.. [61] Diehl, R.L. , Elman, J.L., and McCusker, S.B., "Contrast Effects on Stop Consonant Identification," Journal of Experimental Psychology: . Human Perception and Performance, Vol. 4, No. 4, pp 599-609, 1978.

[62] Dodwell, P.C., Visual Pattern Recognition, New York: Holt, Rinehart and Winston, 1970.

-370- [63] Doster, W., "Contextual Postprocessing System for Cooperation with a Multiple Choice Character Recognition System," IEEE Transactions on Systems, Man, and Cybernetics, Vol. C-26, No. 11, pp. 1090-1101, 1977.

[64] Duda, R.0., and Hart, P.E., "Experiments in the Recog- nition of Handprinted Text: Part II, Context Analysis," in Proceedings of the Fall Joint Com- puter Conference, AFIPS, pp. 119-1149, Vol.JT Washington, D.C.: Thompson Book Co., 1968.

[65] Dunn-Rankin, P., "The Visual Characteristics of Words," Scientific American, January 1978, pp. 122-130.

[66] Durlach, N.I., and Braida, L.D., "Intensity Perception. I. Preliminary Theory of Intensity Resolution," Journal of the Acoustical Society of America, Vol. 46, No. 2(Part 2), pp. 372-383, 1969. [67] Eden, M., "Handwriting Generation and Recognition," Chapter 5 in Kolers, P., and Eden, M.. (Eds.), Recognizing Patterns, Cambridge, Mass.: MIT Press, 1968.

[68] Eden, M., "The Application of Character Recognition Techniques to the Development of Reading Machines for the Blind," in Image Processing in Biological Science, pp. 35-55, Univ. of California Press, 1969.

[69] Eden, M., "Visual Image Processing in Animals and Man," in Simon, J.C., and Rosenfeld, A., (Eds.), Digital Image Processing and Analysis, 1978.

[70] Eden, M., and Halle, M., "The Characterization of Cur- sive Writing," Proc. 4th London Symposium on Information Theory, London: Butterworth, 1961, pp. 152-155..

[713 Ehrich, R.W., and Riseman, E.M. , "Contextual Error Detection," COINS Tech. Report 70C-4, University of Massachusetts, Amherst, 1971.

[72] "Electronic Mail Net Blueprint Is Unveiled By USPS," Data Communications, Dec. 1977, pp. 15-17.

-371- [73] Eriksen, C.W., and Hake, H.W., "Absolute Judgements as a Function of Stimulus Range and Number of Stimulus and Response Categories," Journal of Experimental Psychology, Vol. 49, pp. 323-332, 1955.

[74]. Fisher, E.G., "The Use of Context in Character Recogni- tion," COINS Technical Report 76-T2, U? Mass., Amherst , July 1976. [75] Flesch, R., Why Johnny Can't Read and What You Can Do About It, New York: Harper-Row, 1966.

[76] Fletcher, H., and Munson, W.A., "Loudness, Its Defini- tion, Measurement and Calculation," reprinted in Harris, J.D., Forty Germinal Papers in Human Hear- ing, Groton, Conn.: Journal of Auditory Research, 1969, pp. 175-184.

[77] Fodor, J.A., Bever, T.G., and Garrett, M.F., The Psychology of Language, New York: McGraw-Hill, 1974.

[78] Freedman, M.D., "Optical Character Recognition," IEEE Spectrum, March 1974, pp. 44-52.

[79] Garner, W.R..,."Context Effects and. the Validity of Loud-ness Scales," Journal of Experimental Psychol- ogy, Vol. 48, pp. 218-224, 1954. [80] Gibson, E.J., Principles of Perceptual Learning and

Development,. New York: Appleton-Century-Crofts, 1969. [81] Gibson, E.J., and Levin, H., The Psychology of Reading, Cambridge, Mass.: MIT Press, 1975.

[82] Goff, W.J., "How Not to Re-Invent the OCR Wheel," OCR Today, Vol. 1, No. 2, pp. 14,17, Feb. 1978.

[83] Gold, B., "Machine Recognition of Hand-Sent Morse Code,". IRE Trans. Information Theory, Vol. IT-5, pp. 17-24, March 1959.

[84] Grether, C.B., and Stroh, R.E., "Subjective Evaluation of Differential Pulse-Code Modulation Using the Speech 'Goodness' Rating Scale," IEEE Trans. Audio and Electroacoustics, Vol. AU21, pp. 179-184, 1973.

-372- [85] Griffith, A.K., "Handprint Recognition on the GRAFIX 1: A Commercial Application of Heuristic Program- ming," Proceedings A.C.M. Annual Conference, San Diego, November 1974, pp. 368-372.

[86] Griffith, A.K., "Handprint Recognition on the GRAFIX I: Recent Progress," Proceedings of the 1976 Milwaukee Symposium on Automatic Computation and Control, pp. 149-150, 1976.

[87] Griffith, A.K., "The GRAFIX I System and Its Applica- tion to Optical Character Recognition," Proceed- ings of the Third International Joint Conference of Pattern Recognition, Coronado, California, pp. 650-652, 1977.

[88] Guilford, J.P., Psychometric Methods, New York: McGraw Hill, 1954.

[89] Hanson, A.R.., Riseman, E.M., and Fisher, E., "Context in Word Recognition," Pattern Recognition, Vol. 8, pp. 35-43, 1976.

[90] Harmon, L.C., "Automatic Recognition of Print and Script," Proc. IEEE, Vol. 60, pp. 1165-1176, 1972.

[91] Harvey, 0.3., and Campbell, D.T., "Judgments of Weight as Affected by Adaptation Range, Adaptation Dura- tion, Magnitude of Unlabeled Anchor, and Judgmen- tal Language," Journal of Experimental Psychology, Vol. 65, pp. 12-21, 196T

[92] Haton, J., "A Practical Application of a Real-Time Isolated-Word Recognition System Using Syntactic Constraints," IEEE Trans. on Acoustics, Speech and Signal Processing, Vol. ASSP-26, pp. 416-419, 1974.

[93] Helson, H., Adaptation-Level Theory, an Experimental and Systematic Approach to Behavior, New York: Harper and Row, 1964.

[94] Herrick, E.M., "A Linguistic Description of Roman Alphabets," M.A. Thesis, Hartford Seminary Founda- tion, 1966, unpublished.

-373- [95] Herrick, E.M., "A Taxonomy of Alphabets and Scripts," Visi.ble Language, Vol.. 8, pp. 5-32, 1974. [96] Highleyman, W.H. , "An Analogue Method of Character Recognition," IRE Transactions on Electronic Com- puters , Vol. 1,pp. 502-512, 1961. [97] Hildebrand, F.B., Introduction to Numerical Analysis, New York: McGraw-Hill, 1972.

[98] Himmel , D.P. , "Some Real-World Experiences with Hand- printed Optical Character Recognition," IEEE Tran- sactions on Systems, Man, and Cybernetics, Vol. SMC-8, No. 4, pp. 288-292, 1978.

[99] Holt, A.W., "What Was Promised - What We Have - and What is Being Promised in Character Recognition," AFIPS Proceedings, Vol. 33, Pt. 2, 1968 Fall Joint Computer Conference.

[100] Huffman, D., "Why Optical Character Recognition," OCR Today, Vol. 1, No. 2, pp. 15,16, Feb. 1978.

[101] Hunt, William A., "Anchoring Effects in Judgment," American Journal of Psychology, Vol. 54, pp. 395- 403, 1941.

[102] Hunt, William A., and Flannery, Jane, "Variability in the Affective Judgment," American Journal of Psychology, Vol. 51, pp. 507-513, 1938.

[103] Iwata, K., Yoshida, M., Yamamoto,.E., Masui, T., Kabuyama, Y., and Shimizu, S., "Recognition Sys- tem for Handprinted Characters," Proceedings of IFIPS '77, p. 35, 1977.

[104] Jans, Christopher L., "An Investigation of U-V Discrim- ination," B.S. Thesis, M.I.T., 1975, unpublished.

[105] Jaspert, W.P., Berry, W.T., and Johnson, A.F., The Encyclopaedia of Type Faces, Bloomington: McKnight and McKnight, 1970. [106] Johnston, V.S., and Chesney., G.L., "Electrophysiologi- cal Correlates of Meaning," Science, Vol. 186, pp. 944-946, 1974.

-374- [107] Kavanaugh , J.F., and Matting'ly, I.G., Language by Ear and Eye: The Relationships Between Speech and Reading, Cambridge, Mass.: MIT Press, 1972.

[108] Kegel, A.G., Giles, J.K., and Ruder, A.H., "Observa- tions of Selected Application of Optical Character Readers for Constrained Numeric Handprint," IEEE Transactions on Systems, Man and Cybernetics, Vol. SMC-8, No. 4,~p p. 22-285, 1978.

[109] Klatt, D.H., "Review of the ARPA Speech Understanding Project," Journal of the Acoustical Society of America, Vol.. 62, no. 6, pp. 1345-1366, 1977.

[110] Klatt, D.H., "Speech Perception: A Model of Acoustic- Phonetic Analysis and Lexical Access," Journal of Phonetics, in press, 1979.

[111] Klatt , D.H., and Cooper, W.E., "Perception of Segment Duration in Sentence Contexts," presented at the 89th Meeting of the Acoustical Society of America, Austin, Tx., April 7-11, 1975.

[112] Kolers, P.A., and Eden, M., (Eds.), Recognizing Pat- terns, Studies in Living and Automatic Systems, Cambridge, Mass., MIT Press, 1968.

[113] Kotovsky, K., "The Effects of Word Rarity, Word Struc- ture, and Information Value Per Word on the Abil- ity of English Speaking Subjects to Identify Tachistoscopically Presented, Four Letter Words," B.S. Thesis, M.I.T., May 1961, unpublished.

[114] Krantz, D.L., and Campbell, D.T., "Separating Percep- tual and Linguistic Effects of Context Shifts Upon Absolute Judgments," Journal of_ Experimental Psychology Vol.62, pp. 35-42,-1961. [115] Kuklinski, T.T., "Plasticity Effects in the Perception of Handprinted Characters," S.M. and E.E. Thesis, M.I.T. , 1975, unpublished.

[116] Kuklinski, T., and Kuklinski, R., "A Comparison of Interletter Boundaries Across Letter Pair," MIT, R.L.E., C.I.P.G. Character Recognition Subgroup, internal memorandum, 1978.

-375- [117] Kuklinski, T., and Waldron, T., "Frequency Effects in Goodness Experiments," MIT, R.L.E., C.I.P.G. Char- acter Recognition Subgroup, internal memorandum, 1977.

[118] Kurzweil Computer Products, Inc., The Kurzweil Report, Vol. 1, No. 1, Spring, 1978.

[119] Kurzweil, R., "The Kurzweil Reading Machine, An Over- view," Science Writers Seminar in Opthamology, Research to Prevent Blindness, Inc. -Reston, Va. May 8-11, 1976.

[120] Labov, William, "The Boundaries of Words and their Meanings," Paper presented at the Conference on New Ways of Analyzing Variation in English, Wash- ington, D.C., 1972.

[121] Ladefoged, P., and Broadbent, D.E., "Information Con- veyed by Vowels," Journal of the Acoustical Society of America, Vol. 72, pp. 275-309, 1965. [122] Lane, H., "The Motor Theory of Speech Perception: A Critical Review," Psychological Review, Vol.72, pp. 275-309, 1965.

[123] Lee, Thomas, "The Phenomenon of Line Addition in Char- acter Recognition," B.S. Thesis, M.I.T., 1975, unpublished.

[124] Liles, B.L., An Introductory Transformational Grammar, Englewood Cliffs: Prentice-Hall, 1971. [125] Lim, J.S., Rabinowitz, W.M., Braida, L.D., and Durlach, N.I., "Intensity Perception. VIII. Loudness Com- parisons Between Different Types of Stimuli," Journal of the Acoustical Society of America, Vol. 62, No. 5, pp..1256-1267,.1977.

[126] Lippman, R.P., Braida, L.D., and Durlach, N.I., Inten- sity Perception. V. Effect of Payoff Matrix on Absolute Identification," Journal of the Acousti- cal Society of America, Vol. 59, No. 1, pp. 129- 134, 1976. [127] Lisker, L., and Abramson, A.S., "A Cross-Language Study of Voicing in Initial Stops: Acoustical Measure- ments," Word, Vol. 20, pp. 384-422, 1964.

-376- [128] Management Information Corporation, "OCR Update 78," Data Entry Awareness Report, Vol. 6, No. 9, Sep- tember 1978.

[129] Manis, Melvin, "Context Effects in Communication, Determinants of Verbal Output and Referential Decoding," in Appley, M.H. (Ed.), Adaptation-Level Theory, a Symposium, New York: Academic Press, T97T.

[130] Mason, S.J., and Clemens, J.K., "Character Recognition in an Experimental Reading Machine for the Blind," Chapter 6 in Recognizing Patterns, Studies in Liv- ing and Automatic Systems, Cambridge, Mass.,MIT Press, 1968.

[131] Massaro, D., "A Stage Model of Reading and Listening," Visible Language, Vol. 12, No. 1, pp. 3-26, 1978.

[132] Mathlab Group, M.I.T., MACYSMA Reference Manual, Ver- sion 7, Cambridge, Mass., 1974.

[133] Mermelstein, P., and Eden, M., ."Experiments on Computer Recognition of Handwritten Words," Information and Control., Vol. 7, pp. 255-270, 1964.

[134] Miller, G.A., "The Magical Number Seven, Plus or Minus Two, or Some Limits on Our Capacity for Processing Inforamtton," Psychological Review, Vol. 63, pp. 81-96, 1956.

[135] Munson, J.H., "Experiments in the Recognition of Hand- Printed Text," Fall Joint Computer Conference, AFIPS Proceedings, Vol. 33, Part 2, pp. 1125-1138, 1968.

[136] Munson, J.H., "The Recognition of Hand-Printed Text," from Kanal, L., (Ed.), Pattern Recognition, New York: Thompson, 1968, pp. 115-140.

[137] Nagy, George, "State of the Art in Pattern Recogni- tion," Proc. IEEE, Vol. 56, pp. 836-862, 1968. [138] Naus, M.J., and Shillman, R.J., "Why a Y Is Not a V: A New Look at the Distinctive Features of Letters," Journal of Experimental Psychology: Human Percep- tion and Performance, Vol. 2, pp. 394-400, 1976.

-377- [139] Neisser, U., Cognitive Psychology, New York: Appleton-Century-Crofts, 1967.

[140] Neisser, U., and Weene, P., "A Note on Human Recogni- tion of Hand-Printed Characters," Information and Control, Vol. 3, pp. 191-196, 1960.

[141] Niemann, H., "Clas-sification of Characters by Man and Machine," Pattern Recognition, Vol. 9, No. 4, pp. 173-179, 1977. [142] Nunnally, J.C., Psychometric Theory, New York: McGraw-Hill, 1967. [143] OCR Today, official publication of the OCR Users Asso- ciation, published quarterly by C/J Publications, Hackensack, N.J.

[144] "Optical Readers," Modern Data, October 1974, pp. 22- 27.

[145] Optical Scanning News, Raines, G., (Ed.), Phila., Pa., North American Publishing.

[146] Ossanna, J.F., NROFF/TROFF User's Manual, Bell Labora- tories Technical Memorandum~76-1271-13, October 11, 1976.

[147] Parducci, A., "A Range-Frequency Compromise in Judg- ment," Psychological Monographs, Vol. 77 (2, Whole No. 565), 1963. [148] Parducci, A., "Category Judgment: A Range-Frequency Model," Psychological Review, Vol. 72, pp. 407- 418,. 1965.

[149] Parducci, A., "The Relativism of Absolute Judgments," Scientific American, Vol. 219, pp. 84-90, 1968. [150] Parducci, A., "Contextual Effects: A Range-Frequency Analysis," in Carterette, E.C., and Friedman, M.P. (Eds.), Handbook of Perception. Vol. II, New York: Academic Press, 1975.

[151]. Parducci, A., Calfee, R.C., Marshall, L.M., and David-. son, L.P. , "Context Effects in Judgment: Adapta- tion Level as a Function of the Mean, Midpoint, and Median of Stimuli," Journal of Experimental Psychology, Vol. 60, pp. 65-77, 1960.

-378- [152] Parducci, A., and Haugen, R., "The Frequency Principle for Comparative Judgments," Perception and Psycho- physics, Vol. 2, pp. 81-82, 1967. [153] Parducci, A., Knobel, S.,and Thomas, C., "Independent Contexts for Category Ratings: A Range-Frequency Analysis," Perception and Psychophysics, Vol. 20(5), pp. 360-366, 1976?

[154] Parducci, A., and Marshall, L.M., "Context Effects in Judgments of Length," American Journal of Psychol- ogy, Vol. 70, pp. 576-583, 1961.

[155] Parducci, A., and Marshall, L.M., "Supplementary Report: The Effects of the Mean, Midpoint, and Median Upon Adaptation Levels in Judgment," Jour- nal of Experimental Psychology, Vol. 61, pp. 1226-1262, 1961.

[156] Parducci, A., and Perrett, L.F., "Contextual Effects for Category Judgments by Practiced Subjects," Psychonomic Science, Vol. 9, pp. 357-358, 1967.

[157] Parducci, A., and Perrett, L.F., "Category Rating Scales: Effects of Spacing and Frequency of Stimulus Values," Journal of Experimental Psychol- ogy Monograph, Vol. 89, pp. 427-452, 1971.

[158] Parducci, A., and Sandusky, A., "Distribution and Sequence Effects in Judgment," Journal of Experi mental Psychology, Vol. 69, No. 5, pp. 450-459, 1965.

[159] Pastore, R.E., "Phonemes and Alphanumeric Characters: Possible Components of Parallel Human Communica- tion Systems," Visible Language, Vol. 12, No 1, pp. 27-42, 1978. [160] "Pattern Recognition Data Bases," Computer, Vol. 10, No. 4, 1977. [161] Pavlidis, T., and Ali, F., "Computer Recognition of Handwritten Numerals by Polygonal Approximations," IEEE Transactions on Systems, Man, and Cybernet- ics, Vol. SMC-5, No. 6, pp. 610-614, 1975.

-379- [162] Pearson, D.E., "Methods for Scaling Television Picture Quality: A Survey," in Huang, T., and Tretiak, 0. (Eds.), Picture Bandwidth Compression, pp. 47-95, New York: Gordon and Breach, 1972.

[163] Proceedings of the IEEE Computer Society Conference on Pattern Recognition and Image Processing, June 6- 8, 1977, R.P.I., Troy, New York.

[164] Proceedings of the Second International Joint Confer- ence on Pattern Recognition, Copenhagen, Denmark, IEEE Catalog No. 74CH0885-4C, pp. 195-202, 1974. [165] Pynn, C.T., Braida, L.D., and Durlach, N.I., "Intensity Perception. III. Resolution in Small-Range Iden- tification," Journal of the Acoustical Society of America, Vol. 51, No. 2 (Part 2), pp. 559-566, 1972.

[166] Rabinow, J., "The Present State of the Art in Reading Machines," in Kanal, L.N. (Ed.), Pattern Recogni- tion, City: Thomson, 1968.

[167] Rabinowitz, W.M., Lim, J.S., Braida, L.D., and Durlach, N.I., "Intensity Perception. VI. Summary of Recent Data on Deviations from Weber's Law for 1000-Hz Tone Pulses," Journal of the Acoustical Society of America, Vol. 59, No. 6, pp. 1506-1509, 1976.

[168] Rankin, 'B.K., "A Grammar for Component Combination in Chinese Characters," National Bureau of Standards Technical Note No. 296., Washington, D.C., 1966.

[169] Reed, Stephen K., Psychological Processes in Pattern Recognition, New York: Academic Press, 1.973.

[170] Rengger, R.E., and Parks, J.R., A Survey of Handprint- ing, Auto T.M. (68), 9. National Physical Labora- tory, Teddington, England, March 1968. [171] Richardson,S., Chan, H.L., Lee, A., and Teo, S.T., "The Muller-Lyer Illusion: A Cross Cultural Study in ," Ergonomics, Vol. 15, pp. 293-298, 1972. [172] Riseman, E.M., and Ehrich, R.W., "Contextual Recogni- tion Using Binary Digrams," IEEE Transactions on

Computers, Vol. C-20, pp. 397~4~C3, 1971. -

-380- [173] Riseman, E.M., and Hanson, A.R. , "A -Conte.xtual Postpro- cessing. System for Error Detection and Correction in Character Recognition," Tech. Report 72C-I, Dept. of Computer and Information Science, Univ. of Massachusetts, Amherst, 1972.

[174] Rorer, L.G., "What, Can the Devill Speake True?," Psychological Bulletin, Vol. 81, No. 6, pp. 355- 357194.

[175] Rosenfeld, A., "Survey: Picture Processing 1976", Com- puter Graphics and Image Processing. Vol. 6, No. 2, pp. 157-183, 1977. [176] Sarris, V., and Parducci, A., "Multiple Anchoring of Category Rating Scales," Perception and Psychophy- sics, Vol. 24 (1), pp. 35-39, 1978. [177] Scan-Data Corporation, "2250/1 OCR System" Literature No. MHDM-2250/1OS-11/73, Norristown, Pa., 1973.

[178] Scan-Data Corporation, "Alphabetic Handprint," Litera- ture No. MADM-AHP-3/75, Norristown, Pa., 1975.

[179] Schantz, H.F., "Optical Character Recognition, The Impact of a Maturing Technology on Future User Applications," OCR Today, pp. 6,7,25, November 1977.

[180] Schantz, H.F., "Economic Considerations in the Selec- tion of OCR Systems," OCR Today, Vol. 2, No. 1, pp. 19-26, May 1978.

[181] Sefkow, T., "Symmetry as an Aspect of Character Recog- nition", S.B. .Thesis, M.I.T., 1973, unpublished.

[182] Shillman, R.J., "Character Recognition Based on Phenomenological Attributes: Theory and Methods," Ph.D. Thesis, M.I.T., 1974, unpublished. [183] Shillman, R.J., "The Application.of Optical Character Recognition to Machine Translation," paper presented to Seminar on Machine Translation, Foreign Broadcast Information Service, Rosslyn, Va., March 1976.

-381- [184] Shillman, R.J., "Character Recognition and the Data Entry Problem," presented at Foreign Broadcast Information Service Seminar on Computer Support to Translation, Rosslyn, Virginia, May 11-12, 1978.

[185] Shillman, R.J. and Babcock, R.T., "Preliminary Steps in the Design of Optical Character Recognition Algo- rithms," Proceedings of the IEEE Computer Society Conference on Pattern Recognition and Image Pro- cessing, R.P.I., Troy, N.Y., June 1977. [186] Shillman, R., and Blesser, B., "The Use of Ambiguous Characters in Measuring Functional Invariants," R.L.E. Quarterly Progress Report,, M.I.T., No. 109, p. 155, 1973. [187] Shillman, R., Cox, C., Kuklinski, T., Ventura, J., Blesser, B., and Eden, M., "A Bibliography in Character Recognition: Techniques for Describing Characters," Visible Language, Vol. 8, pp. 151- 16,6, 1974. [188] Shillman, R.J., Kuklinski, T.T., and Blesser, B.A., "Experimental Methodologies for Character Recogni- tion Based on Phenomenological Attributes," Proceedings of the Second International Joint Conference on Pattern Recognition, Copenhagen, Denmark, IEEE Catalog No. 74CH0885-4C, pp. 195- 202, 1974. [189] Shillman, R., Kuklinski, T. ,and Blesser, B., "Psycho- physical Techniques for Investigating the Distinc- tive Features of Letters," International Journal of Man-Machine Studies, Vol. 8,.pp. 195-205, 1976. [1903 Shillman, R.J., and Naus, G.J., "The Distinctive Features of the Letters 0 and D," Progress Report, Research Laboratory of Electronics, M.I.T., Cam- bridge, Mass., No. 118, pp. 233-238, 1976. [191] Siegel, S., Nonparametric Statistics, New York: McGraw-Hill, 1956.

[192] Spanjersberg, A.A., "Experiments with Automatic Input of Handwritten Numerical Data into a Large Admin- istrative System," IEEE Transactions on Systems, Man and Cybernetics, Vol. SMC-8, No. 47 pp. 286- 288, 1978.

-382- [193] Spiegel, M.R. , Statistics., New York: McGraw-Hill, 1961.

[194] Stevens, M.E., "Automatic Character Recognition - A State of the Art Report," National Bureau of Stan- dards Technical Note 112, May 1961.

[195] Stevens, S.S., "Adaptation-Level vs. the Relativity of Judgment," American Journal of Psychology, Vol. 71, pp. 633-64677958.

[196] Stevens, .S., "Perceptual Magnitude and Its Measure- ment," Chapter 11 in Carterette, E.C., and Fried- man, M.P., (Eds.), Handbook of Perception, Volume II, New York: Academic Press, 1974.

[197] Suen, C., "Human Factors in Character Recognition," Proceedings of the 1974 Intl. Conference on Sys- tems, Man, and Cybernetics, Dallas, Tx., October 1974, IEEE CHO 908-4SMC, pp. 253-258. [198] Suen, C.Y., "Handwriting Education - A Bibliography of Contemporary Publications," Visible Language, Vol. 9, No. 2, pp. 145-158, 1975.

[199] Suen, Ching Y., "Advances in Optical Character Recogni- tion," Proceedings of the Canadian Computer Conference, , , May 1978.

[200] Suen, C.Y., Berthod, M., and Mori, S., "Akivances in Recognition of Handprinted Characters," Proceed- ings of the Fourth International Joint Conference on Pattern Recognition, Kyoto, Japan, November, 1978. [201] Suen, C.Y., and Shillman, R.J., "Low Error Rate Optical Character Recognition of Unconstrained Handprinted Letters Based on a Model of Human Perception," IEEE Trans. Systems, Man and Cybernetics, Vol. SMC-7, No. 6, pp. 491-495, 1977. [202] Tersoff, A.I., (Ed.) Special Section on Man-Machine Considerations in Automatic Handprint Recognition, IEEE Transactions on Systems, Man and Cybernetics, Vol. SMC-8, No. 4,~pp. 279-296, 1978.

-383- [203] Tersoff, A.I.,, "Man-Machine Considerations in Automatic Handprint Recognition," IEEE Transactions on Sys- tems, Man and Cybernetics, Vol. SMC-8, No. 4, pp. 279, 1978. [204] Thompson, R.C., "Alphabetic Handprint Reading," OCR Today, Vol. 1, No. 2, pp. 9,10,12,23, Feb. 1978.

[205] Toissaint, G.T. "The Use of Context in Pattern Recog- nition," Pattern Recognition, Vol. 10, No. 3, pp. 189-204, 1978 (an earlier version of this paper appeared in: Proc. IEEE Computer Society Conf. Pattern Recognition and Image Processing, Troy, N.Y., June 1977, pp. 1-10.)

[206] Torgerson, W.S., .Theory and Methods of Scaling, New York: Wiley, 1958.

[207] Troxel, D.E., "Feature Selection for Low Error Rate OCR," Pattern Recognition, Vol. 8, No. 2, pp. 73- 76, 1916.

[208] Ullmann, J.R., Pattern Recognition Techniques, New York: Crane, Russak and Co., 1973.

[209] Ullmann, J.R., "Picture Analysis in Character Recogni- tion," Chapter 6 in Rosenfeld, A. (Ed.), Digital Picture Analysis, Topics in Applied Physics, Vol. 11, pp. 295-343, Berlin: Springer-Verlag, 1976.

[210] Ventura, J., "Letter Recognition and the Analysis of Confusion Matrices," S.M. Thesis, Dept. of Psychology, Brandeis University, 1971.

[211] Waldron, T.P., "Effects of Biased Context on Inter- Character Boundaries," S.B. Thesis, Dept. of Electrical Engineering and Computer Science, M.I.T., 1977.

[212] Wang, S-Y., "Language Change,' paper presented at New York Academy of Sciences Conference in Origins and Evolution of Language and. Speech, New York, Sept., 1975.

[213] Withington, F.G., "Transformation of the Information Industries,". Datamation, November 15, 1978, pp. 8-14.

-384- [214] Wright, G.G.N., The Writing of Arabic Numerals, Scot- tish Council for Research in Education Series, No. 33, London: University of London Press, 1952. [215] Wrolstad, M.E. (Ed.), Visible Language, Vol. 10, No. 3, cover, Summer 1976.

[216] Yacyk, J., "Alphabetic Handprint Reading," IEEE Tran- sactions on Systems, Man and Cybernetics, Vol. SMC-8, No. 4, pp. 279-282, 1978.

[217] Yamasaki, I., "Quantitative Evaluation of Print Quality for Optical Character Recognition Systems," IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-8, No. 5, pp. 371-381, 1978. [218] Yasuhara, M., "On Distinctive Attributes in Character Recognition - Some Evidence From Reaction Time Measurements," Research Laboratory of Communica- tion Sciences, University of Electro- Communications, Tokyo, Japan. Working Paper No. 102.

[219] Yasuhara, M., and Kuklinski, T., "Category Boundary Effect for Grapheme Perception," Perception and Psychophysics, Vol. 23(2), pp. 97-104, 1978.

-385- BIOGRAPHICAL NOTE

Theodore Kuklinski was born August 15, 1949, and -raised

in Philadelphia, Pennsylvania. He received a Jesuit education

at St. Joseph's Preparatory School, graduating in 1967. He attended Drexel University in Philadelphia in Electrical Engineering, graduating in 1972 with highest honors. Under

the.cooperative education program, he worked in the Electrical

Design Division at the'Philadelphia Naval Shipyard, as well as

in the Signal Processing Group of the Advanced Technology

Laboratories at RCA Corporation in Camden, New Jersey. At RCA

he became. very interested in problems of pattern recognition.

His interest in the'topic of pattern recognition led him

to graduate school at the Massachusetts Institute of Technol-

ogy working in the Cognitive Information Processing Group of

the Research Laboratory of Electronics. There he became

involved in an NSF sponsored project entitled "Pattern Recog-

nition of Conventional Symbol Systems" working under the direction of Professors Murray Eden and Barry Blesser. This project involved the formulation of a new theory of character recognition based on functional attributes. He received the

S.M.. and E.E. degrees from M.I.T. in 1975, after completing a thesis entitled, "Plasticity Effects in the Perception of

Handprinted Characters." His interest in character recogni- tion was extended into the present thesis.

-386- On October 8, 1977, Mr. Kuklinski was joined in matrimony to Hsueh-Rong Chang and they 1ived happily ever after! Mr. Kuklinski is also director of the MIT Kite Experimentation

Laboratory. Other interests include the Chinese language, both written and spoken. He is an active volleyball player and official , as well. as a bicycle enthusiast and folk guitar- ist. He is a member of the I.E.E.E., Eta Kappa Nu, Sigma Xi,

Tau Beta Pi, and the American Kitefliers Association.

-387-